|
基于Transformer的肾实质新型分割网络
|
Abstract:
肾在人体内是一个很重要的器官,而肾实质是很常见的一种肾病。目前关于肾实质病变的判断是临床医生通过标注,人工进行判断的。这样人工的方式需要大量的时间以及人工成本,也因此我们亟需一种自动化的标注分割方法,从而提升肾实质的分割效率与精度。本文针对小儿肾实质的分割问题,绘制了一套儿童的肾实质数据集,并且根据数据集的特点提出了一种基于transform的分割方法。Transform架构不同于传统卷积提取特征的架构,transform更加关注语义的上下文信息,我们用它作为编码器的一部分来提取语义信息,从序列到序列学习的角度,为图像分割提供了一个全新的视角。这样做不仅改善了分辨率降低导致感受野下降的问题,同时也改善了跳跃连接会带来语义间隙的问题,得到最终的分割结果图,极大地减少了人工标注的代价。本文的代码是基于pytorch框架进行的编程,在所提出的肾图数据集上进行的实验,并将本文提出的网络与经典的FCN、SegNet、U-Net和Deeplab-V3+做了对比实验。结果显示本文提出的方法在precision、dice_coeff、recall三种评价指标上(对比其网络在这三种指标上最优的结果),分别提升了1.99%、1.65%、2.23%和3.001%,其效果也得到了专业医生的认可。
The kidney is a very important organ in the human body, and the renal parenchyma is a very common kidney disease. At present, the judgment of renal parenchymal lesions is made manually by clinicians through labeling. This artificial method requires a lot of time and labor cost. Therefore, we urgently need an automatic labeling and segmentation method to improve the efficiency and accuracy of renal parenchyma segmentation. Therefore, aiming at the segmentation of children’s renal parenchyma, this paper draws a set of children’s renal parenchyma data set, and proposes a segmentation method based on transform according to the characteristics of the data set. Transform architecture is different from the traditional convolution feature extraction architecture. Transform pays more attention to the semantic context information. We use it as a part of the encoder to extract semantic information. From the perspective of sequence to sequence learning, it provides a new perspective for image segmentation. This not only improves the problem of reduced resolution leading to the decline of receptive field, but also improves the problem of semantic gap caused by jump connection, and obtains the final segmentation result image, which greatly reduces the cost of manual annotation. The code of this paper is based on the programming of pytorch framework. The experiment is carried out on the proposed nephrogram data set, and the network proposed in this paper is compared with the classical FCN, segnet, u-net and deeplab-v3 +. The results show that the proposed method is effective in precision, dice_coeff and recall on the three evaluation indexes of coeff and recall (comparing the best results of their network in these three indexes) have increased by 1.99%, 1.65%, 2.23% and 3.001% respectively, and their effects have also been recognized by professional doctors.
[1] | Chan, H.P., Doi, K., Galhotra, S., et al. (1987) Image Feature Analysis and Computer-Aided Diagnosis in Digital Radiography. I. Automated Detection of Microcalcifications in Mammography. Medical Physics, 14, 538-548.
https://doi.org/10.1118/1.596065 |
[2] | Van Ginneken, B., Ter Haar Romeny, B.M. and Viergever, M.A. (2001) Computer-Aided Diagnosis in Chest Radiography: A Survey. IEEE Transactions on Medical Imaging, 20, 1228-1241. https://doi.org/10.1109/42.974918 |
[3] | Jha, D., Smedsrud, P.H., Riegler, M.A., et al. (2020) Kvasir-SEG: A Segmented Polyp Dataset. International Conference on Multimedia Modeling, Daejeon, 5-8 January 2020, 451-462. https://doi.org/10.1007/978-3-030-37734-2_37 |
[4] | Zhao, F. and Xie, X. (2013) An Overview of Interactive Medical Image Segmentation. Annals of the BMVA, 2013, 1-22. |
[5] | Litjens, G., Kooi, T., Bejnordi, B.E., et al. (2017) A Survey on Deep Learning in Medical Image Analysis. Medical Image Analysis, 42, 60-88. https://doi.org/10.1016/j.media.2017.07.005 |
[6] | Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3431-3440.
https://doi.org/10.1109/CVPR.2015.7298965 |
[7] | Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, 5-9 October 2015, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 |
[8] | Jo, H.J. (2018) Factors of Variation in Diagrams and Location of Kidney. The Journal of Korean Medical History, 31, 23-42. |
[9] | Chen, L.C., Zhu, Y., Papandreou, G., et al. (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 833-851. https://doi.org/10.1007/978-3-030-01234-2_49 |
[10] | Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495.
https://doi.org/10.1109/TPAMI.2016.2644615 |
[11] | Guo, Z., Shengoku, H., Wu, G., et al. (2018) Semantic Segmentation for Urban Planning Maps Based on U-Net. IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, 22-27 July 2018, 6187-6190. https://doi.org/10.1109/IGARSS.2018.8519049 |
[12] | Chen, L.C., Papandreou, G., Schroff, F., et al. (2017) Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv:1706.05587. |
[13] | He, K., Zhang, X., Ren, S., et al. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778.
https://doi.org/10.1109/CVPR.2016.90 |
[14] | Huang, Z., Wang, X., Huang, L., et al. (2019) CCNet: Criss-Cross Attention for Semantic Segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 603-612.
https://doi.org/10.1109/ICCV.2019.00069 |
[15] | Zhao, H., Shi, J., Qi, X., et al. (2017) Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 6230-6239. https://doi.org/10.1109/CVPR.2017.660 |
[16] | Zhao, H., Zhang, Y., Liu, S., et al. (2018) PSANet: Point-Wise Spatial Attention Network for Scene Parsing. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 270-286.
https://doi.org/10.1007/978-3-030-01240-3_17 |
[17] | Xue, H., Liu, C., Wan, F., et al. (2019) DANet: Divergent Activation for Weakly Supervised Object Localization. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 6588-6597. https://doi.org/10.1109/ICCV.2019.00669 |
[18] | Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, 4-9 December 2017, 5998-6008. |
[19] | Zheng, S., Lu, J., Zhao, H., et al. (2021) Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 6877-6886. https://doi.org/10.1109/CVPR46437.2021.00681 |
[20] | Paszke, A., Gross, S., Massa, F., et al. (2019) PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems, 32, 8026-8037. |