|
一种基于TransUNet的双分支并行医学图像分割模型
|
Abstract:
医学图像分割任务中,为充分利用TransUNet模型能有效捕获全局和局部特征的优势,在其基础上,提出DouTransNet模型,编码器部分针对单分支Transformer模块学习角度单一、容易丢失细节特征的问题,将Transformer设计成双分支并行结构,来提取不同尺度的特征,融合两个分支的特征,实现特征互补;针对在融合两个分支的特征时可能存在冗余信息问题,添加多核并行池化模块,在保留多尺度特征的同时去除冗余信息;在解码器设计多尺度融合模块(USF),融合来自编码器的三个尺度信息,有效弥补编码器与解码器之间的信息差距。在Synapse和ACDC数据集上进行了多次对比实验,Synapse数据集上平均DSC系数可达79.20%,较TransUNet模型提高3.34%,HD距离为25.24%,降低了11.67%;在ACDC数据集上平均DSC系数可达90.30%,提高1.67%。
In the medical image segmentation task, in order to make full use of the advantages of TransUNet model which can effectively capture global and local features, the DouTransNet model is proposed based on it. The encoder part aims at the problems of single learning Angle and easy loss of detail features in single-branch Transformer module. Transformer is designed as a two-branch parallel structure to extract features of different scales and fuse features of two branches to achieve feature complementarity. To solve the problem of redundant information when fusing the features of two branches, a multi-core parallel pooling module is added to remove redundant information while retaining multi-scale features. In the decoder design multi-scale fusion module (USF), the information of three scales from the encoder is fused to effectively bridge the information gap between the encoder and the decoder. Several comparison experiments were conducted on Synapse and ACDC data sets. The average DSC coefficient on Synapse data set can reach 79.20%, which is 3.34% higher than TransUNet model, and the HD distance is 25.24%, which is 11.67% lower. The average DSC coefficient on ACDC dataset can reach 90.30%, an increase of 1.67%.
[1] | Zheng, Z., Chen, D. and Huang, Y. (2024) Image Semantic Segmentation Approach for Studying Human Behavior on Image Data. Wuhan University Journal of Natural Sciences, 29, 145-153. https://doi.org/10.1051/wujns/2024292145 |
[2] | Li, D., Lin, C. and Bing, L. (2023) Surveys on the Application of Neural Networks to Event Extraction. The Journal of China Universities of Posts and Telecommunications, 30, 43-54+66. |
[3] | 吴曈, 胡浩基, 冯洋, 等. 分割一切模型(SAM)在医学图像分割中的应用[J/OL]. http://kns.cnki.net/kcms/detail/31.1339.TN.20240520.1426.004.html, 2024-05-30. |
[4] | Chen, J.N., Lu, Y.Y., Yu, Q.H., et al. (2021) TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. |
[5] | Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y., et al. (2019) CE-Net: Context Encoder Network for 2D Medical Image Segmentation. IEEE Transactions on Medical Imaging, 38, 2281-2292. https://doi.org/10.1109/tmi.2019.2903562 |
[6] | Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., et al. (2023) Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In: Lecture Notes in Computer Science, Springer, 205-218. https://doi.org/10.1007/978-3-031-25066-8_9 |
[7] | He, K., Zhang, X., Ren, S. and Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. https://doi.org/10.1109/cvpr.2016.90 |
[8] | Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324. https://doi.org/10.1109/5.726791 |
[9] | Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z. (2016) Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 2818-2826. https://doi.org/10.1109/cvpr.2016.308 |
[10] | Milletari, F., Navab, N. and Ahmadi, S. (2016) V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. 2016 Fourth International Conference on 3D Vision, Stanford, 25-28 October 2016, 565-571. https://doi.org/10.1109/3dv.2016.79 |
[11] | Landman, B., Xu, Z.B., Igelsias, J., et al. (2015) Miccai Multi-Atlas Labeling beyond the Cranial Vault-Workshop and Challenges. |
[12] | Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yang, X., Heng, P., et al. (2018) Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? IEEE Transactions on Medical Imaging, 37, 2514-2525. https://doi.org/10.1109/tmi.2018.2837502 |
[13] | Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W.M. and Frangi, A.F., Eds., Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Springer, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 |
[14] | Oktay, O., Schlemper, J., Folgoc, L.L., et al. (2018) Attention U-Net: Learning Where to Look for the Pancreas. |
[15] | Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations, Ethiopia, 26-30 April 2020. https://doi.org/10.48550/arXiv.2010.11929 |
[16] | Huang, X., Deng, Z., Li, D., Yuan, X. and Fu, Y. (2023) Missformer: An Effective Transformer for 2D Medical Image Segmentation. IEEE Transactions on Medical Imaging, 42, 1484-1494. https://doi.org/10.1109/tmi.2022.3230943 |
[17] | Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., et al. (2022) UNETR: Transformers for 3D Medical Image Segmentation. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, 3-8 January 2022, 1748-1758. https://doi.org/10.1109/wacv51458.2022.00181 |