|
基于对比学习增强的Lora微调超声影像分割模型
|
Abstract:
超声影像分析在现代医学中扮演着至关重要的角色,但精确分割是其面临的主要挑战之一。尽管现有的深度学习模型如SAM在自然图像上表现出色,但在医学图像分割上仍存在性能差距。本研究提出了一种基于对比学习增强的LoRA微调SAM-Med3D超声影像分割模型(USCL-Med3D),旨在提高3D超声影像分割的精确度和效率。为此,设计了一种半监督伪标签数据集训练方法,通过自动化获取标注数据,降低了标注难度并保证了标注效果。同时,引入对比学习架构VCL-head,增强了模型对3D超声影像上下文信息的提取能力。此外,还对SAM-Med3D模型进行了LoRA微调,从而使模型具有更好的分割能力。实验结果表明,所提方法在3D超声数据集和一些公开的3D医疗影像数据集上取得了优异的分割效果。
Ultrasound image analysis plays a critical role in modern medicine, but precise segmentation remains one of its major challenges. Although existing deep learning models like SAM perform well on natural images, there is still a performance gap in medical image segmentation. This study proposes a contrastive learning-enhanced LoRA fine-tuned SAM-Med3D ultrasound image segmentation model (USCL-Med3D) to improve the accuracy and efficiency of 3D ultrasound image segmentation. We designed a semi-supervised pseudo-label dataset training method to automatically obtain annotated data, reducing annotation difficulty while ensuring annotation quality. Additionally, a contrastive learning architecture was introduced to enhance the model’s ability to extract contextual information from 3D ultrasound images. Furthermore, we fine-tuned the SAM-Med3D model using LoRA, effectively incorporating the feature representation abilities of the 3D ultrasound dataset. Our method achieved excellent segmentation performance on the 3D ultrasound dataset and several publicly available 3D medical imaging datasets.
[1] | Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, 5-9 October 2015, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 |
[2] | Ke, J., Lu, Y., Shen, Y., Zhu, J., Zhou, Y., Huang, J., et al. (2023) ClusterSeg: A Crowd Cluster Pinpointed Nucleus Segmentation Framework with Cross-Modality Datasets. Medical Image Analysis, 85, Article 102758. https://doi.org/10.1016/j.media.2023.102758 |
[3] | Gao, H., Li, Y., Long, K., et al. (2024) A Survey for Foundation Models in Autonomous Driving. arXiv: 2402.01105. https://doi.org/10.48550/arXiv.2402.01105 |
[4] | Amrehn, M., Gaube, S., Unberath, M., et al. (2017) UI-Net: Interactive Artificial Neural Networks for Iterative Image Segmentation Based on a User Model. arXiv: 1709.03450. https://doi.org/10.48550/arXiv.1709.03450 |
[5] | Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., et al. (2023) Segment Anything. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 3992-4003. https://doi.org/10.1109/iccv51070.2023.00371 |
[6] | Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929. https://doi.org/10.48550/arXiv.2010.11929 |
[7] | Zhang, Y., Shen, Z. and Jiao, R. (2024) Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions. Computers in Biology and Medicine, 171, Article 108238. https://doi.org/10.1016/j.compbiomed.2024.108238 |
[8] | Ma, J., He, Y., Li, F., Han, L., You, C. and Wang, B. (2024) Segment Anything in Medical Images. Nature Communications, 15, Article No. 654. https://doi.org/10.1038/s41467-024-44824-z |
[9] | Cheng, J., Ye, J., Deng, Z., et al. (2023) SAM-Med2d. arXiv: 2308.16184. https://doi.org/10.48550/arXiv.2308.16184 |
[10] | Mazurowski, M.A., Dong, H., Gu, H., Yang, J., Konz, N. and Zhang, Y. (2023) Segment Anything Model for Medical Image Analysis: An Experimental Study. Medical Image Analysis, 89, Article 102918. https://doi.org/10.1016/j.media.2023.102918 |
[11] | Wang, H., Guo, S., Ye, J., et al. (2023) SAM-Med3D. arXiv: 2310.15161. https://doi.org/10.48550/arXiv.2310.15161 |
[12] | Wu, L., Zhuang, J. and Chen, H. (2024) VoCo: A Simple-Yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 16-22 June 2024, 22873-22882. https://doi.org/10.1109/cvpr52733.2024.02158 |
[13] | Achiam, J., Adler, S., Agarwal, S., et al. (2023) GPT-4 Technical Report. arXiv: 2303.08774. https://doi.org/10.48550/arXiv.2303.08774 |
[14] | Wang, X., Zhang, X., Cao, Y., Wang, W., Shen, C. and Huang, T. (2023) SegGPT: Towards Segmenting Everything in Context. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 1130-1140. https://doi.org/10.1109/iccv51070.2023.00110 |
[15] | Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, Virtual, 18-24 July 2021, 8748-8763. |
[16] | Oquab, M., Darcet, T., Moutakanni, T., et al. (2023) DiNOv2: Learning Robust Visual Features without Supervision. arXiv: 2304.07193. https://doi.org/10.48550/arXiv.2304.07193 |
[17] | Zou, X., Yang, J., Zhang, H., et al. (2024) Segment Everything Everywhere All at Once. Proceedings of the 37th International Conference on Neural Information Processing System, New Orleans, 10-16 December 2023, 19769-19782. |
[18] | Betker, J., Goh, G., Jing, L., et al. (2023) Improving Image Generation with Better Captions. Computer Science. https://cdn.openai.com/papers/dall-e-3.pdf |
[19] | Chen, T., Zhu, L., Ding, C., Cao, R., Wang, Y., Zhang, S., et al. (2023) SAM-Adapter: Adapting Segment Anything in Underperformed Scenes. 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, 2-6 October 2023, 3359-3367. https://doi.org/10.1109/iccvw60793.2023.00361 |
[20] | Wu, J., Ji, W., Liu, Y., et al. (2023) Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation. arXiv: 2304.12620. https://doi.org/10.48550/arXiv.2304.12620 |
[21] | Gong, S., Zhong, Y., Ma, W., Li, J., Wang, Z., Zhang, J., et al. (2024) 3DSAM-Adapter: Holistic Adaptation of SAM from 2D to 3D for Promptable Tumor Segmentation. Medical Image Analysis, 98, Article 103324. https://doi.org/10.1016/j.media.2024.103324 |
[22] | Chen, C., Miao, J., Wu, D., Zhong, A., Yan, Z., Kim, S., et al. (2024) MA-SAM: Modality-Agnostic SAM Adaptation for 3D Medical Image Segmentation. Medical Image Analysis, 98, Article 103310. https://doi.org/10.1016/j.media.2024.103310 |
[23] | He, K., Fan, H., Wu, Y., Xie, S. and Girshick, R. (2020) Momentum Contrast for Unsupervised Visual Representation Learning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 9726-9735. https://doi.org/10.1109/cvpr42600.2020.00975 |
[24] | He, Y., Yang, G., Ge, R., Chen, Y., Coatrieux, J., Wang, B., et al. (2023) Geometric Visual Similarity Learning in 3D Medical Image Self-Supervised Pre-Training. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 9538-9547. https://doi.org/10.1109/cvpr52729.2023.00920 |
[25] | Tang, Y., Yang, D., Li, W., Roth, H.R., Landman, B., Xu, D., et al. (2022) Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 20698-20708. https://doi.org/10.1109/cvpr52688.2022.02007 |
[26] | Du, H., Dong, Q., Xu, Y. and Liao, J. (2023) Weakly-Supervised 3D Medical Image Segmentation Using Geometric Prior and Contrastive Similarity. IEEE Transactions on Medical Imaging, 42, 2936-2947. https://doi.org/10.1109/tmi.2023.3269523 |
[27] | Cui, J., Zhong, Z., Tian, Z., et al. (2023) Generalized Parametric Contrastive Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46, 7463-7474. https://doi.org/10.1109/TPAMI.2023.3278694 |
[28] | Caron, M., Touvron, H., Misra, I., Jegou, H., Mairal, J., Bojanowski, P., et al. (2021) Emerging Properties in Self-Supervised Vision Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 9630-9640. https://doi.org/10.1109/iccv48922.2021.00951 |
[29] | Taleb, A., Loetzsch, W., Danz, N., et al. (2020) 3D Self-Supervised Methods for Medical Imaging. Advances in Neural Information Processing Systems, 33, 18158-18172. |
[30] | Zhou, H., Lu, C., Chen, C., Yang, S. and Yu, Y. (2023) A Unified Visual Information Preservation Framework for Self-Supervised Pre-Training in Medical Image Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 8020-8035. https://doi.org/10.1109/tpami.2023.3234002 |
[31] | Zhou, X., Gao, H., Xu, X., et al. (2022) PCRL: Priority Convention Reinforcement Learning for Microscopically Sequencable Multi-Agent Problems. 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, 28 November-9 December 2022. |
[32] | Zhang, Z. and Gong, X. (2023) Positional Label for Self-Supervised Vision Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 3516-3524. https://doi.org/10.1609/aaai.v37i3.25461 |
[33] | Tao, X., Li, Y., Zhou, W., Ma, K. and Zheng, Y. (2020) Revisiting Rubik’s Cube: Self-Supervised Learning with Volume-Wise Transformation for 3D Medical Image Segmentation. Medical Image Computing and Computer Assisted Intervention—MICCAI 2020, Lima, 4-8 October 2020, 238-248. https://doi.org/10.1007/978-3-030-59719-1_24 |
[34] | He, K., Chen, X., Xie, S., Li, Y., Dollar, P. and Girshick, R. (2022) Masked Autoencoders Are Scalable Vision Learners. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 15979-15988. https://doi.org/10.1109/cvpr52688.2022.01553 |
[35] | He, Z., Unberath, M., Ke, J. and Shen, Y. (2023) TransNuSeg: A Lightweight Multi-Task Transformer for Nuclei Segmentation. Medical Image Computing and Computer Assisted Intervention—MICCAI 2023, Vancouver, 8-12 October 2023, 206-215. https://doi.org/10.1007/978-3-031-43901-8_20 |
[36] | Chen, T., Kornblith, S., Norouzi, M., et al. (2020) A Simple Framework for Contrastive Learning of Visual Representations. International Conference on Machine Learning. PmLR, 1597-1607. |
[37] | Chen, X. and He, K. (2021) Exploring Simple Siamese Representation Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 15745-15753. https://doi.org/10.1109/cvpr46437.2021.01549 |
[38] | Yang, N., Zhang, Y., Wang, Y., Tang, D., Li, Y. and Yuan, D. (2024) Adaptformer: An Adaptive Multimodal Deep Decomposition Approach for Power Consumption Forecasting. Advanced Data Mining and Applications, Sydney, 3-5 December 2024, 48-62. https://doi.org/10.1007/978-981-96-0847-8_4 |