|
基于编码器–解码器架构和Potts模型的图像分割方法
|
Abstract:
深度学习驱动的图像分割在医学影像和自动驾驶等领域成效显著,但其黑箱决策机制导致模型选择与超参数调整缺乏理论指导,依赖大数据和高算力支撑。相较之下,基于变分模型的方法虽多受限于局部特征提取,易忽略全局上下文关联,但其通过融合全局统计规律与局部平滑约束的特性,在数学可解释性和抗噪声伪影方面展现优势。因此,本文提出了一种基于Potts模型展开的与U-Net相似的架构,旨在提升图像分割的准确性和鲁棒性。与传统U-Net不同,本文在下采样和上采样过程中引入了基于Potts模型的正则化块,以增强分割过程中的区域一致性和边缘保留能力。通过HQS (半二次分裂)方法求解Potts模型,并结合FoE正则化项,使用可训练的离散余弦变换(DCT)-高斯卷积实现了梯度算子的学习,激活函数采用软阈值公式(STF)。此外,为了捕获全局上下文信息并处理远距离依赖,在网络的最底层加入了Transformer结构,进一步改善分割效果。实验结果表明,本文提出的模型在少量参数和数据集上能够有效学习特征,提高分割精度。本研究为图像分割任务提供了新的视角,展示了结合深度神经网络与传统变分模型架构的广阔潜力。
Deep learning-driven image segmentation has demonstrated significant efficacy in fields such as medical imaging and autonomous driving. However, its black-box decision-making mechanisms lead to a lack of theoretical guidance for model selection and hyperparameter tuning, with heavy reliance on large datasets and high computational resources. In contrast, variational model-based methods, though often limited by local feature extraction and neglect of global contextual relationships, exhibit advantages in mathematical interpretability and noise/artifact resistance through their integration of global statistical patterns and local smoothness constraints. This paper proposes a U-Net-inspired architecture based on the Potts model unfolding framework to enhance segmentation accuracy and robustness. Unlike traditional U-Net, our method introduces Potts model-derived regularization blocks during downsampling and upsampling to strengthen region consistency and edge preservation capabilities. The Potts model is solved via the Half-Quadratic Splitting (HQS) method, combined with a Fields of Experts (FoE) regularization term. Trainable Discrete Cosine Transform (DCT)-Gaussian convolutions are employed to learn gradient operators, with activation functions adopting the Soft Thresholding Formula (STF). Additionally, a Transformer structure is integrated at the network’s deepest layer to capture global contextual information and address long-range dependencies, further refining segmentation performance. Experimental results demonstrate that our model effectively learns features with limited parameters and datasets while improving segmentation precision. This study offers a novel perspective for image segmentation tasks, highlighting the vast potential of hybrid architectures that combine deep neural networks with classical variational models.
[1] | Otsu, N. (1979) A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9, 62-66. https://doi.org/10.1109/tsmc.1979.4310076 |
[2] | Adams, R. and Bischof, L. (1994) Seeded Region Growing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 641-647. https://doi.org/10.1109/34.295913 |
[3] | Canny, J. (1986) A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 679-698. https://doi.org/10.1109/tpami.1986.4767851 |
[4] | Osher, S. and Sethian, J.A. (1988) Fronts Propagating with Curvature-Dependent Speed: Algorithms Based on Hamilton-Jacobi Formulations. Journal of Computational Physics, 79, 12-49. https://doi.org/10.1016/0021-9991(88)90002-2 |
[5] | Chan, T.F. and Vese, L.A. (2001) Active Contours without Edges. IEEE Transactions on Image Processing, 10, 266-277. https://doi.org/10.1109/83.902291 |
[6] | Samson, C., Blanc-Féraud, L., Aubert, G. and Zerubia, J. (2000) A Level Set Model for Image Classification. International Journal of Computer Vision, 40, 187-197. https://doi.org/10.1023/a:1008183109594 |
[7] | 李忠伟, 潘振宽, 倪明玖. 基于TV模型的多相图像分割变分水平集方法[C]//第五届图像图形技术与应用学术会议论文集. 北京: 北京图像图形学学会, 2010: 49-56. |
[8] | Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3431-3440. https://doi.org/10.1109/cvpr.2015.7298965 |
[9] | Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, 5-9 October 2015, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 |
[10] | Kirillov, A., Wu, Y., He, K. and Girshick, R. (2023) Segment Anything. Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, 2-6 October 2023, 4015-4026. |
[11] | Chen, X., Papandreou, G. and Schroff, F. (2016) Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected Conditional Random Fields. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 2437-2446. |
[12] | Chen, Y., Qi, H. and Dee, D. (2020) Deep Variational Image Segmentation via a Conditional Generative Model. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 10843-10850. |
[13] | Tai, X., Liu, H. and Chan, R. (2024) PottsMGnet: A Mathematical Explanation of Encoder-Decoder Based Neural Networks. SIAM Journal on Imaging Sciences, 17, 540-594. https://doi.org/10.1137/23m1586355 |
[14] | Chen, Y. and Pock, T. (2017) Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1256-1272. https://doi.org/10.1109/tpami.2016.2596743 |
[15] | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, M. and Tsipras, S. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010. |
[16] | Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A. and Zagoruyko, S. (2020) End-to-End Object Detection with Transformers. 16th European Conference on Computer Vision, Glasgow, 23-28 August 2020, 213-229. https://doi.org/10.1007/978-3-030-58452-8_13 |
[17] | Shotton, J.B., Winn, J.M., Rother, C. and Torr, P.H. (2006) TextonBoost for Image Segmentation and Recognition. Proceedings of the European Conference on Computer Vision (ECCV), Graz, 7-13 May 2006, 1-15. |
[18] | Zach, C., Gallup, D., Frahm, J.M. and Niethammer, M. (2008) Fast Global Labeling for Real-Time Stereo Using Multiple Plane Sweeps. 13th International Fall Workshop on Vision, Modeling, and Visualization, VMV 2008, Konstanz, 8-10 October 2008, 243-252. |
[19] | Bae, E., Yuan, J. and Tai, X. (2010) Global Minimization for Continuous Multiphase Partitioning Problems Using a Dual Approach. International Journal of Computer Vision, 92, 112-129. https://doi.org/10.1007/s11263-010-0406-y |
[20] | Roth, S. and Black, M.J. (2009) Fields of Experts. International Journal of Computer Vision, 82, 205-229. https://doi.org/10.1007/s11263-008-0197-6 |
[21] | Dosovitskiy, A., Beyer, D., Kolesnikov, A., Zhai, X. and Hoffmann, T. (2021) An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations (ICLR), 3-7 May 2021. |
[22] | Kingma, D.P. and Ba, J. (2015) Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, 7-9 May 2015, 1-15. |
[23] | Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N. and Liang, J. (2018) Unet++: A Nested U-Net Architecture for Medical Image Segmentation. In: Stoyanov, D., et al., Eds., Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer International Publishing, 3-11. https://doi.org/10.1007/978-3-030-00889-5_1 |
[24] | Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944. https://doi.org/10.1109/cvpr.2017.106 |
[25] | Chen, L. C., Papandreou, G., Schroff, F., Adam, H. (2017) Rethinking Atrous Convolution for Semantic Image Seg-mentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1-10. |