|
深度学习在高分辨率遥感图像语义分割中的算法研究
|
Abstract:
遥感图像语义分割是为遥感图像分配像素级语义标签的计算机视觉任务。随着传感器技术以及深度学习的发展,深度学习算法在精度与速度上远超传统算法。其中,基于深度学习的高分辨率遥感图像语义分割的算法成为众多学者的主要研究方向之一。本文主要针对深度学习在遥感图像语义分割中的相关算法以及网络结构进行介绍。首先介绍语义分割CNN网络,其次分别从三个方面对高分辨率遥感图像语义分割算法进行阐述:一是结合多尺度、多阶段、上下文聚合策略,二是在语义分割之后采用后处理技术,三是结合注意力机制。随后介绍经典数据集,最后对未来深度学习在高分辨率遥感图像语义分割中的算法的发展进行总结与展望。
Remote sensing image semantic segmentation is a computer vision task to assign pixel level semantic labels to remote sensing images. With the development of sensor technology and deep learning, deep learning algorithm is far superior to traditional algorithms in accuracy and speed. Among them, the algorithm of high-resolution remote sensing image semantic segmentation based on deep learning has become one of the main research directions of many scholars. This paper mainly introduces the related algorithms and network structure of deep learning in remote sensing image semantic segmentation. First, the semantic segmentation CNN network is introduced, and then the semantic segmentation algorithm of high-resolution remote sensing images is described from three directions: first, combining multi-scale, multi-stage, context aggregation strategies, second, using post-processing technology after semantic segmentation, and third, combining attention mechanism. Then we introduce the classical datasets, and finally, summarize and prospect the development of deep learning algorithm in the high-resolution remote sensing images semantic segmentation in the future.
[1] | Maggiori, E., Tarabalka, Y., Charpiat, G. and Alliez, P. (2017) Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification. IEEE Transactions on Geoscience and Remote Sensing, 55, 645-657.
https://doi.org/10.1109/TGRS.2016.2612821 |
[2] | Cheng, G., Zhou, P. and Han, J. (2016) Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 54, 7405-7415.
https://doi.org/10.1109/TGRS.2016.2601622 |
[3] | Zhu, H., Jiao, L., Ma, W., Liu, F. and Zhao, W. (2019) A Novel Neural Network for Remote Sensing Image Matching. IEEE Transactions on Neural Networks and Learning Systems, 30, 2853-2865.
https://doi.org/10.1109/TNNLS.2018.2888757 |
[4] | Zhu, H., Ma, W., Li, L., Jiao, L., Yang, S. and Hou, B. (2020) A Dual-Branch Attention Fusion Deep Network for Multiresolution Remote-Sensing Image Classification. Information Fusion, 58, 116-131.
https://doi.org/10.1016/j.inffus.2019.12.013 |
[5] | Maboudi, M., Amini, J., Malihi, S. and Hahn, M. (2018) Integrating Fuzzy Object Based Image Analysis and Ant Colony Optimization for Road Extraction from Remotely Sensed Images. ISPRS Journal of Photogrammetry and Remote Sensing, 138, 151-163. https://doi.org/10.1016/j.isprsjprs.2017.11.014 |
[6] | Zhang, Q. and Seto, K.C. (2011) Mapping Urbanization Dynamics at Regional and Global Scales Using Multi-Temporal DMSP/OLS Nighttime Light Data. Remote Sensing of Environment, 115, 2320-2329.
https://doi.org/10.1016/j.rse.2011.04.032 |
[7] | Marcos, D., Volpi, M., Kellenberger, B. and Tuia, D. (2018) Land Cover Mapping at Very High Resolution with Rotation Equivariant CNNs: Towards Small Yet Accurate Models. ISPRS Journal of Photogrammetry and Remote Sensing, 145, 96-107. https://doi.org/10.1016/j.isprsjprs.2018.01.021 |
[8] | Li, A., Jiao, L., Zhu, H., Li, L. and Liu, F. (2022) Multitask Semantic Boundary Awareness Network for Remote Sensing Image Segmentation. IEEE Transactions on Geoscience and Remote Sensing, 60, 1-14.
https://doi.org/10.1109/TGRS.2021.3050885 |
[9] | Maxwell, S.K., Schmidt, G.L. and Storey, J.C. (2007) A Multi-Scale Segmentation Approach to Filling Gaps in Landsat ETM+ SLC-Off Images. International Journal of Remote Sensing, 28, 5339-5356.
https://doi.org/10.1080/01431160601034902 |
[10] | Ton, J., Sticklen, J. and Jain, A.K. (1991) Knowledge-Based Segmentation of Landsat Images. IEEE Transactions on Geoscience and Remote Sensing, 29, 222-232. https://doi.org/10.1109/36.73663 |
[11] | Liu, D., Han, L., Ning, X. and Zhu, Y. (2018) A Segmentation Method for High Spatial Resolution Remote Sensing Images Based on the Fusion of Multifeatures. IEEE Geoscience and Remote Sensing Letters, 15, 1274-1278.
https://doi.org/10.1109/LGRS.2018.2829807 |
[12] | Lu, L., Wang, C. and Yin, X. (2019) Incorporating Texture into SLIC Super-Pixels Method for High Spatial Resolution Remote Sensing Image Segmentation. 2019 8th International Conference on Agro-Geoinformatics, Istanbul, 16-19 July 2019, 1-5. https://doi.org/10.1109/Agro-Geoinformatics.2019.8820692 |
[13] | Yang, P., Hou, Z., Liu, X. and Shi, Z. (2016) Texture Feature Extraction of Mountain Economic Forest Using High Spatial Resolution Remote Sensing Images. IEEE International Geoscience and Remote Sensing Symposium, Beijing, 10-15 July 2016, 3156-3159. https://doi.org/10.1109/IGARSS.2016.7729816 |
[14] | Fu, Y., et al. (2017) An Improved Combination of Spectral and Spatial Features for Vegetation Classification in Hyperspectral Images. Remote Sensing, 9, Article No. 261. https://doi.org/10.3390/rs9030261 |
[15] | Tatsumi, K., Yamashiki, Y., Canales Torres, M.A. and Taipe, C.L.R. (2015) Crop Classification of Upland Fields Using Random Forest of Time-Series Landsat 7 ETM+ Data. Computers and Electronics in Agriculture, 115, 171-179.
https://doi.org/10.1109/TGRS.2007.907109 |
[16] | Zhong, P. and Wang, R. (2007) A Multiple Conditional Random Fields Ensemble Model for Urban Area Detection in Remote Sensing Optical Images. IEEE Transactions on Geoscience and Remote Sensing, 45, 3978-3988.
https://doi.org/10.1109/TGRS.2007.907109 |
[17] | Adede, C., Oboko, R., Wagacha, P.W. and Atzberger, C. (2019) A Mixed Model Approach to Vegetation Condition Prediction Using Artificial Neural Networks (ANN): Case of Kenya’s Operational Drought Monitoring. Remote Sensing, 11, Article No. 1099. https://doi.org/10.3390/rs11091099 |
[18] | Zhang, C., et al. (2018) A Hybrid MLP-CNN Classifier for Very Fine Resolution Remotely Sensed Image Classification. ISPRS Journal of Photogrammetry and Remote Sensing, 140, 133-144.
https://doi.org/10.1016/j.isprsjprs.2017.07.014 |
[19] | Wang, L., Li, R., Duan, C., Zhang, C., Meng, X. and Fang, S. (2021) A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images. ArXiv: 2104.12137. http://arxiv.org/abs/2104.12137 |
[20] | Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, 7-12 June 2015, 3431-3440.
https://doi.org/10.1109/CVPR.2015.7298965 |
[21] | Badrinarayanan, V., Kendall, A. and Cipolla, R. (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495.
https://doi.org/10.1109/TPAMI.2016.2644615 |
[22] | Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W. and Frangi, A., Eds., Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Springer, Cham, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 |
[23] | Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2014) Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. ArXiv: 1412.7062. |
[24] | Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848. https://doi.org/10.1109/TPAMI.2017.2699184 |
[25] | Yu, F. and Koltun, V. (2015) Multi-Scale Context Aggregation by Dilated Convolutions. ArXiv: 1511.07122. |
[26] | Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F. and Adam, H. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 801-818. https://doi.org/10.1109/CVPR.2016.396 |
[27] | Paszke, A., Chaurasia, A., Kim, S. and Culurciello, E. (2016) Enet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. ArXiv: 1606.02147. |
[28] | Zhao, H., Qi, X., Shen, X., Shi, J. and Jia, J. (2018) ICNet for Real-Time Semantic Segmentation on High-Resolution Images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 418-434. https://doi.org/10.1007/978-3-030-01219-9_25 |
[29] | Chen, L.C., Yang, Y., Wang, J., Xu, W. and Yuille, A.L. (2016) Attention to Scale: Scale-Aware Semantic Image Segmentation. 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, 27-30 June 2016, 3640-3649. https://doi.org/10.1109/CVPR.2016.396 |
[30] | Hou, L., Vicente, T.F.Y., Hoai, M. and Samaras, D. (2021) Large Scale Shadow Annotation and Detection Using Lazy Annotation and Stacked CNNs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1337-1351.
https://doi.org/10.1109/TPAMI.2019.2948011 |
[31] | Kirillov, A., Girshick, R., He, K. and Dollár, P. (2019) Panoptic Feature Pyramid Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, 15-20 June 2019, 6392-6401.
https://doi.org/10.1109/CVPR.2019.00656 |
[32] | Lin, G., Milan, A., Shen, C. and Reid, I. (2017) RefineNet: Multi-Path refinement Networks for High-Resolution Semantic Segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, 21-26 July 2017, 5168-5177. https://doi.org/10.1109/CVPR.2017.549 |
[33] | Zhao, H., Shi, J., Qi, X., Wang, X. and Jia, J. (2017) Pyramid Scene Parsing Network. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 6230-6239.
https://doi.org/10.1109/CVPR.2017.660 |
[34] | Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W. and Xiao, B. (2020) Deep High-Resolution Representation Learning for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3349-3364. https://doi.org/10.1109/TPAMI.2020.2983686 |
[35] | Zhao, W. and Du, S. (2016) Learning Multiscale and Deep Representations for Classifying Remotely Sensed Imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 113, 155-165. https://doi.org/10.1016/j.isprsjprs.2016.01.004 |
[36] | Cheng, D., Meng, G., Xiang, S. and Pan, C. (2017) FusionNet: Edge Aware Deep Convolutional Networks for Semantic Segmentation of Remote Sensing Harbor Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10, 5769-5783. https://doi.org/10.1109/JSTARS.2017.2747599 |
[37] | Marmanis, D., Schindler, K., Wegner, J.D., Galliani, S., Datcu, M. and Stilla, U. (2018) Classification with an Edge: Improving Semantic Image Segmentation with Boundary Detection. ISPRS Journal of Photogrammetry and Remote Sensing, 135, 158-172. https://doi.org/10.1016/j.isprsjprs.2017.11.009 |
[38] | Chen, J., Zhu, J., Sun, G., Li, J. and Deng, M. (2021) SMAF-Net: Sharing Multiscale Adversarial Feature for High-Resolution Remote Sensing Imagery Semantic Segmentation. IEEE Geoscience and Remote Sensing Letters, 18, 1921-1925. https://doi.org/10.1109/LGRS.2020.3011151 |
[39] | Ma, B. and Chang, C.-Y. (2022) Semantic Segmentation of High-Resolution Remote Sensing Images Using Multiscale Skip Connection Network. IEEE Sensors Journal, 22, 3745-3755. https://doi.org/10.1109/JSEN.2021.3139629 |
[40] | Xia, F., Wang, P., Chen, L.-C. and Yuille, A.L. (2016) Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer Vision—ECCV 2016, Springer, Cham, 648-663. https://doi.org/10.1007/978-3-319-46454-1_39 |
[41] | Takahama, S., Kurose, Y., Mukuta, Y., Abe, H., Fukayama, M., Yoshizawa, A., Kitagawa, M. and Harada, T. (2019) Multi-Stage Pathological Image Classification Using Semantic Segmentation. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 10701-10710.
https://doi.org/10.1109/ICCV.2019.01080 |
[42] | Liu, Y., Fan, B., Wang, L., Bai, J., Xiang, S. and Pan, C. (2018) Semantic Labeling in Very High Resolution Images via a Self-Cascaded Convolutional Neural Network. ISPRS Journal of Photogrammetry and Remote Sensing, 145, 78-95. https://doi.org/10.1016/j.isprsjprs.2017.12.007 |
[43] | Liu, W., Rabinovich, A. and Berg, A.C. (2015) ParseNet: Looking Wider to See Better. ArXiv: 1506.04579. |
[44] | Yu, C., Wang, J., Peng, C., Gao, C., Yu, G. and Sang, N. (2018) BiseNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 334-349. https://doi.org/10.1007/978-3-030-01261-8_20 |
[45] | Tokunaga, H., Teramoto, Y., Yoshizawa, A. and Bise, R. (2019) Adaptive Weighting Multi-Field-of-View CNN for Semantic Segmentation in Pathology. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 12589-12598. https://doi.org/10.1109/CVPR.2019.01288 |
[46] | Chen, W., Jiang, Z., Wang, Z., Cui, K. and Qian, X. (2019) Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 8916-8925. https://doi.org/10.1109/CVPR.2019.00913 |
[47] | Li, Q., Yang, W., Liu, W., Yu, Y. and He, S. (2021) From Contexts to Locality: Ultra-High Resolution Image Segmentation via Locality-Aware Contextual Correlation. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, 10-17 October 2021, 7232-7241. https://doi.org/10.1109/ICCV48922.2021.00716 |
[48] | Bai, H., Cheng, J., Huang, X., Liu, S. and Deng, C. (2022) HCANet: A Hierarchical Context Aggregation Network for Semantic Segmentation of High-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 19, 1-5. https://doi.org/10.1109/LGRS.2021.3063799 |
[49] | He, K., Sun, J. and Tang, X. (2010) Guided Image Filtering. In: Daniilidis, K., Maragos, P. and Paragios, N., Eds., Computer Vision—ECCV 2010, Springer, Berlin, 1-14. |
[50] | Wu, H., Zheng, S., Zhang, J. and Huang, K. (2018) Fast End-to-End Trainable Guided Filter. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June2018, 1838-1847.
https://doi.org/10.1109/CVPR.2018.00197 |
[51] | Li, K., Hariharan, B. and Malik, J. (2016) Iterative Instance Segmentation. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 3659-3667. https://doi.org/10.1109/CVPR.2016.398 |
[52] | Cheng, H.K., Chung, J., Tai, Y.-W. and Tang, C.-K. (2020) CascadePSP: Toward Class-Agnostic and Very Highresolution Segmentation via Global and Local Refinement. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 8887-8896. https://doi.org/10.1109/CVPR42600.2020.00891 |
[53] | Kirillov, A., Wu, Y., He, K. and Girshick, R. (2020) PointRend: Image Segmentation as Rendering. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 13-19 June 2020, 9796-9805.
https://doi.org/10.1109/CVPR42600.2020.00982 |
[54] | Huynh, C., Tran, A.T., Luu, K. and Hoai, M. (2021) Progressive Semantic Segmentation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, 20-25 June 2021, 16750-16759.
https://doi.org/10.1109/CVPR46437.2021.01648 |
[55] | Zi, W., Xiong, W., Chen, H., Li, J. and Jing, N. (2021) SGA-Net: Self-Constructing Graph Attention Neural Network for Semantic Segmentation of Remote Sensing Images. Remote Sensing, 13, Article No. 4201.
https://doi.org/10.3390/rs13214201 |
[56] | Lv, L., Guo, Y., Bao, T., Fu, C., Huo, H. and Fang, T. (2021) MFALNet: A Multiscale Feature Aggregation Lightweight Network for Semantic Segmentation of High-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 18, 2172-2176. https://doi.org/10.1109/LGRS.2020.3012705 |