|
语言驱动的语义边缘检测
|
Abstract:
语义边缘检测致力于精确描绘对象边界并为各个像素分配类别标签,这对实现准确定位和分类提出了双重挑战。本研究介绍了语言驱动语义边缘检测,这是一个简单的框架,可增强语义轮廓检测模型。语言驱动语义边缘检测旨在利用嵌入在文本表示中的语义信息来重新校准边缘检测器的注意力,从而增强高级图像特征的判别能力。为了实现这一点,我们引入了文本特征信息,使用跨模态融合方式增强了边缘检测器的定位和分类。在SBD和CityScapes数据集上的实验结果表明,模型性能得到显著提升。例如,在CASENet中加入文本特征信息可将SBD数据集上的平均ODS得分从70.4提高到72.6。最终,语言驱动语义边缘检测实现了领先的平均ODS 77.0,超越了竞争对手。我们将展示更多额外的结合方法、主干网络的效果。
Semantic edge detection strives to accurately delineate object boundaries and assign category labels to individual pixels, which poses a dual challenge to achieve accurate localization and classification. This study introduces language-driven semantic edge detection, a simple framework that enhances semantic contour detection models. Language-driven semantic edge detection aims to leverage the semantic information embedded in text representations to recalibrate the attention of edge detectors, thereby enhancing the discriminative ability of high-level image features. To achieve this, we introduce text feature information and use cross-modal fusion to enhance the localization and classification of edge detectors. Experimental results on SBD and CityScapes datasets show that model performance is significantly improved. For example, adding text feature information to CASENet improves the average ODS score on the SBD dataset from 70.4 to 72.6. Ultimately, language-driven semantic edge detection achieves a leading average ODS of 77.0, surpassing the competition. We will show the effects of more additional combining methods and backbone networks.
[1] | Acuna, D., Kar, A. and Fidler, S. (2019). Devil Is in the Edges: Learning Semantic Boundaries from Noisy Annotations. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 11067-11075. https://doi.org/10.1109/cvpr.2019.01133 |
[2] | Liu, Y., Cheng, M., Fan, D., Zhang, L., Bian, J. and Tao, D. (2021) Semantic Edge Detection with Diverse Deep Supervision. International Journal of Computer Vision, 130, 179-198. https://doi.org/10.1007/s11263-021-01539-8 |
[3] | Yu, Z.D., Liu, W.Y., Zou, Y., Feng, C., et al. (2018) Simultaneous Edge Alignment and Learning. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 388-404. |
[4] | Yu, Z.D., Feng, C., Liu, M.-Y. and Ramalingam, S. (2017) CaseNet: Deep Category-Aware Semantic Edge Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 5964-5973. |
[5] | Hariharan, B., Arbelaez, P., Bourdev, L., Maji, S. and Malik, J. (2011) Semantic Contours from Inverse Detectors. 2011 International Conference on Computer Vision, Barcelona, 6-13 November 2011, 991-998. https://doi.org/10.1109/iccv.2011.6126343 |
[6] | Li, B.Y., Weinberger, K.Q., Belongie, S., Koltun, V. and Ranftl, R. (2022) Language-Driven Semantic Segmentation. International Conference on Learning Representations, 25-29 April 2022. |
[7] | Radford, A., Kim, J.W., Hallacy, C., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. International Conference on Machine Learning, Online, 18-24 July 2021, 8748-8763. |
[8] | Canny, J. (1986) A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 679-698. https://doi.org/10.1109/tpami.1986.4767851 |
[9] | Fram, J.R. and Deutsch, E.S. (1975) On the Quantitative Evaluation of Edge Detection Schemes and Their Comparison with Human Performance. IEEE Transactions on Computers, 24, 616-628. https://doi.org/10.1109/t-c.1975.224274 |
[10] | Kittler, J. (1983) On the Accuracy of the Sobel Edge Detector. Image and Vision Computing, 1, 37-42. https://doi.org/10.1016/0262-8856(83)90006-9 |
[11] | Perona, P. and Malik, J. (1990) Scale-Space and Edge Detection Using Anisotropic Diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 629-639. https://doi.org/10.1109/34.56205 |
[12] | Lowe, D.G. (2004) Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision, 60, 91-110. https://doi.org/10.1023/b:visi.0000029664.99615.94 |
[13] | Senthilkumaran, N. and Rajesh, R. (2009) Edge Detection Techniques for Image Segmentation—A Survey of Soft Computing Approaches. International Journal of Recent Trends in Engineering, 1, 250-254. |
[14] | Siddiqui, M. and Medioni, G. (2010) Human Pose Estimation from a Single View Point, Real-Time Range Sensor. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, San Francisco, 13-18 June 2010, 1-8. https://doi.org/10.1109/cvprw.2010.5543618 |
[15] | Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 25, 1-9. |
[16] | Su, Z., Liu, W.Z., Yu, Z.T., et al. (2021) Pixel Difference Networks for Efficient Edge Detection. |
[17] | Xie, S.N. and Tu, Z.W. (2015) Holistically-Nested Edge Detection. Proceedings of the IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 1395-1403. |
[18] | Zhou, C., Huang, Y., Pu, M., Guan, Q., Huang, L. and Ling, H. (2023) The Treasure beneath Multiple Annotations: An Uncertainty-Aware Edge Detector. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 15507-15517. https://doi.org/10.1109/cvpr52729.2023.01488 |
[19] | Liu, Y., Cheng, M.-M., Hu, X.W., Wang, K. and Bai, X. (2016) Richer Convolutional Features for Edge Detection. |
[20] | Deng, R.X., Shen, C.H., Liu, S.J., et al. (2018) Learning to Predict Crisp Boundaries. Proceedings of the European Conference on Computer Vision (ECCV), Munich, 8-14 September 2018, 562-578. |
[21] | Deng, R. and Liu, S. (2020) Deep Structural Contour Detection. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, 12-16 October 2020, 304-312. https://doi.org/10.1145/3394171.3413750 |
[22] | Pu, M.Y., Huang, Y.P., Liu, Y.M., Guan, Q.J. and Ling, H.B. (2022) Edter: Edge Detection with Transformer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 18-24 June 2022, 1402-1412. |
[23] | Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141. https://doi.org/10.1109/cvpr.2018.00745 |
[24] | Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., et al. (2016) The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 3213-3223. https://doi.org/10.1109/cvpr.2016.350 |
[25] | Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. |
[26] | He, K.M., Zhang, X.Y., Ren, S.Q. and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 770-778. |
[27] | Tan, M.X. and Le, Q. (2019) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. International Conference on Machine Learning, Long Beach, 10-15 June 2019, 6105-6114. |
[28] | Hu, Y., Chen, Y.P., Li, X. and Feng, J.S. (2019) Dynamic Feature Fusion for Semantic Edge Detection. |
[29] | Reimers, N. and Gurevych, I. (2019). Sentence-Bert: Sentence Embeddings Using Siamese Bert-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, 3-7 November 2019, 3982-3992. https://doi.org/10.18653/v1/d19-1410 |
[30] | Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014) Microsoft COCO: Common Objects in Context. In: Lecture Notes in Computer Science, Springer, 740-755. https://doi.org/10.1007/978-3-319-10602-1_48 |