|
基于自相似结构特征和显著特征深度正交融合的图像检索
|
Abstract:
复杂场景下,由于图像内容复杂,细节信息丰富,以致深度学习网络提取的特征难以有效表达图像的重点信息。本文提出了融合正交显著特征和自相似描述符的图像检索模型。设计了自相似结构分支,获得图像局部自相似结构特征,将其编码为紧凑的自相似描述符,以有效描述图像内的结构信息;引入了注意力分支,将特征图中各通道相同位置的像素点作为一个向量,通过范数注意力生成包含显著特征的向量,通过自注意力和交叉注意力得到增强的显著特征。最后,引入了一个正交融合模块,融合结构特征和显著特征,从而得到复杂场景下图像的有效特征。实验证明,通过融合显著特征和结构特征,我们可以很好地提升基于全局表示的图像检索性能。
In complex scenes, due to the intricate content and rich details of images, the features extracted by deep learning networks often fail to effectively represent the key information of the image. In this paper, we propose an image retrieval model that integrates orthogonal salient features and self-similarity descriptors. We design a self-similarity structural branch to obtain local self-similarity structural features of the image, which are encoded into compact self-similarity descriptors to effectively describe the structural information within the image. Additionally, an attention branch is introduced, where the pixels at the same position across all channels of the feature map are treated as a vector. Norm-based attention is used to generate a vector containing salient features, and enhanced salient features are obtained through both self-attention and cross-attention mechanisms. Finally, an orthogonal fusion module is introduced to combine the structural features and salient features, resulting in effective features for image retrieval in complex scenes. Experimental results demonstrate that by integrating salient features and structural features, we can significantly improve the performance of image retrieval based on global representations.
[1] | Noh, H., Araujo, A., Sim, J., Weyand, T. and Han, B. (2017) Large-Scale Image Retrieval with Attentive Deep Local Features. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 3476-3485. https://doi.org/10.1109/iccv.2017.374 |
[2] | Lee, S., Lee, S., Seong, H. and Kim, E. (2023) Revisiting Self-Similarity: Structural Embedding for Image Retrieval. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 23412-23421. https://doi.org/10.1109/cvpr52729.2023.02242 |
[3] | Ng, T., et al. (2020) SOLAR: Second-Order Loss and Attention for Image Retrieval. Computer Vision-ECCV 2020: 16th European Conference, Glasgow, 23-28 August 2020, 253-270. https://doi.org/10.1007/978-3-030-58595-2_16 |
[4] | Cao, B.Y., Araujo, A. and Sim, J. (2020) Unifying Deep Local and Global Features for Image Search. Computer Vision-ECCV 2020: 16th European Conference, Glasgow, 23-28 August 2020, 726-743. https://doi.org/10.1007/978-3-030-58565-5_43 |
[5] | Wu, H., Wang, M., Zhou, W., Hu, Y. and Li, H. (2022) Learning Token-Based Representation for Image Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 2703-2711. https://doi.org/10.1609/aaai.v36i3.20173 |
[6] | Shao, S., Chen, K., Karpur, A., Cui, Q., Araujo, A. and Cao, B. (2023) Global Features Are All You Need for Image Retrieval and Reranking. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 11002-11012. https://doi.org/10.1109/iccv51070.2023.01013 |
[7] | Yang, M., He, D., Fan, M., Shi, B., Xue, X., Li, F., et al. (2021) DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 11752-11761. https://doi.org/10.1109/iccv48922.2021.01156 |
[8] | Zhang, Z., Wang, L., Zhou, L. and Koniusz, P. (2023) Learning Spatial-Context-Aware Global Visual Feature Representation for Instance Image Retrieval. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 11216-11225. https://doi.org/10.1109/iccv51070.2023.01033 |
[9] | Kwon, H., Kim, M., Kwak, S. and Cho, M. (2021) Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 13045-13055. https://doi.org/10.1109/iccv48922.2021.01282 |
[10] | Shechtman, E. and Irani, M. (2007) Matching Local Self-Similarities across Images and Videos. 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, 17-22 June 2007, 1-8. https://doi.org/10.1109/cvpr.2007.383198 |
[11] | Deselaers, T. and Ferrari, V. (2010) Global and Efficient Self-Similarity for Object Classification and Detection. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 13-18 June 2010, 1633-1640. https://doi.org/10.1109/cvpr.2010.5539775 |
[12] | Fan, J., Xiong, Q., Ye, Y. and Li, J. (2023) Combining Phase Congruency and Self-Similarity Features for Multimodal Remote Sensing Image Matching. IEEE Geoscience and Remote Sensing Letters, 20, 1-5. https://doi.org/10.1109/lgrs.2023.3239191 |
[13] | Ma, J., Jiang, X., Fan, A., Jiang, J. and Yan, J. (2020) Image Matching from Handcrafted to Deep Features: A Survey. International Journal of Computer Vision, 129, 23-79. https://doi.org/10.1007/s11263-020-01359-2 |
[14] | Song, T., Kim, S. and Sohn, K. (2023) Unsupervised Deep Asymmetric Stereo Matching with Spatially-Adaptive Self-similarity. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 13672-13680. https://doi.org/10.1109/cvpr52729.2023.01314 |
[15] | Wang, H., Zhang, R., Feng, M., Liu, Y. and Yang, G. (2023) Global Context-Based Self-Similarity Feature Augmentation and Bidirectional Feature Fusion for Surface Defect Detection. IEEE Transactions on Instrumentation and Measurement, 72, 1-12. https://doi.org/10.1109/tim.2023.3309374 |
[16] | Hu, Z. and Bors, A.G. (2023) Co-attention Enabled Content-Based Image Retrieval. Neural Networks, 164, 245-263. https://doi.org/10.1016/j.neunet.2023.04.009 |
[17] | Zhang, J., Xia, K., Huang, Z., Wang, S. and Akindele, R.G. (2023) ETAM: Ensemble Transformer with Attention Modules for Detection of Small Objects. Expert Systems with Applications, 224, Article ID: 119997. https://doi.org/10.1016/j.eswa.2023.119997 |
[18] | Zhou, Q., Shi, H., Xiang, W., Kang, B. and Latecki, L.J. (2024) DPNet: Dual-Path Network for Real-Time Object Detection with Lightweight Attention. IEEE Transactions on Neural Networks and Learning Systems, 1-15. https://doi.org/10.1109/tnnls.2024.3376563 |
[19] | Woo, S., Park, J., Lee, J. and Kweon, I.S. (2018) CBAM: Convolutional Block Attention Module. Computer Vision—ECCV, Munich, 8-14 September 2018, 3-19. https://doi.org/10.1007/978-3-030-01234-2_1 |
[20] | Dosovitskiy, A., et al. (2020) An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale. |
[21] | Song, C.H., Yoon, J., Choi, S. and Avrithis, Y. (2023) Boosting Vision Transformers for Image Retrieval. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 2-7 January 2023, 107-117. https://doi.org/10.1109/wacv56688.2023.00019 |
[22] | Zhou, Z., Li, G. and Wang, G. (2023) A Hybrid of Transformer and CNN for Efficient Single Image Super-Resolution via Multi-Level Distillation. Displays, 76, Article ID: 102352. https://doi.org/10.1016/j.displa.2022.102352 |
[23] | Yuan, F., Zhang, Z. and Fang, Z. (2023) An Effective CNN and Transformer Complementary Network for Medical Image Segmentation. Pattern Recognition, 136, Article ID: 109228. https://doi.org/10.1016/j.patcog.2022.109228 |
[24] | Kang, D., Kwon, H., Min, J. and Cho, M. (2021) Relational Embedding for Few-Shot Classification. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 10-17 October 2021, 8802-8813. https://doi.org/10.1109/iccv48922.2021.00870 |
[25] | Ye, Y., Yu, C., Chang, Y., Zhu, L., Zhao, X., Yan, L., et al. (2022) Unsupervised Deraining: Where Contrastive Learning Meets Self-Similarity. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 5811-5820. https://doi.org/10.1109/cvpr52688.2022.00573 |
[26] | Wu, L., Liu, D., Zhang, W., Chen, D., Ge, Z., Boussaid, F., et al. (2022) Pseudo-pair Based Self-Similarity Learning for Unsupervised Person Re-identification. IEEE Transactions on Image Processing, 31, 4803-4816. https://doi.org/10.1109/tip.2022.3186746 |
[27] | Pang, Y., Zhang, H., Zhu, L., Liu, D. and Liu, L. (2024) Self-Similarity Guided Probabilistic Embedding Matching Based on Transformer for Occluded Person Re-identification. Expert Systems with Applications, 237, Article ID: 121504. https://doi.org/10.1016/j.eswa.2023.121504 |
[28] | Chen, Y., Zhang, Z., Wang, Y., Zhang, Y., Feng, R., Zhang, T., et al. (2022) Ae-net: Fine-Grained Sketch-Based Image Retrieval via Attention-Enhanced Network. Pattern Recognition, 122, Article ID: 108291. https://doi.org/10.1016/j.patcog.2021.108291 |
[29] | Zhu, M., et al. (2023) Domain-Aware Double Attention Network for Zero-Shot Sketch-Based Image Retrieval with Similarity Loss. The Visual Computer, 40, 3091-3101. |
[30] | Hou, D., Wang, S., Tian, X. and Xing, H. (2022) An Attention-Enhanced End-to-End Discriminative Network with Multiscale Feature Learning for Remote Sensing Image Retrieval. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 8245-8255. https://doi.org/10.1109/jstars.2022.3208107 |
[31] | Song, C.H., Han, H.J. and Avrithis, Y. (2022) All the Attention You Need: Global-Local, Spatial-Channel Attention for Image Retrieval. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2022, 439-448. https://doi.org/10.1109/wacv51458.2022.00051 |
[32] | Jegou, H., Perronnin, F., Douze, M., Sanchez, J., Perez, P. and Schmid, C. (2012) Aggregating Local Image Descriptors into Compact Codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 1704-1716. https://doi.org/10.1109/tpami.2011.235 |
[33] | Tolias, G., Avrithis, Y. and Jegou, H. (2013) To Aggregate or Not to Aggregate: Selective Match Kernels for Image Search. 2013 IEEE International Conference on Computer Vision, Sydney, 1-8 December 2013, 1401-1408. https://doi.org/10.1109/iccv.2013.177 |
[34] | Weyand, T., Araujo, A., Cao, B. and Sim, J. (2020) Google Landmarks Dataset V2—A Large-Scale Benchmark for Instance-Level Recognition and Retrieval. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 2572-2581. https://doi.org/10.1109/cvpr42600.2020.00265 |