全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

TopN成对相似度迁移的三元组跨模态检索
Triplet Cross-Modal Retrieval Based on TopN Pairwise Similarity Transfer

DOI: 10.12677/CSA.2021.1110256, PP. 2529-2537

Keywords: 跨模态检索,子空间学习,三元组损失,局部保持投影,成对相似度迁移
Cross-Modal Retrieval
, Subspace Learning, Triplet Loss, Locality Preserving Projections, Pairwise Similarity Transfer

Full-Text   Cite this paper   Add to My Lib

Abstract:

随着科技的快速发展,网络上的信息呈现出多模态共存的特点,如何存储和检索多模态信息成为当前的研究热点。其中,跨模态检索就是使用一种模态数据去检索语义相关的其它模态数据。目前大部分研究都聚焦于如何在公共子空间中使相关的样本尽可能靠近,不相关的样本尽可能分离,没有过多考虑相关样本的排序情况。因此提出一种TopN成对相似度迁移的三元组跨模态检索方法,其利用三元组损失和局部保持投影构建多模态共享的公共子空间,同时将原始空间中样本之间的高相似度关系迁移到公共子空间,以构建合理的排序约束。最后在两个经典跨模态数据集上证明了方法的有效性。
With the rapid development of science and technology, information on the Internet shows the characteristics of multi-modal coexistence. How to store and retrieve multi-modal information has become a current research hotspot. Cross-modal retrieval is to use one type of modal data to retrieve semantically related data of other modalities. Most of the current research focuses on how to bring related samples as close as possible and how to separate unrelated samples as much as possible in the common subspace, but ignores the ranking of related samples. Therefore, a triplet cross-modal retrieval method based on TopN pairwise similarity transfer is proposed. It uses triplet loss and Locality Preserving Projections to construct a multi-modal shared common subspace. Meanwhile, it transfers the high similarity relation from origin subspace to common subspace to construct reasonable ordering constraints. Finally, the effectiveness of the method is proved on two classical cross-modal datasets.

References

[1]  欧卫华, 刘彬, 周永辉, 等. 跨模态检索研究综述[J]. 贵州师范大学学报: 自然科学版, 2018, 36(2): 114-120.
[2]  Wang, K., Yin, Q., Wang, W., et al. (2016) A Comprehensive Survey on Cross-Modal Retrieval. arXiv:1607.06215.
[3]  Hardoon, D.R., Szedmak, S. and Shawe-Taylor, J. (2004) Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Computation, 16, 2639-2664.
https://doi.org/10.1162/0899766042321814
[4]  Deng, C., Chen, Z., Liu, X., et al. (2018) Triplet-Based Deep Hashing Network for Cross-Modal Retrieval. IEEE Transactions on Image Processing, 27, 3893-3903.
https://doi.org/10.1109/TIP.2018.2821921
[5]  Schroff, F., Kalenichenko, D. and Philbin, J. (2015) Facenet: A Unified Embedding for Face Recognition and Clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 815-823.
https://doi.org/10.1109/CVPR.2015.7298682
[6]  He, X. and Niyogi, P. (2004) Locality Preserving Projections. Proceedings of the 16th International Conference on Neural In-formation Processing Systems, Whistler, Columbia, 9-11 December 2003, 153-160.
[7]  Zhang, W., Kang, P., Fang, X., et al. (2019) Joint Sparse Representation and Locality Preserving Projection for Feature Extraction. International Journal of Machine Learning and Cybernetics, 10, 1731-1745.
https://doi.org/10.1007/s13042-018-0849-y
[8]  康培培, 林泽航, 杨振国, 等. 成对相似度迁移哈希用于无监督跨模态检索[J]. 计算机应用研究, 2021, 38(10): 3025-3029.
[9]  Zhu, Z., Li, Y. and Liang Y. (2018) Learning and Generalization in Overparameterized Neural Networks, Going beyond Two Layers. arXiv preprint arXiv:181104918.
[10]  Pereira, J.C., Coviello, E., Doyle, G., et al. (2013) On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 521-535.
https://doi.org/10.1109/TPAMI.2013.142
[11]  Mikolov, T., Chen, K., Corrado, G., et al. (2013) Efficient Estima-tion of Word Representations in Vector Space. arXiv e-prints, arXiv:1301.3781.
[12]  Rashtchian, C., Young, P., Hodosh, M., et al. (2010) Collecting Image Annotations Using Amazon’s Mechanical Turk. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Los Angeles, June 2010, 139-147.
[13]  Peng, Y., Zhai, X., Zhao, Y., et al. (2015) Semi-Supervised Cross-Media Feature Learning with Unified Patch Graph Regularization. IEEE Transactions on Circuits and Systems for Video Technology, 26, 583-596.
https://doi.org/10.1109/TCSVT.2015.2400779
[14]  Blaschko, M.B. and Lampert, C.H. (2008) Correlational Spec-tral Clustering. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, 23-28 June 2008, 1-8.
https://doi.org/10.1109/CVPR.2008.4587353
[15]  Andrew, G., Arora, R., Bilmes, J., et al. (2013) Deep Canonical Correlation Analysis. Proceedings of the International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 28, Atlanta, 16-21 June 2013, 1247-1255.
[16]  Zhang, D. and Li, W.-J. (2014) Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization. Proceedings of the AAAI Conference on Artifi-cial Intelligence, Québec, 27-31 July 2014, 2177-2183.
[17]  Zhai, X., Peng, Y. and Xiao, J. (2013) Learning Cross-Media Joint Representation with Sparse and Semisupervised Regularization. IEEE Transactions on Circuits and Systems for Video Technology, 24, 965-978.
https://doi.org/10.1109/TCSVT.2013.2276704
[18]  Wang, B., Yang, Y., Xu, X., et al. (2017) Adversarial Cross-Modal Retrieval. Proceedings of the Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, 23-27 October 2017, 154-162.
https://doi.org/10.1145/3123266.3123326
[19]  Cheng, Q. and Gu, X. (2021) Bridging Multimedia Heterogeneity Gap via Graph Representation Learning for Cross-Modal Retrieval. Neural Networks, 134, 143-162.
https://doi.org/10.1016/j.neunet.2020.11.011

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133