全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于排序损失的跨模态检索优化研究
Research on Optimization of Cross-Modal Retrieval Based on Ranking Loss

DOI: 10.12677/mos.2025.141012, PP. 116-121

Keywords: 跨模态检索,排序损失,相似性度量
Cross-Modal Retrieval
, Ranking Loss, Similarity Measure

Full-Text   Cite this paper   Add to My Lib

Abstract:

跨模态检索通过一种模态(如文本或图像)来检索另一模态的数据,传统的跨模态检索方法主要依赖模态对齐与相似性度量,以实现多模态间的特征匹配。本文创新性地提出了一种基于排序的跨模态检索方法,通过引入排序损失来优化跨模态检索过程,使得与查询相关性高的项目在结果中排名靠前,从而实现跨模态检索。实验结果表明,引入排序损失可显著提升跨模态检索性能,尤其在文本与图像匹配中表现出色,为后续研究提供了新的方法视角和坚实的技术基础。
Cross-modal retrieval aims to retrieve data in one modality (such as text or images) based on another modality. Traditional cross-modal retrieval methods primarily rely on modality alignment and similarity measures to achieve feature matching across multiple modalities. This paper presents an innovative sorting-based cross-modal retrieval method that optimizes the cross-modal retrieval process by introducing ranking loss, allowing items with higher relevance to the query to be prioritized in the results, thereby enhancing cross-modal retrieval effectiveness. Experimental results demonstrate that the introduction of ranking loss significantly enhances the performance of cross-modal retrieval, particularly excelling in text-image matching tasks. This work provides a new methodological perspective and a solid technical foundation for future research in the field.

References

[1]  Wang, K., Yin, Q., Wang, W., Wu, S. and Wang, L. (2016) A Comprehensive Survey on Cross-Modal Retrieval.
[2]  Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. 2021 International Conference on Machine Learning, Online, 13-16 December 2021, 8748-8763.
[3]  Yan, F. and Mikolajczyk, K. (2015) Deep Correlation for Matching Images and Text. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3441-3450.
https://doi.org/10.1109/cvpr.2015.7298966
[4]  Zhang, Y. and Lu, H. (2018) Deep Cross-Modal Projection Learning for Image-Text Matching. In: Lecture Notes in Computer Science, Springer, 707-723.
https://doi.org/10.1007/978-3-030-01246-5_42
[5]  Zhang, C., Cheng, J. and Tian, Q. (2020) Multi-View Image Classification with Visual, Semantic and View Consistency. IEEE Transactions on Image Processing, 29, 617-627.
https://doi.org/10.1109/tip.2019.2934576
[6]  Wang, Z., Gao, Z., Yang, Y., Wang, G., Jiao, C. and Shen, H.T. (2024) Geometric Matching for Cross-Modal Retrieval. IEEE Transactions on Neural Networks and Learning Systems, 1-13.
https://doi.org/10.1109/tnnls.2024.3381347
[7]  Vaswani, A. (2017) Attention Is All You Need.
https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[8]  Pobrotyn, P., Bartczak, T., Synowiec, M., Białobrzeski, R. and Bojar, J. (2020) Context-Aware Learning to Rank with Self-Attention.
[9]  Young, P., Lai, A., Hodosh, M. and Hockenmaier, J. (2014) From Image Descriptions to Visual Denotations: New Similarity Metrics for Semantic Inference over Event Descriptions. Transactions of the Association for Computational Linguistics, 2, 67-78.
https://doi.org/10.1162/tacl_a_00166
[10]  Burges, C.J.C., Ragno, R. and Le, Q.V. (2007) Learning to Rank with Nonsmooth Cost Functions. In: Advances in Neural Information Processing Systems 19, The MIT Press, 193-200.
https://doi.org/10.7551/mitpress/7503.003.0029
[11]  Pobrotyn, P. and Białobrzeski, R. (2021) Neural NDCG: Direct Optimization of a Ranking Metric via Differentiable Relaxation of Sorting.
[12]  Cao, Z., Qin, T., Liu, T., Tsai, M. and Li, H. (2007) Learning to Rank. Proceedings of the 24th International Conference on Machine Learning, New York, 20-24 June 2007, 129-136.
https://doi.org/10.1145/1273496.1273513
[13]  Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005) Learning to Rank Using Gradient Descent. Proceedings of the 22nd International Conference on Machine Learning, New York, 7-11 August 2005, 89-96.
https://doi.org/10.1145/1102351.1102363

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133