|
基于矩阵分解和相似性保持的跨模态检索研究
|
Abstract:
早先的基于哈希的跨模态检索方法因为语义提取以及运行速度慢不适合于大数据场景。因此提出一种新的框架叫做独特相似哈希(Unique Similar Hashing, USH)。USH是一个两步学习的哈希方法,先学习哈希码再学习哈希函数。第一阶段,用核函数将数据非线性地投影到核空间,然后使用矩阵分解学习潜在空间。哈希码从潜在空间中学习而来,为了避免量化误差并不放松哈希码的离散约束,而是直接计算它的封闭解。在学习一个优质的哈希码之后,再学习一个哈希函数将原始样本映射到低维的汉明空间。在Wiki数据集上与最先进的方法进行验证,USH在mAP上取得较好结果,证明了该方法的有效性。
Earlier hash-based cross-modal retrieval methods were not suitable for big data scenarios due to problems with semantic extraction and slow running speed. Therefore, a new framework called Unique Similar Hashing (USH) was proposed. USH is a two-stage learning-based hashing method that learns hash codes first and then hash functions. In the first stage, data is nonlinearly projected to a kernel space using kernel functions, followed by learning a latent space using matrix factorization. Hash codes are then learned from the latent space by computing their closed-form solution directly instead of relaxing the discrete constraints to avoid quantization errors. After learning high-quality hash codes, a hash function is learned to map original samples to a low-dimensional Hamming space. It’s validated on the Wiki dataset against state-of-the-art methods, USH achieved good results in mAP, demonstrating the effectiveness of this approach.
[1] | Ding, G., Guo, Y. and Zhou, J. (2014) Collective Matrix Factorization Hashing for Multimodal Data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 23-28 June 2014, 2075-2082.
https://doi.org/10.1109/CVPR.2014.267 |
[2] | Zhou, J., Ding, G. and Guo, Y. (2014) Latent Semantic Sparse Hashing for Cross-Modal Similarity Search. Proceedings of the 37th International ACM SIGIR Conference on Research, Gold Coast, 6-11 July 2014, 415-424. |
[3] | Zhang, D. and Li, W.J. (2014) Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Quebec, 27-31 July 2014, 2177-2183. |
[4] | Zhang, J., Peng, Y. and Yuan, M. (2018) SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network. IEEE Transactions on Cybernetics, 50, 489-502. https://doi.org/10.1109/TCYB.2018.2868826 |
[5] | Rasiwasia, N., Costa Pereira, J., Coviello, E., et al. (2010) A New Approach to Cross-Modal Multimedia Retrieval. Proceedings of the 18th ACM International Conference on Multi-media, Firenze, 25-29 October 2010, 251-260.
https://doi.org/10.1145/1873951.1873987 |
[6] | Liu, H., Ji, R., Wu, Y., et al. (2016) Supervised Matrix Factoriza-tion for Cross-Modality Hashing. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelli-gence, New York, 9-15 July 2016, 1767-1773. |
[7] | Lin, Z., Ding, G., Hu, M., et al. (2015) Semantics-Preserving Hashing for Cross-View Retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 7-12 June 2015, 3864-3872.
https://doi.org/10.1109/CVPR.2015.7299011 |
[8] | Xu, X., Shen, F., Yang, Y., et al. (2017) Learning Discrimina-tive Binary Codes for Large-Scale Cross-Modal Retrieval. IEEE Transactions on Image Processing, 26, 2494-2507. https://doi.org/10.1109/TIP.2017.2676345 |
[9] | Mandal, D., Chaudhury, K.N. and Biswas, S. (2018) Generalized Semantic Preserving Hashing for Cross-Modal Retrieval. IEEE Transactions on Image Processing, 28, 102-112. https://doi.org/10.1109/TIP.2018.2863040 |
[10] | Wang, S., Zhao, H. and Nai, K. (2021) Learning a Maximized Shared Latent Factor for Cross-Modal Hashing. Knowledge-Based Systems, 228, Article ID: 107252. https://doi.org/10.1016/j.knosys.2021.107252 |