|
- 2015
利用深度玻尔兹曼机与典型相关分析的自动图像标注算法
|
Abstract:
提出一种基于深度玻尔兹曼机与典型相关分析的自动图像标注算法(DBM??CCA)。该算法利用深度玻尔兹曼机实现图像与文本的低层次特征向稀疏高层次抽象概念的转变,并通过典型相关分析建立子空间映射关系以实现标注词汇的生成。首先在深度玻尔兹曼机提取图像与文本高层特征过程中,选用伯努利分布和高斯分布分别拟合标注词汇和图像特征,然后在图像与标注词汇高层特征形成的典型变量空间内计算待标注图像与训练集图像的马氏距离并据此加权计算得到高层标注词汇特征,最后由平均场估计生成图像标注词汇。实验结果表明,所提算法对图像的标注准确率改善较好,与经典的基于监督的多类标签方法和多重伯努利相关模型相比,在Corel5K实验中平均查准率和查全查准均率分别提高了10%和5%。
An automatic image annotation algorithm is proposed based on deep Boltzmann machine and canonical correlation analysis, named DBM??CCA. The algorithm utilizes DBM to transform low??level features of images and labels to sparse high??level abstract concepts, and builds subspace mapping relations by CCA in order to generate labels. The multiple Bernoulli distribution is used to fit labels and the Gaussian distribution is used to fit image features in the process of using DBM to extract high??level features of images and labels. CCA is used to establish relevant connection among image features and labeling words which form canonical variable subspace. High??level text features are calculated based on the Mahalanobis distance between images in canonical variable subspace, and image annotation words are generated by mean??field inference. Experimental results show that the proposed automatic image annotation method significantly outperforms both the traditional MBRM and the SML, and the precision ratio and recall??precision mean ratio are increased by 10% and 5%, respectively, in experiments with Corel5K image dataset
[1] | [2]QIAN X, HUA X S, HOU X. Tag filtering based on similar compatible principle [C]∥Proceedings of IEEE International Conference on Image Processing. Piscataway, NJ, USA: IEEE, 2012: 2349??2352. |
[2] | [14]MAKADIA A, PAVLOVIC V, KUMAR S. Baselines for image annotation [J]. International Journal on Computer Vision, 2010, 90(1): 88??105. |
[3] | [3]QIAN X, HUA X S, TANG Y Y, et al. Social image tagging with diverse semantics [J]. IEEE Transactions on Cybernetics, 2014, 44(12): 2493??2508. |
[4] | [4]NGIAM J, KHOSLA A, KIM M, et al. Multimodal deep learning [C]∥Proceedings of the 28th International Conference on Machine Learning. New York, USA: ACM, 2011: 689??696. |
[5] | [5]OUYANG W, CHU X, WANG X. Multi??source deep learning for human pose estimation [C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ, USA: IEEE, 2014: 2337??2344. |
[6] | [6]KIROS R, ZEMEL R, SALAKHUTDINOV R. Multimodal neural language models [J]. Journal of Machine Learning Research, 2014, 32(1): 595??603. |
[7] | [7]邱立达, 刘天键, 林南, 等. 基于深度学习模型的无线传感器网络数据融合算法 [J]. 传感技术学报, 2014, 27(12): 1704??1709. |
[8] | QIU Lida, LIU Tianjian, LIN Nan, et al. Data aggregation in wireless sensor network based on deep learning model [J]. Chinese Journal of Sensors and Actuators, 2014, 27(12): 1704??1709. |
[9] | [8]SRIVASTAVA N, SALAKHUTDINOV R. Multimodal learning with deep Boltzmann machines [C]∥Proceedings of Advances in Neural Information Processing Systems.Cambridge,MA,USA:MIT, 2012: 2222??2230. |
[10] | [9]高军峰, 郑崇勋, 王沛. 脑电信号中肌电伪差的实时去除方法研究 [J]. 西安交通大学学报, 2010, 44(4): 114??118. |
[11] | GAO Junfeng, ZHENG Chongxun, WANG Pei. Electromyography artifact removal from electroencephalogram in real??time [J]. Journal of Xi’an Jiaotong University, 2010, 44(4): 114??118. |
[12] | [10]RASIWASIA N. A new approach to cross??modal multimedia retrieval [C]∥Proceedings of the 18 th ACM International Conference on Multimedia. New York, USA: ACM, 2010: 251??260. |
[13] | [11]FENG F, WANG X, LI R. Cross??modal retrieval with correspondence autoencoder [C]∥Proceedings of the 22nd ACM International Conference on Multimedia. New York, USA: ACM, 2014: 7??16. |
[14] | [12]GALEN A, RAMAN A, JEFF B. Deep canonical correlation analysis [J]. Journal of Machine Learning Research, 2013, 28(3): 1247??1255. |
[15] | [13]SALAKHUTDINOV R, HINTON G E. Deep Boltzmann machines [C]∥Proceedings of International Conference on Artificial Intelligence and Statistics 2009. Brookline, MA, USA: Microtome Publishing, 2009: 448??455. |
[16] | [1]LI Q, GU Y, QIAN X. LCMKL: latent??community and multi??kernel learning based image annotation [C]∥Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. New York, USA: ACM, 2013: 1469??1472. |