|
- 2018
基于重要性感知稀疏自编码器的多视频摘要
|
Abstract:
如何有效地管理和查询海量视频数据是大数据时代亟待解决的问题.基于查询的多视频摘要技术可提供全面且简洁的查询内容的相关信息, 是解决此问题的重要途径之一.然而, 多视频内容具有多样性, 且包含较多的噪音和冗余, 从这些复杂信息中找出最具代表性的信息极具挑战性.针对这一挑战, 提出一种基于稀疏自编码器, 并将网络查询图像内容作为正则项的多视频摘要模型.该模型不仅满足代表性和简洁性的要求, 还具有依赖查询进行重要性感知的能力.大量的实验验证了本文模型的有效性与先进性.
How to manage and search massive video data effectively is an urgent problem in the era of big data. Query based multi-video summarization can provide comprehensive and concise information about the content of query videos,which is one of the promising ways to address this problem. However,the content of multiple videos is diverse,noisy and redundant,which makes it very challenging to find the most representative information from these videos. A sparse auto-encoder-based multi-video summarization model is proposed,using web query images as regularization terms. It not only satisfies the criteria of representativeness and conciseness,but also has the capability to perceive the query-dependent importance. Extensive experiments demonstrate its effectiveness and superiority
[1] | Dang C, Radha H. RPCA-KFE:Key frame extraction for video using robust principal component analysis[J]. <i>IEEE Transactions on Image Processing</i>, 2015, 24(11):3742-3753. |
[2] | Chen Bowei, Wang Jiaching, Wang Jhingfa. A novel video summarization based on mining the story-structure and semantic relations among concept entities[J]. <i>IEEE Transactions on Multimedia</i>, 2009, 11(2):295-312. |
[3] | Avila S. VSUMM:A mechanism designed to produce static video summaries and a novel evaluation method [J]. <i>Pattern Recognition Letters</i>, 2011, 32(1):56-68. |
[4] | Wang Wei, Li Sujian, Li Wenjie, et al. Exploring hypergraph-based semi-supervised ranking for query-oriented summarization[J]. <i>Information Sciences</i>, 2013, 237(13):271-286. |
[5] | Panda R, Kumar S K, Chowdhury A S. Scalable video summarization using skeleton graph and random walk [C]//<i>International Conference on Pattern Recognition</i>. Stockholm, Sweden, 2014:3481-3486. |
[6] | Zhang Ke, Chao Weilun, Sha Fei, et al. Video summarization with long short-term memory[C]//<i>European Conference on Computer Vision.<i> Amsterdam, The Netherland, 2016:766-782. |
[7] | Mahasseni B, Lam M, Todorovic S. Unsupervised video summarization with adversarial LSTM net-works[C]//<i>Conference on Computer Vision and Pattern Recognition.<i> Honolulu, USA, 2017:202-211. |
[8] | Gong Boqing, Chao Weilun, Grauman K, et al. Diverse sequential subset selection for supervised video summarization[C]//<i>International Conference on Neural Information Processing Systems.<i> Montreal, Canada, 2014:2069-2077. |
[9] | 冀中, 樊帅飞. 基于超图排序算法的视频摘要[J]. 电子学报, 2017, 45(5):1035-1043. |
[10] | Ji Zhong, Fan Shuaifei. Video summarization with hyper-graph ranking[J]. <i>Acta Electronica Sinica</i>, 2017, 45(5):1035-1043(in Chinese). |
[11] | Shao Jian, Jiang Dongming, Wang Mengru, et al. Multi-video summarization using complex graph clustering and mining[J]. <i>Computer Science & Information Systems</i>, 2010, 7(1):85-98. |
[12] | Ji Zhong, Ma Yaru, Pang Yanwei, et al. Query-aware sparse coding for multi-video summarization[EB/OL]. http://arxiv.org/abs/1707.04021, 2017-07-13. |
[13] | Hinton G E, Zemel R S. Autoencoders, minimum description length and helmholtz free energy[J]. <i>Advances in Neural Information Processing Systems</i>, 1994(6):3-10. |
[14] | Lemme A, Reinhart R F, Steil J J. Efficient online learning of a non-negative sparse autoencoder[C]// <i>European Symposium on Artificial Neural Networks.<i> Bruges, Belgium, 2010:1-6. |
[15] | Deng Jun, Zhang Zixing, Marchi E, et al. Sparse autoencoder-based feature transfer learning for speech emotion recognition[C]//<i>Affective Computing and Intel-ligent Interaction.<i> Geneva, Switzerland, 2013:511-516. |
[16] | Liu Yunfan, Hou Xueshi, Chen Jiansheng, et al. Facial expression recognition and generation using sparse autoencoder[C]//<i>International Conference on Smart Computing</i>. Hong Kong, China, 2014:125-130. |
[17] | Yuan Jinhui, Wang Huiyi, Xiao Lan, et al. A formal |
[18] | study of shot boundary detection[J]. <i>IEEE Transactions on Circuits & Systems for Video Technology</i>, 2007, 17(2):168-186. |
[19] | Mei Shaohui, Guan Genliang, Wang Zhiyong, et al. Video summarization via minimum sparse reconstruction [J]. <i>Pattern Recognition</i>, 2015, 48(2):522-533. |
[20] | Song Yale, Vallmitjana J, Stent A, et al. TVSum:Summarizing web videos using titles[C]//<i>IEEE Confer-ence on Computer Vision and Pattern Recognition.<i> Boston, USA, 2015:5179-5187.</i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i> |
[21] | Money A G, Agius H. Video summarization:A conceptual framework and survey of the state of the art[J]. <i>Journal of Visual Communication & Image Representation</i>, 2008, 19(2):121-143. |
[22] | Li Teng, Mei Tao, Kweon In-so, et al. Multi-video synopsis for video representation[J]. <i>Signal Processing</i>, 2009, 89(12):2354-2366. |
[23] | 冀中, 苏育挺, 庞彦伟. 多视频摘要技术:方法、应用及挑战[J]. 计算机工程与应用, 2012, 48(27):1-6. |
[24] | Ji Zhong, Su Yuting, Pang Yanwei. Multi-video abstraction:Approaches applications and challenges[J]. <i>Computer Engineering and Applications</i>, 2012, 48(27):1-6(in Chinese). |
[25] | Han Mengxiong, Hu Haimiao, Liu Yang, et al. An auto-encoder-based summarization algorithm for unstructured videos[J]. <i>Multimedia Tools & Applications</i>, 2017, 76(23):1-18. |
[26] | Wang Meng, Hong Richang, Li Guangda, et al. Event driven web video summarization by tag localization and key-shot identification[J]. <i>IEEE Transactions on Multimedia</i>, 2012, 14(4):975-985. |
[27] | Li Yingbo, Merialdo Bernard. Multimedia maximal marginal relevance for multi-video summarization[J]. <i>Multimedia Tools and Applications</i>, 2016, 75(1):1-22. |
[28] | Kuanar S K, Ranga K B, Chowdhury A S. Multi-view video summarization using bipartite matching con-strained optimum-path forest clustering[J]. <i>IEEE Transactions on Multimedia</i>, 2015, 17(8):1166-1173. |
[29] | Kim G, Sigal L, Xing E P. Joint summarization of large-scale collections of web images and videos for storyline reconstruction[C]//<i>IEEE Conference on Computer Vision and Pattern Recognition</i>. Columbus, USA, 2014:4225-4232. |
[30] | He Yi, Gao Changxin, Sang Nong, et al. Graph coloring based surveillance video synopsis[J]. <i>Neurocomput-ing</i>, 2017, 225(15):64-79. |
[31] | Lu Zheng, Grauman Kristen. Story-driven summariza-tion for egocentric video[C]//<i>IEEE International Conference on Computer Vision and Pattern Recognition</i>. Portland, USA, 2013:2714-2721. |
[32] | Gygli M, Grabner H, Riemenschneider H, et al. Creating summaries from user videos[C]//<i>European Conference on Computer Vision.<i> Zurich, Switzerland, 2014:505-520. |
[33] | Liu Wu, Mei Tao, Zhang Yongdong, et al. Multi-task deep visual-semantic embedding for video thumbnail se-lection[C]//<i>IEEE International Conference on Computer Vision and Pattern Recognition</i>. Boston, USA, 2015:3707-3715. |
[34] | Yong J L, Ghosh J, Grauman K. Discovering important people and objects for egocentric video summarization [C]//<i>IEEE Conference on Computer Vision and Pattern Recognition.<i> Rhode Island, USA, 2012:1346-1353. |
[35] | Zhu Xiatian, Chen C L, Gong Shaogang. Learning from multiple sources for video summarization[J]. <i>International Journal of Computer Vision</i>, 2016, 117(3):247-268. |