全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2018 

基于解码器注意力机制的视频摘要
Video Summarization Based on Decoder Attention Mechanism

DOI: 10.11784/tdxbz201801077

Keywords: 视频摘要,视觉注意力模型,编解码模型,长短时记忆网络
video summarization
,visual attention model,encoder-decoder model,long short-term memory network

Full-Text   Cite this paper   Add to My Lib

Abstract:

作为一种快速浏览和理解视频内容的方式, 视频摘要技术引起了广泛的关注.本文将视频摘要任务看作是序列到序列的预测问题, 设计了一种新颖的基于解码器的视觉注意力机制, 并基于此提出一种有监督视频摘要算法.所提方法考虑到视频帧之间的内在关联性, 利用长短时记忆网络将注意力集中在历史的解码序列, 融合历史的解码信息有效地指导解码, 提升模型预测的准确性.所提算法主要在TVSum和SumMe数据集上进行了大量实验, 验证了其有效性及先进性.
As a way to quickly browse and understand video content,video summarization has attracted wide attention. This paper treats video summarization as a sequence-to-sequence prediction problem and proposes a novel visual attention model based on decoder,which is further applied to supervised video summarization. The proposed method pays attention to decoding sequence by using long short-term memory network. It considers the intrinsic association between video frames,and utilizes the previous decoding sequences to effectively guide the decoding process,which improves the prediction accuracy. Extensive experiments are mainly conducted on TVSum and SumMe datasets,which demonstrate the effectiveness and superiority of the proposed method

References

[1]  王娟, 蒋兴浩, 孙锬锋. 视频摘要技术综述[J]. 中国图象图形学报, 2014, 19(12):1685-1695.
[2]  Wang Juan, Jiang Xinghao, Sun Tanfeng. Review of video abstraction[J]. <i>Journal of Image and Graphics</i>, 2014, 19(12):1685-1695(in Chinese).
[3]  de Avila S E F, Lopes A P B. VSUMM:A mechanism designed to produce static video summaries and a novel evaluation method[J]. <i>Pattern Recognition Letters</i>, 2011, 32(1):56-68.
[4]  Ji Z, Ma Y R, Pang Y W, et al. Query-aware sparse coding for multi-video summarization[EB/OL]. https:// arxiv.org/abs/1707.04021, 2017.
[5]  Zhang K, Chao W, Sha F, et al. Summary transfer:Exemplar-based subset selection for video summarization [C]//<i>IEEE Conference on Computer Vision and Pattern Recognition.<i> Las Vegas, USA, 2016:1059-1067.
[6]  Yao L, Torabi A, Cho K, et al. Describing videos by exploiting temporal structure[C]//<i>IEEE International Conference on Computer Vision.</i> Santiago, Chile, 2015:4507-4515.
[7]  Venugopalan S, Xu H, Donahue J, et al. Translating videos to natural language using deep recurrent neural networks[C]//<i>Annual Meeting of the Association for Computational Linguistics.<i> Baltimore, USA, 2014:1494-1504.
[8]  Li Y, Merialdo B. Multi-video summarization based on
[9]  Video-MMR[C]//<i>International Workshop on Image<i> <i>Analysis for Multimedia Interactive Services.<i> Desenzano del Garda, Italy, 2010:1-4.
[10]  Furini M, Geraci F, Montangero M, et al. STIMO:Still and moving video storyboard for the web scenario [J]. <i>Multimedia Tools and Applications</i>, 2010, 46(1):47-69.
[11]  Kuanar S K, Panda R, Chowdhury A S. Video key frame extraction through dynamic delaunay clustering with a structural constraint[J]. <i>Journal of Visual Communication and Image Representation</i>, 2013, 24(7):1212-1227.
[12]  Wu J, Zhong S H, Jiang J, et al. A novel clustering method for static video summarization[J]. <i>Multimedia Tools & Applications</i>, 2017, 76(7):1-17.
[13]  Ji Z, Zhang Y Y, Pang Y W, et al. Hypergraph dominant set based multi-video summarization[J]. <i>Signal Processing</i>, 2018, 148:114-123.
[14]  Gygli M, Grabner H, van Gool L. Video summarization by learning submodular mixtures of objectives [C]//<i>IEEE Conference on Computer Vision and Pattern Recognition.<i> Boston, USA, 2015:3090-3098.
[15]  Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate [C]//<i>International Conference on Learning Representations.<i> San Diego, USA, 2015:1-15.
[16]  Meng F, Lu Z, Wang M, et al. Encoding source language with convolutional neural network for machine translation[C]//<i>Annual Meeting of the Association for Computational Linguistics.<i> Beijing, China, 2015:20-30.
[17]  Chopra S, Auli M, Rush A M. Abstractive sentence summarization with attentive recurrent neural networks [C]//<i>Annual Meeting of the Association for Computational Linguistics.<i> Berlin, Germany, 2016:93-98.
[18]  Demir M, Bozma H I. Video summarization via segments summary graphs[C]//<i>IEEE International Conference on Computer Vision</i>. Santiago, Chile, 2016:1071-1077.
[19]  Zhang K, Chao W L, Sha F, et al. Video summarization with long short-term memory[C]//<i>European Conference on Computer Vision.<i> Amsterdam, Netherlands, 2016:766-782.
[20]  Mahasseni B, Lam M, Todorovic S. Unsupervised video summarization with adversarial LSTM networks [C]//<i>IEEE Conference on Computer Vision and Pattern Recognition.<i> Honolulu, USA, 2017:1-10.
[21]  Gygli M, Grabner H, Riemenschneider H, et al. Creating summaries from user videos[C]//<i>European Conference on Computer Vision.<i> Zurich, Switzerland, 2014:505-520.
[22]  Yang H, Wang B, Lin S, et al. Unsupervised extraction of video highlights via robust recurrent auto-encoders[C]// <i>IEEE International Conference on Computer Vision.<i> Santiago, Chile, 2015:4633-4641.
[23]  Song Y, Vallmitjana J, Stent A, et al. TVSum:Summarizing web videos using titles[C]//<i>IEEE Conference on Computer Vision and Pattern Recognition.<i> Boston, USA, 2015:5179-5187.
[24]  Zhao B, Xing E P, Quasi real-time summarization for consumer videos[C]//<i>IEEE Conference on Computer Vision and Pattern Recognition.<i> Columbus, USA, 2014:2513-2520.
[25]  Shao L, Zhu F, Li X. Transfer learning for visual categorization:A survey[J]. <i>IEEE Transactions on Neural Networks & Learning Systems</i>, 2015, 26(5):1019-1034.</i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i></i>
[26]  冀中, 樊帅飞, 基于超图排序算法的视频摘要[J]. 电子学报, 2017, 45(5):1035-1043.
[27]  Ji Zhong, Fan Shuaifei. Video summarization with hypergraph ranking[J]. <i>Acta Electronica Sinica</i>, 2017, 45(5):1035-1043(in Chinese).
[28]  Panda R, Kuanar S K, Chowdhury A S. Scalable video summarization using skeleton graph and random walk [C]//<i>International Conference on Pattern Recognition.<i> Stockholm, Sweden, 2014:3481-3486.
[29]  Mei S, Guan G, Wang Z, et al. Video summarization via minimum sparse reconstruction [J]. <i>Pattern Recognition</i>, 2015, 48(2):522-533.
[30]  Panda R, Das A, Roy-Chowdhury A K. Video summarization in a multi-view camera network[C]// <i>International Conference on Pattern Recognition.<i> Cancun, Mexico, 2016:2971-2976.
[31]  Gong B, Chao W L, Grauman K, et al. Diverse sequential subset selection for supervised video summarization[C]//<i>Advances in Neural Information Processing Systems.<i> Montreal, Canada, 2014:2069-2077.
[32]  Li X, Zhao B, Lu X. A general framework for edited video and raw video summarization[J]. <i>IEEE Transaction on Image Processing</i>, 2017, 26(8):3652-3664.
[33]  Potapov D, Douze M, Harchaoui Z, et al. Category-specific video summarization[C]//<i>European Conference on Computer Vision.<i> Zurich, Sitzerland, 2014:540-555.
[34]  Yong J L, Ghosh J, Grauman K. Discovering important people and objects for egocentric video summarization [C]// <i>IEEE Conference on Computer Vision and Pattern Recognition.<i> Providence, USA, 2012:1346-1353.
[35]  Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[C]//<i>Advances in Neural Information Processing Systems.<i> Montreal, Canada, 2014:3104-3112.
[36]  Ma Y F, Lu L, Zhang H J, et. al. A user attention model for video summarization[C]//<i>ACM Conference on Multimedia</i>. Juan les Pins, France, 2002:533-542.
[37]  Ejaz N, Mehmood I, Baik S W. Efficient visual attention based framework for extracting key frames from videos[J]. <i>Signal Processing Image Communication</i>, 2013, 28(1):34-44.
[38]  Xu K, Ba J, Kiros R, et al. Show, attend and tell:Neural image caption generation with visual attention [C]//<i>International Conference on Machine Learning.<i> Lille, France, 2015:2048-2057.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133