全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2017 

基于局部保持典型相关分析的零样本动作识别
Zero Shot Action Recognition Based on Local Preserving Canonical Correlation Analysis

DOI: 10.11784/tdxbz201607010

Keywords: 零样本学习,动作识别,典型相关分析,局部保持
zero shot learning(ZSL)
,action recognition,canonical correlation analysis(CCA),local preserving

Full-Text   Cite this paper   Add to My Lib

Abstract:

动作识别领域需要识别的类别越来越多, 这使得标注足够多的训练数据越来越难.零样本学习是针对传统机器学习收集和标注数据日益困难而提出的一种新思路.针对基于零样本学习的动作识别问题, 提出了一种基于局部保持典型相关分析映射的方法.该方法使用流形约束的典型相关分析将视觉特征和辅助特征映射到一个公共特征空间, 并且在映射过程中保留视觉特征和辅助特征的局部信息, 还考虑了域转换所带来的不利影响, 同时采用自训练和hubness修正等方法增强所提方法的鲁棒性.通过在主流数据集HMDB51和UCF101上的大量实验, 表明所提方法具有较好的零样本学习性能.
The number of categories for action recognition is growing rapidly and it has become increasingly hard to label sufficient training data for learning classification models of all categories. Zero shot learning(ZSL)is an attractive approach aiming at handling the difficulty in collecting ever more data and labeling them exhaustively. This paper proposes a ZSL-based action recognition method with the idea of local preserving canonical correlation analysis(LPCCA). Specifically,a mapping from visual and side information to a common CCA feature space is constructed,using a manifold-regularized term. The impact of domain shift is also taken into consideration. Approaches of self-training and hubness correction are applied to improve the robustness of the proposed method. The proposed method is evaluated extensively on popular human action datasets of HMDB51 and UCF101. The results demonstrate that the proposed method achieves a better performance against the state-of-the-art with a simple and efficient pipeline

References

[1]  Elhoseiny M, Liu J, Cheng H, et al. Zero-shot event detection by multimodal distributional semantic embedding of videos[C]// <i>Proceedings of the<i> 30<i>th AAAI Conference on Artificial Intelligence.<i> Phoenix, USA, 2015: 10-19.
[2]  Zhang Z, Saligrama V. Zero-shot learning via semantic similarity embedding[C]// <i>IEEE International Conference on Computer Vision</i>. Santiago, Chile, 2015: 4166-4174.
[3]  Fu Y, Hospedales T M, Xiang T, et al. Learning multimodal latent attributes[J]. <i>IEEE Transactions on Pattern Analysis and Machine Intelligence</i>, 2014, 36(2): 303-316.
[4]  Xian Y, Akata Z, Sharma G , et al. Latent embeddings for zero-shot classification[C]//2016 <i>IEEE Conference on Computer Vision and Pattern Recognition.<i> Las Vegas,USA, 2016: 69-77.
[5]  Pan S J, Yang Q. A survey on transfer learning[J]. <i>IEEE Transactions on Knowledge and Data Engineering</i>, 2010, 22(10):1345-1359.
[6]  Yang Y, Liu R, Deng C, et al. Multi-task human action recognition via exploring super-category[J]. <i>Signal Processing</i>, 2016, 124(6):36-44.
[7]  Melzer T, Reiter M, Bischof H. Appearance models based on kernel canonical correlation analysis[J]. <i>Pattern Recognition</i>, 2003, 36(9): 1961-1971.
[8]  Hsieh W W. Nonlinear canonical correlation analysis by neural networks[J]. <i>Neural Networks</i>, 2000, 13(10): 1095-1105.
[9]  Kumar S, Martin E B, Morris A J. Non-linear canonical correlation analysis using a RBF networks[C]//<i>Eurorean Symposium on Artificial Neural Networks</i>. Bruges, Belgique, 2002: 507-512.
[10]  Lai P L, Fyfe C. A neural implementation of canonical
[11]  correlation analysis[J]. <i>Neural Networks</i>, 1999, 12(10): 1391-1397.
[12]  Xu X, Hospedales T, Gong S. Zero-shot action recognition by word-vector embedding[J]. <i>International Journal of Computer Vision</i>, 2015, 123(3): 309-333.
[13]  Fu Y, Hospedales T M, Xiang T, et al. Transductive multi-view zero-shot learning[J]. <i>IEEE Transactions on Pattern Analysis and Machine Intelligence</i>, 2015, 37(11): 2332-2345.
[14]  Soomro K, Zamir A R, Shah M. Ucf101: A dataset of 101 Human actions classes from videos in the wild[J]. <i>Computer Science</i>, 2012(11):1-7.
[15]  Frome A, Corrado G S, Shlens J, et al. Devise: A deep visual-semantic embedding model[C]//<i>Advances in Neural Information Processing Systems</i>. South Lake Tahoe, USA, 2013: 2121-2129.
[16]  Romera-Paredes B, Torr P. An embarrassingly simple approach to zero-shot learning[C]//<i>Proceedings of International Conference on Machine Learning</i>. Lille, France, 2015: 2152-2161.
[17]  Yang Y, Deng C, Tao D, et al. Latent max-margin multitask learning with skelets for 3-D action recognition[J]. <i>IEEE Transactions on Cybernetics</i>, 2016, 47 (2) :1-10.
[18]  Zhao F, Huang Y, Wang L, et al. Relevance topic model for unstructured social group activity recognition [C]//<i>Advances in Neural Information Processing Systems</i>. South Lake Tahoe,USA, 2013: 2580-2588.
[19]  Gan C, Lin M, Yang Y, et al. Exploring semantic inter-class relationships (SIR) for zero-shot action recognition [C]//<i>AAAI Conference on Artificial Intelligence</i>. Austin, German, 2015: 468-471.
[20]  Kambhatla N, Leen T K. Dimension reduction by local principal component analysis[J]. <i>Neural Computation</i>, 1997, 9(7):1493-1516.
[21]  He X. Locality preserving projections[C]//<i>Advances in Neural Information Processing Systems</i>. Chicago, USA,2005: 186-197.
[22]  Verbeek J J, Roweis S T, Vlassis N. Non-linear CCA and PCA by alignment of local models[C]//<i>Advances in Neural Information Processing Systems</i>. Vancouver, Canada, 2003: 297-304.
[23]  Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding[J]. <i>Science</i>, 2000, 290 (5500): 2323-2326.
[24]  Saul L K, Roweis S T. Think globally, fit locally: Unsupervised learning of low dimensional manifolds[J]. <i>Journal of Machine Learning Research</i>, 2003, 4(2): 119-155.
[25]  Sun T, Chen S. Locality preserving CCA with applications to data visualization and pose estimation[J]. <i>Image and Vision Computing</i>, 2007, 25(5): 531-543.
[26]  Dinu G, Lazaridou A, Baroni M. Improving zero-shot learning by mitigating the Hubness problem[C]// <i>International Conference on Learning Representations.</i> San Diego, USA, 2015: 10-20.
[27]  Kuehne H, Jhuang H, Garrote E, et al. Hmdb: A large video database for human motion recognition[C]//<i>IEEE International Conference on Computer Vision</i>. Barcelona, Spain,2011: 2556-2563.</i></i></i></i></i></i>
[28]  Lampert C H, Nickisch H, Harmeling S. Attribute-based classification for zero-shot visual object categorization [J]. <i>IEEE Transactions on Pattern Analysis and Machine Intelligence</i>, 2014, 36(3): 453-465.
[29]  Liu J, Kuipers B, Savarese S. Recognizing human actions by attributes[C]//<i>IEEE Conference on Computer Vision and Pattern Recognition</i>. Providence, USA, 2015: 3337-3344.
[30]  Lampert C H, Nickisch H, Harmeling S. Learning to detect unseen object classes by between-class attribute transfer[C]//<i>IEEE Conference on Computer Vision and Pattern Recognition</i>. Miami, USA, 2009: 951-958.
[31]  Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//<i>Advances in Neural Information Processing Systems</i>. South Lake Tahoe, USA, 2013: 3111-3119.
[32]  Socher R, Ganjoo M, Manning C D, et al. Zero-shot learning through cross-modal transfer[C]//<i>Advances in Neural Information Processing Systems</i>. South Lake Tahoe, USA, 2013: 935-943.
[33]  Habibian A, Mensink T, Snoek C G. Videostory: A new multimedia embedding for few-example recognition and translation of events[C]//<i>Proceedings of the ACM International Conference on Multimedia</i>. New York, USA, 2014: 17-26.
[34]  Fu Z, Xiang T A, Kodirov E, et al. Zero-shot object recognition by semantic manifold distance[C]//<i>IEEE Conference on Computer Vision and Pattern Recognition</i>. Boston, USA, 2015: 2635-2644.
[35]  Akata Z, Reed S, Walter D, et al. Evaluation of output embeddings for fine-grained image classification [C]//<i>IEEE Conference on Computer Vision and Pattern Recognition</i>. Boston, USA, 2015: 2927-2936.
[36]  Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3D convolutional networks [C]//<i>IEEE International Conference on Computer Vision</i>. Santiago, Chile, 2015: 4489-4497.
[37]  Wang H, Schmid C. Action recognition with improved trajectories[C]//<i>IEEE International Conference on Computer Vision</i>. Sydney, Australia, 2013: 3551-3558
[38]  Xu X, Hospedales T, Gong S. Semantic embedding space for zero-shot action recognition[C]//<i>IEEE International Conference on Image Processing</i>. Quebec City, Canada, 2015: 63-67.
[39]  Liu M, Zhang D, Chen S. Attribute relation learning for zero-shot classification[J]. <i>Neurocomputing</i>, 2014, 139(2): 34-46.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133