OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Sensors 2013

A Generalized Pyramid Matching Kernel for Human Action Recognition in Realistic Videos

DOI: 10.3390/s131114398

Jun Zhu,Quan Zhou,Weijia Zou,Rui Zhang,Wenjun Zhang

Keywords: video analysis, human action recognition, pyramid matching kernel, kernel-based classification method

Full-Text Cite this paper Add to My Lib

Abstract:

Human action recognition is an increasingly important research topic in the fields of video sensing, analysis and understanding. Caused by unconstrained sensing conditions, there exist large intra-class variations and inter-class ambiguities in realistic videos, which hinder the improvement of recognition performance for recent vision-based action recognition systems. In this paper, we propose a generalized pyramid matching kernel (GPMK) for recognizing human actions in realistic videos, based on a multi-channel “bag of words” representation constructed from local spatial-temporal features of video clips. As an extension to the spatial-temporal pyramid matching (STPM) kernel, the GPMK leverages heterogeneous visual cues in multiple feature descriptor types and spatial-temporal grid granularity levels, to build a valid similarity metric between two video clips for kernel-based classification. Instead of the predefined and fixed weights used in STPM, we present a simple, yet effective, method to compute adaptive channel weights of GPMK based on the kernel target alignment from training data. It incorporates prior knowledge and the data-driven information of different channels in a principled way. The experimental results on three challenging video datasets (i.e., Hollywood2, Youtube and HMDB51) validate the superiority of our GPMK w.r.t. the traditional STPM kernel for realistic human action recognition and outperform the state-of-the-art results in the literature.

References

[1]	Turaga, P.K.; Chellappa, R.; Subrahmanian, V.S.; Udrea, O. Machine recognition of human activities: A survey. IEEE Trans. Circuit. Syst. Video Technol. 2008, 18, 1473–1488.
[2]	Xu, X.; Tang, J.; Zhang, X.; Liu, X.; Zhang, H.; Qiu, Y. Exploring techniques for vision based human activity recognition: Methods, systems, and evaluation. Sensors 2013, 13, 1635–1650.
[3]	Ke, S.R.; Thuc, H.L.U.; Lee, Y.J.; Hwang, J.N.; Yoo, J.H.; Choi, K.H. A review on video-based human activity recognition. Computers 2013, 2, 88–131.
[4]	Zhu, G.; Yang, M.; Yu, K.; Xu, W.; Gong, Y. Detecting Video Events Based on Action Recognition in Complex Scenes Using Spatio-Temporal Descriptor. Proceedings of the ACM International Conference on Multimedia, Beijing, China, 19–23 October 2009; pp. 165–174.
[5]	Blunsden, S.; Fisher, R. The BEHAVE video dataset: Ground truthed video for multi-person behavior classification. Ann. BMVA 2010, 4, 1–12.
[6]	Park, J.Y.; Yi, J.H. Gesture recognition based interactive boxing game gesture recognition based interactive boxing game. Int. J. Inf. Tech. 2006, 12, 36–43.
[7]	Choi, J.; Cho, Y.; Han, T.; Yang, H.S. A View-Based Real-time Human Action Recognition System as an Interface for Human Computer Interaction. Proceedings of International Conference on Virtual Systems and Multimedia, Brisbane, Australia, 23–26 September 2007; pp. 112–120.
[8]	Robertson, N.; Reid, I. A general method for human activity recognition in video. Comput. Vis. Image Underst. 2006, 104, 232–248.
[9]	Rodriguez, M.; Ahmed, J.; Shah, M. Action MACH: A Spatio-Temporal Maximum Average Correlation Height Filter for Action Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8.
[10]	Choi, J.; Jeon, W.J.; Lee, S.C. Spatio-Temporal Pyramid Matching for Sports Videos. Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, Vancouver, Canada, 30–31 October 2008; pp. 291–297.
[11]	Jones, S.; Shao, L.; Zhang, J.; Liu, Y. Relevance feedback for real-world human action retrieval. Pattern Recogn. Lett. 2012, 33, 446–452.
[12]	Schuldt, C.; Laptev, I.; Caputo, B. Recognizing Human Actions: A Local SVM Approach. Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 23–26 August 2004; pp. 32–36.
[13]	Gorelick, L.; Blank, M.; Shechtman, E.; Irani, M.; Basri, R. Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 2247–2253.
[14]	Marsza？ek, M.; Laptev, I.; Schmid, C. Actions in Context. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 2929–2936.
[15]	Liu, J.; Luo, J.; Shah, M. Recognizing Realistic Actions from Videos in the Wild. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1996–2003.
[16]	Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A Large Video Database for Human Motion Recognition. Proceedings of the IEEE International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563.
[17]	Laptev, I.; Marszalek, M.; Schmid, C.; Rozenfeld, B. Learning Realistic Human Actions from Movies. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 6–13 November 2011; pp. 1–8.
[18]	Han, D.; Bo, L.; Sminchisescu, C. Selection and Context for Action Recognition. Proceedings of IEEE International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October 2009; pp. 1933–1940.
[19]	Sadanand, S.; Corso, J. Action Bank: A High-Level Representation of Activity in Video. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 1234–1241.
[20]	Wang, H.; Klser, A.; Schmid, C.; Liu, C.L. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vision. 2013, 103, 60–79.
[21]	Ramanan, D.; Forsyth, D.A. Automatic Annotation of Everyday Movements. Proceedings of Advances in Neural Information Processing Systems, Vancouver and Whistler, BC, Canada, 8–13 December 2003.
[22]	Wang, L.; Cheng, L.; Thi, T.H.; Zhang, J. Human Action Recognition from Boosted Pose Estimation. Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, Sydney, Australia, 1–3 December 2010; pp. 308–313.
[23]	Fei-Fei, L.; Perona, P. A Bayesian Hierarchical Model for Learning Natural Scene Categories. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 524–531.
[24]	Lazebnik, S.; Schmid, C.; Ponce, J. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2169–2178.
[25]	Gemert, J.V.; Veenman, C.; Smeulders, A.; Geusebroek, J. Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intel. 2010, 32, 1271–1283.
[26]	Yang, J.; Yu, K.; Gong, Y.; Huang, T. Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1794–1801.
[27]	Wang, J.; Yang, J.; Yu, K.; Lv, F.; Huang, T.; Gong, Y. Locality-Constrained Linear Coding for Image Classification. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,, San Francisco, CA, USA, 13–18 June 2010; pp. 3360–3367.
[28]	Wang, H.; Ullah, M.M.; Klaser, A.; Laptev, I.; Schmid, C. Evaluation of Local Spatio-Temporal Features for Action Recognition. Proceedings of British Machine Vision Conference, London, UK, 7–10 September 2009; pp. 124.1–124.11.
[29]	Laptev, I. On space-time interest points. Int. J. Comput. Vision 2005, 64, 107–123.
[30]	Dollár, P.; Rabaud, V.; Cottrell, G.; Belongie, S. Behavior Recognition via Sparse Spatio-Temporal Features. Proceedings of the Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China, 15–16 October 2005; pp. 65–72.
[31]	Dalal, N.; Triggs, B.; Schmid, C. Human Detection Using Oriented Histograms of Flow and Appearance. Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 428–441.
[32]	Klaeser, A.; Marszalek, M.; Schmid, C. A Spatio-Temporal Descriptor Based on 3D-Gradients. Proceedings of the British Machine Vision Conference, Leeds, UK, 1–4 September 2008; pp. 99.10–99.1.
[33]	Ni, B.; Wang, G.; Moulin, P. RGBD-HuDaAct: A Color-Depth Video Database for Human Daily Activity Recognition. Proceedings of IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, 6–13 November 2011; pp. 1147–1153.
[34]	Jiang, Y.G.; Dai, Q.; Xue, X.; Liu, W.; Ngo, C.W. Trajectory-Based Modeling of Human Actions With Motion Reference Points. Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 425–438.
[35]	Cristianini, N.; Shawe-Taylor, J.; Elisseeff, A.; Kandola, J.S. On Kernel-Target Alignment. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2001; pp. 367–373.
[36]	Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods; Cambridge University Press: Cambridge, UK, 2000.
[37]	Shawe-Taylor, J.; Cristianini, N. Kernel Methods for Pattern Analysis; Cambridge University Press: Cambridge, UK, 2004.
[38]	Barla, A.; Odone, F.; Verri, A. Histogram Intersection Kernel for Image Classification. Proceedings of the International Conference on Image Processing, Barcelona, Catalonia, Spain, 14–18 September 2003; pp. 513–516.
[39]	Maji, S.; Berg, A.C. Max-Margin Additive Classifiers for Detection. Proceedings of IEEE International Conference on Computer Vision, Kyoto, Japan, 27 September–4 October, 2009; pp. 40–47.
[40]	Wu, J. A Fast Dual Method for HIK SVM Learning. Proceedings of the European Conference on Computer Vision, Hersonissos, Heraklion, Crete, Greece, 5–11 September 2010; pp. 552–565.
[41]	Gonen, M.; Alpayd, E. Multiple kernel learning algorithms. J. Mach. Learn. Res. 2011, 12, 2211–2268.
[42]	Gilbert, A.; Illingworth, J.; Bowden, R. Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 883–897.
[43]	Hartigan, J.A.; Wong, M.A. A K-means clustering algorithm. JSTOR: Appl. Stat. 1979, 28, 100–108.
[44]	Sapienza, M.; Cuzzolin, F.; Torr, P.H. Learning Discriminative Space-Time Actions from Weakly Labelled Videos. Proceedings of the British Machine Vision Conference, Guildford, Surrey, UK, 3–7 September 2012; pp. 122.12–123.1.
[45]	Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 27:1–27:27.
[46]	Song, Y.; Zheng, Y.T.; Tang, S.; Zhou, X.; Zhang, Y.; Lin, S.; Chua, T.S. Localized Multiple Kernel Learning for Realistic Human Action Recognition in Videos. IEEE Trans. Circuit. Syst. Video Techn. 2011, 21, 1193–1202.
[47]	Le, Q.; Zou, W.; Yeung, S.; Ng, A. Learning Hierarchical Invariant Spatio-Temporal Features for Action Recognition with Independent Subspace Analysis. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3361–3368.
[48]	Bhattacharya, S.; Sukthankar, R.; Jin, R.; Shah, M. A Probabilistic Representation for Efficient Large Scale Visual Recognition Tasks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 2593–2600.
[49]	Brendel, W.; Todorovic, S. Activities as Time Series of Human Postures. Proceedings of the European Conference on Computer Vision, Hersonissos, Heraklion, Crete, Greece, 5–11 September 2010; pp. 721–734.
[50]	Ikizler-Cinbis, N.; Sclaroff, S. Object, Scene and Actions: Combining Multiple Features for Human Action Recognition. Proceedings of the European Conference on Computer Vision, Hersonissos, Heraklion, Crete, Greece, 5–11 September 2010; pp. 494–507.
[51]	Kliper-Gross, O.; Gurovich, Y.; Hassner, T.; Wolf, L. Motion Interchange Patterns for Action Recognition in Unconstrained Videos. Proceedings of the European conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 256–269.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133