OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

- 2018

基于层叠的部件轨迹片段模型的视频人体姿态估计
Cascaded tracklet-based spatio-temporal model for video pose estimation

DOI: 10.6040/j.issn.1672-3961.0.2017.431

史青宣,王谦,田学东
SHI Qingxuan, WANG Qian, TIAN Xuedong

Keywords: 轨迹片段,姿态估计,马尔科夫随机场,隐马尔科夫模型,
Markov random field,tracklet,pose estimation,hidden Markov model

Full-Text Cite this paper Add to My Lib

Abstract:

摘要：为解决单目视频中的人体姿态估计问题,从人体的部件模型出发,以人体部件轨迹片段为实体构建时空概率图模型,通过逐步缩减轨迹片段在时域上的覆盖度,形成多级层叠模型,采用迭代的时域和空域交替解析的策略,从完整轨迹的推理开始,逐级过滤状态空间,直至获取人体各部件在每帧图像中的最优状态。为提供高质量的状态候选,引入全局运动信息,将单帧图像中人体姿态检测结果传播到整个视频形成轨迹,构成原始状态空间。在3个数据集上的对比试验表明,该方法较其他视频人体姿态估计方法达到了更高的估计精度。
Abstract: To address the problem of full body human pose estimation in video, a coarse-to-fine cascade of spatio-temporal models was developed in which the tracklet of body part was considered as basic unit. The notion of “tracklet” ranges from trajectory covering the whole video to body part in one frame. In this cascade, coarse models filtered the state space for the next level via their max-marginals. Loops in the graphical models made the inference intractable, the models were decomposed into Markov random fields and hidden Markov models. Through iterative spatial and temporal parsing, optimal solution was achieved in polynomial time. To generate reliable state hypotheses, the pose detections were propagated to whole video sequence through global motion cues. Our model was applied on three publicly available datasets and showed remarkable quantitative and qualitative improvements over the state-of-the-art approaches

References

[1]	FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial structures for object recognition[J]. International Journal of Computer Vision, 2005, 61(1): 55-79.
[2]	ZHAO L, GAO X B, TAO D C, et al. Tracking human pose using max-margin markov models[J]. IEEE Transactions on Image Processing, 2015, 24(12): 5274-5287.
[3]	ZHANG D, SHAH M. Human pose estimation in videos[C] //Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 2012-2020.
[4]	WANG L M, QIAO Y, TANG X O. Video action detection with relational dynamic-poselets[C] //Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 565-580.
[5]	ANDRILUKA M, ROTH S, SCHIELE B. Pictorial structures revisited: People detection and articulated pose estimation[C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009: 1014-1021.
[6]	EICHNER M, MARIN-JIMENEZ M, ZISSERMAN A, et al. 2d articulated human pose estimation and retrieval in(almost)unconstrained still images[J]. International Journal of Computer Vision, 2012, 99(2): 190-214.
[7]	CHERIAN A, MAIRAL J, ALAHARI K, et al. Mixing body-part sequences for human pose estimation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 2361-2368.
[8]	SIGAL L, BHATIA S, ROTH S, et al. Tracking loose-limbed people[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2004: 421-428.
[9]	WEISS D, SAPP B, TASKAR B. Sidestepping intractable inference with structured ensemble cascades[C] //Proceedings of Advances in Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2010: 2415-2423.
[10]	DI H J, TAO L M, XU G Y. A mixture of transformed hidden Markov models for elastic motion estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(10): 1817-1830.
[11]	PARK D, RAMANAN D. N-best maximal decoders for part models[C] //Proceedings of the IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011: 2627-2634.
[12]	WANG C Y, WANG Y Z, YUILLE AL. An approach to pose-based action recognition[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 915-922.
[13]	SAPP B, WEISS D, TASKAR B. Parsing human motion with stretchable models[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE, 2011: 1281-1288.
[14]	CRISTANI M, RAGHAVENDRA R, DEL BUE A, et al. Human behavior analysis in video surveillance: A social signal processing perspective[J]. Neurocomputing, 2013,100: 86-97.
[15]	SAPP B, JORDAN C, TASKAR B. Adaptive pose priors for pictorial structures[C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010: 422-429.
[16]	FERRARI V, MARIN-JIMENEZ M, ZISSERMAN A. Progressive search space reduction for human pose estimation[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE, 2008: 1-8.
[17]	SAPP B, WEISS D, TASKAR B. Parsing human motion with stretchable models[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE, 2011: 1281-1288.
[18]	RAMAKRISHNA V, KANADE T, SHEIKH Y. Tracking human pose by tracking symmetric parts[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE, 2013: 3728-3735.
[19]	SMINCHISESCU C, TRIGGS B. Estimating articulated human motion with covariance scaled sampling[J]. The International Journal of Robotics Research, 2003,22(6): 371-391.
[20]	SHEN H Q, YU S I, YANG Y, et al. Unsupervised video adaptation for parsing human motion[C] //Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014: 347-360.
[21]	李毅, 孙正兴, 陈松乐,等. 基于退火粒子群优化的单目视频人体姿态分析方法[J]. 自动化学报, 2012,38(5): 732-741. LI Yi, SUN Zhengxing, CHEN Songle, et al. 3D human pose analysis from monocular video by simulated annealed particle swarm optimization[J]. Acta Automatica Sinica, 2012, 38(5): 732-741.
[22]	吕峰, 邸慧军, 陆耀, 等. 基于分层弹性运动分析的非刚体跟踪方法[J]. 自动化学报, 2015,41(2): 295-303. LYU Feng, DI Huijun, LU Yao, et al. Non-rigid tracking method based on layered elastic motion analysis[J]. Acta Automatica Sinica, 2015, 41(2): 295-303.
[23]	SHI Q, DI H, LU Y, et al. Video pose estimation via medium granularity graphical model with spatial-temporal symmetric constraint part model[C] //Proceedings of IEEE International Conference on Image Processing. Phoenix, USA: IEEE, 2016:1299-1303.
[24]	SAPP B, TOSHEV A, TASKAR B. Cascaded models for articulated pose estimation[C] //Proceedings of European conference on computer vision. Hersonissos, Greece: Springer Berlin Heidelberg, 2010: 406-420.
[25]	GKIOXARI G, HARIHARAN B, GIRSHICH R, et al. Using k-poselets for detecting people and localizing their keypoints[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 3582-3589.
[26]	朱煜, 赵江坤, 王逸宁, 等. 基于深度学习的人体行为识别算法综述[J]. 自动化学报, 2016,42(6): 848-857. ZHU Yu, ZHAO Jiangkun, WANG Yining, et al. A review of human action recognition based on deep learning[J]. Acta Automatica Sinica, 2016, 42(6): 848-857.
[27]	SHOTTON J, GIRSHICK R, FITZGIBBON A, et al. Efficient human pose estimation from single depth images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2821-2840.
[28]	YANG Y, RAMANAN D. Articulated human detection with flexible mixtures of parts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(12): 2878-2890.
[29]	SHI Q X, DI H J, LU Y, et al. Human pose estimation with global motion cues[C] //Proceedings of the IEEE International Conference on Image Processing. Quebec, Canada: IEEE, 2015: 442-446.
[30]	TOKOLA R, CHOI W, SAVARESE S. Breaking the chain: liberation from the temporal Markov assumption for tracking human poses[C] //Proceedings of the IEEE International Conference on Computer Vision. Sydney, Australia: IEEE, 2013: 2424-2431.
[31]	TRAN D, WANG Y, FORSYTH D. Human parsing with a cascade of hierarchical poselet based pruners[C] //Proceedings of Multimedia and Expo(ICME), 2014 IEEE International Conference on. Chengdu, China: IEEE, 2014: 1-6.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

基于层叠的部件轨迹片段模型的视频人体姿态估计Cascaded tracklet-based spatio-temporal model for video pose estimation

基于层叠的部件轨迹片段模型的视频人体姿态估计
Cascaded tracklet-based spatio-temporal model for video pose estimation