OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

International Journal of Digital Multimedia Broadcasting 2010

Flexible Human Behavior Analysis Framework for Video Surveillance Applications

DOI: 10.1155/2010/920121

Weilun Lao,Jungong Han,Peter H. N. de With

Full-Text Cite this paper Add to My Lib

Abstract:

We study a flexible framework for semantic analysis of human motion from surveillance video. Successful trajectory estimation and human-body modeling facilitate the semantic analysis of human activities in video sequences. Although human motion is widely investigated, we have extended such research in three aspects. By adding a second camera, not only more reliable behavior analysis is possible, but it also enables to map the ongoing scene events onto a 3D setting to facilitate further semantic analysis. The second contribution is the introduction of a 3D reconstruction scheme for scene understanding. Thirdly, we perform a fast scheme to detect different body parts and generate a fitting skeleton model, without using the explicit assumption of upright body posture. The extension of multiple-view fusion improves the event-based semantic analysis by 15%–30%. Our proposed framework proves its effectiveness as it achieves a near real-time performance (13–15 frames/second and 6–8 frames/second) for monocular and two-view video sequences. 1. Introduction Visual surveillance for human-behavior analysis has been investigated worldwide as an active research topic [1]. In order to have automatic surveillance accepted by a large community, it requires a sufficiently high accuracy and the computation complexity should enable a real-time performance. In the video-based surveillance application, even if the motion of persons is known, this is not sufficient to describe the posture of the person. The postures of the persons can provide important clues for understanding their activities. Therefore, accurate detection and recognition of various human postures both contribute to the scene understanding. The accuracy of the system is hampered by the use of a single camera, in case of complex situations and several people undertaking actions in the same scene. Often, the posture of people is occluded, so that the behavior cannot be realized in high accuracy. In this paper, we contribute to improve the analysis accuracy by exploiting the use of second camera and mapping the event into a 3D scene model, that enables analysis of the behavior in the 3D domain. Let us now discuss related work from the literature. 1.1. Related Work Most surveillance systems have focused on understanding the events through the study of trajectories and positions of persons using a priori knowledge about the scene. The Pfinder [2] system was developed to describe a moving person in an indoor environment. It tracks a single nonoccluded person in complex scenes. The VSAM [3] system can monitor

References

[1]	W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual surveillance of object motion and behaviors,” IEEE Transactions on Systems, Man and Cybernetics Part C, vol. 34, no. 3, pp. 334–352, 2004.
[2]	C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: realtime tracking of the human body,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, pp. 780–785, 1997.
[3]	R. T. Collins, A. J. Lipton, T. Kanade, et al., “A system for video surveillance and monitoring,” Tech. Rep. CMU-RI-TR-00-12, CMU, Pittsburgh, Pa, USA, 2000.
[4]	I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: real-time surveillance of people and their activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809–830, 2000.
[5]	J. Han, D. Farin, and P. H. N. de With, “A real-time augmented-reality system for sports broadcast video enhancement,” in Proceedings of the ACM International Multimedia Conference and Exhibition, pp. 337–340, Augsburg, Germany, 2007.
[6]	Z. Zivkovic and F. van der Heijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognition Letters, vol. 27, no. 7, pp. 773–780, 2006.
[7]	J. Han, D. Farin, P. H. N. de With, and W. Lao, “Real-time video content analysis tool for consumer media storage system,” IEEE Transactions on Consumer Electronics, vol. 52, no. 3, pp. 870–878, 2006.
[8]	L. R. Rabiner, “Tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
[9]	P. Viola, M. J. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” in Proceedings of the IEEE International Conference on Computer Vision, vol. 2, pp. 734–741, 2003.
[10]	S. Park and J. K. Aggarwal, “Simultaneous tracking of multiple body parts of interacting persons,” Computer Vision and Image Understanding, vol. 102, no. 1, pp. 1–21, 2006.
[11]	H. Fujiyoshi, A. J. Lipton, and T. Kanade, “Real-time human motion analysis by image skeletonization,” IEICE Transactions on Information and Systems, vol. E87, no. 1, pp. 113–120, 2004.
[12]	C.-C. Yu, J.-N. Hwang, G.-F. Ho, and C.-H. Hsieh, “Automatic human body tracking and modeling from monocular video sequences,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07), vol. 1, pp. I917–I920, Honolulu, Hawaii, USA, 2007.
[13]	P. Peursum, H. H. Bui, S. Venkatesh, and G. West, “Robust recognition and segmentation of human actions using HMMs with missing observations,” EURASIP Journal on Applied Signal Processing, vol. 2005, no. 13, pp. 2110–2126, 2005.
[14]	W. Lao, J. Han, and P. H. N. de With, “Fast detection and modeling of human-body parts from monocular video,” in Articulated Motion and Deformable Objects, vol. 5098 of Lecture Notes in Computer Science, pp. 380–389, Springer, Berlin, Germany, 2008.
[15]	J. F. Allen and G. Ferguson, “Actions and events in interval temporal logic,” Journal of Logic Computation, vol. 4, pp. 531–579, 1994.
[16]	J. Han, M. Feng, and P. H. N. de With, “A real-time video surveillance system with human occlusion handling using nonlinear regression,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '08), pp. 305–308, Hannover, Germany, 2008.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133