%0 Journal Article %T Flexible Human Behavior Analysis Framework for Video Surveillance Applications %A Weilun Lao %A Jungong Han %A Peter H. N. de With %J International Journal of Digital Multimedia Broadcasting %D 2010 %I Hindawi Publishing Corporation %R 10.1155/2010/920121 %X We study a flexible framework for semantic analysis of human motion from surveillance video. Successful trajectory estimation and human-body modeling facilitate the semantic analysis of human activities in video sequences. Although human motion is widely investigated, we have extended such research in three aspects. By adding a second camera, not only more reliable behavior analysis is possible, but it also enables to map the ongoing scene events onto a 3D setting to facilitate further semantic analysis. The second contribution is the introduction of a 3D reconstruction scheme for scene understanding. Thirdly, we perform a fast scheme to detect different body parts and generate a fitting skeleton model, without using the explicit assumption of upright body posture. The extension of multiple-view fusion improves the event-based semantic analysis by 15%¨C30%. Our proposed framework proves its effectiveness as it achieves a near real-time performance (13¨C15 frames/second and 6¨C8 frames/second) for monocular and two-view video sequences. 1. Introduction Visual surveillance for human-behavior analysis has been investigated worldwide as an active research topic [1]. In order to have automatic surveillance accepted by a large community, it requires a sufficiently high accuracy and the computation complexity should enable a real-time performance. In the video-based surveillance application, even if the motion of persons is known, this is not sufficient to describe the posture of the person. The postures of the persons can provide important clues for understanding their activities. Therefore, accurate detection and recognition of various human postures both contribute to the scene understanding. The accuracy of the system is hampered by the use of a single camera, in case of complex situations and several people undertaking actions in the same scene. Often, the posture of people is occluded, so that the behavior cannot be realized in high accuracy. In this paper, we contribute to improve the analysis accuracy by exploiting the use of second camera and mapping the event into a 3D scene model, that enables analysis of the behavior in the 3D domain. Let us now discuss related work from the literature. 1.1. Related Work Most surveillance systems have focused on understanding the events through the study of trajectories and positions of persons using a priori knowledge about the scene. The Pfinder [2] system was developed to describe a moving person in an indoor environment. It tracks a single nonoccluded person in complex scenes. The VSAM [3] system can monitor %U http://www.hindawi.com/journals/ijdmb/2010/920121/