We present a novel system for detection, localization and tracking of multiple people, which fuses a multi-view computer vision approach with a radio-based localization system. The proposed fusion combines the best of both worlds, excellent computer-vision-based localization, and strong identity information provided by the radio system, and is therefore able to perform tracking by identification, which makes it impervious to propagated identity switches. We present comprehensive methodology for evaluation of systems that perform person localization in world coordinate system and use it to evaluate the proposed system as well as its components. Experimental results on a challenging indoor dataset, which involves multiple people walking around a realistically cluttered room, confirm that proposed fusion of both systems significantly outperforms its individual components. Compared to the radio-based system, it achieves better localization results, while at the same time it successfully prevents propagation of identity switches that occur in pure computer-vision-based tracking.
References
[1]
Hightower, J.; Borriello, G. Location systems for ubiquitous computing. Computer 2001, 34, 57–66.
[2]
Yilmaz, A.; Javed, O.; Shah, M. Object tracking: A survey. ACM Comput. Surv. 2006, 38. Article No. 13.
[3]
Liu, H.; Darabi, H.; Banerjee, P.; Liu, J. Survey of wireless indoor positioning techniques and systems. IEEE Trans. Syst. Man Cyber. C Appl. Rev. 2007, 37, 1067–1080.
[4]
Santiago, C; Sousa, A.; Estriga, M.; Reis, L.; Lames, M. Survey on Team Tracking Techniques Applied to Sports. Proceedings of the 2010 International Conference on Autonomous and Intelligent Systems, Povoa de Varzim, Portugal, 21–23 June 2010; pp. 1–6.
[5]
MVL Lab5 Dataset. Available online: http://vision.fe.uni-lj.si/research/mvl_lab5/ (accessed on 21 December 2012).
[6]
Iwase, S.; Saito, H. Parallel Tracking of All Soccer Players by Integrating Detected Positions in Multiple View Images. Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 23–26 August 2004. Volume 4; pp. 751–754.
[7]
Xu, M.; Orwell, J.; Jones, G. Tracking Football Players with Multiple Cameras. Proceedings of the 2004 International Conference on Image Processing, Singapore, 24–27 October 2004. Volume 5; pp. 2909–2912.
[8]
Otsuka, K.; Mukawa, N. Multiview Occlusion Analysis for Tracking Densely Populated Objects Based on 2-D Visual Angles. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June– 2 July 2004. Volume 1; pp. 90–97.
[9]
Kristan, M.; Per?, J.; Per?e, M.; Kovacic, S. Closed-world tracking of multiple interacting targets for indoor-sports applications. Comput. Vis. Image Understand 2009, 113, 598–611.
[10]
Fleuret, F.; Berclaz, J.; Lengagne, R.; Fua, P. Multicamera people tracking with a probabilistic occupancy map. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 267–282.
[11]
Khan, S.; Shah, M. Tracking multiple occluding people by localizing on multiple scene planes. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 505–519.
Ben Shitrit, H.; Berclaz, J.; Fleuret, F; Fua, P. Tracking Multiple People under Global Appearance Constraints. Proceedings of the 2011 IEEE International Conference on Computer Vision, Barcelona, Spain, 6– 13 November 2011; pp. 137–144.
[14]
Moravec, H.P. Sensor fusion in certainty grids for mobile robots. AI Magazine 1988, 9, 61–74.
[15]
Beymer, D. Person Counting Using Stereo. Proceedings of the 2000 Workshop on Human Motion, Austin, TX, USA, 7– 8 December 2000; pp. 127–133.
[16]
Yang, D.; Gonzalez-Banos, H.; Guibas, L. Counting People in Crowds with a Real-Time Network of Simple Image Sensors. Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France, 14– 17 October 2003. Volume 1; pp. 122–129.
[17]
Franco, J.S.; Boyer, E. Fusion of Multiview Silhouette Cues Using a Space Occupancy Grid. Proceedings of the 10th IEEE International Conference on Computer Vision, Beijing, China, 17– 20 October 2005. Volume 2; pp. 1747–1753.
[18]
Delannay, D.; Danhier, N.; De Vleeschouwer, C. Detection and Recognition of Sports (Wo)Men from Multiple Views. Proceedings of the 3rd ACM/IEEE International Conference on Distributed Smart Cameras, Como, Italy, 30 August– 2 October 2009; pp. 1–7.
[19]
Munoz-Salinas, R. A Bayesian plan-view map based approach for multiple-person detection and tracking. Pattern Recog. 2008, 41, 3665–3676.
[20]
Losada, C.; Mazo, M.; Palazuelos, S.; Pizarro, D.; Marrón, M. Multi-camera sensor system for 3D segmentation and localization of multiple mobile robots. Sensors 2010, 10, 3261–3279.
[21]
Berclaz, J.; Fleuret, E.; Fua, P. Principled Detection-by-Classification from Multiple Views. Proceedings of the 3rd International Conference on Computer Vision Theory and Applications, Madeira, Portugal, 22–25 January 2008. Volume 2; pp. 375–382.
[22]
Alahi, A.; Boursier, Y.; Jacques, L.; Vandergheynst, P. Sport Players Detection and Tracking with a Mixed Network of Planar and Omnidirectional Cameras. Proceedings of the 3rd ACM/IEEE International Conference on Distributed Smart Cameras, Como, Italy, 30 August– 2 October 2009; pp. 1–8.
[23]
Zhang, X.; Rad, A.B.; Wong, Y.K. Sensor fusion of monocular cameras and laser rangefinders for line-based simultaneous localization and mapping (SLAM) tasks in autonomous mobile tobots. Sensors 2012, 12, 429–452.
[24]
Hernandez-Aceituno, J.; Acosta, L.; Arnay, R. Fusion of a variable baseline system and a range finder. Sensors 2011, 12, 278–296.
[25]
Musleh, B.; García, F.; Otamendi, J.; Armingol, J.M.; De la Escalera, A. Identifying and tracking pedestrians based on sensor fusion and motion stability predictions. Sensors 2010, 10, 8028–8053.
[26]
Baranski, P.; Strumillo, P. Enhancing positioning accuracy in urban terrain by fusing data from a GPS receiver, inertial sensors, stereo-camera and digital maps for pedestrian navigation. Sensors 2012, 12, 6764–6801.
[27]
Meingast, M.; Kushwaha, M.; Oh, S.; Koutsoukos, X.; Ledeczi, A.; Sastry, S. Fusion-Based Localization for a Heterogeneous Camera Network. Proceedings of the 2nd ACM/IEEE International Conference on Distributed Smart Cameras, Stanford, CA, USA, 7– 11 September 2008; pp. 1–8.
[28]
Shivappa, S.; Trivedi, M.; Rao, B. Audiovisual information fusion in human-computer interfaces and intelligent environments: A Survey. Proc. IEEE. 2010, 98, 1692–1715.
[29]
Zhang, W.; Cheung, S.; Chen, M. Hiding Privacy Information in Video Surveillance System. Proceedings of the 2005 IEEE International Conference on Image Processing, Genoa, Italy, 11– 14 September 2005. Volume 3; pp. 868–71.
[30]
Kulyukin, V; Gharpure, C; Nicholson, J.; Pavithran, S. RFID in Robot-Assisted Indoor Navigation for the Visually Impaired. Proceedings of the 2004 IEEE /RSJ International Conference on Intelligent Robots and Systems, Edmonton, Canada, 2– 6 August 2004. Volume 2; pp. 1979–1984.
[31]
Cerrada, C; Salamanca, S.; Perez, E.; Cerrada, J.; Abad, I. Fusion of 3D Vision Techniques and RFID Technology for Object Recognition in Complex Scenes. Proceedings of the 2007 IEEE International Symposium on Intelligent Signal Processing, Guwahati, India, 28– 29 December 2007; pp. 1–6.
[32]
Jia, S.; Sheng, J.; Chugo, D.; Takase, K. Human Recognition Using RFID Technology and Sterero Vision. Proceedings of the 2007 IEEE International Conference on Robotics and Biomimetics (ROBIO 2007), Sanya, China, 15– 18 December 2007; pp. 1488–1493.
[33]
Marchesotti, L.; Singh, R.; Regazzoni, C. Extraction of Aligned Video and Radio Information for Identity and Location Estimation in Surveillance Systems. Proceedings of the 7th International Conference on Information Fusion, Stockholm, Sweden, 28 June–1 July 2004; pp. 316–321.
[34]
Cattoni, A.; Dore, A.; Regazzoni, C. Video-Radio Fusion Approach for Target Tracking in Smart Spaces. Proceedings of the 10th International Conference on Information Fusion, Quebec, Canada, 9–12 July 2007; pp. 1–7.
[35]
Anne, M.; Crowley, J.L.; Devin, V.; Privat, G. Localisation Intra-Batiment Multi-Technologies: RFID, Wifi et Vision. Proceedings of the 2nd French-Speaking Conference on Mobility and Ubiquity Computing, Paris, French, June 2005; pp. 29–35.
[36]
Cucchiara, R.; Fornaciari, M.; Haider, R.; Mandreoli, F; Martoglia, R.; Prati, A.; Sassatelli, S. A Reasoning Engine for Intruders' Localization in Wide Open Areas Using a Network of Cameras and RFIDs. Proceedings of the 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Colorado Springs, CO, USA, 20– 25 June 2011; pp. 33–40.
[37]
Yu, X.; Ganz, A. Global Identification of Tracklets in Video Using Long Range Identity Sensors. Proceedings of the 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, Boston, MA, USA, 29 August– 1 September 2010; pp. 361–368.
[38]
Yu, X.; Ganz, A. A Calibration Free Hybrid RF and Video Surveillance System for Reliable Tracking and Identification. Proceedings of the 2011 IEEE International Conference on Technologies for Homeland Security, Waltham, UK, 13– 15 November 2011; pp. 242–247.
[39]
Gezici, S.; Tian, Z.; Giannakis, G.; Kobayashi, H.; Molisch, A.; Poor, H.; Sahinoglu, Z. Localization via ultra-wideband radios: A look at positioning aspects for future sensor networks. IEEE Signal Process. Mag. 2005, 22, 70–84.
[40]
Research & Development Packages—Ubisense. Available online: http://www.ubisense.net/en/rtls-solutions/research-packages.html (accessed on 21 December 2012).
[41]
Mandeljc, R.; Per?, J.; Kristan, M.; Kovacic, S. Fusion of Non-Visual Modalities into the Probabilistic Occupancy Map Framework for Person Localization. Proceedings of the 5th ACM/IEEE International Conference on Distributed Smart Cameras, Ghent, Belgium, 23– 26 August 2011; pp. 1–6.
[42]
Dibitonto, M.; Buonaiuto, A.; Marcialis, G.L.; Muntoni, D.; Medaglia, C.M.; Roli, F. Fusion of Radio and Video Localization for People Tracking. Proceedings of the 2nd International Conference on Ambient Intelligence, Amsterdam, The Netherlands, 16–18 November 2011; pp. 258–263.
[43]
Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The CLEAR MOT metrics. J. Image Video Process 2008, 2008, 1–10.
[44]
Kasturi, R.; Goldgof, D.; Soundararajan, P.; Manohar, V; Garofolo, J.; Bowers, R.; Boonstra, M.; Korzhova, V; Zhang, J. Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 319–336.
[45]
Scaramuzza, D.; Martinelli, A.; Siegwart, R. A Toolbox for Easily Calibrating Omnidirectional Cameras. Proceedings of the 2006 IEEE /RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9– 15 October 2006; pp. 5695–5701.
[46]
Bouguet, J.Y. Camera Calibration Toolbox for Matlab. Available online: http://www.vision.caltech.edu/bouguetj/calib_doc/ (accessed on 21 December 2012).
[47]
Zivkovic, Z. Improved Adaptive Gaussian Mixture Model for Cackground Subtraction. Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 23–26 August 2004. Volume 2; pp. 28–31.
[48]
Zivkovic, Z.; van der Heijden, F. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recog. Lett. 2006, 27, 773–780.
[49]
Kuhn, H.W. The Hungarian method for the assignment problem. Nov. Res. Logist. 2005, 52, 7–21.