We formulate a multi-matrices factorization model (MMF) for the missing sensor data estimation problem. The estimation problem is adequately transformed into a matrix completion one. With MMF, an n-by-t real matrix, R, is adopted to represent the data collected by mobile sensors from n areas at the time, T1, T2, ... , Tt, where the entry, Rij, is the aggregate value of the data collected in the ith area at T j . We propose to approximate R by seeking a family of d-by-n probabilistic spatial feature matrices, U(1), U(2) , ... , U(t) , and a probabilistic temporal feature matrix, V E R dxt, where Rj?≈ U T (j) Tj . We also present a solution algorithm to the proposed model. We evaluate MMF with synthetic data and a real-world sensor dataset extensively. Experimental results demonstrate that our approach outperforms the state-of-the-art comparison algorithms.
References
[1]
García-Laencina, P.J.; Sancho-Gómez, J.L.; Figueiras-Vidal, A.R.; Verleysen, M. K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputation 2009, 72, 1483–1493.
[2]
Ni, D.; Leonard, J.D.; Guin, A.; Feng, C. Multiple imputation scheme for overcoming the missing values and variability issues in ITS data. J. Transport. Eng. 2005, 131, 931–938.
[3]
Smith, B.L.; Scherer, W.T.; Conklin, J.H. Exploring imputation techniques for missing data in transportation management systems. Transport. Res. Record. J. Transport. Res. Board 2003, 1836, 132–142.
[4]
Qu, L.; Zhang, Y.; Hu, J.; Jia, L.; Li, L. A BPCA Based Missing Value Imputing Method for Traffic Flow Volume Data. Proceedings of 2008 IEEE Intelligent Vehicles Symposium, Eindhoven, The Neatherlands, 4–6 June 2008; pp. 985–990.
[5]
Jiang, N.; Gruenwald, L. Estimating Missing Data in Data Streams. In Advances in Databases: Concepts, Systems and Applications; Springer: Berlin/Heidelberg, Germany, 2007; pp. 981–987.
[6]
Netflix Prize. Avaiable online: http://www.netflixprize.com (accessed on 1 July 2013).
Koren, Y. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA; 2008; pp. 426–434.
[9]
Seung, D.; Lee, L. Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 2001, 13, 556–562.
[10]
Srebro, N. Learning with Matrix Factorizations. Ph.D Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA, 2004.
Independent and identically distributed random variables. Avaiable online: http://en.wikipedia.org/wiki/ndependent_and_identically_distributed_random_variables (accessed on 1 July 2013).
[13]
Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for rcommender systems. Computer 2009, 42, 30–37.
[14]
Xu, W.; Liu, X.; Gong, Y. Document Clustering Based on Non-Negative Matrix Factorization. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Pisa, Italy, 28 July–1 August 2003; pp. 267–273.
[15]
Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791.
[16]
Brunet, J.P.; Tamayo, P.; Golub, T.R.; Mesirov, J.P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA 2004, 101, 4164–4169.
[17]
Candès, E.J.; Recht, B. Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9, 717–772.
[18]
Hoyer, P.O. Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 2004, 5, 1457–1469.
[19]
Mairal, J.; Bach, F.; Ponce, J.; Sapiro, G. Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 2010, 11, 19–60.
[20]
Ke, Q.; Kanade, T. Robust L1Norm Factorization in the Presence of Outliers and Missing Data by Alternative Convex Programming. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, 20–26 June 2005; pp. 739–746.
[21]
Nati, N.S.; Jaakkola, T. Weighted Low-Rank Approximations. proceedings of the 20th International Conference on Machine Learning (ICML 2003), Washington, DC, USA, 21–24 August 2003; pp. 720–727.
[22]
Abernethy, J.; Bach, F.; Evgeniou, T.; Vert, J.P. Low-Rank Matrix Factorization with Attributes. Technical Report; N-24/06/MM; Paris, France, 2006.
[23]
Vapnik, V. Statistical Learning Theory; Wiley: New York, NY, USA, 1998.
[24]
Rissanen, J. Minimum Description Length Principle; Springer: Berlin, Germany, 2010.
[25]
Candè s, E.J.; Tao, T. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory 2010, 56, 2053–2080.
[26]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012.
[27]
Candès, E.J.; Romberg, J.; Tao, T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequencyinformation. IEEE Trans. Inf. Theory 2006, 52, 489–509.
Bertsekas, D.P. Nonlinear Programming; Athena Scientific: Belmont, MA, USA, 1999.
[30]
Tipping, M.E.; Bishop, C.M. Probabilistic principal component analysis. J. Royal Stat. Soc. Ser. B Stat. Methodol. 1999, 61, 611–622.
[31]
Qu, L.; Hu, J.; Li, L.; Zhang, Y. PPCA-based missing data imputation for traffic flow volume: A systematical approach. IEEE Trans. Intell. Transport. Syst. 2009, 10, 512–522.
[32]
Marlin, B. Collaborative Filtering: A Machine Learning Perspective. Ph.D Thesis, University of Toronto, ON, Canada, 2004.
[33]
Nguyen, L.N.; Scherer, W.T. Imputation Techniques to Account for Missing Data in Support of Intelligent Transportation Systems Applications. UVACTS-13-0-78; University of Virginia: Charlottesville: Charlottesville, VA, USA, 2003.
[34]
Gold, D.L.; Turner, S.M.; Gajewski, B.J.; Spiegelman, C. Imputing Missing Values in Its Data Archives for Intervals under 5 Minutes. Proceedings of 80th Annual Meeting of Transportation Research Board, Washington, DC, USA, 7–11 January 2001.
[35]
Shuai, M.; Xie, K.; Pu, W.; Song, G.; Ma, X. An Online Approach Based on Locally Weighted Learning for Real Time Traffic Flow Prediction. The 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Irvine, CA, USA, 5–7 November 2008.
[36]
Zhuhai. Avaiable online: http://en.wikipedia.org/wiki/Zhuhai (accessed on 1 July 2013).
[37]
Floating Car Data. Avaiable online: http://en.wikipedia.org/wiki/Floating_car_data (accessed on 1 July 2013).