Data Mining has become an important
technique for the exploration and extraction of data in numerous and various
research projects in different fields (technology, information technology,
business, the environment, economics, etc.). In the context of the analysis and
visualisation of large amounts of data extracted using Data Mining on a
temporary basis (time-series), free software such as R has appeared in the
international context as a perfect inexpensive and efficient tool of
exploitation and visualisation of time series. This has allowed the development
of models, which help to extract the most relevant information from large
volumes of data. In this regard, a script has been developed with the goal of
implementing ARIMA models, showing these as useful and quick mechanisms for the
extraction, analysis and visualisation of large data volumes, in addition to
presenting the great advantage of being applied in multiple branches of
knowledge from economy, demography, physics, mathematics and fisheries among
others. Therefore, ARIMA models appear as a Data Mining technique, offering
reliable, robust and high-quality results, to help validate and sustain the
research carried out.
References
[1]
IBM (2015). www-01.ibm.com/software/data/bigdata/what-is-big-data.html
[2]
Einav, L. and Levin, J. (2014) Economics in the Age of Big Data. Science, 346, 715-721.
http://dx.doi.org/10.1126/science.1243089
[3]
Lazer, D., Kennedy, R., King, G. and Vespignani, A. (2014) The Parable of Google Flu: Traps in Big Data Analysis. Science, 343, 1203-1205. http://dx.doi.org/10.1126/science.1248506
[4]
Fan, C., Xiao, F., Madsen, H. and Wang, D. (2015) Temporal Knowledge Discovery in Big BAS Data for Building Energy Management. Energy and Buildings, 109, 75-89. http://dx.doi.org/10.1016/j.enbuild.2015.09.060
[5]
Vera-Baquero, A., Colomo-Palacios, R. and Molloy, O. (2016) Real-Time Business Activity Monitoring and Analysis of Process Performance on Big-Data Domains. Telematics and Informatics, 33, 793-807.
http://dx.doi.org/10.1016/j.tele.2015.12.005
[6]
Krishnan, K. (2013) Data Warehousing in the Age of Big Data. Newnes, Boston.
[7]
Inmon, W.H. and Linstedt, D. (2015) Data Architecture: A Primer for the Data Scientist. Morgan Kaufmann, Boston.
[8]
Rathod, R.R. and Garg, R.D. (2016) Regional Electricity Consumption Analysis for Consumers Using Data Mining Techniques and Consumer Meter Reading Data. Electrical Power and Energy Systems, 78, 368-374.
http://dx.doi.org/10.1016/j.ijepes.2015.11.110
[9]
Zhang, Z., Kusiak, A., Zeng, Y. and Wei, X. (2016) Modeling and Optimization of a Wastewater Pumping System with Data-Mining Methods. Applied Energy, 164, 303-311. http://dx.doi.org/10.1016/j.apenergy.2015.11.061
[10]
Shaheen, M. and Khan, M.Z. (2016) A Method of Data Mining for Selection for Wind Turbines. Renewable and Sustainable Energy Reviews, 55, 1225-1233. http://dx.doi.org/10.1016/j.rser.2015.04.015
[11]
Box, G.E.P. and Jenkins, G.M. (1976) Time Series Analysis, Forecasting and Control. Holden-Day, San Francisco.
[12]
Batarseh, F.A. and Latif, E.A. (2015) Assessing the Quality of Service Using Big Data Analytics: With Application to Healthcare. Big Data Research, 4, 13-24. http://dx.doi.org/10.1016/j.bdr.2015.10.001
[13]
Legates, M.J. (1999) Evaluating the Use of Goodness of Fit Measures in Hydrologic and Hydroclimatic Model Validation. Water Resources Research, 35, 233-241. http://dx.doi.org/10.1029/1998WR900018
[14]
Abrahart, R.J. and See, L. (2000) Comparing Neural Network and Autoregressive Moving Average Techniques for the Provision of Continuous River Flow Forecasts in Two Contrasting Catchments. Hydrological Processes, 14, 2157-2172. http://dx.doi.org/10.1002/1099-1085(20000815/30)14:11/12<2157::AID-HYP57>3.0.CO;2-S
[15]
R Documentation (2016) ARIMA Modelling of Time Series.
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/arima.html
[16]
Dickey, D.A. and Fuller, W.A. (1979) Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association, 74, 427-431.
[17]
Kwiatkowski, D., Phillips, P.C.B., Schmidt, P. and Shinb, Y. (1992) Testing the Null Hypothesis of Stationary against the Alternative of a Unit Root. Journal of Econometrics, 54, 159-178.
http://dx.doi.org/10.1016/0304-4076(92)90104-Y
[18]
Breusch, T.S. and Pagan, A.R. (1979) A Simple Test for Heteroscedasticity and Random Coefficient Variation. Econometrica, 47, 1287-1294. http://dx.doi.org/10.2307/1911963
[19]
Ljung, G.M. and Box, G.E.P. (1978) On a Measure of Lack of Fit in Time Series Models. Biometrika, 65, 297-303.
http://dx.doi.org/10.1093/biomet/65.2.297
[20]
Chatfield, C. (2013) The Analysis of Time Series: An Introduction. CRC Press, Boca Raton.
[21]
Parreno, J., De la Fuente, D., Gómez, A. and Fernández, I. (2003) Previsión en el sector turístico en Espana con las metodologías Box-Jenkins y Redes neuronales. XIII Congreso Nacional ACEDE, Salamanca, Espana.
[22]
Holton, J. and Keating, B. (1996) Previsiones en los negocios. Irwin, Madrid.
[23]
Steel, R.G.D. and Torrie, J.H. (1960) Principles and Procedures of Statistics with Special Reference to the Biological Sciences. McGraw Hill, New York, 187-287.
[24]
Ventura, S., Silva, M., Pérez-Bendito, D. and Hervas, C. (1995) Artificial Neural Networks for Estimation of Kinetic Analytical Parameters. Analytical Chemistry, 67, 1521-1525. http://dx.doi.org/10.1021/ac00105a007
[25]
Nash, J.E. and Sutcliffe, J.V. (1970) River Flow Forecasting through Conceptual Models Part I-A Discussion of Principles. Journal of Hydrology, 10, 282-290. http://dx.doi.org/10.1016/0022-1694(70)90255-6
[26]
Kitanidis, P.K. and Bras, R.L. (1980) Real-Time Forecasting with a Conceptual Hydrologic Model: 2. Applications and Results. Water Resources Research, 16, 1034-1044. http://dx.doi.org/10.1029/WR016i006p01034
[27]
Grinó, R. (1992) Neural Networks for Univariate Time Series Forecasting and Their Application to Water Demand Prediction. Neural Network World, 2, 437-450.
[28]
Akaike, H. (1974) A New Look at the Statistical Identification Model. IEEE Transactions on Automatic Control, 19, 716-723. http://dx.doi.org/10.1109/TAC.1974.1100705
[29]
Motulsky, H.J. and Christopoulos, A. (2003) Fitting Models to Biological Data Using Linear and Nonlinear Regression. GraphPad Software Inc., San Diego, 351 p.
[30]
Diebold, F. (1999) Elementos de Pronósticos. International Thomson Editores, México, 106-128 p.
[31]
Giraldo Gómez, N.D. (2006) Series de Tiempo con R. Universidad Nacional de Colombia, Colombia.
[32]
Gaona, B. (2005) Matrices de covarianza estructuradas en modelos con medidas repericas. Tesis de maestría, Mayagüez, Puerto Rico.
[33]
Guyet, T. and Nicolas, H. (2016) Long Term Analysis of Time Series of Satellite Images. Pattern Recognition Letters, 70, 17-23. http://dx.doi.org/10.1016/j.patrec.2015.11.005
[34]
Siluyele, I. and Jere, S. (2016) Using Box-Jenkins Models to Forecast Mobile Cellular Subscription. Open Journal of Statistics, 6, 303-309. http://dx.doi.org/10.4236/ojs.2016.62026
[35]
Czerwinski, I.A., Gutiérrez-Estrada, J.C. and Hernando-Casal, J.A. (2007) Short-Term Forecasting of Halibut CPUE: Linear and Non-Linear Univariate Approaches. Fisheries Research, 86, 120-128.
http://dx.doi.org/10.1016/j.fishres.2007.05.006
[36]
Jere, S. and Moyo, E. (2016) Modelling Epidemiological Data Using Box-Jenkins Procedure. Open Journal of Statistics, 6, 295-302. http://dx.doi.org/10.4236/ojs.2016.62025
[37]
Arnau, J. (1981) Uso de los modelos de series temporales como técnica de análisis de los dise?os conductuales. Anuario de psicología, 25, 20-34.
[38]
Maté Jiménez, C. (2014) Big data.Un nuevo paradigma de análisis de datos. Anales de mecánica y electricidad, 10-16.