Outlier detection is an important data
screening type. RIM is a mechanism of outlier detection that identifies the
contribution of data points in a regression model. A BIC-based RIM is
essentially a technique developed in this work to simultaneously detect
influential data points and select optimal predictor variables. It is an
addition to the body of existing literature in this area of study to both
having an alternative to the AIC and Mallow’s Cp Statistic-based RIM as well as
conditions of no influence, some sort of influence and perfectly single outlier
data point in an entire data set which are proposed in this work. The method is
implemented in R by an algorithm that iterates over all data points; deleting
data points one at a time while computing BICs and selecting optimal predictors
alongside RIMs. From the analyses done using evaporation data to compare the
proposed method and the existing methods, the results show that the same data
cases selected as having high influences by the two existing methods are also
selected by the proposed method. The three methods show same performance; hence
the relevance of the BIC-based RIM cannot be undermined.
References
[1]
Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360. http://dx.doi.org/10.1198/016214501753382273
[2]
Burnham, K.P. and Anderson, D.R. (2004) Kullback-Leibler Information as a Basis for Strong Inference in Ecological Studies. Wildlife Research, 28, 111-119. http://dx.doi.org/10.1071/WR99107
[3]
Steel, S.J. and Uys, D.W. (2007) Variable Selection in Multiple Linear Regression: The Influence of Individual Cases. ORiON, 23, 123-136. http://dx.doi.org/10.5784/23-2-52
[4]
Cook, R.D. (1977) Detection of Influential Observations in Linear Regression. Technometrics, 19, 15-18. http://dx.doi.org/10.1080/00401706.1977.10489493
[5]
Cook, R.D. (1986) Assessment of Local Influence. Journal of the Royal Statistical Society, Series B, 48, 133-169.
[6]
Belsley, D.A., Kul, E. and Welsch, R.E. (1980) Regression Diagnostics. Wiley, New York. http://dx.doi.org/10.1002/0471725153
[7]
Tibshirani, R.J. (1997) The LASSO Method for Variable Selection in the Cox Model. Statistics in Medicine, 16, 385-395. http://dx.doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
[8]
Zakaria, A., Howard, N.K. and Nkansah, B.K. (2014) On the Detection of Influential Outliers in Linear Regression Analysis. American Journal of Theoretical and Applied Statistics, 3, 100-106. http://dx.doi.org/10.11648/j.ajtas.20140304.14
[9]
Shahriari, S., Faria, S., Goricalves, A.M. and Van Aelst, S. (2014) Outlier Detection and Robust Variable Selection for Least Angle Regression. Computational Science and Its Application-ICCSA, Vol. 8581, Springer-Verlag, New York, 512-522.
[10]
Wagenmakers, E.J. and Farrell, S. (2004) AIC Model Selection Using Akaike Weights. Psychonomic Bulletin and Review, 11, 192-196. http://dx.doi.org/10.3758/BF03206482
[11]
Bozdogan, H. (1987) Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions. Psychometrika, 52, 345-370. http://dx.doi.org/10.1007/BF02294361
[12]
Guetta, D. (2010) High Dimensional Variable Selection. www.columbia.edu/.../part IIIEssay.pdf
[13]
Leger, C. and Altman, N. (1993) Assessing Influence in Variable Selection Problems. Journal of the American Statistical Association, 88, 547-556.
[14]
Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. The 2nd International Symposium on Information Theory, Budapest, 267-281.
[15]
Schwarz, G. (1978) Estimating the Dimension of a Model. Annals of Statistics, 6, 461-464. http://dx.doi.org/10.1214/aos/1176344136
[16]
Burnham, K.P. and Anderson, D.R. (2002) Model Selection and Multi-Model Inference. Springer, New York.
[17]
Hastie, T., Tibshirani, R. and Freidman, J. (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, New York. http://dx.doi.org/10.1007/978-0-387-21606-5
[18]
Kass, R.E. and Raftery, A.E. (1995) Bayes Factors. Journal of the American Statistical Association, 90, 773-795. http://dx.doi.org/10.1080/01621459.1995.10476572
[19]
Kundu, D. and Murali, G. (1996) Model Selection in Linear Regression. Computational Statistics and Data Analysis, 22, 461-469. http://dx.doi.org/10.1016/0167-9473(96)00008-4
[20]
Freund, R.J. (1979) Multicollinearity etc.: Some “New” Examples. Proceedings of the Statistical Computing Section, American Statistical Association, USA, 111-112.