全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

An Alternative Approach to AIC and Mallow’s Cp Statistic-Based Relative Influence Measures (RIMS) in Regression Variable Selection

DOI: 10.4236/ojs.2016.61009, PP. 70-75

Keywords: Relative Influence Measure (RIM), BIC, AIC, Mallow’s Cp Statistic, Cook’s Distance

Full-Text   Cite this paper   Add to My Lib

Abstract:

Outlier detection is an important data screening type. RIM is a mechanism of outlier detection that identifies the contribution of data points in a regression model. A BIC-based RIM is essentially a technique developed in this work to simultaneously detect influential data points and select optimal predictor variables. It is an addition to the body of existing literature in this area of study to both having an alternative to the AIC and Mallow’s Cp Statistic-based RIM as well as conditions of no influence, some sort of influence and perfectly single outlier data point in an entire data set which are proposed in this work. The method is implemented in R by an algorithm that iterates over all data points; deleting data points one at a time while computing BICs and selecting optimal predictors alongside RIMs. From the analyses done using evaporation data to compare the proposed method and the existing methods, the results show that the same data cases selected as having high influences by the two existing methods are also selected by the proposed method. The three methods show same performance; hence the relevance of the BIC-based RIM cannot be undermined.

References

[1]  Fan, J. and Li, R. (2001) Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360.
http://dx.doi.org/10.1198/016214501753382273
[2]  Burnham, K.P. and Anderson, D.R. (2004) Kullback-Leibler Information as a Basis for Strong Inference in Ecological Studies. Wildlife Research, 28, 111-119.
http://dx.doi.org/10.1071/WR99107
[3]  Steel, S.J. and Uys, D.W. (2007) Variable Selection in Multiple Linear Regression: The Influence of Individual Cases. ORiON, 23, 123-136.
http://dx.doi.org/10.5784/23-2-52
[4]  Cook, R.D. (1977) Detection of Influential Observations in Linear Regression. Technometrics, 19, 15-18.
http://dx.doi.org/10.1080/00401706.1977.10489493
[5]  Cook, R.D. (1986) Assessment of Local Influence. Journal of the Royal Statistical Society, Series B, 48, 133-169.
[6]  Belsley, D.A., Kul, E. and Welsch, R.E. (1980) Regression Diagnostics. Wiley, New York.
http://dx.doi.org/10.1002/0471725153
[7]  Tibshirani, R.J. (1997) The LASSO Method for Variable Selection in the Cox Model. Statistics in Medicine, 16, 385-395.
http://dx.doi.org/10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
[8]  Zakaria, A., Howard, N.K. and Nkansah, B.K. (2014) On the Detection of Influential Outliers in Linear Regression Analysis. American Journal of Theoretical and Applied Statistics, 3, 100-106.
http://dx.doi.org/10.11648/j.ajtas.20140304.14
[9]  Shahriari, S., Faria, S., Goricalves, A.M. and Van Aelst, S. (2014) Outlier Detection and Robust Variable Selection for Least Angle Regression. Computational Science and Its Application-ICCSA, Vol. 8581, Springer-Verlag, New York, 512-522.
[10]  Wagenmakers, E.J. and Farrell, S. (2004) AIC Model Selection Using Akaike Weights. Psychonomic Bulletin and Review, 11, 192-196.
http://dx.doi.org/10.3758/BF03206482
[11]  Bozdogan, H. (1987) Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions. Psychometrika, 52, 345-370.
http://dx.doi.org/10.1007/BF02294361
[12]  Guetta, D. (2010) High Dimensional Variable Selection.
www.columbia.edu/.../part IIIEssay.pdf
[13]  Leger, C. and Altman, N. (1993) Assessing Influence in Variable Selection Problems. Journal of the American Statistical Association, 88, 547-556.
[14]  Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. The 2nd International Symposium on Information Theory, Budapest, 267-281.
[15]  Schwarz, G. (1978) Estimating the Dimension of a Model. Annals of Statistics, 6, 461-464.
http://dx.doi.org/10.1214/aos/1176344136
[16]  Burnham, K.P. and Anderson, D.R. (2002) Model Selection and Multi-Model Inference. Springer, New York.
[17]  Hastie, T., Tibshirani, R. and Freidman, J. (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, New York.
http://dx.doi.org/10.1007/978-0-387-21606-5
[18]  Kass, R.E. and Raftery, A.E. (1995) Bayes Factors. Journal of the American Statistical Association, 90, 773-795.
http://dx.doi.org/10.1080/01621459.1995.10476572
[19]  Kundu, D. and Murali, G. (1996) Model Selection in Linear Regression. Computational Statistics and Data Analysis, 22, 461-469.
http://dx.doi.org/10.1016/0167-9473(96)00008-4
[20]  Freund, R.J. (1979) Multicollinearity etc.: Some “New” Examples. Proceedings of the Statistical Computing Section, American Statistical Association, USA, 111-112.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133