Outlier Detection and Effects on Modeling

doi:10.4236/oalib.1106619

OALib Journal期刊
ISSN: 2333-9721
费用：99美元

查看量	下载量

Open Access Library Journal 7 2020

查看所有领域

Outlier Detection and Effects on Modeling

DOI: 10.4236/oalib.1106619, PP. 1-30

Christopher O. Arimie, Emmanuel O. Biu, Maxwell A. Ijomah

Subject Areas: Applied Statistical Mathematics

Keywords: Outliers’ Detection, Classification and Comparisons, Simple and Multiple Linear Regression Models

Full-Text Cite this paper Add to My Lib

Abstract

In this work, a comprehensive framework for traditional outlier detection techniques based on simple and multiple linear regression models was studied. Two data sets were used for the illustration and evaluation of each class of outlier detection techniques (analytical and graphical methods). Outlier detection aims at identifying such outlier in order to improve the analytic of data and suitable model built. Furthermore, comparisons of the different methods were done to highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques. The results show that by removing the influential points (or outliers), the model adequacy increased (from R2 = 0.72 to R2 = 0.97). It was observed that Jackknife residuals and Atkinson’s measure methods are very useful in detecting outliers; hence, both methods were recommended for outliers’ detection.

Cite this paper

Arimie, C. O. , Biu, E. O. and Ijomah, M. A. (2020). Outlier Detection and Effects on Modeling. Open Access Library Journal, 7, e6619. doi: http://dx.doi.org/10.4236/oalib.1106619.

References

[1]	Bollen, K.A. and Jackman, R.W. (1990) Regression Diagnostics: An Expository Treatment of Outliers and Influential Cases. In: Fox, J. and Scott, L.J., Eds., Modern Methods of Data Analysis, Sage, Newbury Park, 257-291.
[2]	Hawkins, D.M. (1983) Discussion of Paper by Beckman and Cook. Technometrics, 25, 155-156. https://doi.org/10.1080/00401706.1983.10487843
[3]	Dixon, W.J. (1950) Analysis of Extreme Values. The Annals of Mathematical Statistics, 21, 488-506. https://doi.org/10.1214/aoms/1177729747
[4]	Barnett, V. and Lewis, T. (1994) Outliers in Statistical Data. John Wiley, New York.
[5]	Montgomery, D.C., Peck, E.A. and Vining, G.G. (2013) Introduction to Linear Regression Analysis. John Wiley and Sons, Hoboken.
[6]	Abuzaid, A.H., Hussin, A.G. and Mohamed, I.B. (2008) Identifying Single Outlier in Linear Circular Regression Model Based on Circular Distance. Journal of Applied Probability and Statistics, 3, 107-117.
[7]	Zhang, Y., Meratnia, N. and Havinga, P.J.M. (2010) Outlier Detection Techniques for Wireless Sensor Networks Survey. IEEE Communication Survey and Tutorial, 12, 159-170. https://doi.org/10.1109/SURV.2010.021510.00088
[8]	Rousseeuw, P.J. (1984) Least Median of Squares Regression. Journal of the American Statistical Association, 79, 871-880. https://doi.org/10.1080/01621459.1984.10477105
[9]	Aggarwal, C.C. and Yu, P.S. (2013) Outlier Detection for High Dimensional Data. https://www.researchgate.net/publication/2401320_Outlier_Detection_for_High_Dimensional_Data
[10]	Arning, A., Agrawal, R. and Raghavan, P. (1996) A Linear Method for Deviation Detection in Large Databases. KDD-1996, Portland, 2-4 August 1996, 164-169.
[11]	Hodge, V.J. and Austin, J. (2018) An Evaluation of Classification and Outlier Detection Algorithms. https://arxiv.org/pdf/1805.00811.pdf
[12]	Sebert, D.M., Montgomery, D.C. and Rollier, D.A. (1998) Clustering Algorithm for Identifying Multiple Outliers in Linear Regression. Computational Statistics and Data Analysis, 27, 461-484. https://doi.org/10.1016/S0167-9473(98)00021-8
[13]	Worden, K., Manson, G. and Fieller, N.R.J. (2000) Damage Detection Using Outlier Analysis. Journal of Sound and Vibration, 229, 647-667. https://doi.org/10.1006/jsvi.1999.2514
[14]	Kitagawa, G. (1984) Bayesian Analysis of Outliers via Akaike’s Predictive Likelihood of a Model. Communication Statistics—Simulation Computation, 13, 107-126. https://doi.org/10.1080/03610918408812361
[15]	Fung, W.-K. and Bacon-Shone, J. (1993) Quasi-Bayesian Modelling of Multivariate Outliers. Computational Statistics and Data Analysis, 16, 271-278. https://doi.org/10.1016/0167-9473(93)90129-H
[16]	Belsey, D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley, Hoboken. https://doi.org/10.1002/0471725153
[17]	Cook, R.D. (1977) Detection of Influential Observation in Linear Regression. Technometrics, 19, 15-18. https://doi.org/10.1080/00401706.1977.10489493
[18]	Williams, D.X. (1973) Letter to the Editor. Applied Statistics, 22, 407-408. https://doi.org/10.1111/j.1467-9876.1973.tb00281.x
[19]	Meloun, M. and Militky, J. (2001) Detection of Single Influential Points in OLS Regression Model Building. Analytica Chimica Acta, 439, 169-191. https://doi.org/10.1016/S0003-2670(01)01040-6

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133