Home OALib Journal OALib PrePrints Submit Ranking News My Lib FAQ About Us Follow Us+
 All Title Author Keywords Abstract
 Publish in OALib Journal ISSN: 2333-9721 APC: Only \$99

# Modeling Vehicle Crash Frequency When Multicollinearity Exists in Vehicle Crash Data: Ridge Regression versus Ordinary Least Squares Linear Regression

DOI: 10.4236/oalib.1108873, PP. 1-17

Subject Areas: Mathematical Analysis, Applied Statistical Mathematics

 Full-Text   Cite this paper

Abstract

Ridge Regression is an important statistical method in modeling vehicle crash frequency when crash data contains collinear predictors. The term multicollinearity refers to the condition in which two or more predictors are highly correlated with one another. This would make the explanatory variables become very sensitive to small changes in the model. Multicollinearity reduces the precision of the estimated coefficients, which weakens the statistical power of the regression model. Common methods to address multicollinearity include: variable selection and ridge regression. Variable selection simply entails dropping predictors that are highly correlated in the model. But sometimes this is not possible, especially when a variable that contributes to the collinearity might be a main predictor in the model. However, using ridge regression will allow retention of all explanatory variables of interest, even if they are highly collinear, and provide information regarding which coefficients are the most sensitive to multicollinearity. Ridge regression works by adding a degree of bias to the regression estimates that reduce the standard errors and produce estimates that are much more reliable. This paper uses a five-year vehicle crash data extending from 2011 to 2015 on the interstate highway (I-90) in the state of Minnesota, USA. The data has shown multicollinearity between some independent variables. Results show that the Ridge regression is an effective tool to address the existing multicollinearity and produce accurate regression estimates compared with multiple linear regression.

Cite this paper

Abdulhafedh, A. (2022). Modeling Vehicle Crash Frequency When Multicollinearity Exists in Vehicle Crash Data: Ridge Regression versus Ordinary Least Squares Linear Regression. Open Access Library Journal, 9, e8873. doi: http://dx.doi.org/10.4236/oalib.1108873.

References

 [1] Washington, S.P., Karlaftis, M.G. and Mannering, F. (2010) Statistical and Econometric Methods for Transportation Data Analysis. 2nd Edition, Chapman Hall/CRC, Boca Raton. [2] Cule, E. and De Iorio, M. (2012) A Semi-Automatic Method to Guide the Choice of Ridge Parameter in Ridge Regression. arXiv: 1205.0686 http://arxiv.org/pdf/1205.0686.pdf [3] Ahn, J.J., Kim, Y.M., Yoo, K., Park, J. and Oh, K.J. (2012) Using GA-Ridge Regression to Select Hydro-Geological Parameters Influencing Groundwater Pollution Vulnerability. Environmental Monitoring and Assessment, 184, 6637-6645. https://doi.org/10.1007/s10661-011-2448-1 [4] Alkhamisi, M., Khalaf, G. and Shukur, G. (2006) Some Modifications for Choosing Ridge Parameters. Communications in Statistics: Theory and Methods, 35, 2005-2020. https://doi.org/10.1080/03610920600762905 [5] Abdulhafedh, A. (2022) Comparison between Common Statistical Modeling Techniques Used in Research, Including: Discriminant Analysis vs Logistic Regression, Ridge Regression vs LASSO, and Decision Tree vs Random Forest. Open Access Library Journal, 9, e8414. https://doi.org/10.4236/oalib.1108414 [6] Cameron, A.C. and Trivedi, P.K. (1998) Regression Analysis of Count Data. Cambridge University Press, Cambridge, UK. https://doi.org/10.1017/CBO9780511814365 [7] Abdulhafedh, A. (2017) Road Crash Prediction Models: Different Statistical Modeling Approaches. Journal of Transportation Technologies, 7, 190-205. https://doi.org/10.4236/jtts.2017.72014 [8] Hilbe, J. (2014) Modeling Count Data. Cambridge University Press, London. [9] Alkhamisi, M.A. and Shukur, G. (2008) Developing Ridge Parameters for SUR Model. Communications in Statistics: Theory and Methods, 37, 544-564. https://doi.org/10.1080/03610920701469152 [10] Al-Hassan, Y.M. (2008) A Monte Carlo Evaluation of Some Ridge-Type Estimators. Jordan Journal of Applied Sciences, 10, 101-110. [11] Anders, B.J. (2001) Ridge Regression and Inverse Problems. Stocks University, Sweden. [12] Dorugade, A.V. and Kashid, D.N. (2010) Alternative Method for Choosing Ridge Parameter for Regression. Applied Mathematical Sciences, 4, 447-456. [13] El-Dereny, M. and Rashwan, N.I. (2011) Solving Multicollinearity Problem Using Ridge Regression Models. International Journal of Contemporary Mathematical Sciences, 6, 585-600. [14] Abdulhafedh, A. (2016) Crash Frequency Analysis. Journal of Transportation Technologies, 6, 169-180. https://doi.org/10.4236/jtts.2016.64017 [15] Hoerl, A.E. and Kennard, R.W. (1970) Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12, 55-67. https://doi.org/10.1080/00401706.1970.10488634 [16] Hoerl, A.E. and Kennard, R.W. (1970) Ridge Regression: Applications to Nonorthogonal Problems. Technometrics, 12, 69-82. https://doi.org/10.1080/00401706.1970.10488635 [17] Pasha, G.R. and Shah, M.A. (2004) Application of Ridge Regression to Multicollinear Data. Journal of Research (Science), 15, 97-106. [18] Mansson, K., Shukur, G. and Golam Kibria, B.M. (2010) A Simulation Study of Some Ridge Regression Estimators under Different Distributional Assumptions. Communications in Statistics: Simulation and Computation, 39, 1639-1670. https://doi.org/10.1080/03610918.2010.508862 [19] Lauridsen, J. and Mur, J. (2006) Multicollinearity in Cross-Sectional Regressions. Journal of Geographical Systems, 8, 317-333. https://doi.org/10.1007/s10109-006-0031-z [20] Chopra, P., Sharma, R.K. and Kumar, M. (2013) Ridge Regression for the Prediction of Compressive Strength of Concrete. International Journal of Innovations in Engineering and Technology (IJIET), 2, 106-111. [21] Zaka, A. and Akhter, A.S. (2013) Methods for Estimating the Parameters of the Power Function Distribution. Pakistan Journal of Statistics and Operation Research, 9, 213-224. https://doi.org/10.18187/pjsor.v9i2.488 [22] Farrar, D.E. and Glauber, R.R. (1967) Multicollinearity in Regression Analysis: The Problem Revisited. The Review of Economics and Statistics, 49, 92-107. https://doi.org/10.2307/1937887 [23] Abdulhafedh, A. (2016) Crash Severity Modeling in Transportation Systems. PhD Dissertation. University of Missouri-Columbia, MO, USA. https://doi.org/10.32469/10355/59817 [24] Frank, I.E. and Friedman, J.H. (1993) A Statistical View of Some Chemometrics Regression Tools. Technometrics, 35, 109-135. https://doi.org/10.1080/00401706.1993.10485033 [25] Abdulhafedh, A. (2017) Road Traffic Crash Data: An Overview on Sources, Problems, and Collection Methods. Journal of Transportation Technologies, 7, 206-219. https://doi.org/10.4236/jtts.2017.72015 [26] Judge, G., Griffiths, W.E., Hill, R.C., Lutkepohl, H. and Lee, T.C. (1985) The Theory and Practice of Econometrics. 2nd Edition, Wiley, New York. [27] Abdulhafedh, A. (2021) Incorporating K-Means, Hierarchical Clustering and PCA in Customer Segmentation. Journal of City and Development, 3, 12-30. [28] Fu, W.J. (1998) Penalized Regressions: The Bridge versus the Lasso. Journal of Computational and Graphical Statistics, 7, 397-416. https://doi.org/10.1080/10618600.1998.10474784 [29] Duzan, H. and Shariff, N.S. M. (2015) Ridge Regression for Solving the Multicollinearity Problem: Review of Methods and Models. Journal of Applied Sciences, 15, 392-404. https://doi.org/10.3923/jas.2015.392.404 [30] Abdulhafedh, A. (2022) Incorporating Multiple Linear Regression in Predicting the House Prices Using a Big Real Estate Dataset with 80 Independent Variables. Open Access Library Journal, 9, e8346. https://doi.org/10.4236/oalib.1108346 [31] Khalaf, G. (2012) A Proposed Ridge Parameter to Improve the Least Square Estimator. Journal of Modern Applied Statistical Methods, 11, Article 15. https://doi.org/10.22237/jmasm/1351743240 [32] Singh, R. (2012) Solution of Multicollinearity by Ridge Regression. International Journal of Research in Computer Application & Management, 2, 130-136. [33] Abdulhafedh, A. (2017) Incorporating the Multinomial Logistic Regression in Vehicle Crash Severity Modeling: A Detailed Overview. Journal of Transportation Technologies, 7, 279-303. https://doi.org/10.4236/jtts.2017.73019 [34] Gorman, J.W. and Toman, R.J. (1966) Selection of Variables for Fitting Equations to Data. Technometrics, 8, 27-51. https://doi.org/10.1080/00401706.1966.10490322 [35] Heinze, G. and Schemper, M. (2002) A Solution to the Problem of Separation in Logistic Regression. Statistics in Medicine, 21, 2409-2419. https://doi.org/10.1002/sim.1047 [36] Goldstein, M. and Smith, A.F.M. (1974) Ridge-Type Estimators for Regression Analysis. Journal of the Royal Statistical Society: Series B (Methodological), 36, 284-291. https://doi.org/10.1111/j.2517-6161.1974.tb01006.x [37] Golub, G.H., Heath, M. and Wahba, G. (1979) Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics, 21, 215-223. https://doi.org/10.1080/00401706.1979.10489751 [38] Abdulhafedh, A. (2017) A Novel Hybrid Method for Measuring the Spatial Autocorrelation of Vehicular Crashes: Combining Moran’s Index and Getis-Ord Statistic. Open Journal of Civil Engineering, 7, 208-221. https://doi.org/10.4236/ojce.2017.72013 [39] Abdulhafedh, A. (2017) Identifying Vehicular Crash High Risk Locations along Highways via Spatial Autocorrelation Indices and Kernel Density Estimation. World Journal of Engineering and Technology, 5, 198-215. https://doi.org/10.4236/wjet.2017.52016 [40] Khalaf, G. and Shukur, G. (2005) Choosing Ridge Parameter for Regression Problems. Communications in Statistics: Theory and Methods, 34, 1177-1182. https://doi.org/10.1081/STA-200056836

Full-Text