%0 Journal Article %T Incorporating Multiple Linear Regression in Predicting the House Prices Using a Big Real Estate Dataset with 80 Independent Variables %A Azad Abdulhafedh %J Open Access Library Journal %V 9 %N 1 %P 1-21 %@ 2333-9721 %D 2022 %I Open Access Library %R 10.4236/oalib.1108346 %X This paper uses a multiple linear regression analysis to predict the final price of a house in a big real estate dataset. The data describes the sale of individual properties, various features, and details of each home in Ames, Iowa, USA from 2006 to 2010. The dataset comprises of 80 explanatory variables which include 23 nominal, 23 ordinal, 14 discrete, and 20 continuous variables. The goal was to use the training data to predict the sale prices of the houses in the testing data. The most important predictors were determined by random forest and kept in the analysis. The highly correlated predictors were dropped from the dataset. All assumptions of the linear regression were checked, and an optimal final predictive model was achieved by keeping the most influential predictors only. The model accuracy assessments produced very good results with an adjusted R-squared value of 0.9283, a residual standard error (RSE) of 0.094, and a root squared mean error (RSME) of 0.12792. In addition, the prediction error (Mean Squared Error, MSE) of the final model was found to be very small (12%) by applying different cross validation techniques, including the validation set approach, the K-fold approach and the Leave-One-Out-Cross Validation (LOOCV) approach. Results show that multiple linear regression can precisely predict the house prices with big dataset and large number of both categorical and numerical predictors. %K Multiple Linear Regression %K Ames House Price Prediction %K RSE %K RSME %K MSE %K K-Fold %K LOOCV %U http://www.oalib.com/paper/6768834