In order to reduce the risk of non-performing loans, losses, and improve the loan approval efficiency, it is necessary to establish an intelligent loan risk and approval prediction system. A hybrid deep learning model with 1DCNN-attention network and the enhanced preprocessing techniques is proposed for loan approval prediction. Our proposed model consists of the enhanced data preprocessing and stacking of multiple hybrid modules. Initially, the enhanced data preprocessing techniques using a combination of methods such as standardization, SMOTE oversampling, feature construction, recursive feature elimination (RFE), information value (IV) and principal component analysis (PCA), which not only eliminates the effects of data jitter and non-equilibrium, but also removes redundant features while improving the representation of features. Subsequently, a hybrid module that combines a 1DCNN with an attention mechanism is proposed to extract local and global spatio-temporal features. Finally, the comprehensive experiments conducted validate that the proposed model surpasses state-of-the-art baseline models across various performance metrics, including accuracy, precision, recall, F1 score, and AUC. Our proposed model helps to automate the loan approval process and provides scientific guidance to financial institutions for loan risk control.
References
[1]
Dansana, D., Patro, S.G.K., Mishra, B.K., Prasad, V., Razak, A. and Wodajo, A.W. (2023) Analyzing the Impact of Loan Features on Bank Loan Prediction Using Random Forest Algorithm. Engineering Reports, 6, e12707. https://doi.org/10.1002/eng2.12707
[2]
Sathish Kumar, L., Pandimurugan, V., Usha, D., Nageswara Guptha, M. and Hema, M.S. (2022) Random Forest Tree Classification Algorithm for Predicating Loan. Materials Today: Proceedings, 57, 2216-2222. https://doi.org/10.1016/j.matpr.2021.12.322
[3]
Arora, N. and Kaur, P.D. (2020) A Bolasso Based Consistent Feature Selection Enabled Random Forest Classification Algorithm: An Application to Credit Risk Assessment. Applied Soft Computing, 86, Article 105936. https://doi.org/10.1016/j.asoc.2019.105936
[4]
Melo Junior, L., Nardini, F.M., Renso, C., Trani, R. and Macedo, J.A. (2020) A Novel Approach to Define the Local Region of Dynamic Selection Techniques in Imbalanced Credit Scoring Problems. Expert Systems with Applications, 152, Article 113351. https://doi.org/10.1016/j.eswa.2020.113351
[5]
Fu, Y. (2016) A User Loan Approval Evaluation Model and Empirical Study Based on Decision Tree and Support Vector Machine Algorithms. Master’s Dissertation, University of Fujian.
[6]
Chen, Q. (2020) Research on Rural Commercial Credit Risk Prediction Based on SVM Method. Master’s Dissertation, University of Hunan Agricultural.
[7]
Sheikh, M.A., Goel, A.K. and Kumar, T. (2020) An Approach for Prediction of Loan Approval Using Machine Learning Algorithm. 2020 International Conference on Electronics and Sustainable Communication Systems, Coimbatore, 2-4 July 2020, 490-494. https://doi.org/10.1109/icesc48915.2020.9155614
[8]
Pandey, N., Gupta, R. and Uniyal, S. (2021) Loan Approval Prediction Using Machine Learning Algorithms Approach. International Journal of Innovative Research in Technology, 8, 898-902.
[9]
Uddin, N., Uddin Ahamed, M.K., Uddin, M.A., Islam, M.M., Talukder, M.A. and Aryal, S. (2023) An Ensemble Machine Learning Based Bank Loan Approval Predictions System with a Smart Application. International Journal of Cognitive Computing in Engineering, 4, 327-339. https://doi.org/10.1016/j.ijcce.2023.09.001
[10]
Li, M., Yan, C. and Liu, W. (2021) The Network Loan Risk Prediction Model Based on Convolutional Neural Network and Stacking Fusion Model. Applied Soft Computing, 113, Article 107961. https://doi.org/10.1016/j.asoc.2021.107961
[11]
Bhargav, P. and Sashirekha, K. (2023) A Machine Learning Method for Predicting Loan Approval by Comparing the Random Forest and Decision Tree Algorithms. Journal of Survey in Fisheries Sciences, 10, 1803-1813.
[12]
Zhu, X., Chu, Q., Song, X., Hu, P. and Peng, L. (2023) Explainable Prediction of Loan Default Based on Machine Learning Models. Data Science and Management, 6, 123-133. https://doi.org/10.1016/j.dsm.2023.04.003
[13]
Yang, Z., Zhang, Y.S, Guo, B.H., Zhao, B.Y. and Dai, Y.F. (2018) Deepcredit: Exploiting User Cickstream for Loan Risk Prediction in P2P Lending. Proceedings of the International AAAI Conference on Web and Social Media, Palo Alto, 25-28 June 2018, 444-453. https://doi.org/10.1609/icwsm.v12i1.15001
[14]
Wu, M., Du, C., Huang, Y., Cui, X. and Duan, J. (2021) Investigation on Loan Approval Based on Convolutional Neural Network. In: Ghosh, A. and Zhou, L.Z., Eds., Communications in Computer and Information Science, Springer International Publishing, 203-216. https://doi.org/10.1007/978-3-030-78615-1_18
[15]
Wu, M., Huang, Y. and Duan, J. (2019) Investigations on Classification Methods for Loan Application Based on Machine Learning. 2019 International Conference on Machine Learning and Cybernetics, Kobe, 7-10 July 2019, 1-6. https://doi.org/10.1109/icmlc48188.2019.8949252
[16]
Xiao, K., Jiang, X., Hou, P. and Zhu, H. (2024) Autoeis: Automatic Feature Embedding, Interaction and Selection on Default Prediction. Information Processing & Management, 61, Article 103526. https://doi.org/10.1016/j.ipm.2023.103526
[17]
Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002) Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 46, 389-422. https://doi.org/10.1023/a:1012487302797
[18]
Rojas, H., Alvarez, C. and Rojas, N. (2013) Statistical Hypothesis Testing for Information Value.
[19]
Siddiqi, N. (2012) Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley and SAS Business Series.
[20]
Kiranyaz, S., Ince, T. and Gabbouj, M. (2016) Real-Time Patient-Specific ECG Classification by 1-D Convolutional Neural Networks. IEEE Transactions on Biomedical Engineering, 63, 664-675. https://doi.org/10.1109/tbme.2015.2468589
[21]
Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M. and Inman, D.J. (2021) 1D Convolutional Neural Networks and Applications: A Survey. Mechanical Systems and Signal Processing, 151, Article 107398. https://doi.org/10.1016/j.ymssp.2020.107398
[22]
Liu, L. and Si, Y. (2022) 1D Convolutional Neural Networks for Chart Pattern Classification in Financial Time Series. The Journal of Supercomputing, 78, 14191-14214. https://doi.org/10.1007/s11227-022-04431-5
[23]
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention is All You Need.
[24]
Jia, M., Lai, J., Li, K., Chen, J., Huang, K., Ding, C., et al. (2024) Optimizing Prediction Accuracy for Early Recurrent Lumbar Disc Herniation with a Directional Mutation-Guided SVM Model. Computers in Biology and Medicine, 173, Article 108297. https://doi.org/10.1016/j.compbiomed.2024.108297
[25]
Prasad, B.V.V.S., Gupta, S., Borah, N., Dineshkumar, R., Lautre, H.K. and Mouleswararao, B. (2023) Predicting Diabetes with Multivariate Analysis an Innovative KNN-Based Classifier Approach. Preventive Medicine, 174, Article 107619. https://doi.org/10.1016/j.ypmed.2023.107619
[26]
Barboza, F. and Altman, E. (2024) Predicting Financial Distress in Latin American Companies: A Comparative Analysis of Logistic Regression and Random Forest Models. The North American Journal of Economics and Finance, 72, Article 102158. https://doi.org/10.1016/j.najef.2024.102158
[27]
Kim, J.H., Lee, D.H., Mendoza, J.A. and Lee, M. (2024) Applying Machine Learning Random Forest (RF) Method in Predicting the Cement Products with a Co-Processing of Input Materials: Optimizing the Hyperparameters. Environmental Research, 248, Article 118300. https://doi.org/10.1016/j.envres.2024.118300
[28]
EL Bilali, A., Taleb, A., Bahlaoui, M.A. and Brouziyne, Y. (2021) An Integrated Approach Based on Gaussian Noises-Based Data Augmentation Method and AdaBoost Model to Predict Faecal Coliforms in Rivers with Small Dataset. Journal of Hydrology, 599, Article 126510. https://doi.org/10.1016/j.jhydrol.2021.126510
[29]
Ma, X., Sha, J., Wang, D., Yu, Y., Yang, Q. and Niu, X. (2018) Study on a Prediction of P2P Network Loan Default Based on the Machine Learning LightGBM and XGboost Algorithms According to Different High Dimensional Data Cleaning. Electronic Commerce Research and Applications, 31, 24-39. https://doi.org/10.1016/j.elerap.2018.08.002