|
基于高斯加权局部异常因子过滤的成本敏感信用评分模型研究
|
Abstract:
信用评分作为金融风险管理和决策制定的核心环节,对于金融机构的稳健运营和市场竞争力至关重要。然而,信贷数据中普遍存在的类别不平衡现象,即违约样本数量远小于非违约样本数量,给信用评分模型的构建带来了挑战,容易导致模型偏向多数类而忽略少数类,从而降低模型的预测准确性和泛化能力。为解决这一问题,提出的GLOF-BFL-LightGBM模型采用了一种分阶段的优化策略。首先,考虑到异常样本的存在会进一步加剧类别不平衡的影响,并降低模型的鲁棒性,本研究引入高斯加权局部异常因子(GLOF)技术,识别并剔除数据中的潜在异常样本,以净化数据集并提高模型的稳定性。其次,为了提升模型对少数类的识别能力,采用Focal Loss损失函数来降低多数类样本对模型训练的影响,并利用贝叶斯优化技术自动搜索Focal Loss损失函数的最优参数,以获得最佳的类别不平衡学习效果。为验证模型的有效性,本文在UCI数据库的四个信贷数据集上进行了实验,并将GLOF-BFL-LightGBM模型与多种基线模型(包括传统的分类方法和常规的集成学习模型)进行了比较。实验结果表明,GLOF-BFL-LightGBM模型在AUC、KS值等关键指标上均优于对比模型,有效提升了信用评分的准确性和模型的泛化能力,为个人信用风险评估提供了一种可靠的工具。
Credit scoring, as a core aspect of financial risk management and decision making, is crucial to the sound operation and market competitiveness of financial institutions. However, the prevalent category imbalance in credit data, in which the number of default samples is much smaller than the number of non-default samples, poses a challenge to the construction of credit scoring models, which can easily lead to a model biased toward the majority class and ignoring the minority class, thus reducing the predictive accuracy and generalization ability of the model. To address this problem, the proposed GLOF-BFL-LightGBM model adopts a staged optimization strategy. First, considering that the presence of anomalous samples can further exacerbate the effect of category imbalance and reduce the robustness of the model, this study introduces the Gaussian-weighted local anomaly factor (GLOF) technique to identify and remove potentially anomalous samples from the data in order to purify the dataset and improve the stability of the model. Secondly, in order to improve the model’s ability to identify the minority class, the Focal Loss loss function is used to reduce the impact of the majority class samples on the model training, and Bayesian optimization technique is used to automatically search for the optimal parameters of the Focal Loss loss function in order to obtain the best class imbalance learning effect. To verify the effectiveness of the model, experiments are conducted on four credit datasets of the UCI database, and the GLOF-BFL-LightGBM model is compared with a variety of baseline models (including traditional classification methods and conventional integrated learning models). The experimental results show that the GLOF-BFL-LightGBM model outperforms the comparison models in key metrics such as AUC and KS values, effectively improves the accuracy of credit
[1] | Zedda, S. (2024) Credit Scoring: Does XGboost Outperform Logistic Regression? A Test on Italian SMEs. Research in International Business and Finance, 70, Article ID: 102397. https://doi.org/10.1016/j.ribaf.2024.102397 |
[2] | Gao, Y., Xiao, H., Zhan, C., Liang, L., Cai, W. and Hu, X. (2023) CATE: Contrastive Augmentation and Tree-Enhanced Embedding for Credit Scoring. Information Sciences, 651, Article ID: 119447. https://doi.org/10.1016/j.ins.2023.119447 |
[3] | Durand, D. (1941) Risk Elements in Consumer Installment Financing. National Bureau of Economic Research, 189-201. |
[4] | Orgler, Y.E. (1970) A Credit Scoring Model for Commercial Loans. Journal of Money, Credit and Banking, 2, 435-445. https://doi.org/10.2307/1991095 |
[5] | Zhang, R. and Qiu, Z. (2020) Optimizing Hyper-Parameters of Neural Networks with Swarm Intelligence: A Novel Framework for Credit Scoring. PLOS ONE, 15, e0234254. https://doi.org/10.1371/journal.pone.0234254 |
[6] | Zhou, M. (2022) Credit Risk Assessment Modeling Method Based on Fuzzy Integral and SVM. Mobile Information Systems, 2022, Article ID: 3950210. https://doi.org/10.1155/2022/3950210 |
[7] | 王重仁, 韩冬梅. 基于超参数优化和集成学习的互联网信贷个人信用评估[J]. 统计与决策, 2019, 35(1): 87-91. |
[8] | Liu, W., Fan, H. and Xia, M. (2023) Tree-Based Heterogeneous Cascade Ensemble Model for Credit Scoring. International Journal of Forecasting, 39, 1593-1614. https://doi.org/10.1016/j.ijforecast.2022.07.007 |
[9] | 康海燕, 胡成倩. 基于特征提取和集成学习的个人信用评分方法[J]. 计算机仿真, 2024, 41(1): 311-320. |
[10] | Khalili, N. and Rastegar, M.A. (2023) Optimal Cost-Sensitive Credit Scoring Using a New Hybrid Performance Metric. Expert Systems with Applications, 213, Article ID: 119232. https://doi.org/10.1016/j.eswa.2022.119232 |
[11] | Wu, Y., Huang, W., Tian, Y., Zhu, Q. and Yu, L. (2022) An Uncertainty-Oriented Cost-Sensitive Credit Scoring Framework with Multi-Objective Feature Selection. Electronic Commerce Research and Applications, 53, Article ID: 101155. https://doi.org/10.1016/j.elerap.2022.101155 |
[12] | Bahnsen, A.C., Aouada, D. and Ottersten, B. (2014) Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring. 2014 13th International Conference on Machine Learning and Applications, Detroit, 3-6 December 2014, 263-269. https://doi.org/10.1109/icmla.2014.48 |
[13] | Shen, F., Zhao, X., Kou, G. and Alsaadi, F.E. (2021) A New Deep Learning Ensemble Credit Risk Evaluation Model with an Improved Synthetic Minority Oversampling Technique. Applied Soft Computing, 98, Article ID: 106852. https://doi.org/10.1016/j.asoc.2020.106852 |
[14] | 邵良杉, 周玉. 一种改进过采样算法在类别不平衡信用评分中的应用[J]. 计算机应用研究, 2019, 36(6): 1683-1687. |
[15] | 陈启伟, 王伟, 马迪, 等. 基于Ext-GBDT集成的类别不平衡信用评分模型[J]. 计算机应用研究, 2018, 35(2): 421-427. |