全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于逻辑回归的不平衡数据算法适用性研究
Research on the Applicability of Unbalanced Data Algorithm Based on Logistic Regression

DOI: 10.12677/CSA.2020.1011216, PP. 2049-2057

Keywords: 逻辑回归,随机欠采样法,BSL过采样法,ADASYN过采样法
Logistic Regression
, Random Over-Sampling, Border Line-Smote Method, ADASYN Method

Full-Text   Cite this paper   Add to My Lib

Abstract:

逻辑回归模型容易受到不平衡数据的影响,本文主要探究了随机欠采样法、Border Line-Smote (BLS)过采样法、自适应综合过采样法(Synthetic Minority Oversampling Technique)等三种不平衡数据算法对逻辑回归模型的适用情况。利用逻辑回归模型分别对三种方法平衡之后的数据,处理之后发现BLS过采样法得出的各项指标最优,ADASYN过采样法得出的各项指标最差,最终得出BLS过采样法更适用于逻辑回归模型的不平衡数据集的处理。
The logistic regression model is susceptible to the impact of unbalanced data. This paper mainly explores the applicability of three kinds of unbalanced data algorithms, including stochastic under-sampling, Border Line-Smote oversampling (BLS) method, and Synthetic Minority Over-sampling Technique, to the logistic regression model. By using logistic regression model to process the balanced data of the three methods, it was found that the indicators obtained by BLS over-sampling method were the best and the indicators obtained by ADASYN over-sampling method were the worst. Finally, it was concluded that BLS oversampling method was more suitable for the processing of unbalanced data sets of logistic regression model.

References

[1]  徐丽丽, 闫德勤, 高晴. 基于聚类欠采样的极端学习机[J]. 微型机与应用, 2015(17): 81-84.
[2]  Paolo, S. (2010) A Multi-Objective Optimization Approach for Class Imbalance Learning. Computers in Biology and Medicine, 40, 509-518.
https://doi.org/10.1016/j.compbiomed.2010.03.005
[3]  王和勇, 樊泓坤, 姚正安, 李成安. 不平衡数据集的分类方法研究[J]. 计算机应用研究, 2008(5): 1301-1303+1308.
[4]  顾东晓, 李培培, 杨雪洁. 网络在线预约挂号系统用户的爽约行为研究[J]. 情报科学, 2017, 35(5): 99-106.
[5]  Han, H., et al. (2005) Border-line-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. Application Research of Computers, 56, 66-68.
[6]  Vasu, M. and Ravi, V. (2011) A Hybrid Under-Sampling Approach for Mining Unbalanced Datasets: Ap-plications to Banking and Insurance. International Journal of Data Mining, Modelling and Management, 3, 75-105.
https://doi.org/10.1504/IJDMMM.2011.038812
[7]  Li, H. and Sun, J. (2012) Forecasting Business Failure: The Use of Nearest-Neighbour Support Vectors and Correcting Imbalanced Samples Evidence from the Chinese Hotel Indus-try. Tourism Management, 33, 622-634.
https://doi.org/10.1016/j.tourman.2011.07.004
[8]  Sundarkumar, G.G. and Ravi, V. (2015) A Novel Hybrid Un-dersampling Method for Mining Unbalanced Datasets in Banking and Insurance. Engineering Applications of Artificial Intelligence, 37, 368-377.
https://doi.org/10.1016/j.engappai.2014.09.019
[9]  B?aszczyński, J. and Stefanowski, J. (2015) Neighbourhood Sampling in Bagging for Imbalanced Data. Neurocomputing, 150, 529-542.
https://doi.org/10.1016/j.neucom.2014.07.064
[10]  Bi, J.J. and Zhang, C.S. (2018) An Empirical Comparison on State-of-the-Art Multi-Class Imbalance Learning Algorithms and a New Diversified Ensemble Learning Scheme. Knowledge-Based Systems, 158, 81-93.
[11]  Namvar, A., Siami, M., Rabhi, F. and Naderpour, M. (2018) Credit Risk Prediction in an Imbalanced Social Lending Environment. International Journal of Computational Intelligence Systems, 11, 925-935.
https://doi.org/10.2991/ijcis.11.1.70
[12]  高阳, 刘其成, 牟春晓. 基于蚁群聚类的不平衡数据过采样方法[J/OL]. 烟台大学学报(自然科学与工程版), 1-8 [2020-11-19].
[13]  蒋华, 江日辰, 王鑫, 王慧娇. ADASYN和SMOTE相结合的不平衡数据分类算法[J]. 计算机仿真, 2020, 37(3): 254-258+420.
[14]  Guo, H.X., Li, Y.J., Shang, J., Gu, M.Y., Huang, Y.Y. and Gong, B. (2016) Learning from Class-Imbalanced Data: Review of Methods and Applications. Expert Systems with Applications, 73, 220-239.
https://doi.org/10.1016/j.eswa.2016.12.035
[15]  宋捷. 不平衡数据处理方法综述[J]. 统计与决策, 2014(3): 100-102.
[16]  He, H. and Garcia, E.A. (2009) Learning from Imbalanced Data. IEEE Transactions on Knowledge & Data Engineering, 21, 63-84.
https://doi.org/10.1109/TKDE.2008.239
[17]  刘金平, 周嘉铭, 贺俊宾, 唐朝晖, 徐鹏飞, 张国勇. 面向不均衡数据的融合谱聚类的自适应过采样法[J/OL]. 智能系统学报, 1-8. http://kns.cnki.net/kcms/detail/23.1538.TP.20200827.1317.008.html, 2020-10-30.
[18]  Berkson, J. (2012) Application of the Logistic Function to Bio-Assay. Journal of the American Statistical Association, 39, 357-365.
https://doi.org/10.2307/2280041

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133