|
基于混淆矩阵的机器学习分类评价指标研究及Python实践
|
Abstract:
基于混淆矩阵的机器学习分类指标体系是在衡量各个分类器的分类效果中最常用的,本文对这些指标的计算原理进行了全面的整理,其中,对二分类的G-mean值和Matthews相关系数做了三分类及更多分类上的推广定义,并利用UCI的机器学习数据的Vehicle Silhouettes (汽车轮廓)数据集进行了基于Sklearn的相应python实验,给出了相应python代码以及运行结果等。特别地,对于推广到多分类的两个指标定义了相应的python函数,并进行了相应实验和验证。本文为最常用的机器学习分类评价指标的选取提供理论和python实践及相应代码的参考,为广大学者选用指标提供了依据。
The machine learning classification performance measure system based on confusion matrix is the most commonly used in measuring the classification effect of each classifier, and the calculation principle of these performance measures are comprehensively listed in this paper. Among them, the G-mean value and Matthews correlation coefficient of two classifications are generalized and de-fined in three or more classification problems. Moreover, using the Vehicle Silhouettes dataset in UCI, the corresponding python experiments are implemented in the bases of Sklearn, and corresponding python codes and running results are given. In particular, the corresponding python functions are defined for the two generalized measures, and corresponding experiments and validations are also implemented. This paper provides a theoretical basis and python practice and corresponding codes for the selection of the most commonly used machine learning classification performance measures for the majority of scholars to select appropriate performance measures.
[1] | Yang, M., Yuan, Y. and Liu, G. (2022) SDUNet: Road Extraction via Spatial Enhanced and Densely Connected UNet. Pattern Recognition, 126, Article ID: 108549. https://doi.org/10.1016/j.patcog.2022.108549 |
[2] | Wang, X., Yu, X., Guo, L., Liu, F. and Xu, L. (2020) Student Performance Prediction with Short-Term Sequential Campus Behaviors. In-formation, 11, 201. https://doi.org/10.3390/info11040201 |
[3] | Wisanwanichthan, T. and Thammawichai, M. (2021) A Double-Layered Hybrid Approach for Network Intrusion Detection System Using Combined Naive Bayes and SVM. IEEE Access, 9, 138432-138450.
https://doi.org/10.1109/ACCESS.2021.3118573 |
[4] | Géron, A. (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. 2nd Edition, O’Reilly Media, Incorporation, Sebastopol. https://doi.org/10.1016/j.landusepol.2022.106282 |
[5] | Janusa, J. and Ertun?, E. (2022) Towards a Full Automation of Land Consolidation Projects: Fast land Partitioning Algorithm Using the Land Value Map. Land Use Policy, 120, Ar-ticle ID: 106282. |
[6] | Beeche, C., Singh, J.P, Leader, J.K., et al. (2022) Super U-Net: A Modularized Generalizable Architecture. Pattern Recognition, 128, Article ID: 108669. https://doi.org/10.1016/j.patcog.2022.108669 |
[7] | 张开放, 苏华友, 窦勇. 一种基于混淆矩阵的多分类任务准确率评估新方法[J]. 计算机工程与科学, 2021, 43(11): 1910-1919. |
[8] | 于营, 杨婷婷, 杨博雄. 混淆矩阵分类性能评价及Python实现[J]. 现代计算机, 2021(20): 70-73+79. |
[9] | 郭华平, 董亚东, 邬长安, 等. 面向类不平衡的逻辑回归方法[J]. 模式识别与人工智能, 2015, 28(8): 686-693. |
[10] | 刘洋. 基于filter-wrapper mRMR改进的K阶依赖贝叶斯网络分类模型[D]: [硕士学位论文]. 长春: 吉林大学, 2018. |
[11] | Alizadeh, S.H., Hediehloo, A. and Harzevili, N.S. (2021) Multi Independent Latent Com-ponent Extension of Naive Bayes Classifier. Knowledge-Based Systems, 213, Article ID: 106646. https://doi.org/10.1016/j.knosys.2020.106646 |
[12] | 王彦光, 朱鸿斌, 徐维超. ROC曲线及其分析方法综述[J]. 广东工业大学学报, 2021, 38(1): 46-53. |
[13] | 杨杏丽. 分类学习算法的性能度量指标综述[J]. 计算机科学, 2021, 48(8): 209-219. |