全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

面向不平衡甲状腺眼病数据集的分类算法对比研究及应用
Comparative Study and Application of Classification Algorithms for Unbalanced Thyroid Eye Disease Datasets

DOI: 10.12677/SEA.2023.123049, PP. 495-505

Keywords: 不平衡数据,机器学习,甲状腺相关眼病,分类
Unbalanced Data
, Machine Learning, Thyroid-Associated Ophthalmopathy, Classification

Full-Text   Cite this paper   Add to My Lib

Abstract:

对不同数据进行分类是机器学习的研究热点,然而在各大领域,数据不平衡现象是普遍存在的。现有的许多机器学习算法虽然取得了良好的效果,但他们都是在默认数据集分布均衡的前提下进行的,并且认为不同类别的误分代价一致,这导致它们在不平衡数据集上表现很差。本文针对用于甲状腺眼病诊断的数据集出现的正负样本不平衡现象,选择了WCE loss,LDAM-loss,Focal-loss,Minimax四种面向不平衡数据的优化方法进行了对比实验。实验结果表明,用不平衡优化方法训练的分类模型相对于原始模型具有更好的分类性能。实验还发现随正负样本比例的不同,各方法对结果的提升存在一定差异,在重度不平衡条件下,LDAM loss和Minimax表现出更好的鲁棒性,尤其是Minimax方法,它对于少数类的分类性能更好。总结而言,本论文所展示的对比实验能在不平衡甲状腺眼病诊断数据的条件下,对分类算法的选取提供指导。
Data classification is a prominent area of machine learning, but data imbalance is a common issue across major fields. Although many machine learning algorithms have produced favorable outcomes, they rely on the assumption that the default dataset is uniformly distributed and the cost of false separation for different categories is consistent. Consequently, they exhibit poor performance on unbalanced datasets. In this study, four optimization methods were selected to address the issue of unbalanced data in the diagnosis of thyroid eye disease. These methods, including WCE loss, LDAM-loss, Focal-loss, and Minimax, were used to compare the positive and negative sample imbalance in the dataset. The experimental results demonstrate that the unbalanced optimization method produced a better classification performance than the original model. Furthermore, the experiments revealed that the improvement of results varied with the proportion of positive and negative samples, with LDAM loss and Minimax exhibiting better robustness under severe imbalance conditions. The Minimax method, in particular, demonstrated superior classification performance for minority classes. In conclusion, the comparative experiment presented in this study can offer valuable insights for the selection of classification algorithms under the condition of unbalanced thyroid eye disease diagnostic data.

References

[1]  李艳霞, 柴毅, 胡友强, 等. 不平衡数据分类方法综述[J]. 控制与决策, 2019, 34(4): 673-688.
[2]  赵楠, 张小芳, 张利军. 不平衡数据分类研究综述[J].计算机科学, 2018, 45(z1): 22-27, 57.
[3]  Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357.
https://doi.org/10.1613/jair.953
[4]  Lin, T.Y., Goyal, P., Girshick, R., He, K.M. and Dollár, P. (2017) Focal Loss for Dense Object Detection. 2017 IEEE International Conference on Computer Vision, Venice, 22-29 October 2017, 2999-3007.
https://doi.org/10.1109/ICCV.2017.324
[5]  Freund, Y. and Schapire, R.E. (1997) A Desicion-Theoretic Generalization of on-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55, 119-139.
https://doi.org/10.1006/jcss.1997.1504
[6]  Ho, T.K. (1995) Random Decision Forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, 14-16 August 1995, 278-282.
[7]  陈欢欢, 杨涛. 甲状腺相关眼病发病机制研究进展[J]. 中国实用内科杂志, 2015, 35(7): 561-565.
[8]  杨科, 丰玲玲, 黄海新. 眼眶CT定量分析在甲状腺相关眼病诊治中的应用效果[J]. 影像研究与医学应用, 2022, 6(1): 58-60.
[9]  丛志洋, 鲁婷, 范璟源, 宋雪霏, 王慧, 周雷. 基于眼球特征提取的甲状腺眼病预测方法研究[J]. 软件工程与应用, 2022, 11(6): 1288-1296.
[10]  Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning Representations by Back Propagating Errors. Nature, 323, 533-536.
https://doi.org/10.1038/323533a0
[11]  Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2014) Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 1929-1958.
[12]  Cao, K., Wei, C., Gaidon, A., Arechiga, N. and Ma, T.Y. (2019) Learning Imbalanced Datasets with Label-Distribution- Aware Margin Loss. 33rd Conference on Neural Information Processing Systems, Vancouver, 8-14 December 2019, 1567-1578.
[13]  Xu, Q. and Xuan, X. (2018) Nonlinear Regression without i.i.d. Assumption. Probability, Uncertainty and Quantitative Risk, 4, Article No. 8.
https://doi.org/10.1186/s41546-019-0042-6
[14]  Loshchilov, I. and Hutter, F. (2016) SGDR: Stochastic Gradient Descent with Warm Restarts.
https://arxiv.org/abs/1608.03983

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133