|
基于密度聚类的三支K-Means聚类算法
|
Abstract:
本文提出了一种基于密度聚类的三支K-Means算法。针对传统的K-Means算法在选取初始聚类中心时往往依赖于随机选择和无法处理不确定性数据对象的问题,本文采用基于密度聚类算法优化初始聚类中心的选择,并优化了截断距离的选取,最后使用三支决策的方法对聚类结果进行处理。实验结果表明,与传统的K-Means算法相比,改进的K-Means算法在聚类中表现出更高的聚类精度和稳定性。
This paper proposes a three-branch K-Means algorithm based on density clustering. In view of the problem that the traditional K-Means algorithm often relies on random selection and cannot handle uncertain data objects when selecting initial clustering centers, this paper uses a density-based clustering algorithm to optimize the selection of initial clustering centers, and optimizes the selection of truncation distance. Finally, a three-branch decision method is used to process the clustering results. The experimental results show that the improved K-Means algorithm exhibits higher clustering accuracy and stability in clustering compared to the traditional K-Means algorithm.
[1] | Xu, R. and WunschII, D. (2005) Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, 16, 645-678. https://doi.org/10.1109/tnn.2005.845141 |
[2] | Sinaga, K.P. and Yang, M. (2020) Unsupervised K-Means Clustering Algorithm. IEEE Access, 8, 80716-80727. https://doi.org/10.1109/access.2020.2988796 |
[3] | Zhao, W., Deng, C. and Ngo, C. (2018) K-Means: A Revisit. Neurocomputing, 291, 195-206. https://doi.org/10.1016/j.neucom.2018.02.072 |
[4] | Yoder, J. and Priebe, C.E. (2017) Semi-Supervised K-Means++. Journal of Statistical Computation and Simulation, 87, 2597-2608. https://doi.org/10.1080/00949655.2017.1327588 |
[5] | 孟子健, 马江洪. 一种可选初始聚类中心的改进K-Means算法[J]. 统计与决策, 2014(12): 12-14. |
[6] | 张亚迪, 孙悦, 刘锋, 等. 结合密度参数与中心替换的改进K-Means算法及新聚类有效性指标研究[J]. 计算机科学, 2022, 49(1): 121-132. |
[7] | Wang, P., Yang, X., Ding, W., Zhan, J. and Yao, Y. (2024) Three-Way Clustering: Foundations, Survey and Challenges. Applied Soft Computing, 151, Article 111131. https://doi.org/10.1016/j.asoc.2023.111131 |
[8] | 钱进, 郑明晨, 周川鹏, 等. 多粒度三支决策研究进展[J]. 数据采集与处理, 2024, 39(2): 361-375. |
[9] | 钱进, 汤大伟, 洪承鑫. 多粒度层次序贯三支决策模型研究[J]. 山东大学学报(理学版), 2022, 57(9): 33-45. |
[10] | 李志聪, 孙旭阳. 基于离群点检测和自适应参数的三支DBSCAN算法[J]. 计算机应用研究, 2024, 41(7): 1999-2004. |
[11] | 朱金, 徐天杰, 王平心. 基于蚁群算法的三支K-Means聚类算法[J]. 江苏科技大学学报(自然科学版), 2024, 38(3): 63-69. |
[12] | Rodriguez, A. and Laio, A. (2014) Clustering by Fast Search and Find of Density Peaks. Science, 344, 1492-1496. https://doi.org/10.1126/science.1242072 |
[13] | 王森, 刘琛, 邢帅杰. K-Means聚类算法研究综述[J]. 华东交通大学学报, 2022, 39(5): 119-126. |
[14] | Yao, Y. (2010) Three-Way Decisions with Probabilistic Rough Sets. Information Sciences, 180, 341-353. https://doi.org/10.1016/j.ins.2009.09.021 |
[15] | Yao, Y. (2011) The Superiority of Three-Way Decisions in Probabilistic Rough Set Models. Information Sciences, 181, 1080-1096. https://doi.org/10.1016/j.ins.2010.11.019 |
[16] | Yu, H., Chu, S. and Yang, D. (2012) Autonomous Knowledge-Oriented Clustering Using Decision-Theoretic Rough Set Theory. Fundamenta Informaticae, 115, 141-156. https://doi.org/10.3233/fi-2012-646 |
[17] | Penrose, D.M. and Glick, B.R. (2003) Methods for Isolating and Characterizing ACC Deaminase-Containing Plant Growth-Promoting Rhizobacteria. Physiologia Plantarum, 118, 10-15. https://doi.org/10.1034/j.1399-3054.2003.00086.x |
[18] | Chung, C.Y., Liu, C., Wang, K. and Zykaj, B.B. (2015) Institutional Monitoring: Evidence from the F-Score. Journal of Business Finance & Accounting, 42, 885-914. https://doi.org/10.1111/jbfa.12123 |
[19] | Detlefsen, M. and Arana, A. (2011) Purity of Methods. https://philpapers.org/rec/DETPOM |