OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Computer Science and Application 2025

基于离群点检测的优化初始中心的三支K-Means算法
Three-Branch K-Means Algorithm with Optimized Initial Center Based on Outlier Detection

DOI: 10.12677/csa.2025.152039, PP. 118-131

樊有明, 李志聪

Keywords: K-Means算法，三支聚类，LOF离群点检测算法，聚类中心
K-Means Algorithm, Three-Branch Clustering, LOF Outlier Detection Algorithm, Cluster Center

Full-Text Cite this paper Add to My Lib

Abstract:

针对传统的k-means算法的聚类数目k无法确定、初始聚类中心随机给定、容易受到离群点影响等问题，该算法使用LOF (Local Outlier Factor)离群点检测算法计算数据集中每个数据对象的离群因子，并去除离群因子大于指定阈值的数据对象，使用手肘法来确定符合数据集的最佳k值，根据最大密度和最大距离的思想结合每个点的离群因子来选取初始聚类中心并进行后续聚类中心的迭代，聚类完成后结合三支决策的思想对聚类结果的每个簇内的数据对象进行进一步优化。实验结果表明ODT-kmeans算法能合理选取k值、减少离群点的影响并且可以消除随机选择初始聚类中心的问题，提高了k-means聚类算法的准确率。
In view of the problems of the traditional k-means algorithm, such as the number of clusters k cannot be determined, the initial cluster center is randomly given, and it is easily affected by outliers, this algorithm uses the LOF (Local Outlier Factor) outlier detection algorithm to calculate the outlier factor of each data object in the data set and remove the data objects whose outlier factor is greater than the specified threshold. The elbow method is used to determine the best k value that meets the data set. The initial cluster center is selected based on the idea of maximum density and maximum distance combined with the outlier factor of each point and the subsequent cluster center iterations are performed. After clustering is completed, the idea of three-way decision is combined to further optimize the data objects in each cluster of the clustering results. Experimental results show that the ODT-kmeans algorithm can reasonably select the k value, reduce the influence of outliers, and eliminate the problem of randomly selecting the initial cluster center, thereby improving the accuracy of the k-means clustering algorithm.

References

[1]	Brown, D., Japa, A. and Shi, Y. (2019). A Fast Density-Grid Based Clustering Method. 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, 7-9 January 2019, 48-54. https://doi.org/10.1109/ccwc.2019.8666548
[2]	Xu, C., Lin, R., Cai, J. and Wang, S. (2022) Deep Image Clustering by Fusing Contrastive Learning and Neighbor Relation Mining. Knowledge-Based Systems, 238, Article 107967. https://doi.org/10.1016/j.knosys.2021.107967
[3]	岳珊, 雍巧玲. 基于确定初始簇心的优化K-Means算法[J]. 数字技术与应用, 2023, 41(11): 140-142.
[4]	Hu, H., Liu, J., Zhang, X. and Fang, M. (2023) An Effective and Adaptable K-Means Algorithm for Big Data Cluster Analysis. Pattern Recognition, 139, Article 109404. https://doi.org/10.1016/j.patcog.2023.109404
[5]	Shrifan, N.H.M.M., Akbar, M.F. and Isa, N.A.M. (2022) An Adaptive Outlier Removal Aided K-Means Clustering Algorithm. Journal of King Saud University—Computer and Information Sciences, 34, 6365-6376. https://doi.org/10.1016/j.jksuci.2021.07.003
[6]	孙林, 刘梦含, 徐久成. 基于优化初始聚类中心和轮廓系数的K-Means聚类算法[J]. 模糊系统与数学, 2022, 36(1): 47-65.
[7]	郭文娟. 基于优化初始聚类中心的K-Means聚类算法[J]. 科技风, 2022(4): 63-65.
[8]	张亚迪, 孙悦, 刘锋, 等. 结合密度参数与中心替换的改进K-Means算法及新聚类有效性指标研究[J]. 计算机科学, 2022, 49(1): 121-132.
[9]	刘美玲, 黄名选, 汤卫东. 基于离散量优化初始聚类中心的K-Means算法[J]. 计算机工程与科学, 2017, 39(6): 1164-1170.
[10]	唐东凯, 王红梅, 胡明, 等. 优化初始聚类中心的改进K-Means算法[J]. 小型微型计算机系统, 2018, 39(8): 1819-1823.
[11]	廖纪勇, 吴晟, 刘爱莲. 基于相异性度量选取初始聚类中心改进的K-Means聚类算法[J]. 控制与决策, 2021, 36(12): 3083-3090.
[12]	Hodge, V. and Austin, J. (2004) A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22, 85-126. https://doi.org/10.1023/b:aire.0000045502.10941.a9
[13]	Du, X., Yu, J., Ye, L., et al. (2020) Outlier Detection Algorithm Based on Graph Random Walk. Journal of Computer Applications, 40, Article 1322.
[14]	Breunig, M.M., Kriegel, H., Ng, R.T. and Sander, J. (2000) LOF: Identifying Density-Based Local Outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, 15-18 May 2000, 93-104. https://doi.org/10.1145/342009.335388
[15]	Yao, Y. (2007) Decision-Theoretic Rough Set Models. In: Lecture Notes in Computer Science, Springer, 1-12. https://doi.org/10.1007/978-3-540-72458-2_1
[16]	Yao, Y. (2009) Three-Way Decision: An Interpretation of Rules in Rough Set Theory. In: Lecture Notes in Computer Science, Springer, 642-649. https://doi.org/10.1007/978-3-642-02962-2_81
[17]	Yao, Y. (2012) An Outline of a Theory of Three-Way Decisions. In: Lecture Notes in Computer Science, Springer, 1-17. https://doi.org/10.1007/978-3-642-32115-3_1
[18]	Yu, H., Chu, S. and Yang, D. (2012) Autonomous Knowledge-Oriented Clustering Using Decision-Theoretic Rough Set Theory. Fundamenta Informaticae, 115, 141-156. https://doi.org/10.3233/fi-2012-646
[19]	唐欣. 三支K-means聚类算法及其应用研究[D]: [硕士学位论文]. 银川: 北方民族大学, 2023.
[20]	李浩溥, 李志聪. 结合孤立森林和鲸鱼优化算法的三支K-Means [J]. 长江信息通信, 2023, 36(2): 48-50.
[21]	Wang, P., Yang, X., Ding, W., Zhan, J. and Yao, Y. (2024) Three-Way Clustering: Foundations, Survey and Challenges. Applied Soft Computing, 151, Article 111131. https://doi.org/10.1016/j.asoc.2023.111131
[22]	Chen, W., Song, Y., Bai, H., Lin, C. and Chang, E.Y. (2011) Parallel Spectral Clustering in Distributed Systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 568-586. https://doi.org/10.1109/tpami.2010.88
[23]	Davies, D.L. and Bouldin, D.W. (1979) A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 224-227. https://doi.org/10.1109/tpami.1979.4766909
[24]	Wang, Y. and Chen, L. (2016) K-MEAP: Multiple Exemplars Affinity Propagation with Specified K Clusters. IEEE Transactions on Neural Networks and Learning Systems, 27, 2670-2682. https://doi.org/10.1109/tnnls.2015.2495268
[25]	蔺艳艳, 陆介平, 王郁鑫, 等. 改进的K-Means算法在三支决策中的应用研究[J]. 计算机与数字工程, 2020, 48(6): 1294-1299+1353.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

基于离群点检测的优化初始中心的三支K-Means算法Three-Branch K-Means Algorithm with Optimized Initial Center Based on Outlier Detection

基于离群点检测的优化初始中心的三支K-Means算法
Three-Branch K-Means Algorithm with Optimized Initial Center Based on Outlier Detection