全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于空间平移的K-Means初始簇心选取
K-Means Initial Cluster Center Selection Based on Spatial Translation

DOI: 10.12677/aam.2024.139418, PP. 4381-4390

Keywords: K-Means,初始聚类中心,密度,空间平移
K-Means
, Initial Cluster Center, Density, Spatial Translation

Full-Text   Cite this paper   Add to My Lib

Abstract:

K-means聚类算法因其算法简单、计算效率高,在机器学习、数据挖掘等多个领域得到了广泛应用。然而,传统K-means算法在初始簇心的选取上存在随机性,这可能导致聚类结果的不稳定性。为了解决这一问题,本研究提出了一种基于空间平移的初始簇心选取算法。该算法首先将包含所有样本集的最小空间通过单位空间以一定步长遍历,在单位空间内统计样本点的密度,以此降低计算量。通过逐一选出密度最高的个点作为初始簇心,从而提高了K-means算法的聚类性能。在UCI的12种数据集上进行的实验表明,与传统的K-means、K-means++等算法相比,改进的算法在迭代次数上有所降低,聚类准确率得到了显著提高。
K-means clustering algorithm is an important content in the field of machine learning and is widely used because of its simplicity and efficiency. In order to solve the problem that the initial cluster center selection of traditional K-means algorithm is random, an initial cluster center selection algorithm based on space segmentation is proposed. The minimum space containing all sample sets is divided to calculate the density, and the initial cluster centers with the highest density are selected one by one. The selected cluster centers are replaced by random initial cluster centers for K-means clustering. Twelve datasets were tested separately at UCI. The experimental results show that compared with traditional K-means, K-means++ and other algorithms, the improved algorithm has lower iteration times and higher clustering accuracy.

References

[1]  周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[2]  Han, J., Kamber, M. and Pei, J. (2001) Data Mining Concepts and Techniques Orlando. Morgan Kaufmann Publishers, San Francisco.
[3]  Zhang, W., Zhang, X., Zhao, J., Qiang, Y., Tian, Q. and Tang, X. (2017) A Segmentation Method for Lung Nodule Image Sequences Based on Superpixels and Density-Based Spatial Clustering of Applications with Noise. PLOS ONE, 12, e0184290.
https://doi.org/10.1371/journal.pone.0184290
[4]  Shahbazitabar, M. and Abdi, H. (2018) A Novel Priority-Based Stochastic Unit Commitment Considering Renewable Energy Sources and Parking Lot Cooperation. Energy, 161, 308-324.
https://doi.org/10.1016/j.energy.2018.07.025
[5]  Andrienko, G., Andrienko, N., Fuchs, G. and Garcia, J.M.C. (2018) Clustering Trajectories by Relevant Parts for Air Traffic Analysis. IEEE Transactions on Visualization and Computer Graphics, 24, 34-44.
https://doi.org/10.1109/tvcg.2017.2744322
[6]  Caruso, G., Gattone, S.A., Fortuna, F. and Di Battista, T. (2021) Cluster Analysis for Mixed Data: An Application to Credit Risk Evaluation. Socio-Economic Planning Sciences, 73, Article ID: 100850.
https://doi.org/10.1016/j.seps.2020.100850
[7]  Ji, X., Vedaldi, A. and Henriques, J. (2019) Invariant Information Clustering for Unsupervised Image Classification and Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 9864-9873.
https://doi.org/10.1109/iccv.2019.00996
[8]  MacQueen, J. (1967) Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, 281-297.
[9]  Ramze Rezaee, M., Lelieveldt, B.P.F. and Reiber, J.H.C. (1998) A New Cluster Validity Index for the Fuzzy C-Mean. Pattern Recognition Letters, 19, 237-246.
https://doi.org/10.1016/s0167-8655(97)00168-2
[10]  成卫青, 卢艳红. 一种基于最大最小距离和SSE的自适应聚类算法[J]. 南京邮电大学学报: 自然科学版, 2015, 35(2): 102-107.
[11]  王建仁, 马鑫, 段刚龙. 改进的K-means聚类k值选择算法[J]. 计算机工程与应用, 2019, 55(8): 27-33.
[12]  Sinaga, K.P. and Yang, M. (2020) Unsupervised K-Means Clustering Algorithm. IEEE Access, 8, 80716-80727.
https://doi.org/10.1109/access.2020.2988796
[13]  何选森, 何帆, 徐丽, 等. K-Means算法最优聚类数量的确定[J]. 电子科技大学学报, 2022, 51(6): 904-912.
[14]  唐东凯, 王红梅, 胡明, 等. 优化初始聚类中心的改进K-means算法[J]. 小型微型计算机系统, 2018, 39(8): 1819-1823.
[15]  Breunig, M.M., Kriegel, H., Ng, R.T. and Sander, J. (2000) LOF: Identifying Density-Based Local Outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, 15-18 May 2000, 93-104.
https://doi.org/10.1145/342009.335388
[16]  朱利, 邱媛媛, 于帅, 等. 一种基于快速k-近邻的最小生成树离群检测方法[J]. 计算机学报, 2017, 40(12): 2856-2870.
[17]  刘凤, 戴家佳, 胡阳. 基于局部密度离群点检测K-means聚类算法[J]. 重庆工商大学学报(自然科学版), 2021, 38(4): 30-35.
[18]  张玉芳, 毛嘉莉, 熊忠阳. 一种改进的K-means聚类算法[J]. 计算机应用, 2003, 23(8): 31-33.
[19]  袁方, 孟增辉, 于戈. 对K-means聚类算法的改进[J].计算机工程与应用, 2004, 40(36): 177-178.
[20]  赖玉霞, 刘建平. K-means算法的初始聚类中心的优化[J]. 计算机工程与应用, 2008, 44(10): 147-149.
[21]  汪中, 刘贵全, 陈恩红. 一种优化初始中心点的K-means算法[J]. 模式识别与人工智能, 2009(2): 299-304.
[22]  陈光平, 王文鹏, 黄俊. 一种改进初始聚类中心选择的K-means聚类算法[J].小型微型计算机系统, 2012, 33(6): 1320-1323.
[23]  谢娟英, 王艳娥. 最小方差优化初始聚类中心的K-means算法[J]. 计算机工程, 2014, 40(8): 205-211.
[24]  郭永坤, 章新友, 刘莉萍, 等. 优化初始聚类中心的K-means聚类算法[J]. 计算机工程与应用, 2020, 56(15): 172-178.
[25]  韩凌波, 王强, 蒋正锋, 等. 一种改进的K-means初始聚类中心选取算法[J]. 计算机工程与应用, 2010, 46(17): 150-152.
[26]  Steinley, D. (2006) K‐Means Clustering: A Half‐Century Synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1-34.
https://doi.org/10.1348/000711005x48266
[27]  杨俊闯, 赵超. K-Means聚类算法研究综述[J]. 计算机工程与应用, 2019, 55(23): 7-14.
[28]  王森, 刘琛, 邢帅杰. K-means聚类算法研究综述[J]. 华东交通大学学报, 2022, 39(5): 119-126.
[29]  Rousseeuw, P.J. (1987) Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Journal of Computational and Applied Mathematics, 20, 53-65.
https://doi.org/10.1016/0377-0427(87)90125-7
[30]  薛印玺, 许鸿文, 李羚. 基于样本密度的全局优化K均值聚类算法[J]. 计算机工程与应用, 2018, 54(14): 143-147.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133