全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于均值漂移的三支聚类算法
Three-Way Clustering Algorithm Based on Mean Shift

DOI: 10.12677/PM.2023.1312353, PP. 3401-3411

Keywords: 三支聚类,均值漂移,偏移向量
Three-Way Clustering
, Mean Drift, Offset Vector

Full-Text   Cite this paper   Add to My Lib

Abstract:

本文结合基于均值漂移的聚类算法与三支决策理论,首先利用核函数对中心点至样本点的向量进行加权求和,定义了偏移向量,据此不断移动中心点的位置,使样本中心点在密度梯度方向移动至密度最大的区域。然后根据样本点对类簇的访问频率将数据分为非噪声点和噪声点数据,对非噪声点数据采取传统的二支聚类得到核心域,对噪声点数据采取三支聚类,通过比较样本点对不同类簇的访问频率将样本点划分到相应类簇的边界域。将聚类结果用核心域和边界域表示。通过UCI数据集上的实验结果,验证了本文提出的算法相对于传统聚类可以提高聚类准确度、聚类结构的类内紧密度和类间分离度。
By combining the clustering algorithm based on mean shift with the theory of three-way decision theory, this paper defines the mean shift vector according to the vector from the center point to the sample points, so that the center point of the samples is moved in the direction of the density gradient to the region of the highest density. According to the access frequency of the sample points to the class clusters the data are divided into non-noise point and noise point data, the traditional two-way clustering is taken to obtain the core domain for the non-noise point data, and the three-way clustering is taken for the noise point data, and the sample points are divided into the boundary domains of the corresponding class clusters by comparing the access frequency of the sample points to the different class clusters. The clustering results were expressed in terms of core and boundary domains. The experimental results on the UCI dataset verify the advantages of the proposed algorithm over traditional clustering algorithms, which can improve the clustering accuracy, the intra-class closeness of the clustering structure and the inter-class separation.

References

[1]  Dong, W., Yusuke, N., Motohisa, S. and Aketagawa, M. (2021) Cluster Analysis Based Fringe-Activity Range Detector. Optics Communications, 483, Article ID: 126626.
https://doi.org/10.1016/j.optcom.2020.126626
[2]  Rezaul, M.K., Oya, B., Achille, Z., et al. (2020) Deep Learning-Based Clustering Approaches for Bioinformatics. Briefings in Bioinformatics, 22, 393-415.
https://doi.org/10.1093/bib/bbz170
[3]  Song, L. and Zhang, X. (2018) Improved Pixel Relevance Based on Mahalanobis Distance for Image Segmentation. International Journal of Information and Computer Security, 10, 237-247.
https://doi.org/10.1504/IJICS.2018.10012573
[4]  Mykhailo, V., Maria, Z., Lesia, S.V., et al. (2020) Management of the Social Package Structure at Industrial Enterprises on the Basis of Cluster Analysis. TEM Journal, 9, 249-260.
[5]  Rai, P.K. and Dwivedi, K.R. (2012) Clustering Techniques for Unsupervised Learning. International Journal of Managment, IT and Engineering, 2, 462-571.
[6]  Liu, H., Fen, L., Jian, J., et al. (2018) Overlapping Community Discovery Algorithm Based on Hierarchical Agglomerative Clustering. International Journal of Pattern Recognition and Artificial Intelligence, 32, Article ID: 1850008.
https://doi.org/10.1142/S0218001418500088
[7]  Fanny, R., Muhammad, Z. and Saib, S. (2020) Improve BIRCH Algorithm for Big Data Clustering. IOP Conference Series: Materials Science and Engineering, 725, Article ID: 012090.
https://doi.org/10.1088/1757-899X/725/1/012090
[8]  Zuo, Y., Hu, Z., Yuan, S., et al. (2022) Identification of Convective and Stratiform Clouds Based on the Improved DBSCAN Clustering Algorithm. Advances in Atmospheric Sciences, 39, 2203-2212.
https://doi.org/10.1007/s00376-021-1223-7
[9]  Rehioui, H., Idrissi, A., Abourezq, M. and Zegrari, F. (2016) DENCLUE-IM: A New Approach for Big Data Clustering. Procedia Computer Science, 83, 560-567.
https://doi.org/10.1016/j.procs.2016.04.265
[10]  Shi, J., He, Q. and Wang, Z. (2019) GMM Clustering-Based Decision Trees Considering Fault Rate and Cluster Validity for Analog Circuit Fault Diagnosis. IEEE Access, 7, 140637-140650.
https://doi.org/10.1109/ACCESS.2019.2943380
[11]  Shi, J., Liu, X., Yang, S., et al. (2021) An Initialization Friendly Gaussian Mixture Model Based Multi-Objective Clustering Method for SAR Images Change Detection. Journal of Ambient Intelligence and Humanized Computing.
https://doi.org/10.1007/s12652-020-02584-w
[12]  Fukunaga, K. and Hostetler, L. (1975) The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition. IEEE Transactions on Information Theory, 21, 32-40.
https://doi.org/10.1109/TIT.1975.1055330
[13]  Cheng, Y. and Fu, K.S. (1985) Conceptual Clustering in Knowledge Organization. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-7, 592-598.
https://doi.org/10.1109/TPAMI.1985.4767706
[14]  Lingras, P. and West, C. (2004) Interval Set Clustering of Web Users with Rough K-Means. Journal of Intelligent Information Systems, 23, 5-16.
https://doi.org/10.1023/B:JIIS.0000029668.88665.1a
[15]  姜凡. 基于三支决策的密度聚类算法[J]. 应用数学进展, 2022, 11(2): 858-865.
[16]  Wang, P. and Yao, Y. (2018) CE3: A Three-Way Clustering Method Based on Mathematical Morphology. Knowledge-Based Systems, 155, 54-65.
https://doi.org/10.1016/j.knosys.2018.04.029
[17]  李刘万, 朱金, 王平心. 基于样本相似度的三支聚类算法[J]. 扬州大学学报(自然科学版), 2022, 25(6): 40-44.
[18]  Yao, Y. (2010) The Superiority of Three-Way Decisions in Probabilistic Rough Set Models. Information Sciences, 181, 1080-1096.
https://doi.org/10.1016/j.ins.2010.11.019
[19]  Yu, H., Zhang, C. and Wang, G. (2016) A Tree-Based Incremental Overlapping Clustering Method Using the Three-Way Decision Theory. Knowledge-Based Systems, 91, 189-203.
https://doi.org/10.1016/j.knosys.2015.05.028
[20]  凡嘉琛, 王平心, 杨习贝. 基于三支决策的密度敏感谱聚类[J]. 山东大学学报(理学版), 2023, 58(1): 59-66.
[21]  徐天杰, 王平心, 杨习贝. 基于人工蜂群的三支k-means聚类算法[J]. 计算机科学, 2023, 50(6): 116-121.
[22]  Taaffe, K., Pearce, B. and Ritchie, G. (2021) Using Kernel Density Estimation to Model Surgical Procedure Duration. International Transactions in Operational Research, 28, 401-418.
https://doi.org/10.1111/itor.12561
[23]  Wang, Z., Farhand, S. and Tsechpenakis, G. (2019) Fading Affect Bias: Improving the Trade-Off between Accuracy and Efficiency in Feature Clustering. Machine Vision and Applications, 30, 255-268.
https://doi.org/10.1007/s00138-019-01008-w
[24]  Warrens, W. and van der Hoef, H. (2022) Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs. Journal of Classification, 39, 487-509.
https://doi.org/10.1007/s00357-022-09413-z
[25]  Bagirov, A.M., Aliguliyev, R.M. and Sultanova, N. (2023) Finding Compact and Well-Separated Clusters: Clustering Using Silhouette Coefficients. Pattern Recognition, 135, Article ID: 109144.
https://doi.org/10.1016/j.patcog.2022.109144
[26]  Davies, D.L. and Bouldin, D.W. (1979) A Cluster Separation Measure. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1, 224-227.
https://doi.org/10.1109/TPAMI.1979.4766909

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133