全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于阴影集的共享最邻近三支DBSCAN
Three-Way DBSCAN Text Clustering Based on Shadowed Sets and Shared Nearest Neighbor

DOI: 10.12677/hjdm.2025.152012, PP. 137-150

Keywords: 三支决策,三支聚类,阴影集,文本聚类
Three-Way Decision
, Three-Way Clustering, Shadowed Sets, Text Clustering

Full-Text   Cite this paper   Add to My Lib

Abstract:

传统DBSCAN算法在处理数据时,将某些不确定的数据强制划分到某一类中往往容易带来决策风险。针对此问题,提出了基于阴影集的共享最邻近三支DBSCAN算法。该算法利用三支决策思想,将核心点划分到核心域中,对于非核心点引入阴影集理论,计算样本的隶属度,将样本划分到核心域或边界域中,并通过共享最邻近算法进一步细化边界域中的样本划分,从而提升聚类的准确性和鲁棒性。该算法应用在文本分析领域,通过实验对比分析,验证了该算法具有较好的性能,提高了文本聚类的准确性。
The traditional DBSCAN algorithm, when processing data, often faces decision risks by forcing certain uncertain data points into a specific cluster. A three-way DBSCAN algorithm based on shadowed sets and Shared Nearest Neighbor is proposed to address this issue. This algorithm utilizes the three-way decision-making approach to classify core points into the core region. For non-core points, the theory of shadow sets is introduced to calculate the membership degree of the samples, categorizing them into either the core region or boundary region. The Shared Nearest Neighbor algorithm is then applied to further refine the classification of samples within the boundary region, thereby enhancing the accuracy and robustness of clustering. Applied in text analysis, experimental comparative analysis has verified that this algorithm demonstrates better performance and improves the accuracy of text clustering.

References

[1]  Wang, P., Yang, X., Ding, W., Zhan, J. and Yao, Y. (2024) Three-Way Clustering: Foundations, Survey and Challenges. Applied Soft Computing, 151, Article ID: 111131.
https://doi.org/10.1016/j.asoc.2023.111131
[2]  Leuski, A. (2001) Evaluating Document Clustering for Interactive Information Retrieval. Proceedings of the Tenth International Conference on Information and Knowledge Management, Atlanta, 5-10 October 2001, 33-40.
https://doi.org/10.1145/502585.502592
[3]  Mei, Q. and Zhai, C. (2005) Discovering Evolutionary Theme Patterns from Text: An Exploration of Temporal Text Mining. Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Chicago, 21-24 August 2005, 198-207.
https://doi.org/10.1145/1081870.1081895
[4]  Nandwani, P. and Verma, R. (2021) A Review on Sentiment Analysis and Emotion Detection from Text. Social Network Analysis and Mining, 11, Article No. 81.
https://doi.org/10.1007/s13278-021-00776-6
[5]  Ester, M., Kriegel, H.P., Sander, J., et al. (1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, 2-4 August 1996, 226-231.
[6]  Rehman, S.U., Asghar, S., Fong, S. and Sarasvady, S. (2014) DBSCAN: Past, Present and Future. The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014), Bangalore, 17-19 February 2014, 232-238.
https://doi.org/10.1109/icadiwt.2014.6814687
[7]  Deng, D. (2020) DBSCAN Clustering Algorithm Based on Density. 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), Hefei, 25-27 September 2020, 949-953.
https://doi.org/10.1109/ifeea51475.2020.00199
[8]  Ienco, D. and Bordogna, G. (2016) Fuzzy Extensions of the DBScan Clustering Algorithm. Soft Computing, 22, 1719-1730.
https://doi.org/10.1007/s00500-016-2435-0
[9]  Ertöz, L., Steinbach, M. and Kumar, V. (2003) Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. Proceedings of the 2003 SIAM International Conference on Data Mining, San Francisco, 1-3 May 2003, 47-58.
https://doi.org/10.1137/1.9781611972733.5
[10]  Yu, H., Chen, L., Yao, J. and Wang, X. (2019) A Three-Way Clustering Method Based on an Improved DBSCAN Algorithm. Physica A: Statistical Mechanics and Its Applications, 535, Article 122289.
https://doi.org/10.1016/j.physa.2019.122289
[11]  Pedrycz, W. (1998) Shadowed Sets: Representing and Processing Fuzzy Sets. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 28, 103-109.
https://doi.org/10.1109/3477.658584
[12]  Pedrycz, W. and Vukovich, G. (2002) Granular Computing with Shadowed Sets. International Journal of Intelligent Systems, 17, 173-197.
https://doi.org/10.1002/int.10015
[13]  Pedrycz, W. (2005) Interpretation of Clusters in the Framework of Shadowed Sets. Pattern Recognition Letters, 26, 2439-2449.
https://doi.org/10.1016/j.patrec.2005.05.001
[14]  Pedrycz, W. (2009) From Fuzzy Sets to Shadowed Sets: Interpretation and Computing. International Journal of Intelligent Systems, 24, 48-61.
https://doi.org/10.1002/int.20323
[15]  Jiang, C., Li, Z. and Yao, J. (2022) A Shadowed Set-Based Three-Way Clustering Ensemble Approach. International Journal of Machine Learning and Cybernetics, 13, 2545-2558.
https://doi.org/10.1007/s13042-022-01543-5
[16]  Zhang, Y., Zhang, T., Peng, C., Ma, F. and Pedrycz, W. (2024) Rough Fuzzy K-Means Clustering Based on Parametric Decision-Theoretic Shadowed Set with Three-Way Approximation. International Journal of Fuzzy Systems, 26, 1698-1715.
https://doi.org/10.1007/s40815-024-01700-8
[17]  Zhang, X. and Zhou, S. (2023) WOA-DBSCAN: Application of Whale Optimization Algorithm in DBSCAN Parameter Adaption. IEEE Access, 11, 91861-91878.
https://doi.org/10.1109/access.2023.3307412
[18]  李文杰, 闫世强, 蒋莹, 等. 自适应确定DBSCAN算法参数的算法研究[J]. 计算机工程与应用, 2019, 55(5): 1-7, 148.
[19]  Kim, J., Choi, J., Yoo, K. and Nasridinov, A. (2018) AA-DBSCAN: An Approximate Adaptive DBSCAN for Finding Clusters with Varying Densities. The Journal of Supercomputing, 75, 142-169.
https://doi.org/10.1007/s11227-018-2380-z
[20]  Smiti, A. and Eloudi, Z. (2013) Soft DBSCAN: Improving DBSCAN Clustering Method Using Fuzzy Set Theory. 2013 6th International Conference on Human System Interactions (HSI), Sopot, 6-8 June 2013, 380-385.
https://doi.org/10.1109/hsi.2013.6577851
[21]  申秋萍, 张清华, 高满, 等. 基于局部半径的三支DBSCAN算法[J]. 计算机科学, 2023, 50(6): 100-108.
[22]  Yao, Y. (2010) Three-Way Decisions with Probabilistic Rough Sets. Information Sciences, 180, 341-353.
https://doi.org/10.1016/j.ins.2009.09.021
[23]  Yao, Y. (2011) The Superiority of Three-Way Decisions in Probabilistic Rough Set Models. Information Sciences, 181, 1080-1096.
https://doi.org/10.1016/j.ins.2010.11.019
[24]  Yu, H., Zhang, C. and Wang, G. (2016) A Tree-Based Incremental Overlapping Clustering Method Using the Three-Way Decision Theory. Knowledge-Based Systems, 91, 189-203.
https://doi.org/10.1016/j.knosys.2015.05.028
[25]  Yu, H. (2017) A Framework of Three-Way Cluster Analysis. Rough Sets: International Joint Conference, IJCRS 2017, Olsztyn, 3-7 July 2017, 300-312.
https://doi.org/10.1007/978-3-319-60840-2_22
[26]  鞠哲, 曹隽喆, 顾宏. 用于不平衡数据分类的模糊支持向量机算法[J]. 大连理工大学学报, 2016, 56(5): 525-531.
[27]  Maji, P. and Pal, S.K. (2007) RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets. Fundamenta Informaticae, 80, 475-496.
https://doi.org/10.3233/fun-2007-80408
[28]  Yang, F., Xie, H. and Li, H. (2019) RETRACTED: Video Associated Cross-Modal Recommendation Algorithm Based on Deep Learning. Applied Soft Computing, 82, Article 105597.
https://doi.org/10.1016/j.asoc.2019.105597
[29]  周水庚, 周傲英, 曹晶, 等. 一种基于密度的快速聚类算法[J]. 计算机研究与发展, 2000, 37(11): 1287-1292.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133