OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

控制与决策 2015

基于连通分量的分类变量聚类算法

DOI: 10.13195/j.kzyjc.2013.1501, PP. 39-45

周红芳,周扬,张晓鹏,谈姝辰

Keywords: 聚类,分类变量,相似度,连通分量,聚类精度

Full-Text Cite this paper Add to My Lib

Abstract:

针对分类变量相似度定义存在的不足,提出一种新的相似度定义.利用新的相似度定义,将数据集抽象为无向图,将聚类过程转化为求无向图连通分量的过程,进而提出一种基于连通分量的分类变量聚类算法.为了定量地分析该算法的聚类效果,针对类别归属已知的数据集,提出一种新的聚类结果评价指标.实验结果表明,所提出的算法具有较高的聚类精度和聚类效率.

References

[1]	Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes[C]. Proc of the 15th Int Conf on Data Engineering. Sydney: IEEE CS Press, 1999: 512-521.
[2]	James B M. Some methods for classification and analysis of multivariate observations[C]. Proc of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967: 281-297.
[3]	Martin E, Hans P K, Jiirg S, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]. Proc of the 2nd Int Conf on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996: 226-231.
[4]	Wu S, Liu J J, Wei G. Clustering algorithm based on condensed set dissimilarity for high dimensional sparse data of categorical attributes[C]. Proc of the 3rd Int Conf on Advanced Computer Control. Harbin: IEEE Press, 2011: 445-448.
[5]	Han J W, Kamber M. Data mining: Concepts and techniques[M]. Beijing: China Machine Press, 2008：253-260.
[6]	Cao F Y, Liang J Y, Li D Y. A dissimilarity measure of the ??-modes clustering algorithm[J]. Knowledge-Based Systems, 2012, 26(15): 120-127.
[7]	Natthakan L, Tossapon B, Simon G, et al. A link-based cluster ensemble approach for categorical data clustering[J]. IEEE Trans on Knowledge and Data Engineering, 2012, 24(3): 413-425.
[8]	Joydeep G, Gunjan K G. Value balanced agglomerative connectivity clustering[C]. Proc of the 3rd Int Conf on Data Mining and Knowledge Discovery: Theory, Tools and Technology. Orlando: SPIE, 2001: 6-15.
[9]	Dutta M, Dakoti M A, Pujari A K. QROCK: A quick version of the ROCK algorithm for clustering of categorical data[J]. Pattern Recognition Letters, 2005, 26(15): 2364-2373.
[10]	金阳, 左万利. 一种基于动态近邻选择模型的聚类算法[J]. 计算机学报, 2007, 30(5): 756-762.
[11]	(Jin Y, Zuo W L. A clustering algorithm using dynamic nearest neighbors selection model[J]. Chinese J of Computers, 2007, 30(5): 756-762.)
[12]	Zhou Q B, Ding L X, Zhang S S. A genetic evolutionary ROCK algorithm[C]. Proc of the 1st Int Conf on Computer Application and System Modeling. Taiyuan: IEEE Computer Society, 2010: 347-351.
[13]	Anil P, Ritesh J, Surendra M. Implementation of distributed ROCK algorithm for clustering of large categorical datasets and its performance analysis[C]. Proc of the 3rd Int Conf on Electronics Computer Technology. India: IEEE Press, 2011: 78-83.
[14]	He Z Y, Xu X F, Deng S C. Squeezer: An efficient algorithm for clustering categorical data[J]. J of Computer Science and Technology, 2002, 17(5): 611-624.
[15]	Richard O D, Peter E. Pattern classication and scene analysis[M]. New York: A Wiley-Interscience Publication, 1973: 103-114.
[16]	Don C, Siimuei W. Matrix multiplication via arithmetic progressions[J]. J of Symbolic Computation, 1990, 9(3): 251-280.
[17]	Fabrizio S. A tutorial on automated text categorization[C]. Proc of the 1st Argentinean symposium on artificial intelligence. Buenos Aires: AR, 1999: 7-35.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133