%0 Journal Article %T Improved BIRCH clustering algorithm
一种改进的BIRCH聚类算法 %A JIANG Shen-yi %A LI Xia %A
蒋盛益 %A 李霞 %J 计算机应用 %D 2009 %I %X BIRCH algorithm is a clustering algorithm suitable for very large data sets. In the algorithm, a CF-tree is built whose all entries in each leaf node must satisfy a uniform threshold T, and the CF-tree is rebuilt at each stage by different threshold. But how to set the initial threshold and how to increase the threshold of each stage are not given. In addition, the algorithm can only work with "metric" attribute, which makes its application restrained. This paper made some improvements on BIRCH algorithm: 1) Changed CF structure so that heterogeneous attributes could be manipulated; 2) Gave a heuristic method of getting initial threshold and increasing threshold of second stage of the algorithm; 3) Discussed the algorithm's parameter B and L and found that the algorithm had equal performance when B=L, at last, gave a sound scope for B. Experimental results on public data sets show that the improved algorithm has preferable performance. %K BIRCH algorithm %K clustering %K threshold %K heterogeneous attributes %K data mining
BIRCH算法 %K 聚类 %K 阈值 %K 混合属性数据 %K 数据挖掘 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=831E194C147C78FAAFCC50BC7ADD1732&aid=550143E6E3EE088AE1A6BAB02A219E1E&yid=DE12191FBD62783C&vid=771469D9D58C34FF&iid=CA4FD0336C81A37A&sid=2AC7DCCBBC26ECF8&eid=D0182A31A5EB14BA&journal_id=1001-9081&journal_name=计算机应用&referenced_num=3&reference_num=11