聚类初始中心点选取研究
, PP. 161-165
Keywords: k-均值,序列模式,huffman树,聚类,初始中心
Abstract:
研究了利用已发现的频繁序列模式对序列数据库进行再聚类再发现的问题,针对已有的k-均值聚类算法随机选取初始中心点而导致聚类结果不稳定性的缺点,提出了一种基于huffman思想的初始中心点选取算法――k-spam(k-meansalgorithmofsequencepatternminingbasedonthehuffmanmethod)算法.该算法能够在一定程度上减少陷入局部最优的可能,而且对序列间相似度的计算采用一种高效的"与"、"或"运算,可极大提高算法的执行效率.
References
[1] | agrawala,srikantr.miningsequentialpatterns[c]//taipe:iprocofthe11stintconfondataengineering,1995:3-14.
|
[2] | kaufmanl,roueeeuwpj.findinggroupsindata:anintroductiontoclusteranalysis[m].newyork:johnwiley&sons,1990.
|
[3] | morzyt,wojciechowskim,zakrzewiczm.scalablehierar-chicalclusteringmethodforsequencesofcategoricalvalues[c]//procofthe5thpacific-asiaconferenceonknowledgediscoveryanddatamining(pakdd),lecturenotesincomputerscience2035.newyork:springer-verlag,2001:282-293.
|
[4] | ayresj,gehrkeetalj.sequentialpatternminingusingabitmaprepresentation[c]//procofthe8thacmsigkddintconfonknowledgediscoveryanddatamining.edmonton,2002:429-435.
|
[5] | 严蔚敏,吴伟民.数据结构[m].北京:清华大学出版社,2007:144-145.
|
[6] | uci数据集[db/ol].[2008-03-13].http://download.csdn.net/source/378926.
|
[7] | ibmalmadenresearchcenter.questdataminingproject[db/ol].(1996-03-12)[2007-05-26].http://www.almaden.ibm.com/cs/quest/syndata.html.
|
Full-Text