%0 Journal Article
%T Study of Fast Parallel Clustering Partition Algorithm for Large Data Sets<br>面向大规模数据的快速并行聚类划分算法研究
%A NIU Xin-zheng
%A SHE Kun
%A <br>牛新征
%A 佘堑
%J 计算机科学
%D 2012
%I 
%X With the rapid increase of data amounts in clustering algorithms' processing, traditional K-Means clustering algorithm is facing huge challenge for large data sets. In order to improve efficiency of traditional K-Means clustering algorithm, this paper proposed some improvement ideas and implementation using the cluster center initialization and communication mode, according to parallel clustering algorithm based on MPI and distributed clustering algorithm based on Hadoop in cloud. The results show that research of the algorithm can reduce the communication and computation largely, and can have higher implementation efficiency. I}hc research fruits will help us to design better and fast parallel clustering partition algorithm for large data sets in future.
%K Cloud computing
%K K-means
%K Large data sets
%K Message passing interface
%K Hadoop<br>云计算
%K K-Means
%K 大规模数据
%K MPI
%K Hadoop
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=64A12D73428C8B8DBFB978D04DFEB3C1&aid=1A50AD49C0C949494FDF0469B2BE5106&yid=99E9153A83D4CB11&vid=7C3A4C1EE6A45749&iid=CA4FD0336C81A37A&sid=03A030BB0C519C60&eid=419A92846E208267&journal_id=1002-137X&journal_name=计算机科学&referenced_num=0&reference_num=11