OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

- 2017

K-Means算法改进及基于Spark计算模型的实现
Improvement of K-Means algorithm and implementation based on Spark computing model

徐鹏程,王诚

Keywords: K Means Canopy算法最小最大距离算法 Spark
K Means Canopy algorithm minimum maximum distance algorithm Spark

Full-Text Cite this paper Add to My Lib

Abstract:

K Means算法是一种基于划分的算法，具有实现简单、效率较高的特点，但存在对初始中心选取依赖性强、分类数K未必总是已知及算法频繁迭代资源开销大等缺点。为解决这些问题，通过引入Canopy算法和最小最大距离算法对原K Means算法进行改进，并在大数据的现实背景下，采用Spark 并行计算框架来实现该算法。实验结果表明：改进后的聚类算法在分类稳定性、准确性和收敛速度上都有所提升，并在处理大规模数据方面表现出较大的性能优势。
The K Means algorithm is a partition based algorithm with numerous advantages of simple and high efficiency.But the algorithm has a strong dependence on the selection of initial center.What’s more,the number of classes is not always known and frequent iterations can result in the overload of server.To solve these problems,the original K Means algorithm is improved by introducing Canopy algorithm and minimum maximum distance algorithm.In order to deal with big data,the Spark computing model is utilized to improve the algorithm.Experimental results show that the improved clustering algorithm can improve the classification stability,the accuracy and the convergence speed,thus having performance advantages in dealing with big data

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

K-Means算法改进及基于Spark计算模型的实现Improvement of K-Means algorithm and implementation based on Spark computing model

K-Means算法改进及基于Spark计算模型的实现
Improvement of K-Means algorithm and implementation based on Spark computing model