%0 Journal Article %T 基于Parameter Server框架的大数据挖掘优化算法<br>Optimization algorithm for big data mining based on parameter server framework %A 刘洋 %A 刘博 %A 王峰< %A br> %A LIU Yang %A LIU Bo %A WANG Feng %J 山东大学学报(工学版) %D 2017 %R 10.6040/j.issn.1672-3961.0.2016.339 %X 摘要: 基于大数据挖掘的实时性要求和数据样本的多样性特征,提出一种面向大数据挖掘的机器学习模型训练优化算法。分析当前算法的迭代计算过程,根据模型向量的改变量将迭代过程分为粗调和微调两个阶段,并发现在微调阶段绝大部分样本对计算结果的影响极小,因此可以在微调阶段不计算此类样本的梯度而直接采用上次迭代的计算结果,从而减小计算量,提升计算效率。试验结果表明,算法在分布式集群环境下可以减小模型训练约35%的计算量,且训练得到的模型准确度在正常范围内,可有效提高大数据挖掘的实时性。<br>Abstract: Traditional machine learning algorithms for small data were not applicable for mining of big data. An optimization algorithm for machine learning and big data mining was proposed. The iterative computation of machine learning algorithms was divided into two phases according to the change of model vector. According to the observation that most samples contributed little to the model update during the iteration, the computation load of machine learning algorithms could be reduced by reusing the iterative computing results of this kind of samples. The experimental results showed that the proposed method could reduce the computation load by 35%, with little effect on prediction accuracy of the training model %K 优化算法 %K 分布式系统 %K 大数据 %K 样本差异性 %K 机器学习 %K < %K br> %K big data %K sample diversity %K machine learning %K distributed system %K optimization %U http://gxbwk.njournal.sdu.edu.cn/CN/10.6040/j.issn.1672-3961.0.2016.339