全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

LDA\,算法在Mahout下的高效实现

, PP. 118-130

Keywords: LatentDirichletAllocation,Gibbs\,采样,Mahout,分布式并行计算,MapReduce\,计算框架

Full-Text   Cite this paper   Add to My Lib

Abstract:

通过对运用\,Gibbs\,采样的\,LatentDirichletAllocation(LDA)\,算法和\,MapReduce\,计算框架的细致研究,实现了\,LDA\,算法在\,Mahout下的分布式并行计算.详细地考察了该分布式并行计算程序的计算性能,并深入地探讨了一些影响计算性能的关键问题.

References

[1]  {1}
[2]  BLEI D M, NG A Y, JORDAN M I.
[3]  Latent Dirichlet allocation[J].
[4]  Journal of Machine Learning Research, 2003 (3): 993-1022.
[5]  {2}
[6]  GRIFFITHS T L, STEYVERS M.
[7]  VENNER J.
[8]  Pro Hadoop[M].
[9]  New York: Apress, 2009.
[10]  BU Y Y, HOWE B, BALAZINSKA M, et al.
[11]  HaLoop: efficient iterative data processing on large clusters[J].
[12]  Proceedings of the VLDB Endowment, 2010(3): 285-296.
[13]  Finding scientific topics[J].
[14]  Proceedings of the National Academy of Sciences, 2004(101): 5228-5235.
[15]  {3}
[16]  {4}
[17]  OWEN S, ANIL R, DUNNING T, FRIEDMAN E.
[18]  Mahout in Action[M].
[19]  New York: Manning Publications, 2010.
[20]  {5}
[21]  STEYVERS M, GRIFFITHS T.
[22]  Probabilistic topic models[M]//LANDAUER T,
[23]  MCNAMARA D, DENNIS S, et al. Latent Semantic Analysis: A Road to Meaning.[s.l.]:Routledge, 2007.
[24]  {6}
[25]  HEINRICH G.
[26]  Parameter estimation for text analysis[R].
[27]  Darmstadt: Fraunhofer IGD, 2004.
[28]  {7}
[29]  NEWMAN D, ASUNCION A, SMYTH P, WELLING M.
[30]  Distributed inference for latent Dirichlet allocation[J].
[31]  Proc Neural Information Processing Systems, 2007(20): 1081-1088.
[32]  {8}
[33]  WANG Y, BAI H J, STANTON M, et al.
[34]  PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications[M].
[35]  Lecture Notes in Computer Science 5564. Berlin: Springer, 2009: 301-314.
[36]  {9}
[37]  GRIFFITHS T L, STEYVERS M.
[38]  A probabilistic approach to semantic representation[C]// Proceedings of the Twenty-Fourth Annual Conference of Cognitive Science Society,
[39]  2002.
[40]  {10}
[41]  LIU Z Y, ZHANG Y Z, CHANG E Y.
[42]  PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing[J].
[43]  ACM Transactions on Intelligent Systems and Technology, 2011(2): 26.
[44]  {11}
[45]  SMOLA A, NARAYANAMURTHY S.
[46]  An architecture for parallel topic models[J].
[47]  Proceedings of the VLDB Endowment, 2010(3): 703-710.
[48]  {12}
[49]  EKANAYAKE J, LI H, ZHANG B J, et al.
[50]  Twister: a runtime for iterative MapReduce[J].
[51]  Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010(1): 810-818.
[52]  {13}

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133