OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

模式识别与人工智能 2014

查询无关排序主题模型*

, PP. 623-630

肖智博,车丰,吴镝,李庆丰,鲁明羽

Keywords: 排序主题模型,主题模型评价,多文档自动文摘,抽取性文摘,文摘句排序

Full-Text Cite this paper Add to My Lib

Abstract:

主题模型已成为机器学习和自然语言处理等领域研究的重要工具，它可发现大规模语料库中的隐含主题.随着语料库规模增大，发现的主题规模也随之增大.绝大多数主题模型以词袋模型为基础，无法描述词项间的顺序关系，使得主题之间无法按照重要性区分.文中提出查询无关排序主题模型框架，利用主题间各种关系排序主题，得到有序主题列表.主题关系从主题层面评价主题影响度，继而提出词项贡献度，从词项语义层面评价主题，削弱流行但语义空泛的排序主题.由于排序主题模型尚未有公认的评价标准，将有序主题作为特征进行多文档自动文摘生成，通过文摘效果间接评价主题排序的效果.实验结果证明有序主题模型优于非排序主题模型的结果.

References

[1]	Blei D M. Probabilistic Topic Models. Communications of the ACM, 2012, 55(4): 77-84
[2]	Robertson S E. The Probability Ranking Principle in IR. Journal of Documentation, 1977, 33(4): 294-304
[3]	Jones K S, Walker S, Robertson S E. A Probabilistic Model of Information Retrieval: Development and Comparative Experiments Part1. Information Processing & Management, 2000, 36(6): 779-808
[4]	Robertson S, Zaragoza H. The Probabilistic Relevance Framework: BM25 and Beyond. Foundation and Trends in Information Retrieval, 2009, 3(4): 333-389
[5]	Estivill-Castro V. Why So Many Clustering Algorithms: A Position Paper. ACM SIGKDD Explorations Newsletter, 2002, 4(1): 65-75
[6]	AlSumait L, Barbará D, Gentle J, et al. Topic Significance Ranking of LDA Generative Models // Proc of the European Conference on Machine Learning and Knowledge Discovery in Databases. Bled, Slovenia, 2009: 67-82
[7]	Lau J H, Newman D, Karimi S, et al. Best Topic Word Selection for Topic Labelling // Proc of the 23rd International Conference on Computational Linguistics: Posters. Beijing, China, 2010: 605-613
[8]	Song Y Q, Pan S M, Liu S X, et al. Topic and Keyword Re-ranking for LDA-Based Topic Modeling // Proc of the 18th ACM Conference on Information and Knowledge Management. Hong Kong, China, 2009: 1757-1760
[9]	Duan D S, Li Y H, Li R X, et al. RankTopic: Ranking Based Topic Modeling // Proc of the 12th IEEE International Conference on Data Mining. Brussels, Belgium, 2012: 211-220
[10]	Sun Y Z, Han J W, Gao J, et al. iTopicModel: Information Network-Integrated Topic Modeling // Proc of the 9th IEEE International Conference on Data Mining. Miami, USA, 2009: 493-502
[11]	Bougouin A, Boudin F, Daille B. TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction // Proc of the 6th International Joint Conference on Natural Language Processing. Nagoya, Japan, 2013: 543-551
[12]	Daud A, Li J Z, Zhou L Z, et al. Knowledge Discovery through Directed Probabilistic Topic Models: A Survey. Frontiers of Computer Science in China, 2010, 4(2): 280-301
[13]	Srivastava A, Sahami M. Text Mining: Classification, Clustering, and Applications. Boca Raton, USA: CRC Press, 2009
[14]	Blei D M, Lafferty J D. Correlated Topic Models // Proc of the Advances in Neural Information Processing Systems 18. Vancouver, Canada, 2005: 113-120
[15]	Blei D M, Lafferty J D. A Correlated Topic Model of Science. TheAnnals of Applied Statistics, 2007, 1(1): 17-35
[16]	Li W, McCallum A. Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations // Proc of the 23rd International Conference on Machine Learning. Pittsburgh, USA, 2006: 577-584
[17]	Mimno D, Li W, McCallum A. Mixtures of Hierarchical Topics with Pachinko Allocation // Proc of the 24th International Conference on Machine Learning. Corvallis, USA, 2007: 633-640
[18]	Li W, Wang X R, McCallum A. A Continuous-Time Model of Topic Co-occurrence Trends [EB/OL].[2013-4-15]. http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA449612
[19]	Wang X R, McCallum A. Topics Over Time: A Non-Markov Continuous-Time Model of Topical Trends // Proc of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, USA, 2006: 424-433
[20]	Griffiths T L, Steyvers M. Finding Scientific Topics. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Supplement 1): 5228-5235
[21]	Hall D, Jurafsky D, Manning C D. Studying the History of Ideas Using Topic Models // Proc of the Conference on Empirical Methods in Natural Language Processing. Honolulu, USA, 2008: 363-371
[22]	Pruteanu-Malinici I, Ren L, Paisley J, et al. Hierarchical Bayesian Modeling of Topics in Time-Stamped Documents. IEEE Trans on Pattern Analysis and Machine Intelligence, 2010, 32(6): 996-1011
[23]	Blei D M, Lafferty J D. Dynamic Topic Models // Proc of the 23rd International Conference on Machine Learning. Pittsburgh, USA, 2006: 113-120
[24]	Teh Y W, Jordan M I, Beal M J, et al. Hierarchical Dirichlet Processes. Journal of the American Statistical Association, 2006, 101(476): 1566-1581
[25]	Blei D M, Jordan M I. Variational Inference for Dirichlet Process Mixtures. Bayesian Analysis, 2006, 1(1): 121-144
[26]	Chueh C H, Chien J T. Segmented Topic Model for Text Classification and Speech Recognition [EB/OL].[2013-4-21]. http://www.umiacs.umd.edu/~jbg/nips_tm_workshop/7.pdf
[27]	Du L, Buntine W, Jin H D. A Segmented Topic Model Based on the Two-Parameter Poisson-Dirichlet Process. Machine Learning, 2010, 81(1): 5-19
[28]	Chang J, Blei D M. Hierarchical Relational Models for Document Networks. The Annals of Applied Statistics, 2010, 4(1): 124-150
[29]	Dan L, Buntine W L, Jin H D. Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document // Proc of the 10th IEEE International Conference on Data Mining. Sydney, Australia, 2010: 148-157
[30]	Chang J, Blei D M. Relational Topic Models for Document Networks [EB/OL].[2013-4-25]. www.cs.princeton.edu/~blei/papers/ChangBlei2009.pdf
[31]	Lin C Y. Rouge: A Package for Automatic Evaluation of Summaries // Proc of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL. Barcelona, Spain, 2004: 74-81
[32]	Haghighi A, Vanderwende L. Exploring Content Models for Multi-Document Summarization // Proc of HCT-NAACL 2009. Boulder, USA, 2009: 362-370
[33]	Arora R, Ravindran B. Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization // Proc of the 8th IEEE International Conference on Data Mining. Pisa, Italy, 2008: 713-718
[34]	Nenkova A, McKeown K. Automatic Summarization. Foundations and Trends in Information Retrieval, 2011, 5(2/3): 103-233
[35]	Harabagiu S M, Lctusu F. Generating Single and Multi-document Summaries with GISTEXTER // Proc of the Workshop on Automatic Summarization. Philadelphia, USA, 2002: 30-38
[36]	Van Halteren H. Writing Style Recognition and Sentence Extraction // Proc of the ACL Workshop on Text Summarization. Philadelphia, USA, 2002: 66-70

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133