|
- 2018
基于本体的俄文新闻话题检测设计与实现
|
Abstract:
摘要: 针对俄文新闻文本的话题检测问题,以俄文文本的自动形态分析、命名实体识别作为辅助手段,设计了一种基于本体描述俄文新闻文本和话题信息并进行相似度计算的方法,随后使用Single-pass算法进行俄文文本的话题检测实验。通过对比基于向量空间模型和基于本体模型的俄文话题检测结果,证明了后者具有相对较高的准确性和有效性。
Abstract: Aiming at the problem of topic detection in Russian news, using automatic morphological analysis and named entity recognition as the auxiliary means, a method for describing Russian news elements and calculating their similarities based on ontology was designed. The Single-pass algorithm was used to carry out text clustering experiments for topic detection. By comparing the results of vector space model(VSM)model and ontology model, it is proved that the latter has relatively high accuracy and validity
[1] | 周学广, 高飞, 孙艳. 基于依存连接权VSM的子话题检测与跟踪方法[J]. 通信学报, 2013, 34(8):1-9. ZHOU Xueguang, GAO Fei, SUN Yan. Sub-topic detection and tracking based on dependency connection weights for vector space model[J]. Journal on Communications, 2013, 34(8):1-9. |
[2] | ALLAN J, JIN H, RAJMAN M, et al. Topic-based novelty detection[C] // Proceedings of the Johns Hopkins Summer Workshop. Baltimore: CLSP, 1999: 1-59. |
[3] | NALLAPATI R. Semantic language models for topic detection and tracking[C] // Proceedings of 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: the HLT-NAACL 2003 Student Research Workshop.[S.l.] : ACL, 2003: 1-6. |
[4] | MAKKONEN J. Investigations on event evolution in TDT[C] // Proceedings of 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: the HLT-NAACL 2003 Student Research Workshop.[S.l.] : ACL, 2003: 43-48. |
[5] | КОЛЬЦОВА Ю, КОЛЬЦОВ Н. Статистический и тематический профиль ??爯Живого журнала??爲[C] // Материалы научной конференции “Интернет и современное общество”. Санкт-Петербург: IMS, 2013: 96-104. KOLSOVA YU, KOLSOV N. Statistical and thematic profile of the “LiveJournal”[C] // Proceedings of Scientific Conference “Internet and Modern Society”. St Petersburg: IMS, 2013: 96-104. |
[6] | ДАНИЛОВА В, ПОПОВА В. Извлечения Событий Из Неструктурированного Текста Для Задач Интернет-Социологии[D]. Москва: Издательство Моска,РАНХиГС, 2015. DANILOVA V, POPOVA V. Extraction of events from the unstructured text for the tasks of internet sociology[D]. Moscow: Publishing House Moscow, RANEPA, 2015. |
[7] | 李勇, 张克亮. 面向LDA和VSM模型的微博热点话题发现研究[J]. 自动化技术与应用, 2016, 35(8):52-57. LI Yong, ZHANG Keliang. Research of micro-blog hot topic detection based on LDA and VSM model[J].Techniques of Automation and Application, 2016, 35(08):52-57. |