OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Data | An Open Access Journal from MDPI 2018

Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

DOI: https://doi.org/10.3390/data3040066

Kuralay Mukhsina,Nina Khairova,Orken Mamyrbayev,Svitlana Petrasova,W？odzimierz Lewoniewski

Keywords: information extraction, short text fragment similarity, Wikipedia communities, NLP

Full-Text Cite this paper Add to My Lib

Abstract:

Abstract Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. With WordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments in Wikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities. View Full-Tex

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133