%0 Journal Article %T 基于维基百科的俄汉可比语料库构建及可比度计算<br>Building a Russian-Chinese comparable corpus based on Wikipedia and its comparability calculation %A 原伟 %A 易绵竹< %A br> %A YUAN Wei %A YI Mian-zhu %J 山东大学学报(理学版) %D 2017 %R 10.6040/j.issn.1671-9352.0.2017.095 %X 摘要: 可比语料库由于其自身优势和广泛用途逐渐成为语料库研究的热点方向之一,而目前国内俄汉可比语料库相关研究未见学者涉及。通过梳理国内外相关研究成果,设计了一种基于维基百科构建俄汉可比语料库的思路和方法,研制了语料自动获取系统,以篇章对齐为基础建立了俄汉可比语料库,语料字(词)总数达到了百万级,最后利用跨语言相似度计算的方法对俄汉语料的可比度进行计算。计算结果表明该方法能够有效获取可比度较高的俄汉语料,所构建的语料库可被用于俄汉翻译、话语分析及计算语言学研究中。<br>Abstract: Currently Russian and Chinese corpus research is urgently needed new breakthroughs in data sources, research angles and applications. Comparable corpus is one of the research hotspots in corpus linguistics and natural language processing. So far there has been no study of Russian-Chinese comparable corpora in China. This paper reviews the existing achievements in this area, designs an method to construct Russian-Chinese comparable corpus based on Wikipedia, develops a system for automatic acquiring comparable texts, and builds a Russian-Chinese comparable corpus, which contents more than a million words. In the end, the comparability of this comparable corpora was evaluated by using cross-language similarity calculation methods. The results demonstrate that using this method can effectively obtain Russian-Chinese comparable texts with high comparability, and the corpus can be used for translation, discourse analysis and computational linguistics studies %K 可比语料库 %K 俄语 %K 维基百科 %K < %K br> %K Russian %K comparable corpora %K Wikipedia %U http://lxbwk.njournal.sdu.edu.cn/CN/10.6040/j.issn.1671-9352.0.2017.095