|
- 2017
基于模糊匹配与音字转换的维吾尔语人名识别
|
Abstract:
维吾尔语是属于阿尔泰语系的黏着性语言,构词特点比较复杂,尤其是维吾尔语中的人名,由于来源差别巨大,识别难度很高,到目前为止,还未出现成熟的维吾尔语人名识别工具。大量维吾尔语文本中的人名统计发现,维吾尔族人名和汉族人名共占据了约83%,因此该文分别针对维吾尔语文本中出现的维吾尔族人名和汉族人名提出相应的识别方法。针对维吾尔族人名,提出基于字母的模糊匹配识别方法;针对汉族人名,借助机器翻译思想提出基于音字转换的识别方法。实验结果表明:所提方法识别维吾尔族人名F1值能够达到91.84%,识别汉族人名F1值能够达到95.86%。
Abstract:Uyghur is a very agglutinative language which belongs to the Altaic family of languages with a very complex morphology. Uyghur names have many origins, so they are difficult to analyze and recognize. Thus, there is no well-developed toolkit for name recognition in Uyghur. An investigation of a large Uyghur text shows that 83% of all the names are either Uyghur names or Chinese names. Therefore, this work focuses on these two kinds of names with specific solutions for recognizing them in Uyghur texts. A letter-based fuzzy matching method is used for the Uyghur names with a syllable-character conversion method based on a machine translation method for the Chinese names. Tests show that this method achieves a 91.84% F1 score for the Uyghur names and 95.86% for the Chinese names.