|
Modern Linguistics 2021
基于词典和规则的土耳其语形态消歧系统实现
|
Abstract:
本文提出了一种基于形态分析词典和上下文环境约束规则的土耳其语形态消歧方法,通过文本预处理、命名实体识别、固定搭配识别、未登录词处理、形态分析和形态消歧共6个模块,构建了一个实用的土耳其语形态消歧系统。实验中,系统对随机选取的15份新闻文本测试数据进行处理,结果显示,与未加入消歧规则的基线系统相比,文本中78.57%的形态歧义得到了解决,形态句法标注准确率达96.84%,提高了1.7个百分点。
This paper proposed a hybrid approach that solves morphological disambiguation problem based on a Turkish Frequency List lexicon and contextual constraint rules. On this methodology, we have created a practical Turkish morphological disambiguation system consisting of text preprocessing, named entity recognition, fixed collocation recognition, unknown word recognition, morphological parsing and morphological disambiguation, a total of six modules. In the test, 15 online news texts were randomly selected, and by combining constraint rules the system gets 96.84% of all the morphosyntax features correctly parsed on the test data. Compared with the baseline system without disambiguation rules, 78.57% of the morphological ambiguities in the text were resolved and the accuracy increased by 1.7%.
[1] | Oflazer, K. (1994) Two-Level Description of Turkish Morphology. Literary and Linguistic Computing, 9, 137-148.
https://doi.org/10.1093/llc/9.2.137 |
[2] | Kaasand?k, A. (2013) Türk Dili. https://turkdili.gen.tr/ |
[3] | Sak, H., Güng?r, T. and Sara?lar, M. (2007) Morphological Disambiguation of Turkish Text with Perceptron Algorithm. Computational Linguistics and ?ntelligent Text Processing, Mexico City, 18-24 February 2007, 107-118.
https://doi.org/10.1007/978-3-540-70939-8_10 |
[4] | Oflazer, K. and Kuru?z, ?. (1994) Tagging and Morphological Disambiguation of Turkish Text. Proceedings of the Fourth Conference on Applied Natural Language Processing, Stuttgart, October 1994, 144-149.
https://doi.org/10.3115/974358.974391 |
[5] | Oflazer, K. and Tür, G. (1996) Combining Hand-Crafted Rules and Unsupervised Learning in Constraint-Based Morphological Disambiguation. Proceedings of the ACL-SIGDAT Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, May 1996, 1-13. |
[6] | Oflazer, K. and Tür, G. (1997) Morphological Disambiguation by Voting Constraints. Proceedings of ACL’97, Madrid, 11 July 1997, 222-229. https://doi.org/10.3115/979617.979646 |
[7] | Hakkani-Tür, D.Z., Oflazer, K. and Tür, G. (2002) Statistical Morphological Disambiguation for Agglutinative Languages. Computers and the Humanities, 36, 381-410. https://doi.org/10.1023/A:1020271707826 |
[8] | G?rgün, O. and Y?ld?z, O.T. (2011) A Novel Approach to Morphological Disambiguation for Turkish. In: Gelenbe, E., Lent, R. and Sakellari, G., Eds., Computer and Information Sciences II, Springer, London, 77-83. |
[9] | Yildiz, E., et al. (2016) A Morphology-Aware Network for Morphological Disambiguation. Proceedings of the AAAI Conference on Artificial Intelligence, 30. https://ojs.aaai.org/index.php/AAAI/article/view/10355 |
[10] | Yuret, D. and Türe, F. (2006) Learning Morphological Disambiguation Rules for Turkish. HLT-NAACL’06: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, June 2006, 328-334. https://doi.org/10.3115/1220835.1220877 |
[11] | Kutlu, M. and ?i?ekli, ?. (2013) A Hybrid Morphological Disambiguation System for Turkish. Turkish Natural Language Processing, 53-67. |
[12] | TS-Corpus土耳其语语料库[Z/OL]. https://tscorpus.com, 2012-2019. |