全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于分布式语义学的中文语义关系量化研究
Quantitative Research on Chinese Semantic Relations Based on the Distributional Semantics

DOI: 10.12677/ml.2024.129821, PP. 527-536

Keywords: 分布式语义学,词向量,语义关系计算,语义相似性,语义相关性
Distributional Semantics
, Word Embeddings, Calculation of Semantic Relations, Semantic Similarity, Semantic Relatedness

Full-Text   Cite this paper   Add to My Lib

Abstract:

基于分布式语义学理论的词向量蕴含了丰富的语义信息,一定程度上标志着自然语言处理和计算语言学领域进入了大模型发展时代。由于词向量的可计算属性,逐渐发展出了多种基于词向量的语义计算任务,语义关系辨析便是语义计算任务当中重要的一项。本研究基于fastText中文词向量和腾讯中文词向量的方法计算出表征语义关联强度的余弦相似度值,并得出以下结论:fastText中文词向量和腾讯中文词向量在辨别近义关系、反义关系、上下义关系、部分–整体关系这4种语义关系的任务上表现存在一定差异;通过比较Spearman相关系数,fastText中文词向量在实验数据上表现出其习得了更强的语义相似度特征,腾讯中文词向量则体现出其学习到了更强的语义相关度特征;在反义词辨析任务上,fastText中文词向量和腾讯中文词向量都在高度规约化的反义词对上计算出很高的余弦相似度值。
The word embeddings, based on the distributed semantics theory, which contains rich linguistic information, have contributed a lot to the development of large language model (LLM) in the fields of natural language processing and computational linguistics. Due to the computable properties of word embeddings, various semantic computing tasks based on them have gradually emerged, among which semantic relation discrimination is an important task in semantic computation. In our study, we adopt two word-embedding methods, the fastText Chinese word embeddings and the Tencent Chinese word embeddings, to calculate Chinese semantic relations, where the cosine similarity is used to represent the semantic association strength between words. The following are our findings in this study: First, the fastText Chinese embeddings and the Tencent Chinese embeddings show some differences in the task of distinguishing the four types of semantic relation in Chinese, namely, synonymy, antonymy, hyponymy and meronymy; Second, by comparing the Spearman correlation coefficient, the fastText embeddings have acquired more knowledge of semantic similarity between words, while the Tencent Chinese word embeddings have acquired more knowledge of semantic relatedness between words; Third, both the fastText Chinese embeddings and the Tencent Chinese word embeddings give higher values of cosine similarity to highly conventionalized antonyms.

References

[1]  Sahlgren, M. (2006) The Distributional Hypothesis. The Italian Journal of Linguistics, 20, 1-18.
[2]  Harris, Z.S. (1954) Distributional Structure. WORD, 10, 146-162.
https://doi.org/10.1080/00437956.1954.11659520
[3]  Firth, J.R. (1957) A Synopsis of Linguistic Theory 1930-1955. Studies in Linguistic Analysis, Special Volume, 10-32.
[4]  Rubenstein, H. and Goodenough, J.B. (1965) Contextual Correlates of Synonymy. Communications of the ACM, 8, 627-633.
https://doi.org/10.1145/365628.365657
[5]  Boleda, G. and Erk, K. (2015) Distributional Semantic Features as Semantic Primitives—Or Not. AAAI Spring Symposia, 3, 2-5.
[6]  Boleda, G. (2020) Distributional Semantics and Linguistic Theory. Annual Review of Linguistics, 6, 213-234.
https://doi.org/10.1146/annurev-linguistics-011619-030303
[7]  Zhang, Z., Yuan, P. and Jin, H. (2023) Exploring Word-Sememe Graph-Centric Chinese Antonym Detection. In: Koutra, D., Plant, C., Rodriguez, M.G., Baralis, E. and Bonchi, F., Eds., Machine Learning and Knowledge Discovery in Databases: Research Track, Springer, 583-600.
https://doi.org/10.1007/978-3-031-43418-1_35
[8]  Wu, Y. and Zhang, M. (2018) Overview of the NLPCC 2017 Shared Task: Chinese Word Semantic Relation Classification. In: Huang, X.J., Jiang, J., Zhao, D.Y., Feng, Y.S. and Hong, Y., Eds., Natural Language Processing and Chinese Computing, Springer, 919-925.
https://doi.org/10.1007/978-3-319-73618-1_81
[9]  Chen, X., Xu, L., Liu, Z., Sun, M. and Luan, H. (2015) Joint Learning of Character and Word Embeddings. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, 25-31 July 2015, 1236-1242.
[10]  Miller, G.A. and Charles, W.G. (1991) Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6, 1-28.
https://doi.org/10.1080/01690969108406936
[11]  杨泉. 知识本体与词向量结合的词义相似度强化学习计算方法[J]. 重庆理工大学学报(自然科学), 2022, 36(1): 238-235.
[12]  张志毅, 张庆云. 新华反义词词典(中型本) [M]. 上海: 商务印书馆, 2008.
[13]  Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2017) Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135-146.
https://doi.org/10.1162/tacl_a_00051
[14]  Song, Y., Shi, S., Li, J. and Zhang, H. (2018) Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, 8 June 2018, 175-180.
https://doi.org/10.18653/v1/n18-2028
[15]  谷鸿洁. 反义相成词研究综述[J]. 现代语文, 2015(7): 10-12.
[16]  Aina, L., Gulordava, K. and Boleda, G. (2019). Putting Words in Context: LSTM Language Models and Lexical Ambiguity. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, 28 July-2 August 2019, 3342-3348.
https://doi.org/10.18653/v1/p19-1324
[17]  McNally, L. and Boleda, G. (2017) Conceptual versus Referential Affordance in Concept Composition. In: Hampton, J.A. and Winter, Y., Eds., Compositionality and Concepts in Linguistics and Psychology, Springer, 245-267.
https://doi.org/10.1007/978-3-319-45977-6_10

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133