|
一种基于相似度计算的实体数据关系归属方法
|
Abstract:
特定领域的数据蕴含了大量有价值的知识及关系划分,从中能正确将其进行关系划分一直是一个值得关注的话题。当前关系划分都依赖于大量样本模型进行训练得出,由于特定领域的实体数据关系样本数量较少,显然应用到特定领域中存在局限。因此本文针对该问题,提出一种基于相似度计算的实体数据关系归属方法,其中建立一个特定领域的少样本实体关系术语树,与待划分的实体数据进行相似度计算得到在树中具体位置,从而解决错误归属问题,显著减少人工管理成本,能够有效提升系统的可用性。
Domain-specific data contains a large amount of valuable knowledge and relationship delineation, from which it is always a topic of interest to be able to correctly perform relationship delineation. Currently, the relationship classification relies on a large number of sample models for training, due to the small number of domain-specific entity data relationship samples, it is obvious that there are limitations in applying to specific domains. Therefore, in this paper, we propose a similarity-based relationship attribution method for entity data, in which a domain-specific entity relationship term tree with few samples is established, and the entity data to be partitioned is similarity-calculated to get the specific position in the tree, thus solving the problem of misattribution, significantly reducing the cost of manual management, and effectively improving the usability of the system.
[1] | 丁泓馨, 邹佩聂, 赵俊峰, 等. 一种基于主动学习的文本实体与关系联合抽取方法[J]. 计算机科学, 2023, 50(10): 126-134. |
[2] | 沈依宁, 王一然, 吴聪. 基于深度学习的关系抽取研究进展[J/OL]. 电子科技: 1-11. https://doi.org/10.16180/j.cnki.issn1007-7820.2025.07.006, 2024-06-23. |
[3] | 成全, 蒋世辉, 李卓卓. 基于改进Casrel实体关系抽取模型的在线健康信息语义发现研究[J/OL]. 数据分析与知识发现: 1-17. http://kns.cnki.net/kcms/detail/10.1478.g2.20231114.1648.004.html, 2024-06-23. |
[4] | 王欢, 王兴芬, 吕金娜. 面向金融文本的实体关系抽取方法[J]. 计算机工程与设计, 2023, 44(11): 3345-3351. |
[5] | 刘清堂, 蒋如意, 吴林静, 等. 融合实体位置与类型特征的土家民族器乐实体关系抽取研究[J/OL]. 数据分析与知识发现: 1-16. http://kns.cnki.net/kcms/detail/10.1478.G2.20240524.1021.002.html, 2024-06-23. |
[6] | 魏嵬, 丁香香, 郭梦星, 等. 文本相似度计算方法综述[J/OL]. 计算机工程: 1-19. https://doi.org/10.19678/j.issn.1000-3428.0068086, 2024-06-21. |
[7] | 赵停停. 基于MySQL数据库技术的Web动态网页设计研究[J]. 信息与电脑(理论版), 2023, 35(17): 174-176. |
[8] | 徐新黎, 卢齐林, 杨旭华, 等. 多任务特征交互的三元组抽取方法[J/OL]. 小型微型计算机系统: 1-10. http://kns.cnki.net/kcms/detail/21.1106.TP.20240529.1529.008.html, 2024-06-21. |
[9] | 温雨, 王琦, 严武军. 基于相似度融合的中文文本相似性度量方法研究[J]. 信息技术与信息化, 2023(10): 36-39. |
[10] | 杨政, 方正云, 李天骄, 等. 基于分层深度语义的科研项目文本相似度度量方法[J]. 计算机与数字工程, 2024, 52(3): 795-801, 851. |
[11] | Lamurias, A., Ruas, P. and Couto, F.M. (2019) PPR-SSM: Personalized Pagerank and Semantic Similarity Measures for Entity Linking. BMC Bioinformatics, 20, Article No. 534. https://doi.org/10.1186/s12859-019-3157-y |
[12] | 王冠南, 郭丽娟, 彭曙蓉, 等. 基于正则表达式和Jaccard系数的智能变电站录波通道同源匹配[J]. 浙江电力, 2024, 43(1): 20-27. |