%0 Journal Article
%T 基于LEBERT-CRF和知识图谱的中文地址修正补全方法
Chinese Address Correction Completion Method Based on LEBERT-CRF and Knowledge Graph
%A 王钦民
%A 刘鹏
%A 邓国威
%J Computer Science and Application
%P 808-818
%@ 2161-881X
%D 2023
%I Hans Publishing
%R 10.12677/CSA.2023.134080
%X 为解决人工中文地址因输入不准确造成的地址解析错误问题,本文首先结合词汇增强的基于Transformer的双向编码表征模型(LEBERT)与条件随机场(CRF),提出了LEBERT-CRF模型,相较BERT-长短期记忆-CRF模型(BERT-BiLSTM-CRF)在分词准确率、召回率以及F值上分别提升了1.45%、1.89%和1.67%。然后,通过标准层级地址数据,并引入别名、旧名等地址信息构建了地址知识图谱库。最终,利用经过分词处理的地址数据,并根据地址数据存在的几种可能错误类型,设计出一种基于地址知识图谱库的匹配算法,对分词完的地址数据进行匹配修正并得到准确地址信息,相较于中文省份城市地区匹配器(CPCA),地址解析在一级地址、二级地址、三级地址上解析准确率分别提升了2.12%、2.36%和1.12%。
In order to solve the problem of address resolution errors caused by inaccurate input of manual Chinese addresses, in this paper, we first propose a LEBERT-CRF model which is based on the combination of the word-enhanced deep learning model Lexicon Enhanced Bidirectional Encoder Rep-resentations from Transformers (LEBERT) and Conditional Random Fields (CRF). Compared with BERT-Bidirectional Long Short Term Memory-CRF (BERT-BiLSTM-CRF) model, the segmentation accuracy, recall rate and F-score were increased by 1.45%, 1.89% and 1.67%, respectively. Then, based on the standard multi-level address data, an address knowledge graph database is con-structed with address information such as aliases and old names. Finally, a matching algorithm based on the address knowledge graph database is designed based on the address data processed by word segmentation and several possible error types exist in the address data. The address data after word segmentation is matched and corrected and accurate address information is obtained. Compared to the Chinese Province City Area mapper (CPCA), the resolution accuracy of 1st-level address, 2nd-level address and 3rd-level address is improved by 2.12%, 2.36% and 1.12%, respectively.
%K 中文地址分词,中文地址匹配,LEBERT,CRF,知识图谱
Chinese Address Segmentation
%K Chinese Address Matching
%K LEBERT
%K CRF
%K Knowledge Graph
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=64404