全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2016 

基于句子语义距离的释义识别研究

Keywords: 释义识别 词向量 句子语义距离 推特
paraphrase identification word vector sentence semantic distances twitter

Full-Text   Cite this paper   Add to My Lib

Abstract:

中文摘要: 针对释义识别任务如何学习上下文语义的问题,提出了利用词向量来表示句子语义距离的模型。首先,利用word2vec训练大规模的词向量模型,把词的语义信息利用向量分布式表示;然后通过欧氏距离来计算句子间词的移动开销;最后基于EMD模型实现了从词语义距离到句子语义距离的建模,通过采用句子变换矩阵来实现句子间语义距离的度量,进而从语义相似性方面进行句子释义识别。实验基于SemEval-2015 PIT任务,与作为实验基线的逻辑回归和加权矩阵因数分解方法进行比较,提出的模型采用有监督实验时, 值非常接近实验基线,而采用监督方法实验时, 值提高了5.8%。
Abstract:To learn the context semantic information of word for paraphrase identification, the model for representing sentence semantic distances based on word embeddings was proposed for paraphrase detection tasks. Firstly, a large-scale word vectors was trained with word2vec model, which embedded the semantic information in word distributional representation. Then, the travel cost between words in sentences computed with Euclidean distance in the word2vec embedding space. Finally, the model from word embeddings to sentence distances was built based on EMD, and sentence transportation matrix was presented for distance metric between sentences. The sentence semantic distances were used for paraphrase recognition. Experiments based on SemEval-2015 PIT Task showed that the proposed model approximates to the baseline in supervised method and gives an improvement of 5.8% in unsupervised methods, compared to the weighted matrix factorization.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133