全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2017 

面向文本分类的有监督显式语义表示
Supervised Explicit Semantic Representation for Text Categorization

DOI: 10.16337/j.1004-9037.2017.03.014

Keywords: 文本分类,文本表达,有监督显式语义表示
text categorization
, text representation, supervised explicit semantic representation

Full-Text   Cite this paper   Add to My Lib

Abstract:

文本表示作为文本分类的一个基本问题,一直广受关注。目前文本表示主要有词袋模型、隐式语义表达和基于知识库的显式语义表达3种方式。本文首先分析对比了这3种文本表示方式在文本分类中的效果。实验发现,基于知识库的显式语义表达并没有如预期一样提高文本分类的效果。经分析,其原因在于显式语义表达在扩展文档表达时易引入噪声。针对该问题,本文提出了一种有监督的显式语义表达方法。该方法利用数据集的标注信息识别文档中与分类最相关的核心概念,并扩展核心概念以形成文档显式语义表达。3个标准分类数据集上的结果证实了本文所提文本表示方法的有效性。
As a fundamental problem of text categorization, text representation is widely concerned. Currently, there are three main ways of text representation: bag-of-words model, latent semantic representation and knowledge-based explicit semantic representation. The paper analyzes and compared the effects of these methods applied to text categorization. Experiments show that the knowledge-based explicit semantic representation cannot improve the text categorization performance as expected. To tackle the problem that the knowledge-based explicit semantic representation easily introduces noise in extending text, a supervised explicit semantic representation method is proposed. The dataset label information is used to identify the most relevant concepts in document and the document is represented in explicit semantic based on expanding those key concepts. The results of three datasets confirm the effectiveness of the proposed method.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133