全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Study on general extracting method of Web topic text
一种Web主题文本通用提取方法

Keywords: Web text,text extracting,text corpus
Web文本
,文本提取,文本语料库,主题,文本长度,文本提取,方法,text,topic,method,of,general,快速性,结果,实验,通用性,标记分析,HTML,网页,判别规则,配合,符号序列,标点,利用

Full-Text   Cite this paper   Add to My Lib

Abstract:

A simple and efficient method of generally extracting Chinese topic text from Web pages was proposed in this paper in order to build a large Chinese text corpus. This method just utilizes length of Chinese texts and series of punctuations, along with a few rules of discrimination, to extract needed text from Web pages accurately without analyzing HTML tags. The experiment shows the extraction is so fast and accurate that it can achieve the requirement of constructing a large Chinese text corpus.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133