OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

计算机应用 2007

Study on general extracting method of Web topic text
一种Web主题文本通用提取方法

PU Qiang,LI Xin,LIU Qi-he,YANG Guo-wei,
蒲强,李鑫,刘启和,杨国纬

Keywords: Web text,text extracting,text corpus
Web文本,文本提取,文本语料库,主题,文本长度,文本提取,方法,text,topic,method,of,general,快速性,结果,实验,通用性,标记分析,HTML,网页,判别规则,配合,符号序列,标点,利用

Full-Text Cite this paper Add to My Lib

Abstract:

A simple and efficient method of generally extracting Chinese topic text from Web pages was proposed in this paper in order to build a large Chinese text corpus. This method just utilizes length of Chinese texts and series of punctuations, along with a few rules of discrimination, to extract needed text from Web pages accurately without analyzing HTML tags. The experiment shows the extraction is so fast and accurate that it can achieve the requirement of constructing a large Chinese text corpus.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

Study on general extracting method of Web topic text一种Web主题文本通用提取方法

Study on general extracting method of Web topic text
一种Web主题文本通用提取方法