全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Content extraction of Web pages based on characteristic symbols
一种基于特征符号的网页主题信息抽取方法

Keywords: document tree model,characteristic symbols,relevance,content extraction
生成树模型
,特征符号,相关度,主题提取

Full-Text   Cite this paper   Add to My Lib

Abstract:

With the popularity of the Internet, the large amounts of data on the Web provides many challenges for data mining techniques, especially for content extraction of Web pages. The existing methods can not guarantee the generality and effectiveness of Web mining approaches. By studying the internal structure of Web pages, this paper proposed an improved document tree model and discovered the general rules for analyzing it. In addition, extracted content from Web pages based on characteristic symbols. The experimental results prove that the proposed method is accurate as well as generic.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133