OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

计算机应用研究 2009

Content extraction of Web pages based on characteristic symbols
一种基于特征符号的网页主题信息抽取方法

WANG Shu,ZHU Min,ZHANG Ming,NIU Hao,ZHAO Yu,
王舒,朱敏,张明,牛颢,赵瑜

Keywords: document tree model,characteristic symbols,relevance,content extraction
生成树模型,特征符号,相关度,主题提取

Full-Text Cite this paper Add to My Lib

Abstract:

With the popularity of the Internet, the large amounts of data on the Web provides many challenges for data mining techniques, especially for content extraction of Web pages. The existing methods can not guarantee the generality and effectiveness of Web mining approaches. By studying the internal structure of Web pages, this paper proposed an improved document tree model and discovered the general rules for analyzing it. In addition, extracted content from Web pages based on characteristic symbols. The experimental results prove that the proposed method is accurate as well as generic.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

Content extraction of Web pages based on characteristic symbols一种基于特征符号的网页主题信息抽取方法

Content extraction of Web pages based on characteristic symbols
一种基于特征符号的网页主题信息抽取方法