OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

现代图书情报技术 2007

A General Approach to Extracting Topical Information in HTML Pages
一种通用HTML网页主题信息提取方法*

Xu Wen,Du Yuncheng,Li Yuqin,Shi Shuicai,
许文,都云程,李渝勤,施水才

Keywords: DOM
信息提取,分块,相关度

Full-Text Cite this paper Add to My Lib

Abstract:

By researching how to extract the topical contents in different kinds of templates of Web pages, this paper introduces a new extraction methodology based on DOM. The approach transforms HTML documents into DOM trees. According to the method, the topical contents are extracted and topic-unrelated content are deleted. The result of the approach represents the HTML document which only contains the topic information.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

A General Approach to Extracting Topical Information in HTML Pages一种通用HTML网页主题信息提取方法*

A General Approach to Extracting Topical Information in HTML Pages
一种通用HTML网页主题信息提取方法*