全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

A General Approach to Extracting Topical Information in HTML Pages
一种通用HTML网页主题信息提取方法*

Keywords: DOM
信息提取
,分块,相关度

Full-Text   Cite this paper   Add to My Lib

Abstract:

By researching how to extract the topical contents in different kinds of templates of Web pages, this paper introduces a new extraction methodology based on DOM. The approach transforms HTML documents into DOM trees. According to the method, the topical contents are extracted and topic-unrelated content are deleted. The result of the approach represents the HTML document which only contains the topic information.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133