OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Journal of Software 2010

A Novel Method for Extracting Information from Web Pages with Multiple Presentation Templates

DOI: 10.4304/jsw.5.5.506-513

Qingzhong Li,Yanhui Ding,An Feng,Yongquan Dong

Keywords: Information Extraction , Multiple Presentation Templates , Path Entropy , Presentation Regularity , Ontology

Full-Text Cite this paper Add to My Lib

Abstract:

Web information extraction is the key part of web data integration. With the need of e-commerce website and the development of web design, web pages with multiple presentation templates arise. The current web information extraction systems are usually based on single presentation template, so web pages with multiple presentation templates can’t be extracted efficiently. This paper focuses on the extraction problem about web pages with multiple presentation templates. Four different kinds of this problem have been considered, and a novel method based on path entropy, presentation regularity and ontology knowledge is presented. The experiment indicates that this method is very promising and it achieves excellent recall and precision.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133