全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Extracting Semi-Structured Information Based On Subtrees

Keywords: Trees

Full-Text   Cite this paper   Add to My Lib

Abstract:

A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. It then extracts each record from the data region and identifies it whether it is a flat or nested records based on visual information – the area covered and the number of data items present in each record. The next step is data items extraction from these records and transferring them into the database.This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133