OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

International Journal of Computer Trends and Technology 2012

Extracting Semi-Structured Information Based On Subtrees

B. Swapna kumari #1 , S.Rajesh #2

Keywords: Trees

Full-Text Cite this paper Add to My Lib

Abstract:

A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. It then extracts each record from the data region and identifies it whether it is a flat or nested records based on visual information – the area covered and the number of data items present in each record. The next step is data items extraction from these records and transferring them into the database.This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133