OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

International Journal of Engineering Sciences and Emerging Technologies 2013

HAULING TEMPLATES FROM WEB PAGES USING CLUSTERING TECHNIQUES

R.Manjula,A.Chilambuchelvan

Keywords: Document Object Model , Minimum Description Length , Template Extraction , VIPS.

Full-Text Cite this paper Add to My Lib

Abstract:

In today’s world, World Wide Web is the most popular information providers. A website is a collection of web pages and Web pages usually include information for the users. The web sites are designed with common templates and content. The template is used to access the content easily by consistent structures even the templates are not explicitly announced. The current Template extraction techniques are degrading the performance of web applications such as search engine due to irrelevant terms in templates.In this work, we present new method for extracting templates from a large number of web documents which are generated from heterogeneous templates. This paper cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133