|
计算机科学 2005
How to Get the Main Part of Web Pages
|
Abstract:
A Web page can be divided into several parts, they are "the main part, the department logo, the navigation bar, the hyperlinks and the copyright". How to get the main part of Web pages. It's easy for humankind, but hard for computer pocessing. In this paper we tackle the problem by exploring a tag tree, which can suitably express the struc- ture and the layout of Web pages. Here we propose a method to build the tag tree, in addition to develop a single path tag tree named tag tree model, which only describe the main part of Web pages.