|
计算机应用 2005
Auto-extraction methods of Web pagelet
|
Abstract:
Besides the needed data, there are lots of navigation information and advertisements in the Web pages. A DOM tree comparison algorithm was proposed. It compared several pages within a class, and recognized the main contents in pages. Experiment results show that it is feasible and effective.