|
计算机应用研究 2011
Study and design on Web spider in Internet forums
|
Abstract:
To improve the crawling efficiency when Web spider is crawling forums, from the layout and structure of forums, This paper analyzed the universal feature of all forums, and designed a targeting Web spider crawling strategy. The analysis of many forums proved that a majority of information was showed to the users by the pre-designed layout and structure which could be reflected by DOM tree. Through the operation to the tree, URL could be collected, and then the repeated URL be filtrated. Experiment results show that spider crawling strategy in this paper can increase the efficiency of the crawling of Web spiders and saves substantial network bandwidth and spaces of local-storage.