|
计算机应用研究 2010
GNP-based scheduling strategy for distributed crawling
|
Abstract:
In order to solve task scheduling and load balancing problems of distributed search engines,this paper proposed a GNP-based scheduling strategy for distributed crawling and a load balancing method.Adopted internet distance estimating mechanism as a replacement for large-scale network distance measurement,which not only improved response time of the system,but also reduced WAN pressure caused by the system.Through deploying crawling nodes at WANs,built a distributed search engine,and implemented several sche...