|
计算机应用研究 2009
Ajax crawling algorithm based on state transition graph
|
Abstract:
Traditional Web crawler could not meet the challenges of crawling Ajax application, such as JavaScript execution, state identification and navigation, duplicate states elimination etc.By exploring such challenges,this paper introduced state transition graph, based on which an algorithm was proposed to retrieve Ajax states and the background Deep Web. In order to uplift the accuracy,reduce the unnecessary states,improved the algorithm by Ajax fingerprinting and DOM filtering. The experimental results indicate the effectivity and efficiency of this algorithm.