|
计算机应用研究 2012
Method of generalized sitemaps with emergency hot degree
|
Abstract:
In order to get the data about the emergency Web public opinion from the target sites in time, this paper proposed a method to generalize the sitemap based on the emergency hot degree. Utilizing emergency topic dictionary and advanced Shark search algorithm, the method collected enough sample Web pages to produce a sitemap containing the emergency hot degree of every board that was related to emergency in the target Web site. Under the guidance of the sitemaps generalized by this method, the Web clawer was intelligent enough to adapt well to the dynamical changes of the target sites, collected the needed Web page precisely and adjusted its update frequency when necessary. Experiments show that the Web clawer produces an outstanding performance both in the effectiveness and the efficiency with the help of the sitemap.