%0 Journal Article
%T Study and design on Web spider in Internet forums
网络蜘蛛在网络论坛领域的研究与设计
%A TENG Zhao-sheng
%A HU De-min
%A
滕召生
%A 胡德敏
%J 计算机应用研究
%D 2011
%I
%X To improve the crawling efficiency when Web spider is crawling forums, from the layout and structure of forums, This paper analyzed the universal feature of all forums, and designed a targeting Web spider crawling strategy. The analysis of many forums proved that a majority of information was showed to the users by the pre-designed layout and structure which could be reflected by DOM tree. Through the operation to the tree, URL could be collected, and then the repeated URL be filtrated. Experiment results show that spider crawling strategy in this paper can increase the efficiency of the crawling of Web spiders and saves substantial network bandwidth and spaces of local-storage.
%K Web spider
%K DOM(document object model)tree
%K repetitive region
%K crawling strategies
%K repetitive template
网络蜘蛛
%K 文档对象模型树
%K 页面重复区域
%K 爬行策略
%K 重复模板
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=A9D9BE08CDC44144BE8B5685705D3AED&aid=AB0B3CA084CBA3FF15CFB63441FBE9CC&yid=9377ED8094509821&vid=D3E34374A0D77D7F&iid=0B39A22176CE99FB&sid=ABF2590617D31FFD&eid=8C267C8DC97FEEEF&journal_id=1001-3695&journal_name=计算机应用研究&referenced_num=0&reference_num=10