全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Study and design on Web spider in Internet forums
网络蜘蛛在网络论坛领域的研究与设计

Keywords: Web spider,DOM(document object model)tree,repetitive region,crawling strategies,repetitive template
网络蜘蛛
,文档对象模型树,页面重复区域,爬行策略,重复模板

Full-Text   Cite this paper   Add to My Lib

Abstract:

To improve the crawling efficiency when Web spider is crawling forums, from the layout and structure of forums, This paper analyzed the universal feature of all forums, and designed a targeting Web spider crawling strategy. The analysis of many forums proved that a majority of information was showed to the users by the pre-designed layout and structure which could be reflected by DOM tree. Through the operation to the tree, URL could be collected, and then the repeated URL be filtrated. Experiment results show that spider crawling strategy in this paper can increase the efficiency of the crawling of Web spiders and saves substantial network bandwidth and spaces of local-storage.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133