|
- 2013
一种分布式网络爬虫的设计与实现
|
Abstract:
利用用户指定的关键字和搜索引擎生成URL种子,通过分布式网络爬虫抽取符合用户需求的网页作为研究所用的语料.实验结果表明:分布式网络爬虫可以较好地解决在短时间内抽取大量语料的需求.
User-specified keywords to generate URL seeds by search engine has been used.Webpage for user's requirements as research corpus through distributed web crawler has been extracted.Experiments show that the distributed web crawler can be good solution to extract a large number of corpora in a short time