%0 Journal Article %T Search Strategy and Achieve of the Topic Search Engine Spider
主题搜索引擎网络爬虫搜索策略的研究与实现 %A LIU Shu-Mei %A XIA Liang %A XU Nan-Shan %A
刘淑梅 %A 夏亮 %A 许南山 %J 计算机系统应用 %D 2010 %I %X According to the characteristics of the cyber page structure, this paper proposes the theme which predicts the correlativity by delivering the theme among the pages, and solves the problems of channel jamming and capture omission. Firstly, a correlative information value is delivered according to the anchor text. If the information given by the anchor text is correlated, the correlative threshold will be delivered directly. Otherwise, it will be multiplied by the genetic ratio before delivery. In the process of the delivery, correlative information value may be reset to the initial value if it encounters the correlative Web page. At last, the recall ratio is proven to be greatly improved based on the experimental result. %K cyber worm %K search engine %K theme correlativity %K genetic algorithm %K crawl
网络爬虫 %K 搜索引擎 %K 主题相关 %K 遗传 %K 抓取 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=D4F6864C950C88FFCE5B6C948A639E39&aid=13185E6709AF7A10447977BE3764881F&yid=140ECF96957D60B2&vid=2A8D03AD8076A2E3&iid=38B194292C032A66&sid=2A3781E88AB1776F&eid=286FB2D22CF8D013&journal_id=1003-3254&journal_name=计算机系统应用&referenced_num=0&reference_num=6