%0 Journal Article %T Focused crawling method based on improved C4.5 exploiting anchor text
一种基于锚文本和改进C4.5决策树算法的主题爬行方法 %A LIU Jin-hong %A LU Yu-liang %A
刘金红 %A 陆余良 %J 计算机应用 %D 2006 %I %X A new focused crawling method based on anchor text and improved C4.5 decision tree algorithm was proposed. It exploited the anchor text of URL to train the decision tree, and then applied the decision tree model to decide whether a downloaded page was on topic and how to choose the next URL to visit. Finally, a prototype system named DTFC based on this method was implemented, and experiments in four university websites were carried out in allusion to "academic report". The experimental results show that DTFC outperforms two standard crawlers for focused crawling. %K focused crawler %K anchor text %K decision tree
主题网络爬虫 %K 锚文本 %K 决策树 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=831E194C147C78FAAFCC50BC7ADD1732&aid=FA918A0F245A3590&yid=37904DC365DD7266&vid=96C778EE049EE47D&iid=59906B3B2830C2C5&sid=30237B193729CC5C&eid=A2A361E8179A54A7&journal_id=1001-9081&journal_name=计算机应用&referenced_num=0&reference_num=9