全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Focused crawling method based on improved C4.5 exploiting anchor text
一种基于锚文本和改进C4.5决策树算法的主题爬行方法

Keywords: focused crawler,anchor text,decision tree
主题网络爬虫
,锚文本,决策树

Full-Text   Cite this paper   Add to My Lib

Abstract:

A new focused crawling method based on anchor text and improved C4.5 decision tree algorithm was proposed. It exploited the anchor text of URL to train the decision tree, and then applied the decision tree model to decide whether a downloaded page was on topic and how to choose the next URL to visit. Finally, a prototype system named DTFC based on this method was implemented, and experiments in four university websites were carried out in allusion to "academic report". The experimental results show that DTFC outperforms two standard crawlers for focused crawling.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133