全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

MatchLink:一种主题爬行方法

Keywords: 主题爬行器,文档向量模型,朴素贝叶斯

Full-Text   Cite this paper   Add to My Lib

Abstract:

为了在浩如烟海的Web信息中更快地找到用户关心的信息,提出了一种主题爬行方法——MatchLink,它通过文档向量模型来评估网页链接的主题相关度,通过朴素贝叶斯算法和多层分类的方法计算链接所在网页的主题相关度,并根据这2个相关度优先下载主题相关的页面,实验表明其结果好于BestFirst和BreadthFirst。

References

[1]  PORTER M F.An algorithm for suffix stripping[J].Program,1980,14(3):130-137.
[2]  ALTINGOVDE I S,ULUSOY O.Exploiting interclass rules for focused crawling[J].IEEE Intelligent Systems Archive, 2004,19(6):66-73.
[3]  CHO J,GARCIA-MOLINA H,PAGE L.Efficient crawling through URL ordering[J].Computer Networks,1998,30(1- 7):161-172.
[4]  MOCALLUM A,NIGAM K,RENNIE J,et al.A machine learning approach to building domain-specific search engines[C/ OL]//The 6th International Joint Conference on Artificial Intelligence.[S.I.]:[s.n.].1999[2006-05-01].http:// www.kamalnigam,com/papers/cora-ijcai99,pdf.
[5]  庞剑锋,卜东波,白硕.基于向量空间模型的文本自动分类系统的研究与实现[J].北方交通大学学报,2003,27(2):38—41.PANG Jian-feng,PU Dong-bo,BAI Shuo.Research and implementation of text categorization system based on VSM[J]. Journal of Beijing Jiaotong University,2003,27(2):38-41.(in Chinese)
[6]  CRAVEN M,DIPASQUO D,FREITAG D,et al.Learning to construct knowledge bases from the world wide web[J].Ar- tificial Intelligence,2000,118:69-113.
[7]  CHAKRABARTI S,VAN DEN BERG M,DOM B.Focused crawling:a new approach to topic-specific web resource dis- covery[J].Computer Networks,1999,31(11-16):1623-1640.
[8]  DILIGENTI M,COETZEE F M,LAWRENCE S,et al.Focused crawling using context graphs[C/OL]//The 26th Inter- national Conference on Very Large Databases.[S.1.]:[s.n.],2000,[2006-05-05].http://clgiles,ist.psu.edu/papers/ VLDB-2000-focused-crawling,pdf

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133