全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于链接划分的分布式WEB信息检索*

, PP. 519-524

Keywords: 网页链接,聚类,分布式信息检索

Full-Text   Cite this paper   Add to My Lib

Abstract:

分布式信息检索是面向海量WEB信息检索的一种有效手段.本文采用一种基于链接的聚类方法(LIBCA)来对网页数据进行划分,并采用BloomFilter优化LIBCA算法的计算效率,在检索过程中采用CORI集合选择算法和OKAPIBM25检索算法.基于最近3年的TRECWEB实验数据集,对比集中式检索、基于随机划分的分布式检索,实验结果表明,本文方法在P@10的指标上可达到甚至超过集中式检索的效果.效率方面的实验表明利用BloomFilter的LIBCA算法具有较高的划分效率,适合海量数据的处理.

References

[1]  Callan J P, Lu Zhihong, Croft W B. Searching Distributed Collections with Inference Networks // Proc of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, USA, 1995: 2128
[2]  French J C, Powell A L, Viles C I, et al. Evaluating Database Selection Techniques: A Testbed and Experiment // Proc of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia, 1998: 121129
[3]  Xu Jinxi, Croft W B. ClusterBased Language Models for Distributed Retrieval // Proc of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, USA, 1999: 254261
[4]  Small H. CoCitation in the Scientific Literature: A New Measure of the Relationship between Two Documents. Journal of the American Society for Information Science, 1973, 24(4):265269
[5]  Kessler M M. Bibliographic Coupling between Scientific Papers. American Documentation, 1963, 14(1): 1025
[6]  Amsler R. Application of CitationBased Automatic Classification. Technical Report. Austin, USA: The University of Texas at Austin. Linguistics Research Center, 1972
[7]  Callan J. Distributed Information Retrieval // Croft W B, ed. Advances in Informational Retrieval. Dordrecht, Netherlands: Kluwer Academic Publishers, 2001: 127150
[8]  Robertson S E, Walker S, Jones S. Okapi at TREC3 // Proc of the 3rd Text Retrieval Conference. Washington, USA, 1994: 109126
[9]  Bloom B. Space/Time TradeOffs in Hash Coding with Allowable Errors. Communications of the ACM, 1970, 13(7): 422426

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133