OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

计算机应用 2006

Concept based algorithm of dealing near-replicas of documents on the Web
基于概念的网页相似度处理算法研究

GUO Chen-juan,LI Zhan-huai,
郭晨娟,李战怀

Keywords: near-repllcas documents,concept extraction,cluster analysis,near-replicas detection
相似网页,概念抽取,聚类分析,消重

Full-Text Cite this paper Add to My Lib

Abstract:

To solve near-replicas of documents on the Web obtained by search engine, a similarity dealing algorithm was proposed. Based on concepts extracted from the Web pages and inverted file, the algorithm built a model which shrank the scale of the Web pages processed. The algorithm saved a great deal of temporal and spatial resources and provides a good foundation for near-replicas detection.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

Concept based algorithm of dealing near-replicas of documents on the Web基于概念的网页相似度处理算法研究

Concept based algorithm of dealing near-replicas of documents on the Web
基于概念的网页相似度处理算法研究