%0 Journal Article
%T Concept based algorithm of dealing near-replicas of documents on the Web
基于概念的网页相似度处理算法研究
%A GUO Chen-juan
%A LI Zhan-huai
%A
郭晨娟
%A 李战怀
%J 计算机应用
%D 2006
%I
%X To solve near-replicas of documents on the Web obtained by search engine, a similarity dealing algorithm was proposed. Based on concepts extracted from the Web pages and inverted file, the algorithm built a model which shrank the scale of the Web pages processed. The algorithm saved a great deal of temporal and spatial resources and provides a good foundation for near-replicas detection.
%K near-repllcas documents
%K concept extraction
%K cluster analysis
%K near-replicas detection
相似网页
%K 概念抽取
%K 聚类分析
%K 消重
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=831E194C147C78FAAFCC50BC7ADD1732&aid=D47EC63412AFA69E&yid=37904DC365DD7266&vid=96C778EE049EE47D&iid=59906B3B2830C2C5&sid=3094DBAA2D955205&eid=096594D9D9174975&journal_id=1001-9081&journal_name=计算机应用&referenced_num=3&reference_num=9