%0 Journal Article %T Detection and elimination of similar Web pages based on text structure
基于网页文本结构的网页去重 %A WEI Li-xia %A ZHENG Jia-heng %A
魏丽霞 %A 郑家恒 %J 计算机应用 %D 2007 %I %X Similar Web pages that search engine returns not only waste storage resources but also increase the burden on Web users. A dynamic method to detect similar Web pages was proposed. By this method, Texts of Web pages were expressed in the style of catalogue structure trees according to the features of similar Web pages and the features of Web pages themselves. A dynamic algorithm to extract features of texts and a layer fingerprint algorithm to calculate similar degree were implemented. The experimental results show that completely similar Web pages are detected accurately, and partly similar Web pages are detected exactly. %K layer fingerprint %K text structure %K detection and elimination of similar Web pages
层次指纹 %K 文本结构 %K 网页去重 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=831E194C147C78FAAFCC50BC7ADD1732&aid=AF9ECC763BB45A15646EAE47217A8BAB&yid=A732AF04DDA03BB3&vid=DB817633AA4F79B9&iid=708DD6B15D2464E8&sid=004149C4743B9A91&eid=D5D9606B73A02185&journal_id=1001-9081&journal_name=计算机应用&referenced_num=1&reference_num=7