%0 Journal Article %T Document Clustering Algorithm Based on Dynamic Interval Mapping
基于动态区间映射的文档聚类算法 %A SUN Yong-lin %A LIU Zhong %A
孙永林 %A 刘仲 %J 计算机科学 %D 2010 %I %X Archival storage is becoming a research hotspot with information digitization accelerating,where space utihnation and scalability are very important Using content based chunking storage to achieve data deduplication is an effective way to improve storage space utilization, however, it is inefficiency to find shared chunks in all of the huge scale of archival data. We introduced the thought of dynamic interval mapping to information clustering, and presented the DC-DIM(Document Clustering algorithm based on Dynamic Interval Mapping).The algorithm uses chunking and feature extraction methods to generate the fcaturcset of document, and map it on interval links, then choose the document's storage container according to its feature-set's distribution on interval links.By this way, those documents with high similarity(shared a lot of contents) will be clustered, then, it will be very convenient to improve the space utilization and data management. %K Document clustering %K Archival storage %K Dynamic interval mapping %K Space utilization %K Scalability
文档聚类 %K 归档存储 %K 动态区间映射 %K 空间利用率 %K 扩展性 %U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=64A12D73428C8B8DBFB978D04DFEB3C1&aid=84AB8B1E4E97335C2222F15C069ECABE&yid=140ECF96957D60B2&vid=42425781F0B1C26E&iid=B31275AF3241DB2D&sid=EA389574707BDED3&eid=DB817633AA4F79B9&journal_id=1002-137X&journal_name=计算机科学&referenced_num=0&reference_num=0