%0 Journal Article
%T High Dimensional Hybrid Index Based on Query Sampling
基于查询采样的高维数据混合索引
%A ZHANG Jun-Qi
%A ZHOU Xiang-Dong
%A SHI Bai-Le
%A
张军旗
%A 周向东
%A 施伯乐
%J 软件学报
%D 2008
%I
%X In order to improve the query answering of high-dimensional database,data distribution is necessary to select appropriate indexing strategy.However,traditional data distribution models can not estimate the accurate data distribution in the complex real multimedia data of image and video.This paper presents a method to estimate the accurate data distribution based on query sampling,and proposes a novel hybrid index to speed up processing of high-dimensional K-nearest neighbor (KNN) queries.The proposed hybrid index improves the query efficiency by adaptively selecting different index strategies for the data with different distribution.In the first step,the cluster analysis and cluster splitting methods are applied to construct a tree-based index,and then the relationship between data distribution and index performance is derived by sampling.At last some tree branches with sparse data are extracted for linear scan,while the aggregate data remains in the tree.Extensive experiments on four real image data sets show that the proposed hybrid index structure performs better than iDistance,M-Tree and linear scan,and scales better with dimensions.The index is still faster than linear scan when the dimension reaches 336.The experiments also show that the proposed query sampling algorithm can obtain the accurate data distribution when the amount of sampling is below N~(1/2)(N is the size of data set).
%K nearest neighbor query
%K high dimensional index
%K marginal data
%K cluster partitioning
最近邻查询
%K 采样
%K 高维索引
%K 边缘数据
%K 聚类分解
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=B8D95153CB5FD35BECE9903EB3B1FBA1&yid=67289AFF6305E306&vid=2A8D03AD8076A2E3&iid=5D311CA918CA9A03&sid=84B0BD1ACE2C0D86&eid=358F2B73B31EF50A&journal_id=1000-9825&journal_name=软件学报&referenced_num=0&reference_num=19