全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
软件学报  2008 

High Dimensional Hybrid Index Based on Query Sampling
基于查询采样的高维数据混合索引

Keywords: nearest neighbor query,high dimensional index,marginal data,cluster partitioning
最近邻查询
,采样,高维索引,边缘数据,聚类分解

Full-Text   Cite this paper   Add to My Lib

Abstract:

In order to improve the query answering of high-dimensional database,data distribution is necessary to select appropriate indexing strategy.However,traditional data distribution models can not estimate the accurate data distribution in the complex real multimedia data of image and video.This paper presents a method to estimate the accurate data distribution based on query sampling,and proposes a novel hybrid index to speed up processing of high-dimensional K-nearest neighbor (KNN) queries.The proposed hybrid index improves the query efficiency by adaptively selecting different index strategies for the data with different distribution.In the first step,the cluster analysis and cluster splitting methods are applied to construct a tree-based index,and then the relationship between data distribution and index performance is derived by sampling.At last some tree branches with sparse data are extracted for linear scan,while the aggregate data remains in the tree.Extensive experiments on four real image data sets show that the proposed hybrid index structure performs better than iDistance,M-Tree and linear scan,and scales better with dimensions.The index is still faster than linear scan when the dimension reaches 336.The experiments also show that the proposed query sampling algorithm can obtain the accurate data distribution when the amount of sampling is below N~(1/2)(N is the size of data set).

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133