%0 Journal Article
%T Efficient Algorithm for Sequence Similarity Search Based on Reference Indexing
基于参考集索引的高效序列相似性查找算法
%A DAI Dong-Bo
%A XIONG Yun
%A ZHU Yang-Yong
%A
戴东波
%A 熊赟
%A 朱扬勇
%J 软件学报
%D 2010
%I
%X Sequence data are ubiquitous in many domains such as text, Web access log and biological database. Similarity search in this kind of data is very important for knowledge acquisition and analysis. An indexing technique based on reference is an effective method for sequence similarity search, the main idea of which is to assign some sequences in database as reference sets, then filter out those sequences unrelated to query sequence and finally get the answer efficiently. This paper presents a similarity search algorithm IRI (improved reference indexing) which is based on current indexing technique using reference set and is more powerful in terms of filtration. First, previous query results are used to accelerate the current query. Then, the upper bound and lower bound based on sequence characteristic are proposed to make the bound tighter and improve the filtration capability. Finally, to avoid the time-consuming edit distance computing, only partial edit distance between prefix sequences need to compute, which makes the algorithm run faster. Real data including DNA and protein sequence data are used in the experiment. Comprehensive experimental results show that IRI is more efficient than the current reference-based indexing algorithm RI (reference indexing).
%K sequence similarity search
%K reference indexing
%K edit distance
序列相似性查找
%K 参考集索引
%K 编辑距离
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=A3CBC1CBFAD8621A75BC96D0232FB570&yid=140ECF96957D60B2&vid=659D3B06EBF534A7&iid=E158A972A605785F&sid=5A027C8E6C570AAB&eid=507521DBC725630F&journal_id=1000-9825&journal_name=软件学报&referenced_num=0&reference_num=34