全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Optimal neighborhood indexing for protein similarity search

DOI: 10.1186/1471-2105-9-534

Full-Text   Cite this paper   Add to My Lib

Abstract:

The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum webcite.We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction.One fundamental task in bioinformatics concerns large scale comparisons between proteins or families of proteins. It often constitutes the first step before further investigations. A typical comparison, for example, is to query a database with a newly discovered sequence. Observed similarities witness a putative common biological function and direct further studies.In this paper, we focus on massive protein sequence comparisons: a large database is iteratively compared with relatively short queries (such as newly sequenced data). A possible approach is to use the exact dynamic programming method [1]. For a given similarity model, this method provides optimal alignments within a quadratic computation time. Some optimizations achieve a sub-quadratic complexity [2], but the computation time remains prohibitive for large scale comparisons. Thus

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133