|
BMC Bioinformatics 2007
Protein structural similarity search by Ramachandran codesAbstract: We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms.As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.The number of proteins found in structural databases has increased at such an unprecedented rate in recent years that achieving speed and accuracy simultaneously in protein structure similarity searches has become a formidable task. During evolution, three-dimensional (3D) structures are more conserved than amino acid sequences [1], and protein homologs that share highly conserved 3D structures may have unrecognizable sequence homology [2]. Amino acid sequence search tools are fast; however, they have proven to be insufficient for detection of remote homology in structural databases [3]. Structure alignment using delicate geometric algorithms is much more accurate than amino acid sequence compar
|