|
BMC Bioinformatics 2009
Iterative refinement of structure-based sequence alignments by Seed ExtensionAbstract: RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs.RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs.In searching for protein functions and in building homology models, it is desirable to have accurate sequence motifs and profiles [1-3], which are obtained from sequence alignments of homologous proteins. However, it is often difficult to obtain accurate sequence alignments based on sequence similarity alone when sequence similarity is low.Therefore, structural alignments, when available, have been used to guide sequence alignments. Such structure-based sequence alignments have been used as the gold standard to evaluate pure sequence alignment methods [4,5] and to derive structural environment-specific substitution matrices which have been
|