|
BMC Bioinformatics 2012
Comprehensive comparison of graph based multiple protein sequence alignment strategiesAbstract: Our results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal.This is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at http://ekhidna.biocenter.helsinki.fi/MMSA webcite and as a supplementary file attached to this article (see Additional file 1).A variety of methods used by modern molecular biology such as structural modeling, function annotation, phylogenetic analysis and similarity searches are based on multiple protein sequence alignments (MPSA). Correct MPSA arranges in one column position amino acids that share a common ancestor or are functionally/structurally equivalent. MPSA provides position-specific information on evolutionary conserved characters, correlation between characters and their distribution. These features can be used in many further applications for which the quality of MPSA is, therefore, crucial [1]. Traditionally, the quality of the alignment is evaluated using a scoring function based on gap penalties and a substitution matrix. When two sequences are aligned, an exact solution can be found using dynamic
|