|
BMC Bioinformatics 2006
Efficient pairwise RNA structure prediction and alignment using sequence alignment constraintsAbstract: We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment.Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses.RNA secondary structure can be predicted accurately from sequence data alone. For example, the predicted secondary structure of ribosomal RNA has been essentially confirmed by recent crystal structures; 97–98% of the predicted base pairs are confirmed by experimental structures [1]. The trouble is that rRNA predictions were refined by experts over twenty years, ultimately utilizing data from about 7000 small subunit rRNA sequences and 1050 large subunit rRNA sequences [1]. As there are many RNA structures of biological interest [2,3], it is important to find computational means of accelerating, automating, and improving RNA secondary structure prediction [4].There are two main sources of information for RNA secondary structure prediction. The most accurate means of prediction is comparative analysis [5-7], which uses evolutionary information. Homologous RNAs tend to conserve a common base-paired secondary structure. Important base pairing interactions are conserved by compensatory mutations and compensatory mutations induce detectable pairwise sequence correlations between posit
|