All Title Author
Keywords Abstract

SimulFold: Simultaneously Inferring RNA Structures Including Pseudoknots, Alignments, and Trees Using a Bayesian MCMC Framework

DOI: 10.1371/journal.pcbi.0030149

Full-Text   Cite this paper   Add to My Lib


Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.


[1]  Tinoco I Jr, Uhlenbeck OC, Levine MD (1971) Estimation of secondary structure in ribonucleic acids. Nature 230: 362–367.
[2]  Tinoco I Jr, Borer PN, Dengler B, Levine MD, Uhlenbeck OC, et al. (1973) Improved estimation of secondary structure in ribonucleic acids. Nature New Biol 246: 40–41.
[3]  Nussinov R, Jacobson A (1980) Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci U S A 77: 6309–6313.
[4]  Zuker M, Sankoff D (1984) RNA secondary structures and their prediction. Bull Math Biol 46: 591–621.
[5]  Mathews D, Sabina J, Zuker M, Turner D (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288: 911–940.
[6]  Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31: 3406–3415.
[7]  Hofacker IL, Fontana W, Stadler PF, Bonhoeffer S, Tacker M, et al. (1994) Fast folding and comparison of RNA secondary structures. Monatsh Chem 125: 167–188.
[8]  Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information. Nucleic Acids Res 9: 133–148.
[9]  Wuchty S, Fontana W, Hofacker I, Schuster P (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers 49: 145–165.
[10]  Hofacker I, Fekete M, Stadler P (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol 319: 1059–1066.
[11]  Hofacker IL (2003) Vienna RNA secondary structure server. Nucleic Acids Res 31: 3429–3431.
[12]  Morgan SR, Higgs PG (1996) Evidence for kinetic effects in the folding of large RNA molecules. J Chem Phys 105: 7152–7157.
[13]  Boyle J, Robillard G, Kim S (1980) Sequential folding of transfer RNA. A nuclear magnetic resonance study of successively longer tRNA fragments with a common 5′ end. J Mol Biol 139: 601–625.
[14]  Kramer F, Mills D (1981) Secondary structure formation during RNA-synthesis. Nucleic Acids Res 9: 5109–5124.
[15]  Meyer IM, Miklós I (2004) Co-transcriptional folding is encoded within RNA genes. BMC Mol Biol 5: 10.
[16]  Gultyaev A (1991) The computer-simulation of RNA folding involving pseudoknot formation. Nucleic Acids Res 19: 2489–2493.
[17]  Gultyaev A, von Batenburg F, Pleij C (1995) The computer-simulation of RNA folding pathways using a genetic algorithm. J Mol Biol 250: 37–51.
[18]  Isambert H, Siggia E (2000) Modeling RNA folding paths with pseudoknots: Application to hepatitis delta virus ribozyme. Proc Natl Acad Sci U S A 97: 6515–6520.
[19]  Xayaphoummine A, Bucher T, Thalmann F, Isambert H (2003) Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations. Proc Natl Acad Sci U S A 100: 15310–15315.
[20]  Staple DW, Butcher SE (2005) Pseudoknots: RNA structures with diverse functions. PLoS Biology 3: e213.. doi:10.1371/journal.pbio.0030213.
[21]  Lyngs? R, Pedersen C (2000) RNA pseudoknot prediction in energy based models. J Comp Biol 7: 409–428.
[22]  Lyngs? R (2004) Complexity of pseudoknot prediction in simple models. In: Diaz J, Karhum?ki J, Lepist? A, Sannella D, editors. Proceedings of the 31st International Colloquium on Automata, Languages, and Programming (ICALP). pp. 919–931.
[23]  Rivas E, Eddy SR (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. J Mol Biol 285: 2053–2068.
[24]  Rivas E, Eddy SR (2000) The language of RNA: A formal grammar that includes pseudoknots. Bioinformatics 16: 334–340.
[25]  Lyngso R, Pedersen C (2000) Pseudoknots in RNA secondary structures. In: Shamir R, Miyano S, Istrail S, Pevzner P, Waterman M, editors. Proceedings of the Fourth Annual International Conference on Computational Molecular Viology. New York: ACM Press. pp. 201–209.
[26]  Akutsu T (2000) Dynamic programming algorithms for RNA secondary prediction with pseudoknots. Discrete Appl Math 104: 45–62.
[27]  Dirks RM, Pierce NA (2003) A partition function algorithm for nucleic acid secondary structure including pseudoknots. J Comput Chem 24: 1664–1677.
[28]  Cai L, Malmberg R, Wu Y (2003) Stochastic modeling of RNA pseudoknotted structures: A grammatical approach. Bioinformatics 19: 66–73.
[29]  Deogun J, Donis E, Komina O, Ma F (2004) RNA secondary structure prediction with simple pseudoknots. In: Chen YP, editor. Proceedings of the Second Asia Pacific Bioinformatics Conference. pp. 239–246.
[30]  Reeder J, Giegerich R (2004) Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics 5: 104.
[31]  Knudsen B, Hein J (1999) RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 15: 446–454.
[32]  Knudsen B, Hein J (2003) Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Res 31: 3423–3428.
[33]  Pedersen JS, Forsberg R, Meyer IM, Hein J (2004) An evolutionary model for protein-coding regions with conserved RNA structure. Mol Biol Evol 21: 1913–1922.
[34]  Pedersen JS, Meyer IM, Forsberg R, Simmonds P, Hein J (2004) A comparative method for finding and folding RNA secondary structures within protein-coding regions. Nucleic Acids Res 32: 4925–4936.
[35]  Gabow HN (1973) Implementation of algorithms for maximum matching on nonbipartite graphs. Stanford (California): Stanford University. 248 p. [dissertation].
[36]  Gabow HN (1976) An efficient implementation of Edmonds' algorithm for maximum matching on graphs. J ACM 23: 221–234.
[37]  Tabaska J, Cary R, Gabow H, Stormo G (1998) An RNA folding method capable of identifying pseudoknots and base triples. Bioinformatics 14: 691–699.
[38]  Witwer C (2003) Prediction of conserved and consensus RNA structures. Vienna: Universit?t Wien. 187 p. [dissertation].
[39]  Haslinger C, Stadler PF (1999) RNA structures with pseudo-knots: Graph-theoretical, combinatorical, and statistical properties. Bull Math Biol 61: 437–467.
[40]  Ruan J, Stormo G, Zhang W (2004) An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics 20: 58–66.
[41]  Mathews DH, Turner DH (2002) Dynalign: An algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 317: 191–203.
[42]  Mathews DH (2005) Predicting a set of minimal free energy RNA secondary structures common to two sequences. Bioinformatics 21: 2246–2253.
[43]  Havgaard JH, Lyngs? RB, Stormo GD, Gorodkin J (2005) Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 21: 1815–1824.
[44]  Havgaard JH, Lyngs? RB, Gorodkin J (2005) The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search. Nucleic Acids Res 33: W650–W653.
[45]  Perriquet O, Touzet H, Dauchet M (2003) Finding the common structure shared by two homologous RNAs. Bioinformatics 19: 108–116.
[46]  Touzet H, Perriquet O (2004) CARNAC: Folding families of related RNAs. Nucleic Acids Res 32: W142–W145.
[47]  Ji Y, Xu X, Stormo G (2004) A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics 20: 1591–1602.
[48]  Holmes I (2005) Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics 6: 73.
[49]  Dowell RD, Eddy SR (2006) Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics 7: 400.
[50]  Sankoff D (1985) Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math 45: 810–825.
[51]  Eddy SR, Durbin R (1994) RNA sequence analysis using covariance models. Nucleic Acids Res 22: 2079–2088.
[52]  Sakakibara Y, Brown M, Underwood R, Mian IS, Haussler D (1994) Stochastic context-free grammars for modeling RNA. Proceedings of the 27th Hawaii International Conference on System Sciences. Honolulu: IEEE Computer Society Press. pp. 284–283.
[53]  Holmes I, Rubin G (2002) Pairwise RNA structure comparison with stochastic context-free grammars. Pac Symp Biocomput 2002: 163–174.
[54]  Holmes I (2004) A probabilistic model for the evolution of RNA structure. BMC Bioinformatics 5: 166.
[55]  Corpet F, Michot B (1994) RNAlign program: Alignment of RNA sequences using both primary and secondary structures. Comput Appl Biosci 10: 389–399.
[56]  Lanhof H, Reinert K, Vingron M (1998) A polyhedral approach to RNA sequence structural alignment. J Comp Biol 5: 517–530.
[57]  Felsenstein J (1981) Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol 17: 368–376.
[58]  L?ytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A 102: 10557–10562.
[59]  Kingman JFC (1982) The coalescent. Stoch Process Appl 13: 235–248.
[60]  Zuker M, Mathews DH, Turner DH (1999) Algorithms and thermodynamics for RNA secondary structure prediction: A practical guide. In: Barciszewski J, Clark BFC, editors. RNA biochemistry and biotechnology. Dordrecht (The Netherlands): Kluwer. pp. 11–43.
[61]  Chenna R, Sugawara H, Koike T, Lopez R, Gibson T, et al. (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31: 3497–3500.
[62]  Metropolis N, Rosenbluth AN, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculation by fast computing machines. J Chem Phys 21: 1087–1092.
[63]  Liu JS (2001) Monte Carlo strategies in scientific computing. New York: Springer. 343 p.
[64]  Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57: 97–109.
[65]  MacKay D (2003) Information theory, inference, and learning algorithms. Cambridge: Cambridge University Press. 628 p.
[66]  Miklos I, Ittzes P, Hein J (2005) ParIS Genome Rearrangement server. Bioinformatics 21: 817–820.
[67]  Lunter G, Miklós I, Drummond A, Jensen J, Hein J (2005) Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6: 83.
[68]  Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W (2002) Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161: 1307–1320.
[69]  Day W (1983) Properties of the nearest neighbor interchange metric for trees of small size. J Theor Biol 101: 275–288.
[70]  Vinh L, von Haeseler A (2004) Shortest triplet clustering: Reconstructing large phylogenies using representative sets. Mol Biol Evol 21: 1565–1571.
[71]  Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.
[72]  Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press. 356 p.
[73]  Fontana W, Konings DAM, Stadler PF, Schuster P (1993) Statistics of RNA secondary structures. Biopolymers 33: 1389–1404.
[74]  H?chsmann M, T?ller T, Giegerich R, Kurtz S (2003) Local similarity in RNA secondary structures. Proc IEEE Comput Soc Bioinform Conf 2003: 159–168.
[75]  H?chsmann M, Voss B, Giegerich R (2004) Pure multiple RNA secondary structure alignments: A progressive profile approach. IEEE/ACM Trans Comput Biol Bioinform 1: 53–62.
[76]  Rothberg E (1985) Solver-1 [computer program]. Available: Accessed 9 July 2007.
[77]  Condon A, Davy B, Rastegari B, Tarrant F, Zhao S (2004) Classifying RNA pseudoknotted structures. Theor Comput Sci 320: 35–50.
[78]  Witwer C, Hofacker IL, Stadler PF (2004) Prediction of consensus RNA structures including pseudoknots. IEEE/ACM Trans Comput Biol Bioinform 1: 66–77.
[79]  Gardner PP, Giegerich R (2004) A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 5: 140.
[80]  Gardner P, Wilm A, Washietl S (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res 33: 2433–2439.
[81]  Geyer CJ (1991) Markov chain Monte Carlo maximum likelihood. In: Keramigas E, editor. Computing science and statistics: Proceedings of the 23rd Symposium on the Interface. Fairfax (Virginia): Interface Foundation of North America. pp. 156–163.
[82]  Holland B, Moulton V (2003) Consensus networks: A method for visualising incompatibilities in collections of trees. In: Benson G, Page R, editors. Algorithms in Bioinformatics. Berlin: Springer. pp. 165–176.
[83]  Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23: 254–267.


comments powered by Disqus

Contact Us


微信:OALib Journal