In order to be able to capture effects from co-transcriptional folding, we extend stochastic context-free grammars such that the probability of applying a rule can depend on the length of the subword that is eventually generated from the symbols introduced by the rule, and we show that existing algorithms for training and for determining the most probable parse tree can easily be adapted to the extended model without losses in performance. Furthermore, we show that the extended model is suited to improve the quality of predictions of RNA secondary structures. The extended model may also be applied to other fields where stochastic context-free grammars are used like natural language processing. Additionally some interesting questions in the field of formal languages arise from it.
References
[1]
Nussinov, R.; Pieczenik, G.; R'Griggs, J.; Kleitmann, D.J. Algorithms for loop matchings. SIAM J. Appl. Math. 1978, 35, 68–82.
[2]
Zuker, M.; Stiegler, P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981, 9, 133–148.
[3]
Knudsen, B.; Hein, J. RNA secondary structure prediction using stochastic context-free grammars and evolutionary history. Bioinformatics 1999, 15, 446–454.
[4]
Andersen, E.S. Prediction and design of DNA and RNA structures. New Biotechnol. 2010, 27, 184–193.
[5]
Xayaphoummine, A.; Bucher, T.; Isambert, H. Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Res. 2005, 33, W605–W610.
[6]
Boyle, J.; Robillard, G.T.; Kim, S. Sequential Folding of Transfer RNA. A nuclear magnetic resonance study of successively longer tRNA fragments with a common 5′ end. J. Mol. Biol. 1980, 139, 601–625.
[7]
Meyer, I.; Miklos, I. Co-transcriptional folding is encoded within RNA genes. BMC Mol. Biol. 2004, 5, 10.
[8]
Harrison, M.A. Introduction to Formal Language Theory; Addison-Wesley: Boston, MA, USA, 1978.
[9]
Durbin, R.; Eddy, S.R.; Krogh, A.; Mitchison, G. Biological Sequence Analysis; Cambridge University Press: Cambridge, UK, 1998.
[10]
Stolcke, A. An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Comput. Linguist. 1995, 21, 165–201.
[11]
Prescher, D. A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars. Available online: http://arxiv.org/pdf/cs/0412015 (accessed on 27 July 2011).
[12]
Chi, T.; Geman, S. Estimation of Probabilistic Context-Free Grammars. Comput. Linguist. 1998, 24, 299–305.
[13]
Dowell, R.D.; Eddy, S.R. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinforma. 2004, 5, 71.
[14]
Sprinzl, M.; Vassilenko, K.S.; Emmerich, J.; Bauer, F. Compilation of tRNA sequences and sequences of tRNA genes. Available online: http://www.uni-bayreuth.de/departments/biochemie/trna/ (accessed on 21 October 2011).
[15]
Nebel, M.E. Identifying Good Predictions of RNA Secondary Structure. Proceedings of the Pacific Symposium on Biocomputing, Big Island, HI, USA, 6–10 January 2004.
[16]
Wild, S. An Earley-style Parser for Solving the RNA-RNA Interaction Problem. B.Sc. Thesis, Kaiserslautern, Germany, 2010.
[17]
Weinberg, F. Position-and-Length-Dependent Context-free Grammars. In Theorietag Automaten und Formale Sprachen 2009; Joran, M., Ludwig, S., Renate, W., Eds.; University Halle-Wittenberg: Halle, Germany, 2009.