The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets. 1. Introduction Motivated by the fundamental role of RNAs, and especially of small noncoding RNAs, several methods for high-throughput generation of noncoding RNA candidates have been developed recently [1–3]. A fundamental problem is then to infer functional annotation for such putative RNA genes [4, 5] which often involves RNA structure comparisons. Most approaches to compare RNA structures focus on the secondary structure, an intermediate level between the sequence and the full three-dimensional structure, which is both tractable from a computational point of view and relevant from a functional genomics point of view. The problem we consider here is the following: given a new RNA secondary structure (the query) and a database of known and annotated RNA secondary structures which of these known structures display most structural features similar to the query? Databases such as RFAM [6] or RNA STRAND [7] come naturally to mind, but in-house collections of RNA structures resulting from high-throughput experiments can also be considered. Fundamentally, mining a database of RNA secondary structures naturally reduces to pairwise comparisons between the query and the (or a subset of the) structures recorded in the database. The pairwise comparison of RNA secondary structures is a long-standing problem in computational biology, that is still being investigated, as shown by several recent papers, based on different RNA structure representations and computational principles (e.g., [8–12]). We present here BRASERO, a website that contains several benchmark data sets and automatic software tools to compare the performances of RNA secondary structure comparison methods. The software tools available on BRASERO are flexible and can be used with alternative benchmarks data sets, for example designed by a user with some specific application in mind, with the purpose to assess which models/software tools/parameters are relevant for their own specific application. We describe below the main features
References
[1]
E. Zhu, F. Zhao, G. Xu et al., “MirTools: microRNA profiling and discovery based on high-throughput sequencing,” Nucleic Acids Research, vol. 38, no. 2, Article ID gkq393, pp. W392–W397, 2010.
[2]
C. M. Sharma, S. Hoffmann, F. Darfeuille et al., “The primary transcriptome of the major human pathogen Helicobacter pylori,” Nature, vol. 464, no. 7286, pp. 250–255, 2010.
[3]
I. Irnov, C. M. Sharma, J. Vogel, and W. C. Winkler, “Identification of regulatory RNAs in Bacillus subtilis,” Nucleic Acids Research, vol. 38, no. 19, Article ID gkq454, pp. 6637–6651, 2010.
[4]
L. Childs, Z. Nikoloski, P. May, and D. Walther, “Identification and classification of ncRNA molecules using graph properties,” Nucleic Acids Research, vol. 37, no. 9, article e66, 2009.
[5]
P. Menzel, J. Gorodkin, and P. F. Stadler, “The tedious task of finding homologous noncoding RNA genes,” RNA, vol. 15, no. 12, pp. 2075–2082, 2009.
[6]
P. P. Gardner, J. Daub, J. Tate et al., “Rfam: wikipedia, clans and the “decimal” release,” Nucleic Acids Research, vol. 39, supplement 1, pp. D141–D145, 2011.
[7]
M. Andronescu, V. Bereg, H. H. Hoos, and A. Condon, “RNA STRAND: the RNA secondary structure and statistical analysis database,” BMC Bioinformatics, vol. 9, article 340, 2008.
[8]
J. Allali and M. F. Sagot, “A multiple layer model to compare RNA secondary structures,” Software—Practice and Experience, vol. 38, no. 8, pp. 775–792, 2008.
[9]
G. Blin, A. Denise, S. Dulucq, C. Herrbach, and H. Touzet, “Alignments of RNA structures,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 309–322, 2010.
[10]
M. H?chsmann, T. T?ller, R. Giegerich, and S. Kurtz, “Local similarity in RNA secondary structures.,” Proceedings/IEEE Computer Society Bioinformatics Conference, vol. 2, pp. 159–168, 2003.
[11]
V. Guignon, C. Chauve, and S. Hamel, “RNA StrAT: RNA Structure Analysis Toolkit,” in 16th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2008), p. D31, 2008.
[12]
A. Ouangraoua, P. Ferraro, L. Tichit, and S. Dulucq, “Local similarity between quotiented ordered trees,” Journal of Discrete Algorithms, vol. 5, no. 1, pp. 23–35, 2007.
[13]
N. R. Markham and M. Zuker, “DINAMelt web server for nucleic acid melting prediction,” Nucleic Acids Research, vol. 33, no. 2, pp. W577–W581, 2005.
[14]
S. Janssen and R. Giegerich, “Faster computation of exact RNA shape probabilities,” Bioinformatics, vol. 26, no. 5, Article ID btq014, pp. 632–639, 2010.
[15]
I. L. Hofacker, “Vienna RNA secondary structure server,” Nucleic Acids Research, vol. 31, no. 13, pp. 3429–3431, 2003.
[16]
D. L. Wheeler, T. Barrett, D. A. Benson et al., “Database resources of the National Center for Biotechnology Information,” Nucleic Acids Research, vol. 35, no. 1, pp. D5–D12, 2007.
[17]
E. A. Feingold, P. J. Good, M. S. Guyer et al., “The ENCODE (ENCyclopedia of DNA Elements) Project,” Science, vol. 306, no. 5696, pp. 636–640, 2004.
[18]
Y. Ponty, M. Termier, and A. Denise, “GenRGenS: software for generating random genomic sequences and structures,” Bioinformatics, vol. 22, no. 12, pp. 1534–1535, 2006.
[19]
B. A. Shapiro and K. Zhang, “Comparing multiple RNA secondary structures using tree comparisons,” Computer Applications in the Biosciences, vol. 6, no. 4, pp. 309–318, 1990.
[20]
C. Herrbach, “Etude algorithmique et statistique de la comparaison des structures secondaires d’ARN,” , Ph.D. thesis, Université Bordeaux 1, 2007.
[21]
K. Zhang and D. Shasha, “Simple fast algorithms for the editing distance between trees and related problems,” SIAM Journal on Computing, vol. 18, no. 6, pp. 1245–1262, 1989.
[22]
T. Jiang, L. Wang, and K. Zhang, “Alignment of trees—an alternative to tree edit,” Theoretical Computer Science, vol. 143, no. 1, pp. 137–148, 1995.
[23]
T. Jiang, G. Lin, B. Ma, and K. Zhang, “A general edit distance between RNA structures,” Journal of Computational Biology, vol. 9, no. 2, pp. 371–388, 2002.
[24]
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403–410, 1990.