We propose a new method for smallRNAs (sRNAs) identification. First we build an effective target genome (ETG) by means of a strand-specific procedure. Then we propose a new bioinformatic pipeline based mainly on the combination of two types of information: the first provides an expression map based on RNA-seq data (Reads Map) and the second applies principles of comparative genomics leading to a Conservation Map. By superimposing these two maps, a robust method for the search of sRNAs is obtained. We apply this methodology to investigate sRNAs in Mycobacterium tuberculosis H37Rv. This bioinformatic procedure leads to a total list of 1948 candidate sRNAs. The size of the candidate list is strictly related to the aim of the study and to the technology used during the verification process. We provide performance measures of the algorithm in identifying annotated sRNAs reported in three recent published studies.
References
[1]
Storz G, Haas D (2007) A guide to small RNAs in microorganisms. Current Opinion in Microbiology 10: 93–95.
[2]
Sharp PA (2009) The centrality of RNA. Cell 136: 577–580.
[3]
Liu JM, Camilli A (2010) A broadening world of bacterial small RNAs. Curr Opin Microbiol 13: 18–23.
[4]
Levine E, Hwa T (2008) Small RNAs establish gene expression thresholds. Curr Opin Microbiol 11: 574–579.
[5]
Waters LS, Storz G (2009) Regulatory RNAs in bacteria. Cell 136: 615–628.
[6]
Gripenland J, Netterling S, Loh E, Tiensuu T, Toledo-Arana A, et al. (2010) RNAs: regulators of bacterial virulence. Nature reviews Microbiology 8: 857–866.
[7]
Vogel J (2009) A rough guide to the non-coding RNA world of Salmonella. Mol Microbiol 71: 1–11.
[8]
Akama T, Suzuki K, Tanigawa K, Kawashima A, Wu H, et al. (2009) Whole-genome tiling array analysis of Mycobacterium leprae RNA reveals high expression of pseudogenes and noncoding regions. J Bacteriol 191: 3321–3327.
[9]
Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, et al. (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393: 537–544.
[10]
Arnvig KB, Young DB (2009) Identification of small RNAs in Mycobacterium tuberculosis. Mol Microbiol 73: 397–408.
[11]
DiChiara JM, Contreras-Martinez LM, Livny J, Smith D, McDonough KA, et al. (2010) Multiple small RNAs identified in Mycobacterium bovis BCG are also expressed in Mycobacterium tuberculosis and Mycobacterium smegmatis. Nucleic Acids Res 38: 4067–4078.
[12]
Livny J, Teonadi H, Livny M, Waldor MK (2008) High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs. PLoS One 2008 Sep 12;3(9): e3197.
[13]
Sharma CM, Vogel J (2009) Experimental approaches for the discovery and characterization of regulatory small RNA. Curr Opin Microbiol 12: 536–546.
[14]
Rivas E, Eddy SR (2001) Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2: 8.
[15]
Pichon C, Felden B (2003) Intergenic sequence inspector: searching and identifying bacterial RNAs. Bioinformatics 19: 1707–1709.
[16]
Argaman L, Hershberg R, Vogel J, Bejerano G, Wagner EG, et al. (2001) Novel small RNA-encoding genes in the intergenic regions of Escherichia coli. Current biology : CB 11: 941–950.
[17]
Wassarman KM, Repoila F, Rosenow C, Storz G, Gottesman S (2001) Identification of novel small RNAs using comparative genomics and microarrays. Genes & development 15: 1637–1651.
[18]
Sridhar J, Sambaturu N, Sabarinathan R, Ou HY, Deng Z, et al. (2010) sRNAscanner: a computational tool for intergenic small RNA detection in bacterial genomes. PLoS One 5: e11970.
[19]
Vogel J, Bartels V, Tang TH, Churakov G, Slagter-Jager JG, et al. (2003) RNomics in Escherichia coli detects new sRNA species and indicates parallel transcriptional output in bacteria. Nucleic Acids Research 31: 6435–6443.
[20]
Kawano M, Reynolds AA, Miranda-Rios J, Storz G (2005) Detection of 5′- and 3′-UTR-derived small RNAs and cis-encoded antisense RNAs in Escherichia coli. Nucleic Acids Res 33: 1040–1050.
[21]
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10: 57–63.
[22]
Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, et al. (2009) A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet 5: e1000569.
[23]
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, et al. (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome research 12: 1611–1618.
[24]
Li R, Li Y, Kristiansen K, Wang J (2008) SOAP: short oligonucleotide alignment program. Bioinformatics 24: 713–714.
[25]
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25.
[26]
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760.
[27]
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.
[28]
Qi J, Wang B, Hao BI (2004) Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J Mol Evol 58: 1–11.
[29]
Kircher M, Kelso J (2010) High-throughput DNA sequencing–concepts and limitations. BioEssays: news and reviews in molecular, cellular and developmental biology 32: 524–536.
[30]
Chaisson M, Pevzner P, Tang H (2004) Fragment assembly with short reads. Bioinformatics 20: 2067–2074.
[31]
Whiteford N, Haslam N, Weber G, Prugel-Bennett A, Essex JW, et al. (2005) An analysis of the feasibility of short read sequencing. Nucleic Acids Res 33: e171.
[32]
Bonnet E, Wuyts J, Rouze P, Van de Peer Y (2004) Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics 20: 2911–2917.
[33]
Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, et al. (2009) Rfam: updates to the RNA families database. Nucleic Acids Res 37: D136–140.
[34]
Nawrocki EP, Kolbe DL, Eddy SR (2009) Infernal 1.0: Inference of RNA alignments. Bioinformatics 25: 1335–1337.
[35]
Busch A, Richter AS, Backofen R (2008) IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions. Bioinformatics 24: 2849–2856.
[36]
Cao Y, Zhao Y, Cha L, Ying X, Wang L, et al. (2009) sRNATarget: a web server for prediction of bacterial sRNA targets. Bioinformation 3: 364–366.
[37]
Tjaden B (2008) TargetRNA: a tool for predicting targets of small RNA action in bacteria. Nucleic Acids Res 36: W109–113.