全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees

DOI: 10.1155/2012/652979

Full-Text   Cite this paper   Add to My Lib

Abstract:

MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require “read count” to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA? duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation. 1. Introduction MicroRNAs (miRNAs) are nonprotein coding RNAs of between 20 and 22 nucleotides that attenuate protein production by cleavage, translational inhibition, or sequestering of mRNA in P bodies [1]. They are implicated in several different biological pathways, including plant and animal development, and cancer [2–4]. To better understand the role that miRNAs play in these pathways, large datasets containing RNA-seq, expressed sequence tags (ESTs), and genomic sequences are being investigated for new miRNAs [5, 6]. As these datasets grow in an ever increasing rate, their rapid analysis has become critical. Understanding miRNA biogenesis is important when developing predictive models. The mature miRNA originates from an expressed RNA precursor. The precursor folds back to base pair with itself to form a characteristic stem-loop structure. However, not all stem-loop structures are miRNA precursors. The dicer protein cuts a short, double-stranded RNA (miRNA:miRNA* duplex) from the precursor. This double-stranded RNA associates with the RISC complex, where the mature miRNA is retained while the miRNA* is assumed to degrade [7]. The miRNA-loaded RISC complex is responsible for

References

[1]  J. Liu, M. A. Valencia-Sanchez, G. J. Hannon, and R. Parker, “MicroRNA-dependent localization of targeted mRNAs to mammalian P-bodies,” Nature Cell Biology, vol. 7, no. 7, pp. 719–723, 2005.
[2]  T. D. Schmittgen, “Regulation of microRNA processing in development, differentiation and cancer,” Journal of Cellular and Molecular Medicine, vol. 12, no. 5B, pp. 1811–1819, 2008.
[3]  J. Lu, G. Getz, E. A. Miska et al., “MicroRNA expression profiles classify human cancers,” Nature, vol. 435, no. 7043, pp. 834–838, 2005.
[4]  L. Zhang, P. S. Sullivan, J. C. Goodman, P. H. Gunaratne, and D. Marchetti, “MicroRNA-1258 suppresses breast cancer brain metastasis by targeting heparanase,” Cancer Research, vol. 71, no. 3, pp. 645–654, 2011.
[5]  J. Hertel and P. F. Stadler, “Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data,” Bioinformatics, vol. 22, no. 14, pp. E197–E202, 2006.
[6]  J. Wen, B. J. Parker, and G. F. Weiller, “In silico identification and characterization of mRNA-Like noncoding transcripts in Medicago truncatula,” In Silico Biology, vol. 7, no. 4-5, pp. 485–505, 2007.
[7]  Z. S. Kai and A. E. Pasquinelli, “MicroRNA assassins: factors that regulate the disappearance of miRNAs,” Nature Structural & Molecular Biology, vol. 17, no. 1, pp. 5–10, 2010.
[8]  J. Wen, T. Frickey, and G. F. Weiller, “Computational prediction of candidate miRNAs and their targets from medicago truncatula non-protein-coding transcripts,” In Silico Biology, vol. 8, no. 3-4, pp. 291–306, 2008.
[9]  J. Wen, G. F. Weiller, and B. J. Parker, “Analysis of structural strand asymmetry in non-coding RNAs,” in Proceedings of the 6th Asia-Pacific Bioinformatics Conference (APBC '08), vol. 6 of Advances in Bioinformatics and Computational Biology, pp. 187–198, Imperial College Press, 2008.
[10]  E. Zhu, F. Zhao, G. Xu et al., “MirTools: microRNA profiling and discovery based on high-throughput sequencing,” Nucleic Acids Research, vol. 38, no. 2, Article ID gkq393, pp. W392–W397, 2010.
[11]  M. Hackenberg, M. Sturm, D. Langenberger, J. M. Falcón-Pérez, and A. M. Aransay, “miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments,” Nucleic Acids Research, vol. 37, no. 2, pp. W68–W76, 2009.
[12]  M. R. Friedl?nder, S. D. MacKowiak, N. Li, W. Chen, and N. Rajewsky, “MiRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades,” Nucleic Acids Research, vol. 40, no. 1, pp. 37–52, 2012.
[13]  P. Jiang, H. Wu, W. Wang, W. Ma, X. Sun, and Z. Lu, “MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features,” Nucleic Acids Research, vol. 35, pp. W339–344, 2007.
[14]  C. Xue, F. Li, T. He, G. P. Liu, Y. Li, and X. Zhang, “Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine,” BMC Bioinformatics, vol. 6, article 310, 2005.
[15]  B. Pant, K. Pant, and K. R. Pardasani, “Decision tree classifier for classification of plant and animal micro RNA's,” Communications in Computer and Information Science, vol. 51, pp. 443–451, 2009.
[16]  J. H. Teune and G. Steger, “NOVOMIR: de novo prediction of microRNA-coding regions in a single plant-genome,” Journal of Nucleic Acids, vol. 2010, Article ID 495904, 10 pages, 2010.
[17]  M. R. Friedl?nder, W. Chen, C. Adamidi et al., “Discovering microRNAs from deep sequencing data using miRDeep,” Nature Biotechnology, vol. 26, no. 4, pp. 407–415, 2008.
[18]  A. Kozomara and S. Griffiths-Jones, “MiRBase: integrating microRNA annotation and deep-sequencing data,” Nucleic Acids Research, vol. 39, no. 1, pp. D152–D157, 2011.
[19]  X. Chen, Q. Li, J. Wang et al., “Identification and characterization of novel amphioxus microRNAs by Solexa sequencing,” Genome Biology, vol. 10, no. 7, article R78, 2009.
[20]  E. Berezikov, E. Cuppen, and R. H. A. Plasterk, “Approaches to microRNA discovery,” Nature Genetics, vol. 38, supplement, pp. S2–S7, 2006.
[21]  K. L. Childs, J. P. Hamilton, W. Zhu et al., “The TIGR plant transcript assemblies database,” Nucleic Acids Research, vol. 35, no. 1, pp. D846–D851, 2007.
[22]  H. Li and N. Homer, “A survey of sequence alignment algorithms for next-generation sequencing,” Briefings in Bioinformatics, vol. 11, no. 5, pp. 473–483, 2010.
[23]  Q. Dong, C. J. Lawrence, S. D. Schlueter et al., “Comparative plant genomics resources at PlantGDB,” Plant Physiology, vol. 139, no. 2, pp. 610–618, 2005.
[24]  I. L. Hofacker, “RNA secondary structure analysis using the Vienna RNA package,” Current Protocols in Bioinformatics, Chapter 12:Unit 12.2, 2009.
[25]  S. Arlot and A. Celisse, “A survey of cross-validation procedures for model selection,” Statistics Surveys, vol. 4, pp. 40–79, 2010.
[26]  D. G. Altman and J. M. Bland, “Diagnostic tests 1: sensitivity and specificity,” BMJ, vol. 308, no. 6943, p. 1552, 1994.
[27]  P. Rice, L. Longden, and A. Bleasby, “EMBOSS: the European molecular biology open software suite,” Trends in Genetics, vol. 16, no. 6, pp. 276–277, 2000.
[28]  R. E. Schapire, “A brief introduction to boosting,” in Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI '99), vol. 1&2, pp. 1401–1406, 1999.
[29]  N. R. Markham and M. Zuker, “UNAFold: software for nucleic acid folding and hybridization,” Methods in Molecular Biology, vol. 453, pp. 3–31, 2008.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133