|
BMC Bioinformatics 2005
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machineAbstract: A set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information.The local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.MicroRNAs (miRNA) are non-coding RNAs about 21–26 nucleotide (nt) in length that can play important roles in gene regulation by targeting mRNAs for cleavage or translational repression [1,2]. According to the current understanding, miRNA is transcribed as long primary miRNA, which is processed into 60~70 nt miRNA precursor (pre-miRNA) by nuclear RNase III Drosha [3,4]. The pre-miRNA is transported from nuclear to cytoplasm by Exportin-5 [5,6] and then cleaved into ~22 nt duplexes [2]. Almost all pre-miRNAs have the characteristic of stem-loop hairpin structures. During the biogenesis procedure of a mature miRNA, the hairpin structure of pre-miRNA acts as not only the structure motif for Exportin-5 in nuclear-cytoplasm transportation, but also a substrate for Dicer enzyme [5-7]. This indicates the importance of the secondary structures in the miRNA biogenesis procedure.Due to the difficulty of systematically detecting miRNAs from a genome by existing experiment techniques, computational methods play important roles in the identification of new miRNAs. As a characteristic secondary structure, the hairpin of pre-miRNA is an important feature used in the computational identification of miRNAs. For examples, MiRscan relies on the observation that the kn
|