|
BMC Bioinformatics 2007
A computational survey of candidate exonic splicing enhancer motifs in the model plant Arabidopsis thalianaAbstract: We have developed a new computational technique to identify significantly conserved motifs involved in splice site regulation. First, 84 putative exonic splicing enhancer hexamers are identified in Arabidopsis thaliana. Then a Gibbs sampling program called ELPH was used to locate conserved motifs represented by these hexamers in exonic regions near splice sites in confirmed genes. Oligomers containing 35 of these motifs have been shown experimentally to induce significant inclusion of A. thaliana exons. Second, integration of our regulatory motifs into two different splice site recognition programs significantly improved the ability of the software to correctly predict splice sites in a large database of confirmed genes. We have released GeneSplicerESE, the improved splice site recognition code, as open source software.Our results show that the use of the ESE motifs consistently improves splice site prediction accuracy.Alternative splicing is an important regulatory mechanism for many species, allowing them to generate multiple variant proteins from the same primary transcript. In order to predict the complete protein complement of any eukaryote, we need to detect alternative splice sites and put them together in the correct combinations. Algorithmic approaches to splice site prediction have relied mainly on the consensus patterns found at the boundaries between protein coding and non-coding regions [1]. However the sequence conservation found at the splice site junctions is not strong enough to accurately differentiate between introns and exons [2]. Additional sequences, residing at variable distances from splice sites, have been shown to function as cis-acting factor binding sites that regulate splicing either in vivo or in vitro. Although such splicing regulators have been identified in both exons and introns, exonic splicing regulators (ESRs) are generally better characterized, and are probably more common [3,4]. Such ESRs either enhance or suppress the utilizat
|