|
BMC Bioinformatics 2007
Predictive modeling of plant messenger RNA polyadenylation sitesAbstract: Based on our current working model and the profile of nucleotide sequence distribution of the poly(A) signals and around poly(A) sites in Arabidopsis, we have devised a Generalized Hidden Markov Model based algorithm to predict potential poly(A) sites. The high specificity and sensitivity of the algorithm were demonstrated by testing several datasets, and at the best combinations, both reach 97%. The accuracy of the program, called poly(A) site sleuth or PASS, has been demonstrated by the prediction of many validated poly(A) sites. PASS also predicted the changes of poly(A) site efficiency in poly(A) signal mutants that were constructed and characterized by traditional genetic experiments. The efficacy of PASS was demonstrated by predicting poly(A) sites within long genomic sequences.Based on the features of plant poly(A) signals, a computational model was built to effectively predict the poly(A) sites in Arabidopsis genes. The algorithm will be useful in gene annotation because a poly(A) site signifies the end of the transcript. This algorithm can also be used to predict alternative poly(A) sites in known genes, and will be useful in the design of transgenes for crop genetic engineering by predicting and eliminating undesirable poly(A) sites.Eukaryotic messenger RNA (mRNA), after being transcribed from its coding gene, typically undergoes processing events, such as capping, splicing, and polyadenylation, before it is translocated to the cytoplasm and translated into proteins. While these three essential steps of processing are interrelated, each step is performed by a defined set of protein factors and uses specific signals encoded in the precursor mRNA (pre-mRNA) [1]. The polyadenylation signals for all eukaryotes seem to have three common parts: a cleavage site (CS), a near upstream element (called NUE in plants, equivalent to AAUAAA in animals) about 20–30 nucleotides (nt) upstream of the CS, and an element about 50 nt upstream of the CS (termed far upstream ele
|