|
BMC Bioinformatics 2012
Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significanceKeywords: Codon deviation coefficient, CDC, Codon usage bias, CUB, Statistical significance, Background nucleotide composition, GC content, Purine content, Bootstrapping Abstract: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.Codon usage bias or CUB, a phenomenon in which synonymous codons (that encode the same amino acid) are used at different frequencies, is generally believed to be a combined outcome of mutation pressure, natural selection, and genetic drift [1-5]. Within any given species, genes often exhibit variable degrees of CUBs. Moreover, CUB for an individual gene is related closely with gene expression for translational efficiency and/or accuracy [6-10]. Therefore, the ability to accurately quantify CUBs for protein-coding sequences is of fundamental importance in revealing the underlying mechanisms behind codon usage and understanding gene evolution and function in general.Over the past few years, a number of measures have been proposed for the quantification of CUB [11-23], leading to investigations on the pattern of CUBs within and across species [24-30]. Since CUB is primarily shaped by selection and mutation [5], different measures are differentially informative with regard to differentiating causes. For instance, there are purely descriptive measures of CUB as caused by the joint
|