Modeling compositional dynamics based on GC and purine contents of protein-coding sequences

DOI: 10.1186/1745-6150-5-63

To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft.Compositional biases in the contexts of nucleotides, codons, and amino acids are found among bacteria [1-4], fungi [5,6], insects [7-10], plants [11,12], and vertebrates [13,14], which presumably arise from unbalanced forces of mutation and selection and are maintained by the species in their populations [15-17]. For any individual gene, its compositional biases reflect the action of both mutation and selection, which is also linked to the abundance of iso-accepting transfer RNAs and the catalytic efficiencies of their synthetases, thereby translation efficiencies [2,6,18-22]. Therefore, composition analysis is of great significance in better understanding compositional dynamics in order to provide evidence for molecular evolution [23,24].Nucleotide compositions are highly variable among genomes, and the guanine-p


