|
BMC Bioinformatics 2005
Empirical codon substitution matrixAbstract: A set of 17,502 alignments of orthologous sequences from five vertebrate genomes yielded 8.3 million aligned codons from which the number of substitutions between codons were counted. From this data, both a probability matrix and a matrix of similarity scores were computed. They are 64 × 64 matrices describing the substitutions between all codons. Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 × 61 entries for the sense codons and 3 × 3 entries for the stop codons.The amount of genomic data currently available allowed for the construction of an empirical codon substitution matrix. However, more sequence data is still needed to construct matrices from different subsets of DNA, specific to kingdoms, evolutionary distance or different amount of synonymous change. Codon mutation matrices have advantages for alignments up to medium evolutionary distances and for usages that require DNA such as ancestral reconstruction of DNA sequences and the calculation of Ka/Ks ratios.Models for codon substitutions are used in computational biology for a wide range of applications such as reconstructing ancestral DNA sequences, determining Ka/Ks ratios to identify periods of adaptive evolution and aligning coding DNA.Methods for estimating mutation matrices from observed substitutions in sequence alignments of proteins were established by Dayhoff [1]. These matrices contain the probabilities of amino acid mutations for a given period of evolution and have long been used for scoring protein sequence alignments, evolutionary studies and homology searches.More than a decade ago, when large-scale protein databases became established, several amino acid substitution matrices based on observed mutation counts in protein alignments were constructed [2-4], replacing the original Dayhoff matrices that were based on relatively few alignments.However, to describe substitutions at the codon level, parameterized models have b
|