%0 Journal Article %T Context dependent substitution biases vary within the human genome %A P Andrew Nevarez %A Christopher M DeBoever %A Benjamin J Freeland %A Marissa A Quitt %A Eliot C Bush %J BMC Bioinformatics %D 2010 %I BioMed Central %R 10.1186/1471-2105-11-462 %X To address this we have developed a method which identifies over- and under-represented substitution patterns and assesses their overall impact on the evolution of genome composition. Our method is designed to account for biases at smaller pattern sizes, removing their effects. We used this method to investigate context bias in the human lineage after the divergence from chimpanzee. We examined bias effects in substitution patterns between 2 and 5 bp long and found significant effects at all sizes. This included some individual three and four base pair patterns with relatively large biases. We also found that bias effects vary across the genome, differing between transposons and non-transposons, between different classes of transposons, and also near and far from genes.We found that nucleotides beyond the immediately adjacent one are responsible for substantial context effects, and that these biases vary across the genome.Early models of nucleotide substitution made strong simplifying assumptions, for example assuming that different nucleotides substitute for each other at the same rate [1,2]. Over time it has become clear that many of these assumptions were too strong [3,4]. One assumption that has often been made is that the probability of a substitution at a particular nucleotide position is independent of context, that is the identity of its neighbors. However it is now known that context can substantially bias the substitution process.The most dramatic example of such substitution bias in vertebrates is the CG ¡ú TG bias. Typically when a cytosine undergoes deamination it results in a uracil, a situation that is recognized by uracil-DNA glycosylase and repaired by the cell [5]. However, if the cytosine is methylated the result of deamination is thymine. Such cases result in mismatches and lead to an unusually high rate of C ¡ú T and G ¡ú A transitions [6]. Because in vertebrates most methylated C residues occur in a CG context, this process causes high rates of CG %U http://www.biomedcentral.com/1471-2105/11/462