%0 Journal Article %T The Average Mutual Information Profile as a Genomic Signature %A Mark Bauer %A Sheldon M Schuster %A Khalid Sayood %J BMC Bioinformatics %D 2008 %I BioMed Central %R 10.1186/1471-2105-9-48 %X We have analyzed genomic sequences of eukaryotic and prokaryotic chromosomes as well as various subtypes of viruses using an information theoretic framework. We confirm the existence of a species specific average mutual information (AMI) profile. We use these profiles to define a very simple, computationally efficient, alignment free, distance measure that reflects the evolutionary relationships between genomic sequences. We use this distance measure to classify chromosomes according to species of origin, to separate and cluster subtypes of the HIV-1 virus, and classify DNA fragments to species of origin.AMI profiles of DNA sequences prove to be species specific and easy to compute. The structure of AMI profiles are conserved, even in short subsequences of a species' genome, rendering a pervasive signature. This signature can be used to classify relatively short DNA fragments to species of origin.The existence of patterns that can be used as a signature of data is indicative of statistical or deterministic structures in the data. In DNA sequences this structure can be due to biological processes which involve the DNA or they may appear because of events and processes in the evolutionary history of the DNA. There have been significant efforts in understanding the sequential structure and complexity of DNA using various approaches, information theoretic measures or other mathematical models.The standard approach to studying statistical relationships in a sequence is the use of correlation profiles or spectral profiles such as periodograms and power spectrums. To translate the sequence of letters that form the DNA sequence into a sequence of numbers, which can then be easily analyzed using autocorrelation or spectral techniques, different mappings have been proposed by Gates [1], Voss [2] and Peng et al. [3]. The power spectral densities obtained from these approaches show a power law relationship, which points to the existence of long range correlations. A number of m %U http://www.biomedcentral.com/1471-2105/9/48