Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called “compositional domains,” each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter+ to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter+ pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter+ by applying it to human and insect genomes. The computational tools and data repository are available online. 1. Introduction While the genome sizes of multicellular eukaryotes are generally larger and more variable in length than those of prokaryotes, guanine and cytosine (GC) content exhibits a much smaller variation in eukaryotes than in prokaryotes. In particular, vertebrate genomes show quite a uniform GC content, distributing over a very narrow range from about 40% to 45% [1]. Despite the uniformity of their genomic GC content, vertebrate genomes have a much more complex compositional organization than prokaryotic genomes. Recent studies have shown that this narrow distribution cloaks a complex mosaic of homogeneous and nonhomogeneous compositional domains whose sizes range from 3 kilobases (kb) to more than 10 Mega bases (Mb) and whose GC contents range from ~7% to ~72% (e.g., [2, 3]). Molecular evolutionists have had a long-standing interest in deciphering the internal compositional organization of genomes, describing their
References
[1]
D. Graur and W.-H. Li, Fundamentals of Molecular Evolution, Sinauer Associates, Sunderland, Mass, USA, 2000.
[2]
C. G. Elsik, R. L. Tellam, K. C. Worley et al., “The genome sequence of taurine cattle: a window to ruminant biology and evolution,” Science, vol. 324, no. 5926, pp. 522–528, 2009.
[3]
E. Elhaik, D. Graur, K. Josi?, and G. Landan, “Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm,” Nucleic Acids Research, vol. 38, no. 15, article 158, 2010.
[4]
N. Cohen, T. Dagan, L. Stone, and D. Graur, “GC composition of the human genome: in search of isochores,” Molecular Biology and Evolution, vol. 22, no. 5, pp. 1260–1272, 2005.
[5]
G. Bernardi, B. Olofsson, and J. Filipski, “The mosaic genome of warm-blooded vertebrates,” Science, vol. 228, no. 4702, pp. 953–958, 1985.
[6]
E. M. S. Belle, N. Smith, and A. Eyre-Walker, “Analysis of the phylogenetic distribution of isochores in vertebrates and a test of the thermal stability hypothesis,” Journal of Molecular Evolution, vol. 55, no. 3, pp. 356–363, 2002.
[7]
L. Duret and N. Galtier, “Biased gene conversion and the evolution of mammalian genomic landscapes,” Annual Review of Genomics and Human Genetics, vol. 10, pp. 285–311, 2009.
[8]
W.-H. Li, “On parameters of the human genome,” Journal of Theoretical Biology, vol. 288, pp. 92–104, 2011.
[9]
A. Eyre-Walker and L. D. Hurst, “The evolution of isochores,” Nature Reviews Genetics, vol. 2, no. 7, pp. 549–555, 2001.
[10]
E. Elhaik, D. Graur, and K. Josi?, “Comparative testing of DNA segmentation algorithms using benchmark simulations,” Molecular Biology and Evolution, vol. 27, no. 5, pp. 1015–1024, 2010.
[11]
C. R. Smith, C. D. Smith, H. M. Robertson et al., “Draft genome of the red harvester ant Pogonomyrmex barbatus,” Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 14, pp. 5667–5672, 2011.
[12]
C. D. Smith, A. Zimin, C. Holt et al., “Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile),” Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 14, pp. 5673–5678, 2011.
[13]
G. Suen, C. Teiling, L. Li et al., “The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle,” PLoS Genetics, vol. 7, no. 2, Article ID e1002007, 2011.
[14]
E. Sodergren, G. M. Weinstock, E. H. Davidson, R. A. Cameron, R. A. Gibbs, et al., “Insights into social insects from the genome of the honeybee Apis mellifera,” Nature, vol. 443, pp. 931–949, 2006.
[15]
E. F. Kirkness, B. J. Haas, W. Sun, H. R. Braig, M. A. Perotti, et al., “Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle,” Proceedings of the National Academy of Sciences of the United States of America, vol. 107, pp. 12168–12173, 2010.
[16]
J. H. Werren, S. Richards, C. A. Desjardins, O. Niehuis, J. Gadau, et al., “Functional and evolutionary insights from the genomes of three parasitoid Nasonia species,” Science, vol. 327, pp. 343–348, 2010.
[17]
E. Sodergren, G. M. Weinstock, E. H. Davidson et al., “The genome of the sea urchin Strongylocentrotus purpuratus,” Science, vol. 314, no. 5801, pp. 941–952, 2006.
[18]
S. Richards, R. A. Gibbs, G. M. Weinstock et al., “The genome of the model beetle and pest Tribolium castaneum,” Nature, vol. 452, no. 7190, pp. 949–955, 2008.
[19]
D. F. Simola, L. Wissler, G. Donahue, R. M. Waterhouse, M. Helmkampf, et al., “The (r)evolution of social insect genomes,” Proceedings of the National Academy of Sciences.
[20]
E. Elhaik, G. Landan, and D. Graur, “Can GC content at third-codon positions be used as a proxy for isochore composition?” Molecular Biology and Evolution, vol. 26, no. 8, pp. 1829–1833, 2009.
[21]
E. Elhaik and T. V. Tatarinova, “GC3 biology in eukaryotes and prokaryotes,” in DNA Methylation—From Genomics To Technology, T. Tatarinova and O. Kerton, Eds., pp. 55–68, 2012.
[22]
T. V. Tatarinova, N. N. Alexandrov, J. B. Bouck, and K. A. Feldmann, “GC3 biology in corn, rice, sorghum and other grasses,” BMC Genomics, vol. 11, no. 1, article 308, 2010.
[23]
E. Elhaik, E. Greenspan, E. S. Staats, T. Krahn, C. Tyler-Smith, et al., “The GenoChip: a new tool for genetic anthropology,” Genome Biology and Evolution.
[24]
M. Costantini, O. Clay, F. Auletta, and G. Bernardi, “An isochore map of human chromosomes,” Genome Research, vol. 16, no. 4, pp. 536–541, 2006.
[25]
J. L. Oliver, P. Carpena, M. Hackenberg, and P. Bernaola-Galván, “IsoFinder: computational prediction of isochores in genome sequences,” Nucleic Acids Research, vol. 32, pp. W287–W292, 2004.
[26]
D. Mouchiroud, C. Gautier, and G. Bernardi, “The compositional distribution of coding sequences and DNA molecules in humans and murids,” Journal of Molecular Evolution, vol. 27, no. 4, pp. 311–320, 1988.
[27]
A. McLysaght, D. Huson, L. Carmel, I. B. Rogozin, Y. I. Wolf, and E. V. Koonin, “An expectation-maximization algorithm for analysis of evolution of exon-intron structure of eukaryotic genes,” Comparative Genomics, vol. 3678, pp. 35–46, 2005.