Gymnosperms, comprising cycads, Ginkgo, Gnetales, and conifers, represent one of the major groups of extant seed plants. Yet compared to angiosperms, little is known about the patterns of diversification and genome evolution in gymnosperms. We assembled a phylogenetic supermatrix containing over 4.5 million nucleotides from 739 gymnosperm taxa. Although 93.6% of the cells in the supermatrix are empty, the data reveal many strongly supported nodes that are generally consistent with previous phylogenetic analyses, including weak support for Gnetales sister to Pinaceae. A lineage through time plot suggests elevated rates of diversification within the last 100 million years, and there is evidence of shifts in diversification rates in several clades within cycads and conifers. A likelihood-based analysis of the evolution of genome size in 165 gymnosperms finds evidence for heterogeneous rates of genome size evolution due to an elevated rate in Pinus. 1. Introduction Recent advances in sequencing technology offer the possibility of identifying the genetic mechanisms that influence evolutionarily important characters and ultimately drive diversification. Within angiosperms, large-scale phylogenetic analyses have identified complex patterns of diversification (e.g., [1–3]), and numerous genomes are at least partially sequenced. Yet the other major clade of seed plants, the gymnosperms, have received far less attention, with few comprehensive studies of diversification and no sequenced genomes. Note that throughout this paper “gymnosperms” specifies only the approximately 1000 extant species within cycads, Ginkgo, Gnetales, and conifers. These comprise the Acrogymnospermae clade described by Cantino et al. . Many gymnosperms have exceptionally large genomes (e.g., [5–7]), and this has hindered whole-genome sequencing projects, especially among economically important Pinus species. This large genome size is interesting because one suggested mechanism for rapid increases in genome size, polyploidy, is rare among gymnosperms . Recent sequencing efforts have elucidated some of genomic characteristics associated with the large genome size in Pinus. Morse et al.  identified a large retrotransposon family in Pinus, that, with other retrotransposon families, accounts for much of the genomic complexity. Similarly, recent sequencing of 10 BAC (bacterial artificial chromosome) clones from Pinus taeda identified many conifer-specific LTR (long terminal repeat) retroelements . These studies suggest that the large genome size may be caused by rapid expansion of
S. A. Smith, J. M. Beaulieu, A. Stamatakis, and M. J. Donoghue, “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany, vol. 98, no. 3, pp. 404–414, 2011.
S. E. Hall, W. S. Dvorak, J. S. Johnston, H. J. Price, and C. G. Williams, “Flow cytometric analysis of DNA content for tropical and temperate new world pines,” Annals of Botany, vol. 86, no. 6, pp. 1081–1086, 2000.
E. Grotkopp, M. Rejmánek, M. J. Sanderson, and T. L. Rost, “Evolution of genome size in pines (Pinus) and its life-history correlates: supertree analyses,” Evolution, vol. 58, no. 8, pp. 1705–1729, 2004.
J. D. Thompson, D. G. Higgins, and T. J. Gibson, “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Research, vol. 22, no. 22, pp. 4673–4680, 1994.
H. Won and S. S. Renner, “Dating dispersal and radiation in the gymnosperm Gnetum (Gnetales)—clock calibration when outgroup relationships are uncertain,” Systematic Biology, vol. 55, no. 4, pp. 610–622, 2006.
B. R. Moore, K. M. A. Chan, and M. J. Donoghue, “Detecting diversification rate variation in supertrees,” in Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, O. R. P. Bininda-Emonds, Ed., pp. 487–533, Kluwer Academic, Dodrecht, The Netherlands, 2004.
H. Wang, M. J. Moore, P. S. Soltis et al., “Rosid radiation and the rapid rise of angiosperm-dominated forests,” Proceedings of the National Academy of Sciences of the United States of America, vol. 106, no. 10, pp. 3853–3858, 2009.
A. R. Lemmon, J. M. Brown, K. Stanger-Hall, and E. M. Lemmon, “The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and bayesian inference,” Systematic Biology, vol. 58, no. 1, pp. 130–145, 2009.