|
BMC Genomics 2009
High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPsAbstract: We sequenced the two most commonly used inbred mouse strains, DBA/2J and C57BL/6J, across a region of chromosome 1 (171.6 – 174.6 megabases) using two next generation high-throughput sequencing platforms: Applied Biosystems (SOLiD) and Illumina (Genome Analyzer). Using the same templates on both platforms, we compared realignments and single nucleotide polymorphism (SNP) detection with an 80 fold average read depth across platforms and samples. While public datasets currently annotate 4,527 SNPs between the two strains in this interval, thorough high-throughput sequencing identified a total of 11,824 SNPs in the interval, including 7,663 new SNPs. Furthermore, we confirmed 40 missense SNPs and discovered 36 new missense SNPs.Comparisons utilizing even two of the best characterized mouse genetic models, DBA/2J and C57BL/6J, indicate that more than half of naturally-occurring SNPs remain cryptic. The magnitude of this problem is compounded when using more divergent or poorly annotated genetic models. This warrants full genomic sequencing of the mouse strains used as genetic models.With the recent completion of the Perlegen/NIEHS mouse resequencing [1], over ten million mouse single nucleotide polymorphisms (SNPs) are now annotated in the public databases, resulting in a dramatic increase in the genome-wide knowledge of variation among 16 of the most widely used mouse strains. Importantly, this is a lower bound estimate because the C57BL/6J (B6) strain, used for the mouse genome reference sequence (NCBI m37, Apr 2007), is the only mouse strain sequenced in its entirety. Given the proliferation of mouse genetic models (e.g., knockout models, selectively bred lines, heterogeneous stocks derived from standard inbred strains and wild mice) and the recent insurgence of gene expression microarray and phenotypic studies using these mouse models, the impact of naturally-occurring polymorphisms on these data is critical. With the advent of next-generation, high-throughput seque
|