The domestic dog, Canis familiaris, is a well-established model system for mapping trait and disease loci. While the original draft sequence was of good quality, gaps were abundant particularly in promoter regions of the genome, negatively impacting the annotation and study of candidate genes. Here, we present an improved genome build, canFam3.1, which includes 85 MB of novel sequence and now covers 99.8% of the euchromatic portion of the genome. We also present multiple RNA-Sequencing data sets from 10 different canine tissues to catalog ~175,000 expressed loci. While about 90% of the coding genes previously annotated by EnsEMBL have measurable expression in at least one sample, the number of transcript isoforms detected by our data expands the EnsEMBL annotations by a factor of four. Syntenic comparison with the human genome revealed an additional ~3,000 loci that are characterized as protein coding in human and were also expressed in the dog, suggesting that those were previously not annotated in the EnsEMBL canine gene set. In addition to ~20,700 high-confidence protein coding loci, we found ~4,600 antisense transcripts overlapping exons of protein coding genes, ~7,200 intergenic multi-exon transcripts without coding potential, likely candidates for long intergenic non-coding RNAs (lincRNAs) and ~11,000 transcripts were reported by two different library construction methods but did not fit any of the above categories. Of the lincRNAs, about 6,000 have no annotated orthologs in human or mouse. Functional analysis of two novel transcripts with shRNA in a mouse kidney cell line altered cell morphology and motility. All in all, we provide a much-improved annotation of the canine genome and suggest regulatory functions for several of the novel non-coding transcripts.
References
[1]
Axelsson E, Ratnakumar A, Arendt ML, Maqbool K, Webster MT, et al. (2013) The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495: 360–364. doi: 10.1038/nature11837
[2]
Davis SJM, Valla FR (1978) Evidence for domestication of the dog 12,000 years ago in the Natufian of Israel. Nature 276: 608–610. doi: 10.1038/276608a0
[3]
Pang JF, Kluetsch C, Zou XJ, Zhang AB, Luo LY, et al. (2009) mtDNA data indicate a single origin for dogs south of Yangtze River, less than 16,300 years ago, from numerous wolves. Mol Biol Evol 26: 2849–2864. doi: 10.1093/molbev/msp195
[4]
Skoglund P, Gotherstrom A, Jakobsson M (2011) Estimation of population divergence times from non-overlapping genomic sequences: examples from dogs and wolves. Mol Biol Evol 28: 1505–1517. doi: 10.1093/molbev/msq342
[5]
Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, et al. (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438: 803–819.
[6]
Sutter NB, Eberle MA, Parker HG, Pullar BJ, Kirkness EF, et al. (2004) Extensive and breed-specific linkage disequilibrium in Canis familiaris. Genome Res 14: 2388–2396. doi: 10.1101/gr.3147604
[7]
Dreger DL, Parker HG, Ostrander EA, Schmutz SM (2013) Identification of a mutation that is associated with the saddle tan and black-and-tan phenotypes in Basset Hounds and Pembroke Welsh Corgis. J Hered 104: 399–406. doi: 10.1093/jhered/est012
[8]
Karlsson EK, Baranowska I, Wade CM, Salmon Hillbertz NH, Zody MC, et al. (2007) Efficient mapping of mendelian traits in dogs through genome-wide association. Nat Genet 39: 1321–1328. doi: 10.1038/ng.2007.10
[9]
Bannasch D, Young A, Myers J, Truve K, Dickinson P, et al. (2010) Localization of canine brachycephaly using an across breed mapping approach. PloS one 5: e9632. doi: 10.1371/journal.pone.0009632
[10]
Hoopes BC, Rimbault M, Liebers D, Ostrander EA, Sutter NB (2012) The insulin-like growth factor 1 receptor (IGF1R) contributes to reduced size in dogs. Mamm Genome 23: 780–790. doi: 10.1007/s00335-012-9417-z
[11]
Schoenebeck JJ, Hutchinson SA, Byers A, Beale HC, Carrington B, et al. (2012) Variation of BMP3 contributes to dog breed skull diversity. PLoS Genet 8: e1002849. doi: 10.1371/journal.pgen.1002849
[12]
Wilbe M, Jokinen P, Truve K, Seppala EH, Karlsson EK, et al. (2010) Genome-wide association mapping identifies multiple loci for a canine SLE-related disease complex. Nat Genet 42: 250–254. doi: 10.1038/ng.525
[13]
Olsson M, Meadows JR, Truve K, Rosengren Pielberg G, Puppo F, et al. (2011) A novel unstable duplication upstream of HAS2 predisposes to a breed-defining skin phenotype and a periodic fever syndrome in Chinese Shar-Pei dogs. PLoS Genet 7: e1001332. doi: 10.1371/journal.pgen.1001332
[14]
Seppala EH, Jokinen TS, Fukata M, Fukata Y, Webster MT, et al. (2011) LGI2 truncation causes a remitting focal epilepsy in dogs. PLoS Genet 7: e1002194. doi: 10.1371/journal.pgen.1002194
[15]
Downs LM, Wallin-Hakansson B, Boursnell M, Marklund S, Hedhammar A, et al. (2011) A frameshift mutation in golden retriever dogs with progressive retinal atrophy endorses SLC4A3 as a candidate gene for human retinal degenerations. PloS one 6: e21452. doi: 10.1371/journal.pone.0021452
[16]
Axelsson E, Webster MT, Ratnakumar A, Ponting CP, Lindblad-Toh K (2012) Death of PRDM9 coincides with stabilization of the recombination landscape in the dog genome. Genome Res 22: 51–63. doi: 10.1101/gr.124123.111
[17]
Owczarek-Lipska M, Lauber B, Molitor V, Meury S, Kierczak M, et al. (2012) Two loci on chromosome 5 are associated with serum IgE levels in Labrador retrievers. PloS one 7: e39176. doi: 10.1371/journal.pone.0039176
[18]
Salzberg SL, Yorke JA (2005) Beware of mis-assembled genomes. Bioinformatics 21: 4320–4321. doi: 10.1093/bioinformatics/bti769
[19]
Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, et al. (2002) ARACHNE: a whole-genome shotgun assembler. Genome Res 12: 177–189. doi: 10.1101/gr.208902
[20]
Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, et al. (2007) Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res 17: 760–774. doi: 10.1101/gr.6034307
[21]
Levin JZ, Yassour M, Adiconis X, Nusbaum C, Thompson DA, et al. (2010) Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat Methods 7: 709–715. doi: 10.1038/nmeth.1491
[22]
Bogdanov EA, Shagina I, Barsova EV, Kelmanson I, Shagin DA, et al. (2010) Normalizing cDNA libraries. Curr Protoc Mol Biol Chapter 5: Unit 5 12 11–27.
Roberts A, Pimentel H, Trapnell C, Pachter L (2011) Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27: 2325–2329. doi: 10.1093/bioinformatics/btr355
[25]
Schmidt EE (1996) Transcriptional promiscuity in testes. Curr Biol 6: 768–769. doi: 10.1016/s0960-9822(02)00589-4
[26]
Grabherr MG, Russell P, Meyer M, Mauceli E, Alfoldi J, et al. (2010) Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics 26: 1145–1151. doi: 10.1093/bioinformatics/btq102
[27]
Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, et al. (2013) Ensembl 2013. Nucleic Acids Res 41: D48–55. doi: 10.1093/nar/gks1236
[28]
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, et al. (2012) Landscape of transcription in human cells. Nature 489: 101–108. doi: 10.1038/nature11233
[29]
Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, et al. (2005) Antisense transcription in the mammalian transcriptome. Science 309: 1564–1566.
[30]
Weiss B, Davidkova G, Zhou LW (1999) Antisense RNA gene therapy for studying and modulating biological processes. Cell Mol Life Sci 55: 334–358. doi: 10.1007/s000180050296
[31]
Wang X, Arai S, Song X, Reichart D, Du K, et al. (2008) Induced ncRNAs allosterically modify RNA-binding proteins in cis to inhibit transcription. Nature 454: 126–130. doi: 10.1038/nature06992
[32]
Braidotti G, Baubec T, Pauler F, Seidl C, Smrzka O, et al. (2004) The Air noncoding RNA: an imprinted cis-silencing transcript. Cold Spring Harb Symp Quant Biol 69: 55–66. doi: 10.1101/sqb.2004.69.55
[33]
Pauler FM, Koerner MV, Barlow DP (2007) Silencing by imprinted noncoding RNAs: is transcription the answer? Trends Genet 23: 284–292. doi: 10.1016/j.tig.2007.03.018
[34]
Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP (2011) Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147: 1537–1550. doi: 10.1016/j.cell.2011.11.055
[35]
Ponjavic J, Oliver PL, Lunter G, Ponting CP (2009) Genomic and transcriptional co-localization of protein-coding and long non-coding RNA pairs in the developing brain. PLoS Genet 5: e1000617. doi: 10.1371/journal.pgen.1000617
[36]
Liao BY, Zhang J (2006) Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol Biol Evol 23: 530–540. doi: 10.1093/molbev/msj054
[37]
Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, et al. (2011) lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477: 295–300. doi: 10.1038/nature10398
[38]
van Bakel H, Nislow C, Blencowe BJ, Hughes TR (2010) Most “dark matter” transcripts are associated with known genes. PLoS Biol 8: e1000371. doi: 10.1371/journal.pbio.1000371
[39]
Yanagida-Asanuma E, Asanuma K, Kim K, Donnelly M, Young Choi H, et al. (2007) Synaptopodin protects against proteinuria by disrupting Cdc42:IRSp53:Mena signaling complexes in kidney podocytes. Am J Pathol 171: 415–427. doi: 10.2353/ajpath.2007.070075
[40]
Lequarre AS, Andersson L, Andre C, Fredholm M, Hitte C, et al. (2011) LUPA: a European initiative taking advantage of the canine genome architecture for unravelling complex disorders in both human and dogs. Vet J 189: 155–159. doi: 10.1016/j.tvjl.2011.06.013
[41]
Shearin AL, Ostrander EA (2010) Leading the way: canine models of genomics and disease. Dis Model Mech 3: 27–34.
[42]
Rosengren Pielberg G, Golovko A, Sundstrom E, Curik I, Lennartsson J, et al. (2008) A cis-acting regulatory mutation causes premature hair graying and susceptibility to melanoma in the horse. Nat Genet 40: 1004–1009. doi: 10.1038/ng.185
[43]
Tsai MC, Manor O, Wan Y, Mosammaparast N, Wang JK, et al. (2010) Long noncoding RNA as modular scaffold of histone modification complexes. Science 329: 689–693. doi: 10.1126/science.1192002
[44]
Bonfield JK, Smith K, Staden R (1995) A new DNA sequence assembly program. Nucleic Acids Res 23: 4992–4999. doi: 10.1093/nar/23.24.4992
[45]
Koressaar T, Remm M (2007) Enhancements and modifications of primer design program Primer3. Bioinformatics 23: 1289–1291. doi: 10.1093/bioinformatics/btm091
[46]
Untergasser A, Nijveen H, Rao X, Bisseling T, Geurts R, et al. (2007) Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res 35: W71–74. doi: 10.1093/nar/gkm306
[47]
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, et al. (2006) GENCODE: producing a reference annotation for ENCODE. Genome Biol 7 Suppl 1: S4 1–9.
[48]
Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, et al. (2012) The Pfam protein families database. Nucleic Acids Res 40: D290–301. doi: 10.1093/nar/gkr1065
[49]
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410. doi: 10.1016/s0022-2836(05)80360-2
[50]
Mundel P, Reiser J, Zuniga Mejia Borja A, Pavenstadt H, Davidson GR, et al. (1997) Rearrangements of the cytoskeleton and cell contacts induce process formation during differentiation of conditionally immortalized mouse podocyte cell lines. Exp Cell Res 236: 248–258. doi: 10.1006/excr.1997.3739