全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS Genetics  2015 

Origins of De Novo Genes in Human and Chimpanzee

DOI: 10.1371/journal.pgen.1005721

Full-Text   Cite this paper   Add to My Lib

Abstract:

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species—human, chimpanzee, macaque, and mouse—and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.

References

[1]  Basu K, Graham LA, Campbell RL, Davies PL (2015) Flies expand the repertoire of protein structures that bind ice. Proc Natl Acad Sci U S A 112: 737–742. doi: 10.1073/pnas.1422272112. pmid:25561557
[2]  Bosch TCG (2014) Rethinking the role of immunity: lessons from Hydra. Trends Immunol 35: 495–502. doi: 10.1016/j.it.2014.07.008. pmid:25174994
[3]  Haldane JBS (1932) The causes of evolution. New York: Harper and Bros.
[4]  Ohno S (1970) Evolution by gene duplication. Springer New York.
[5]  Tautz D (2014) The Discovery of De Novo Gene Evolution. Perspect Biol Med 57: 149–161. doi: 10.1353/pbm.2014.0006. pmid:25345708
[6]  Siepel A (2009) Darwinian alchemy: Human genes from noncoding DNA. Genome Res 19: 1693–1695. doi: 10.1101/gr.098376.109. pmid:19797681
[7]  Tautz D, Domazet-Lo?o T (2011) The evolutionary origin of orphan genes. Nat Rev Genet 12: 692–702. doi: 10.1038/nrg3053. pmid:21878963
[8]  Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ (2006) Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci U S A 103: 9935–9939. pmid:16777968 doi: 10.1073/pnas.0509809103
[9]  Cai J, Zhao R, Jiang H, Wang W (2008) De novo origination of a new protein-coding gene in Saccharomyces cerevisiae. Genetics 179: 487–496. doi: 10.1534/genetics.107.084491. pmid:18493065
[10]  Heinen TJAJ, Staubach F, H?ming D, Tautz D (2009) Emergence of a new gene from an intergenic region. Curr Biol 19: 1527–1531. doi: 10.1016/j.cub.2009.07.049. pmid:19733073
[11]  Murphy DN, McLysaght A (2012) De novo origin of protein-coding genes in murine rodents. PLoS One 7: e48650. doi: 10.1371/journal.pone.0048650. pmid:23185269
[12]  Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, et al. (2009) Origin of primate orphan genes: a comparative genomics approach. Mol Biol Evol 26: 603–612. doi: 10.1093/molbev/msn281. pmid:19064677
[13]  Reinhardt JA, Wanjiru BM, Brant AT, Saelao P, Begun DJ, et al. (2013) De novo ORFs in Drosophila are important to organismal fitness and evolved rapidly from previously non-coding sequences. PLoS Genet 9: e1003860. doi: 10.1371/journal.pgen.1003860. pmid:24146629
[14]  Knowles DG, McLysaght A (2009) Recent de novo origin of human protein-coding genes. Genome Res 19: 1752–1759. doi: 10.1101/gr.095026.109. pmid:19726446
[15]  Ohno S (1984) Birth of a unique enzyme from an alternative reading frame of the preexisted, internally repetitious coding sequence. Proc Natl Acad Sci U S A 81: 2421–2425. pmid:6585807 doi: 10.1073/pnas.81.8.2421
[16]  Chen S, Zhang YE, Long M (2010) New genes in Drosophila quickly become essential. Science 330: 1682–1685. doi: 10.1126/science.1196380. pmid:21164016
[17]  Li D, Dong Y, Jiang Y, Jiang H, Cai J, et al. (2010) A de novo originated gene depresses budding yeast mating pathway and is repressed by the protein encoded by its antisense strand. Cell Res 20: 408–420. doi: 10.1038/cr.2010.31. pmid:20195295
[18]  Ekman D, Elofsson A (2010) Identifying and quantifying orphan protein sequences in fungi. J Mol Biol 396: 396–405. doi: 10.1016/j.jmb.2009.11.053. pmid:19944701
[19]  Wu D-D, Irwin DM, Zhang Y-P (2011) De novo origin of human protein-coding genes. PLoS Genet 7: e1002379. doi: 10.1371/journal.pgen.1002379. pmid:22102831
[20]  Xie C, Zhang YE, Chen J-Y, Liu C-J, Zhou W-Z, et al. (2012) Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs. PLoS Genet 8: e1002942. doi: 10.1371/journal.pgen.1002942. pmid:23028352
[21]  Chen J-Y, Shen QS, Zhou W-Z, Peng J, He BZ, et al. (2015) Emergence, Retention and Selection: A Trilogy of Origination for Functional De Novo Proteins from Ancestral LncRNAs in Primates. PLoS Genet 11: e1005391. doi: 10.1371/journal.pgen.1005391. pmid:26177073
[22]  Neme R, Tautz D (2013) Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 14:117. doi: 10.1186/1471-2164-14-117. pmid:23433480
[23]  Domazet-Loso T, Tautz D (2003) An evolutionary analysis of orphan genes in Drosophila. Genome Res 13: 2213–2219. pmid:14525923 doi: 10.1101/gr.1311003
[24]  Wissler L, Gadau J, Simola DF, Helmkampf M, Bornberg-Bauer E (2013) Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol Evol 5: 439–455. doi: 10.1093/gbe/evt009. pmid:23348040
[25]  Zhou Q, Zhang G, Zhang Y, Xu S, Zhao R, et al. (2008) On the origin of new genes in Drosophila. Genome Res 18: 1446–1455. doi: 10.1101/gr.076588.108. pmid:18550802
[26]  Li L, Wurtele ES (2015) The QQS orphan gene of Arabidopsis modulates carbon and nitrogen allocation in soybean. Plant Biotechnol J. 13:177–187. doi: 10.1111/pbi.12238. pmid:25146936
[27]  Donoghue MT, Keshavaiah C, Swamidatta SH, Spillane C (2011) Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol Biol 11: 47. doi: 10.1186/1471-2148-11-47. pmid:21332978
[28]  Arendsee ZW, Li L, Wurtele ES (2014) Coming of age: orphan genes in plants. Trends Plant Sci 19: 698–708. pmid:25151064 doi: 10.1016/j.tplants.2014.07.003
[29]  Schl?tterer C (2015) Genes from scratch—the evolutionary fate of de novo genes. Trends Genet. 31: 215–219. doi: 10.1016/j.tig.2015.02.007. pmid:25773713
[30]  Suenaga Y, Islam SMR, Alagu J, Kaneko Y, Kato M, et al. (2014) NCYM, a Cis-antisense gene of MYCN, encodes a de novo evolved protein that inhibits GSK3β resulting in the stabilization of MYCN in human neuroblastomas. PLoS Genet 10: e1003996. doi: 10.1371/journal.pgen.1003996. pmid:24391509
[31]  Djebali S, Davis C a, Merkel A, Dobin A, Lassmann T, et al. (2012) Landscape of transcription in human cells. Nature 489: 101–108. doi: 10.1038/nature11233. pmid:22955620
[32]  Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, et al. (2007) RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316: 1484–1488. pmid:17510325 doi: 10.1126/science.1138341
[33]  Carvunis A-R, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, et al. (2012) Proto-genes and de novo gene birth. Nature 487: 370–374. doi: 10.1038/nature11184. pmid:22722833
[34]  Ingolia NT, Lareau LF, Weissman JS (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147: 789–802. doi: 10.1016/j.cell.2011.10.002. pmid:22056041
[35]  Wilson BA, Masel J (2011) Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol Evol 3: 1245–1252. doi: 10.1093/gbe/evr099. pmid:21948395
[36]  Ingolia NT, Brar GA, Stern-Ginossar N, Harris MS, Talhouarne GJS, et al. (2014) Ribosome Profiling Reveals Pervasive Translation Outside of Annotated Protein-Coding Genes. Cell Rep 8: 1365–1379. doi: 10.1016/j.celrep.2014.07.045. pmid:25159147
[37]  Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM (2014) Long non-coding RNAs as a source of new peptides. Elife 3: e03523. doi: 10.7554/eLife.03523. pmid:25233276
[38]  Necsulea A, Soumillon M, Warnefors M, Liechti A, Daish T, et al. (2014) The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505: 635–640. doi: 10.1038/nature12943. pmid:24463510
[39]  Kutter C, Watt S, Stefflova K, Wilson MD, Goncalves A, et al. (2012) Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet 8: e1002841. doi: 10.1371/journal.pgen.1002841. pmid:22844254
[40]  Palmieri N, Kosiol C, Schl?tterer C (2014) The life cycle of Drosophila orphan genes. Elife 3: e01311. doi: 10.7554/eLife.01311. pmid:24554240
[41]  Zhao L, Saelao P, Jones CD, Begun DJ (2014) Origin and spread of de novo genes in Drosophila melanogaster populations. Science 343: 769–772. doi: 10.1126/science.1248286. pmid:24457212
[42]  Neme R, Tautz D (2014) Evolution: dynamics of de novo gene emergence. Curr Biol 24: R238–R240. doi: 10.1016/j.cub.2014.02.016. pmid:24650912
[43]  McLysaght A, Guerzoni D (2015) New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci 370. doi: 10.1098/rstb.2014.0332
[44]  Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, et al. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28: 511–515. doi: 10.1038/nbt.1621. pmid:20436464
[45]  Soumillon M, Necsulea A, Weier M, Brawand D, Zhang X, et al. (2013) Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep 3: 2179–2190. doi: 10.1016/j.celrep.2013.05.031. pmid:23791531
[46]  Wu D-D, Irwin DM, Zhang Y-P (2011) De novo origin of human protein-coding genes. PLoS Genet 7: e1002379. doi: 10.1371/journal.pgen.1002379. pmid:22102831
[47]  Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson D a, et al. (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29: 644–652. doi: 10.1038/nbt.1883. pmid:21572440
[48]  Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, et al. (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 47: 199–208. doi: 10.1038/ng.3192. pmid:25599403
[49]  Altschul SF, Madden TL, Sch?ffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402. pmid:9254694 doi: 10.1093/nar/25.17.3389
[50]  Brawand D, Soumillon M, Necsulea A, Julien P, Csárdi G, et al. (2011) The evolution of gene expression levels in mammalian organs. Nature 478: 343–348. doi: 10.1038/nature10532. pmid:22012392
[51]  Kapusta A, Kronenberg Z, Lynch VJ, Zhuo X, Ramsay L, et al. (2013) Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet 9: e1003470. doi: 10.1371/journal.pgen.1003470. pmid:23637635
[52]  Hezroni H, Koppstein D, Schwartz MG, Avrutin A, Bartel DP, et al. (2015) Principles of Long Noncoding RNA Evolution Derived from Direct Comparison of Transcriptomes in 17 Species. Cell Rep 11: 1110–1122. doi: 10.1016/j.celrep.2015.04.023. pmid:25959816
[53]  Consortium TG (2013) The Genotype-Tissue Expression (GTEx) project. Nat Genet 45: 580–585. doi: 10.1038/ng.2653. pmid:23715323
[54]  Khorkova O, Myers AJ, Hsiao J, Wahlestedt C (2014) Natural antisense transcripts. Hum Mol Genet 23: R54–R63. doi: 10.1093/hmg/ddu207. pmid:24838284
[55]  Lepoivre C, Belhocine M, Bergon A, Griffon A, Yammine M, et al. (2013) Divergent transcription is associated with promoters of transcriptional regulators. BMC Genomics 14: 914. doi: 10.1186/1471-2164-14-914. pmid:24365181
[56]  Uesaka M, Nishimura O, Go Y, Nakashima K, Agata K, et al. (2014) Bidirectional promoters are the major source of gene activation-associated non-coding RNAs in mammals. BMC Genomics 15: 35. doi: 10.1186/1471-2164-15-35. pmid:24438357
[57]  Wu X, Sharp PA (2013) Divergent transcription: a driving force for new gene origination? Cell 155: 990–996. doi: 10.1016/j.cell.2013.10.048. pmid:24267885
[58]  Bellora N, Farré D, Mar Albà M (2007) PEAKS: identification of regulatory motifs by their position in DNA sequences. Bioinformatics 23: 243–244. pmid:17098773 doi: 10.1093/bioinformatics/btl568
[59]  Heinz S, Benner C, Spann N, Bertolino E, Lin YC, et al. (2010) Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol Cell 38: 576–589. doi: 10.1016/j.molcel.2010.05.004. pmid:20513432
[60]  Horvath GC, Kistler WS, Kistler MK (2004) RFX2 is a potential transcriptional regulatory factor for histone H1t and other genes expressed during the meiotic phase of spermatogenesis. Biol Reprod 71: 1551–1559. pmid:15229132 doi: 10.1095/biolreprod.104.032268
[61]  Kistler WS, Baas D, Lemeille S, Paschaki M, Seguin-Estevez Q, et al. (2015) RFX2 Is a Major Transcriptional Regulator of Spermiogenesis. PLoS Genet 11: e1005368. doi: 10.1371/journal.pgen.1005368. pmid:26162102
[62]  Deng W, Roberts SGE (2005) A core promoter element downstream of the TATA box that is recognized by TFIIB. Genes Dev 19: 2418–2423. pmid:16230532 doi: 10.1101/gad.342405
[63]  Almada AE, Wu X, Kriz AJ, Burge CB, Sharp PA (2013) Promoter directionality is controlled by U1 snRNP and polyadenylation signals. Nature 499: 360–363. doi: 10.1038/nature12349. pmid:23792564
[64]  Kim M-S, Pinto SM, Getnet D, Nirujogi RS, Manda SS, et al. (2014) A draft map of the human proteome. Nature 509: 575–581. doi: 10.1038/nature13302. pmid:24870542
[65]  Wilhelm M, Schlegl J, Hahne H, Moghaddas Gholami A, Lieberenz M, et al. (2014) Mass-spectrometry-based draft of the human proteome. Nature 509: 582–587. doi: 10.1038/nature13319. pmid:24870543
[66]  Gonzalez C, Sims JS, Hornstein N, Mela A, Garcia F, et al. (2014) Ribosome profiling reveals a cell-type-specific translational landscape in brain tumors. J Neurosci 34: 10924–10936. doi: 10.1523/JNEUROSCI.0084-14.2014. pmid:25122893
[67]  Kung JTY, Colognori D, Lee JT (2013) Long noncoding RNAs: past, present, and future. Genetics 193: 651–669. doi: 10.1534/genetics.112.146704. pmid:23463798
[68]  Pauli A, Valen E, Schier AF (2015) Identifying (non-)coding RNAs and small peptides: challenges and opportunities. Bioessays 37: 103–112. doi: 10.1002/bies.201400103. pmid:25345765
[69]  Slavoff SA, Mitchell AJ, Schwaid AG, Cabili MN, Ma J, et al. (2013) Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat Chem Biol 9: 59–64. doi: 10.1038/nchembio.1120. pmid:23160002
[70]  Smeds L, Künstner A (2011) ConDeTri—a content dependent read trimmer for Illumina data. PLoS One 6: e26314. doi: 10.1371/journal.pone.0026314. pmid:22039460
[71]  Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, et al. (2013) Ensembl 2013. Nucleic Acids Res 41: D48–D55. doi: 10.1093/nar/gks1236. pmid:23203987
[72]  Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, et al. (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14: R36. doi: 10.1186/gb-2013-14-4-r36. pmid:23618408
[73]  Forrest ARR, Kawaji H, Rehli M, Baillie JK, de Hoon MJL, et al. (2014) A promoter-level mammalian expression atlas. Nature 507: 462–470. doi: 10.1038/nature13182. pmid:24670764
[74]  Smit, AFA, Hubley, R & Green P (n.d.) RepeatMasker Open-4.0.
[75]  Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842. doi: 10.1093/bioinformatics/btq033. pmid:20110278
[76]  Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, et al. (2003) Human-mouse alignments with BLASTZ. Genome Res 13: 103–107. pmid:12529312 doi: 10.1101/gr.809403
[77]  Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, et al. (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19: 1639–1645. doi: 10.1101/gr.092759.109. pmid:19541911
[78]  Pruitt KD, Tatusova T, Klimke W, Maglott DR (2009) NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res 37: D32–D36. doi: 10.1093/nar/gkn721. pmid:18927115
[79]  Yanai I, Benjamin H, Shmoish M, Chalifa-Caspi V, Shklar M, et al. (2005) Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21: 650–659. pmid:15388519 doi: 10.1093/bioinformatics/bti042
[80]  Matys V, Kel-Margoulis O V, Fricke E, Liebich I, Land S, et al. (2006) TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34: D108–D110. pmid:16381825 doi: 10.1093/nar/gkj143
[81]  Bailey TL, Johnson J, Grant CE, Noble WS (2015) The MEME Suite. Nucleic Acids Res. 43(Web Server issue): W39–W49. doi: 10.1093/nar/gkv416
[82]  Hashimoto K, Noshiro M, Ohno S, Kawamoto T, Satakeda H, et al. (1997) Characterization of a cartilage-derived 66-kDa protein (RGD-CAP/beta ig-h3) that binds to collagen. Biochim Biophys Acta 1355: 303–314. pmid:9061001 doi: 10.1016/s0167-4889(96)00147-4
[83]  Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25. doi: 10.1186/gb-2009-10-3-r25. pmid:19261174
[84]  Perkins DN, Pappin DJ, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20: 3551–3567. pmid:10612281 doi: 10.1002/(sici)1522-2683(19991201)20:18<3551::aid-elps3551>3.0.co;2-2
[85]  Consortium TU (2014) Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res 42: D191–D198. doi: 10.1093/nar/gkt1140. pmid:24253303
[86]  Kall L, Canterbury JD, Weston J, Noble WS, MacCoss MJ (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Meth 4: 923–925. doi: 10.1038/nmeth1113
[87]  Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24: 1586–1591. pmid:17483113 doi: 10.1093/molbev/msm088
[88]  Team R (2013) R Development Core Team. R A Lang Environ Stat Comput.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133