oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Identification and Analysis of Genes and Pseudogenes within Duplicated Regions in the Human and Mouse Genomes  [PDF]
Mikita Suyama,Eoghan Harrington,Peer Bork ,David Torrents
PLOS Computational Biology , 2006, DOI: 10.1371/journal.pcbi.0020076
Abstract: The identification and classification of genes and pseudogenes in duplicated regions still constitutes a challenge for standard automated genome annotation procedures. Using an integrated homology and orthology analysis independent of current gene annotation, we have identified 9,484 and 9,017 gene duplicates in human and mouse, respectively. On the basis of the integrity of their coding regions, we have classified them into functional and inactive duplicates, allowing us to define the first consistent and comprehensive collection of 1,811 human and 1,581 mouse unprocessed pseudogenes. Furthermore, of the total of 14,172 human and mouse duplicates predicted to be functional genes, as many as 420 are not included in current reference gene databases and therefore correspond to likely novel mammalian genes. Some of these correspond to partial duplicates with less than half of the length of the original source genes, yet they are conserved and syntenic among different mammalian lineages. The genes and unprocessed pseudogenes obtained here will enable further studies on the mechanisms involved in gene duplication as well as of the fate of duplicated genes.
Comparative analysis of transposed element insertion within human and mouse genomes reveals Alu's unique role in shaping the human transcriptome
Noa Sela, Britta Mersch, Nurit Gal-Mark, Galit Lev-Maor, Agnes Hotz-Wagenblatt, Gil Ast
Genome Biology , 2007, DOI: 10.1186/gb-2007-8-6-r127
Abstract: We compiled a dataset of all TEs in the human and mouse genomes, identifying 3,932,058 and 3,122,416 TEs, respectively. We than extracted TEs located within human and mouse genes and, surprisingly, we found that 60% of TEs in both human and mouse are located in intronic sequences, even though introns comprise only 24% of the human genome. All TE families in both human and mouse can exonize. TE families that are shared between human and mouse exhibit the same percentage of TE exonization in the two species, but the exonization level of Alu, a primate-specific retroelement, is significantly greater than that of other TEs within the human genome, leading to a higher level of TE exonization in human than in mouse (1,824 exons compared with 506 exons, respectively). We detected a primate-specific mechanism for intron gain, in which Alu insertion into an exon creates a new intron located in the 3' untranslated region (termed 'intronization'). Finally, the insertion of TEs into the first and last exons of a gene is more frequent in human than in mouse, leading to longer exons in human.Our findings reveal many effects of TEs on these two transcriptomes. These effects are substantially greater in human than in mouse, which is due to the presence of Alu elements in human.The completion of the human and mouse genome draft sequences confirmed that transposed elements (TEs) play a major role in shaping mammalian genomes [1,2]. Transposed elements comprise at least 45% of the human and 37% of the mouse genomes. In the human genome, Alu is the most abundant transposed element (TE), comprising more than one million copies, which is about 10% of the genome. We previously reported that more than 5% of the alternatively spliced internal exons in the human genome are derived from Alu, and to the best of our knowledge all Alu-driven exons originated from exonization of intronic sequences [3,4]. Alu elements were shown to create alternative cassette exons, whereas exonization of a consti
Genome comparison without alignment using shortest unique substrings
Bernhard Haubold, Nora Pierstorff, Friedrich M?ller, Thomas Wiehe
BMC Bioinformatics , 2005, DOI: 10.1186/1471-2105-6-123
Abstract: We find that the shortest unique substrings in Caenorhabditis elegans, human and mouse are no longer than 11 bp in the autosomes of these organisms. In mouse and human these unique substrings are significantly clustered in upstream regions of known genes. Moreover, the probability of finding such short unique substrings in the genomes of human or mouse by chance is extremely small. We derive an analytical expression for the null distribution of shortest unique substrings, given the GC-content of the query sequences. Furthermore, we apply our method to rapidly detect unique genomic regions in the genome of Staphylococcus aureus strain MSSA476 compared to four other staphylococcal genomes.We combine a method to rapidly search for shortest unique substrings in DNA sequences and a derivation of their null distribution. We show that unique regions in an arbitrary sample of genomes can be efficiently detected with this method. The corresponding programs shustring (SHortest Unique subSTRING) and shulen are written in C and available at http://adenine.biz.fh-weihenstephan.de/shustring/ webcite.Sequence comparison is traditionally carried out using alignments. The alignment procedure ensures that only homologous positions are compared and corresponding algorithms form the classical core of bioinformatics [1-3]. Once a sequence alignment has been computed, it can be used to determine, for example, signature oligonucleotides or unique genomic regions among a group of closely related organisms.Perhaps surprisingly, the applications of alignments just mentioned – signature oligos and detection of unique genomic regions – do not necessarily involve an alignment step. Since the computation of alignments tends to take time proportional to the product of the lengths of the sampled sequences, elimination of this step often leads to dramatic increases in the speed of sequence analysis algorithms [4].Our method of alignment-free sequence comparison is based on the idea of "shortest uni
Complex Loci in Human and Mouse Genomes  [PDF]
P?r G Engstr?m,Harukazu Suzuki,Noriko Ninomiya,Altuna Akalin,Luca Sessa,Giovanni Lavorgna,Alessandro Brozzi,Lucilla Luzi,Sin Lam Tan,Liang Yang,Galih Kunarso,Edwin Lian-Chong Ng,Serge Batalov,Claes Wahlestedt,Chikatoshi Kai,Jun Kawai,Piero Carninci,Yoshihide Hayashizaki,Christine Wells,Vladimir B Bajic,Valerio Orlando,James F Reid,Boris Lenhard ,Leonard Lipovich
PLOS Genetics , 2006, DOI: 10.1371/journal.pgen.0020047
Abstract: Mammalian genomes harbor a larger than expected number of complex loci, in which multiple genes are coupled by shared transcribed regions in antisense orientation and/or by bidirectional core promoters. To determine the incidence, functional significance, and evolutionary context of mammalian complex loci, we identified and characterized 5,248 cis–antisense pairs, 1,638 bidirectional promoters, and 1,153 chains of multiple cis–antisense and/or bidirectionally promoted pairs from 36,606 mouse transcriptional units (TUs), along with 6,141 cis–antisense pairs, 2,113 bidirectional promoters, and 1,480 chains from 42,887 human TUs. In both human and mouse, 25% of TUs resided in cis–antisense pairs, only 17% of which were conserved between the two organisms, indicating frequent species specificity of antisense gene arrangements. A sampling approach indicated that over 40% of all TUs might actually be in cis–antisense pairs, and that only a minority of these arrangements are likely to be conserved between human and mouse. Bidirectional promoters were characterized by variable transcriptional start sites and an identifiable midpoint at which overall sequence composition changed strand and the direction of transcriptional initiation switched. In microarray data covering a wide range of mouse tissues, genes in cis–antisense and bidirectionally promoted arrangement showed a higher probability of being coordinately expressed than random pairs of genes. In a case study on homeotic loci, we observed extensive transcription of nonconserved sequences on the noncoding strand, implying that the presence rather than the sequence of these transcripts is of functional importance. Complex loci are ubiquitous, host numerous nonconserved gene structures and lineage-specific exonification events, and may have a cis-regulatory impact on the member genes.
Complex Loci in human and mouse genomes.  [cached]
Engstr?m P?r G,Suzuki Harukazu,Ninomiya Noriko,Akalin Altuna
PLOS Genetics , 2006,
Abstract: Mammalian genomes harbor a larger than expected number of complex loci, in which multiple genes are coupled by shared transcribed regions in antisense orientation and/or by bidirectional core promoters. To determine the incidence, functional significance, and evolutionary context of mammalian complex loci, we identified and characterized 5,248 cis-antisense pairs, 1,638 bidirectional promoters, and 1,153 chains of multiple cis-antisense and/or bidirectionally promoted pairs from 36,606 mouse transcriptional units (TUs), along with 6,141 cis-antisense pairs, 2,113 bidirectional promoters, and 1,480 chains from 42,887 human TUs. In both human and mouse, 25% of TUs resided in cis-antisense pairs, only 17% of which were conserved between the two organisms, indicating frequent species specificity of antisense gene arrangements. A sampling approach indicated that over 40% of all TUs might actually be in cis-antisense pairs, and that only a minority of these arrangements are likely to be conserved between human and mouse. Bidirectional promoters were characterized by variable transcriptional start sites and an identifiable midpoint at which overall sequence composition changed strand and the direction of transcriptional initiation switched. In microarray data covering a wide range of mouse tissues, genes in cis-antisense and bidirectionally promoted arrangement showed a higher probability of being coordinately expressed than random pairs of genes. In a case study on homeotic loci, we observed extensive transcription of nonconserved sequences on the noncoding strand, implying that the presence rather than the sequence of these transcripts is of functional importance. Complex loci are ubiquitous, host numerous nonconserved gene structures and lineage-specific exonification events, and may have a cis-regulatory impact on the member genes.
Survival strategies for transposons and genomes
Sandra L Martin, David J Garfinkel
Genome Biology , 2003, DOI: 10.1186/gb-2003-4-4-313
Abstract: The Keystone Symposium "Transposition and other genome rearrangements" covered every level of genome dynamics. The topics ranged from the mechanistic details of transposition and site-specific recombination at the atomic level, through to interactions between transposable elements and the genomes in which they reside that enhance, prevent or control transposon movement, and on to the recognition and classification of novel mobile elements. Recently released genome sequences of human, mouse, mosquito, Chlamydomonas, rice, and fission yeast provided a rich context for a lively and informative meeting. Here, we highlight the presentations that are of special interest to the readership of Genome Biology, as they emphasized how transposable elements and their hosts interact in the 'genomic ecosystem' (Figure 1).Transposons often persist in genomes over millions of years. This requires an exquisite balance between replication and suppression of their activity. The mechanisms used to achieve this balance are as unique and varied as the mechanisms of transposition, which may be either DNA-based (classical transposition) or RNA-based (retrotransposition). Retrotransposons have been particularly successful in eukaryotic genomes, and this success may - as suggested by Jef Boeke (Johns Hopkins University School of Medicine, Baltimore, USA) in his keynote address - reflect enhanced persistence in eukaryotes of the RNA intermediates necessary for reverse transcription. New elements of all classes can be identified in genomic sequences because they occur in multiple copies throughout the genome and on the basis of stereotypic sequence features, such as short target-site duplications, longer direct or inverted repeats, and protein-coding sequences identifiable as transposase or reverse transcriptase.Not all elements are equally successful in all genomes. For example, in maize retrotransposons comprise 70% of the genome, compared to only 20% in rice. In plants, retrotransposons tend
Computational comparison of two mouse draft genomes and the human golden path
Zhenyu Xuan, Jinhua Wang, Michael Q Zhang
Genome Biology , 2002, DOI: 10.1186/gb-2002-4-1-r1
Abstract: We present here a critical comparison of the two latest mouse genome assemblies. The utility of the combined genomes is further demonstrated by comparing them with the human 'golden path' and through a subsequent analysis of a resulting conserved sequence element (CSE) database, which allows us to identify over 6,000 potential novel genes and to derive independent estimates of the number of human protein-coding genes.The Celera and public mouse assemblies differ in about 10% of the mouse genome. Each assembly has advantages over the other: Celera has higher accuracy in base-pairs and overall higher coverage of the genome; the public assembly, however, has higher sequence quality in some newly finished bacterial artifical chromosome clone (BAC) regions and the data are freely accessible. Perhaps most important, by combining both assemblies, we can get a better annotation of the human genome; in particular, we can obtain the most complete set of CSEs, one third of which are related to known genes and some others are related to other functional genomic regions. More than half the CSEs are of unknown function. From the CSEs, we estimate the total number of human protein-coding genes to be about 40,000. This searchable publicly available online CSEdb will expedite new discoveries through comparative genomics.In May 2002, two new mouse genome assemblies were released. One was the second version of the mouse genome assembly from Celera Genomics, created by using both private and public sequence information (denoted Cel2 [1]), and the other was the third version of the assembly from the public Mouse Genome Sequencing Consortium (denoted MGSCv3 [2]). Both these draft mouse genomes were obtained using a whole-genome shotgun (WGS) strategy, but using different mouse strains and distinct sequence-assembly algorithms.Assembled by direct overlapping sequence fragments, Cel2 has about 260,000 contigs with a total size of 2.51 × 109 base-pairs (2.51 gigabases (Gb)), whereas MGSCv3
Overlapping genes in the human and mouse genomes
Chaitanya R Sanna, Wen-Hsiung Li, Liqing Zhang
BMC Genomics , 2008, DOI: 10.1186/1471-2164-9-169
Abstract: About 10% of the genes under study are overlapping genes, the majority of which are different-strand overlaps. The majority of the same-strand overlaps are embedded forms, whereas most different-strand overlaps are not embedded and in the convergent transcription orientation. Most of the same-strand overlapping gene pairs show at least a tenfold difference in length, much larger than the length difference between non-overlapping neighboring gene pairs. The length difference between the two different-strand overlapping genes is less dramatic. Over 27% of the different-strand-overlap relationships are shared between human and mouse, compared to only ~8% conservation for same-strand-overlap relationships. More than 96% of the same-strand and different-strand overlaps that are not shared between human and mouse have both genes located on the same chromosomes in the species that does not show the overlap. We examined the causes of transition between the overlapping and non-overlapping states in the two species and found that 3' UTR change plays an important role in the transition.Our study contributes to the understanding of the evolutionary transition between overlapping genes and non-overlapping genes and demonstrates the high rates of evolutionary changes in the un-translated regions.Overlapping genes are known to be common in viruses, mitochondria, bacteria, and plasmids [1], but are thought to be rare in eukaryotes. This view is changing because recent studies have suggested the existence of many overlapping genes in eukaryotic genomes, including human [2-5], mouse [6], rat [7], fish [8], and flies [9,10].There are two principal types of overlap: (1) the same-strand overlapping type in which the two genes involved are transcribed from the same strand and (2) the different-strand overlapping type in which the two genes are transcribed from different strands. Most of the recent large-scale analyses in higher eukaryotes have been restricted to different-strand-overlap
Gene expression regulation in the context of mouse interspecific mosaic genomes
David L'H?te, Catherine Serres, Reiner A Veitia, Xavier Montagutelli, Ahmad Oulmouden, Daniel Vaiman
Genome Biology , 2008, DOI: 10.1186/gb-2008-9-8-r133
Abstract: Most genes (75%) were not transcriptionally modified either in the IRCSs or in the parent M. spretus mice, compared to M. musculus. The expression levels of most of the remaining transcripts were 'dictated' by either M. musculus transcription factors ('trans-driven'; 20%), or M. spretus cis-acting elements ('cis-driven'; 4%). Finally, 1% of transcripts were dysregulated following a cis-trans mismatch. We observed a higher sequence divergence between M. spretus and M. musculus promoters of strongly dysregulated genes than in promoters of similarly expressed genes.Our study indicates that it is possible to classify the molecular events leading to expressional alterations when a homozygous graft of foreign genome segments is made in an interspecific host genome. The inadequacy of transcription factors of this host genome to recognize the foreign targets was clearly the major path leading to dysregulation.Speciation is defined as the evolutionary process generating new species. It relies on reproductive isolation leading to the separate evolution of genomes. In the 'house mouse species complex' genomic exchanges do occur, and the laboratory mouse itself is considered as a mosaic of other subspecies. Indeed, laboratory mouse strains have originated from a limited number of founder populations of mixed genetic constitution [1,2].A recent analysis of the fine structure of single nucleotide polymorphism (SNP) variation in the mouse genome revealed the existence of long segments with extremely high levels of polymorphism (one-third of the genome). This highly polymorphic subgenome is expected to originate partly from multiple subspecies [2], which suggests that the genomes of inbred strains (that is, Mus musculus) are mosaics of chromosome segments derived from other subspecies [1]. These results have been confirmed and extended to other mouse strains derived from the wild [3].In spite of the accumulating evidence pointing to the mosaic nature of the inbred mouse genome in s
The fine-scale architecture of structural variants in 17 mouse genomes
Binnaz Yalcin, Kim Wong, Amarjit Bhomra, Martin Goodson, Thomas M Keane, David J Adams, Jonathan Flint
Genome Biology , 2012, DOI: 10.1186/gb-2012-13-3-r18
Abstract: By visual inspection of 100 Mbp of genome to which next generation sequence data from 17 inbred mouse strains had been aligned, we identify and interpret 21 paired-end mapping patterns, which we validate by PCR. These paired-end mapping patterns reveal a greater diversity and complexity in SVs than previously recognized. In addition, Sanger-based sequence analysis of 4,176 breakpoints at 261 SV sites reveal additional complexity at approximately a quarter of structural variants analyzed. We find micro-deletions and micro-insertions at SV breakpoints, ranging from 1 to 107 bp, and SNPs that extend breakpoint micro-homology and may catalyze SV formation.An integrative approach using experimental analyses to train computational SV calling is essential for the accurate resolution of the architecture of SVs. We find considerable complexity in SV formation; about a quarter of SVs in the mouse are composed of a complex mixture of deletion, insertion, inversion and copy number gain. Computational methods can be adapted to identify most paired-end mapping patterns.The identification of structural variants (SVs) in mammalian genomes [1-4] has important implications for our understanding of genetic diversity, has elucidated the concept of genomic disorders [5,6] and has improved the analysis of genetic association in common and rare diseases [7-12], cancer development [13] and genomic evolution [14,15]. However, the accurate identification of SVs in mammalian genomes remains challenging.Next generation sequencing provides a novel approach for identifying structural variants [16] and exploits read-pair information [17,18], split reads [19,20], read depth [21] and sequence assembly [22] to localize SVs. Typically, variation in the expected number of reads mapping to the reference sequence is used to identify copy number variants while deviations from the expected distance between reads, and the orientation of reads, is used to infer the presence and type of structural variant at
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.