oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2019 ( 202 )

2018 ( 378 )

2017 ( 390 )

2016 ( 434 )

Custom range...

Search Results: 1 - 10 of 216112 matches for " Steven L Salzberg "
All listed articles are free for downloading (OA Articles)
Page 1 /216112
Display every page Item
Genome re-annotation: a wiki solution?
Steven L Salzberg
Genome Biology , 2007, DOI: 10.1186/gb-2007-8-1-102
Abstract: So you think that gene you just retrieved from GenBank [1] is correct? Are you certain? If it is a eukaryotic gene, and especially if it is from an unfinished genome, there is a pretty good chance that the amino acid sequence is wrong. And depending on when the genome was sequenced and annotated, there is a chance that the description of its function is wrong too.Large-scale genome sequencing has revolutionized biology over the past ten years, generating vast amounts of new information that has radically transformed our understanding of hundreds of species, including ourselves. Sequencing centers continue to churn out new DNA sequences for a fantastic variety of species, covering more and more of the tree of life. Along with these sequences, the centers also produce genome annotation, which includes the locations and descriptions of all identifiable genes. These gene lists are the first pictures we get of what's inside a newly sequenced genome, and they can reveal key insights into what makes an organism distinctive. Sometimes the gene lists themselves are part of the story; for example, when the human genome was published [2,3], the headline was that humans have 'only' 25,000 genes, in contrast to earlier estimates of 100,000 or more. For many microbial species, the genome helps us to understand how the organism can accomplish something particularly difficult, such as how Deinococcus radiodurans (to cite just one of many examples) can withstand exposure to radiation levels far in excess of what a human could tolerate [4]. With each new human pathogen, the gene list helps us determine how the organism infects humans, how it causes sickness and (sometimes) how it becomes resistant to antibiotics. For these and other reasons, the accuracy of the gene list is tremendously important.Before addressing the problems with annotation, I will first summarize how it is done. The process of sequencing and annotating the DNA of a bacterial species has become highly automated in
Re-Assembly of the Genome of Francisella tularensis Subsp. holarctica OSU18
Daniela Puiu, Steven L. Salzberg
PLOS ONE , 2008, DOI: 10.1371/journal.pone.0003427
Abstract: Francisella tularensis is a highly infectious human intracellular pathogen that is the causative agent of tularemia. It occurs in several major subtypes, including the live vaccine strain holarctica (type B). F. tularensis is classified as category A biodefense agent in part because a relatively small number of organisms can cause severe illness. Three complete genomes of subspecies holarctica have been sequenced and deposited in public archives, of which OSU18 was the first and the only strain for which a scientific publication has appeared [1]. We re-assembled the OSU18 strain using both de novo and comparative assembly techniques, and found that the published sequence has two large inversion mis-assemblies. We generated a corrected assembly of the entire genome along with detailed information on the placement of individual reads within the assembly. This assembly will provide a more accurate basis for future comparative studies of this pathogen.
Do-it-yourself genetic testing
Steven L Salzberg, Mihaela Pertea
Genome Biology , 2010, DOI: 10.1186/gb-2010-11-10-404
Abstract: As we learn more about the associations between genes and disease, a growing number of diagnostic tests have been developed to detect mutations that increase the risks of various diseases. However, anyone who wants to develop a diagnostic test or a treatment based on human genes faces a potential roadblock: gene patents. A 2005 study [1] reported that 4,382 human genes (~20% of the total number in our genome) are covered by patents or other intellectual property claims. These patents cover a wide range of methods for assaying the DNA sequence of an individual for the presence of disease-associated mutations. For example, one of the most consequential gene patents covers mutations in the BRCA1 [2] and BRCA2 [3] genes, which are associated with a significantly increased risk of breast and ovarian cancer [4-6]. The BRCA gene patents, which are held by Myriad Genetics, cover all known cancer-causing mutations in addition to those that might be discovered in the future. No one can develop a commercial diagnostic test or a treatment based on the BRCA gene sequences without a license from Myriad. Although a US federal court recently overturned seven of Myriad's BRCA patents, Myriad is appealing the ruling, and it holds 16 other BRCA-related patents that it claims are unaffected by the court's ruling [7].As the cost of DNA sequencing falls, the idea of testing for mutations one gene at a time is rapidly becoming obsolete. We are also rapidly approaching the day when it will be cheaper to fully sequence a genome before testing the sequence for all known genetic mutations associated with a given disease than to conduct multiple separate tests for each gene. Currently Myriad charges more than $3000 for its tests on the BRCA genes, while sequencing one's entire genome now costs less than $20,000. Furthermore, once an individual's genome has been sequenced, it becomes a resource that can be re-tested as new disease-causing mutations are discovered.In contrast to whole-genome seq
Between a chicken and a grape: estimating the number of human genes
Mihaela Pertea, Steven L Salzberg
Genome Biology , 2010, DOI: 10.1186/gb-2010-11-s1-i1
Abstract:
Between a chicken and a grape: estimating the number of human genes
Mihaela Pertea, Steven L Salzberg
Genome Biology , 2010, DOI: 10.1186/gb-2010-11-5-206
Abstract: Ever since the discovery of the genetic code, scientists have been trying to catalog all the genes in the human genome. Over the years, the best estimate of the number of human genes has grown steadily smaller, but we still do not have an accurate count. Here we review the history of efforts to establish the human gene count and present the current best estimates.The first attempt to estimate the number of genes in the human genome appeared more than 45 years ago, while the genetic code was still being deciphered. Friedrich Vogel published his 'preliminary estimate' in 1964 [1], based on the number of amino acids in the alpha- and beta-chains of hemoglobin (141 and 146, respectively). Knowing that three nucleotides corresponded to each amino acid, he extrapolated to compute the molecular weight of the DNA comprising these genes. He then made several assumptions in order to produce his estimate: that these proteins were typical in size (they are actually smaller than average); that nucleotide sequences were uninterrupted on the chromosomes (introns were discovered more than 10 years later [2,3]); and that the entire genome was protein coding. All these assumptions were reasonable at the time, but later discoveries would reveal that none of them was correct. Vogel then used the molecular weight of the human haploid chromosomes to correctly calculate the genome size as 3 × 109 nucleotides, and dividing that by the size of a 'typical' gene, came up with an estimate of 6.7 million genes.Even at the time, Vogel found this number 'disturbingly high', but no one suspected in 1964 that most human genes were interrupted by multiple introns, nor did anyone know that vast regions of the human genome would turn out to contain seemingly meaningless repetitive sequences. Since Vogel's initial attempt, many scientists have tried to estimate the number of genes in the human genome, using increasingly sophisticated molecular tools. Over the years, the number has gradually come down,
TopHat-Fusion: an algorithm for discovery of novel fusion transcripts
Daehwan Kim, Steven L Salzberg
Genome Biology , 2011, DOI: 10.1186/gb-2011-12-8-r72
Abstract: Direct sequencing of messenger RNA transcripts using the RNA-seq protocol [1-3] is rapidly becoming the method of choice for detecting and quantifying all the genes being expressed in a cell [4]. One advantage of RNA-seq is that, unlike microarray expression techniques, it does not rely on pre-existing knowledge of gene content, and therefore it can detect entirely novel genes and novel splice variants of existing genes. In order to detect novel genes, however, the software used to analyze RNA-seq experiments must be able to align the transcript sequences anywhere on the genome, without relying on existing annotation. TopHat [5] was one of the first spliced alignment programs able to perform such ab initio spliced alignment, and in combination with the Cufflinks program [6], it is part of a software analysis suite that can detect and quantify the complete set of genes captured by an RNA-seq experiment.In addition to detection of novel genes, RNA-seq has the potential to discover genes created by complex chromosomal rearrangements. 'Fusion' genes formed by the breakage and re-joining of two different chromosomes have repeatedly been implicated in the development of cancer, notably the BCR/ABL1 gene fusion in chronic myeloid leukemia [7-9]. Fusion genes can also be created by the breakage and rearrangement of a single chromosome, bringing together transcribed sequences that are normally separate. As of early 2011, the Mitelman database [10] documented nearly 60,000 cases of chromosome aberrations and gene fusions in cancer. Discovering these fusions via RNA-seq has a distinct advantage over whole-genome sequencing, due to the fact that in the highly rearranged genomes of some tumor samples, many rearrangements might be present although only a fraction might alter transcription. RNA-seq identifies only those chromosomal fusion events that produce transcripts. It has the further advantage that it allows one to detect multiple alternative splice variants that might be pr
2009 Swine-Origin Influenza A (H1N1) Resembles Previous Influenza Isolates
Carl Kingsford, Niranjan Nagarajan, Steven L. Salzberg
PLOS ONE , 2009, DOI: 10.1371/journal.pone.0006402
Abstract: Background In April 2009, novel swine-origin influenza viruses (S-OIV) were identified in patients from Mexico and the United States. The viruses were genetically characterized as a novel influenza A (H1N1) strain originating in swine, and within a very short time the S-OIV strain spread across the globe via human-to-human contact. Methodology We conducted a comprehensive computational search of all available sequences of the surface proteins of H1N1 swine influenza isolates and found that a similar strain to S-OIV appeared in Thailand in 2000. The earlier isolates caused infections in pigs but only one sequenced human case, A/Thailand/271/2005 (H1N1). Significance Differences between the Thai cases and S-OIV may help shed light on the ability of the current outbreak strain to spread rapidly among humans.
Detection and correction of false segmental duplications caused by genome mis-assembly
David R Kelley, Steven L Salzberg
Genome Biology , 2010, DOI: 10.1186/gb-2010-11-3-r28
Abstract: Ever since the publication of the Drosophila melanogaster genome [1], large-scale eukaryotic sequencing projects have increasingly used the whole-genome shotgun (WGS) strategy to sequence and assemble genomes. Algorithms to assemble a genome from WGS data have grown increasingly sophisticated, but problems nonetheless remain, and despite the ever-accelerating pace of 'complete' genome announcements, not a single vertebrate genome is truly complete. While it is widely known that draft assemblies contain gaps, the extent of errors in published assemblies is less well known.One particular type of error that confounds analysis is an erroneously duplicated sequence. Duplications involving large genomic regions, known as segmental duplications, have been the subject of intensive study in the human genome [2,3] and other species (for example, [4,5]). Although much effort has gone into avoiding the problem of artificially collapsing duplicated regions [6], less attention has been paid to the assembly processes that improperly reconstruct duplicated regions from WGS data, which is a problem for assembly of diploid organisms. Genome assembly software is generally designed as if the sequencing data ('reads') were derived from a clonal, haploid chromosome. This was indeed the case for early WGS projects, which targeted bacteria [7] or archaea [8], but in general is not true for more genetically complex organisms like vertebrates. Diploid organisms inevitably have differences between their two copies of each chromosome, and these differences complicate assembly. This problem can be alleviated somewhat by choosing highly inbred individuals with few differences between chromosomes for sequencing. But for many species such inbred lines are not available, and for others the inbreeding has not resulted in the desired homozygosity [9]. Adding further to the confusion is the fact that virtually all DNA sequence databases (including GenBank, EMBL, and DDBJ) maintain only a single copy o
A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons
Jonathan E Allen, Steven L Salzberg
Algorithms for Molecular Biology , 2006, DOI: 10.1186/1748-7188-1-14
Abstract: A non-expression based statistical method is presented to annotate alternatively spliced exons using a single genome sequence and evidence from cross-species sequence conservation. The computational method is implemented in the program ExAlt and an analysis of prediction accuracy is given for Drosophila melanogaster.ExAlt identifies the structure of most alternatively spliced exons in the test set and cross-species sequence conservation is shown to improve the precision of predictions. The software package is available to run on Drosophila genomes to search for new cases of alternative splicing.High-throughput sequencing of expression data provides compelling evidence that the long held hypothesis "one gene produces one protein" is far less common than previously thought. Surveys from the human genome estimate that as many as 70% of human genes produce more than one transcribed form [1]. Examples are found in a variety of metazoan organisms confirming that a significant number of genes produce multiple distinct transcripts [2,3]. Alternative splicing is an important biological mechanism for producing multiple distinct transcripts from a single gene locus. Exon intron junctions are pieced together to produce differing mRNAs. In some cases alternative exon splicing leads to different functional proteins thereby increasing protein diversity. In other cases an alternatively spliced exon leads to non-functional mRNA, effectively regulating gene expression [3].Given an input genomic sequence and the locations of gene regions, our goal is to find the functional exons originating from each gene locus, identifying their respective amino acid codons and splice sites. Figure 1 shows examples of alternatively spliced exons examined in this study: intron retention (IR), cassette exon (CE), and multiple splice sites (MS). Also considered are constitutive exons (CS), defined to be an exon included with the same splice site boundaries in all functional mRNA forms.Gene expression pr
Clustering metagenomic sequences with interpolated Markov models
David R Kelley, Steven L Salzberg
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-544
Abstract: We present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available.SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm webcite.Over the last 15 years, DNA sequencing technologies have advanced rapidly, allowing sequencing of over one thousand microbial genomes [1]. Still, this accounts for only a sliver of the fantastic diversity of microbes on the planet [2]. Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to drive the discovery and understanding of the "unculturable majority" of species -- the vast number of unknown microbes that cannot be cultured in the laboratory [3]. Successful metagenomics projects have sequenced DNA from ocean water sampled from around the world [4], microbial communities in and on humans [5-8], and acid drainage from an abandoned mine [9]. These and many other projects (e.g. [10-12]) promise to uncover the true extent of microbial diversity and give us a better understanding of how these unknown microbes live.However, progress has been slowed by the difficulty of analysis of metagenomic data. The output from an environmental shotgun sequencing project is a large set of DNA sequence "reads" of unknown origin. Because these reads come from a diverse population of microbial strains, assembly pr
Page 1 /216112
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.