Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99


Any time

2020 ( 2 )

2019 ( 212 )

2018 ( 376 )

2017 ( 391 )

Custom range...

Search Results: 1 - 10 of 211055 matches for " Arthur L Delcher "
All listed articles are free for downloading (OA Articles)
Page 1 /211055
Display every page Item
High-throughput sequence alignment using Graphics Processing Units
Michael C Schatz, Cole Trapnell, Arthur L Delcher, Amitabh Varshney
BMC Bioinformatics , 2007, DOI: 10.1186/1471-2105-8-474
Abstract: This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies.MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.Sequence alignment has a long history in genomics research and continues to be a key component in the analysis of genes and genomes. Simply stated, sequence alignment algorithms find regions in one sequence, called here the query sequence, that are similar or identical to regions in another sequence, called the reference sequence. Such regions may represent genes, conserved regulatory regions, or any of a host of other sequence features. Alignment also plays a central role in de novo and comparative genome assembly [1,2], where thousands or millions of sequencing reads are aligned to each other or to a previously sequenced reference genome. New, inexpensive large-scale sequencing technologies [3] can now generate enormous amounts of sequence data in a very short time, enabling researchers to attempt genome sequencing projects on a much larger scale than previously. Aligning these sequence data using c
Logarithmic-Time Updates and Queries in Probabilistic Networks
Arthur L. Delcher,Adam J. Grove,Simon Kasif,Judea Pearl
Computer Science , 2014,
Abstract: In this paper we propose a dynamic data structure that supports efficient algorithms for updating and querying singly connected Bayesian networks (causal trees and polytrees). In the conventional algorithms, new evidence in absorbed in time O(1) and queries are processed in time O(N), where N is the size of the network. We propose a practical algorithm which, after a preprocessing phase, allows us to answer queries in time O(log N) at the expense of O(logn N) time per evidence absorption. The usefulness of sub-linear processing time manifests itself in applications requiring (near) real-time response over large probabilistic databases.
Versatile and open software for comparing large genomes
Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu, Steven L Salzberg
Genome Biology , 2004, DOI: 10.1186/gb-2004-5-2-r12
Abstract: Genome sequence comparison has been an important method for understanding gene function and genome evolution since the early days of gene sequencing. The pairwise sequence-comparison methods implemented in BLAST [1] and FASTA [2] have proved invaluable in discovering the evolutionary relationships and functions of thousands of proteins from hundreds of different species. The most commonly used application of these sequence-analysis programs is for comparing a single gene (either a DNA sequence or the protein translation of that sequence) to a large database of other genes. The results of such protein and nucleotide database searches have been used in recent years as the basis for assigning function to most of the newly discovered genes emerging from genome projects. In recent years, an important new sequence-analysis task has emerged: comparing an entire genome with another. Until 1999, each new genome published was so distant from all previous genomes that aligning them would not yield interesting results. With the publication of the second strain of Helicobacter pylori [3] in 1999, following the publication of the first strain [4] in 1997, the scientific world had its first chance to look at two complete bacterial genomes whose DNA sequences lined up very closely. Comparison of these genomes revealed an overall genomic structure that was very similar, but showed evidence of two large inversion events centered on the replication origin. The comparison also made it clear that a new type of bioinformatics program was needed, one that could efficiently compare two megabase-scale sequences, something that BLAST cannot do. In response to this need, TIGR released MUMmer 1.0, the first system that could perform genome comparisons of this scale [5]. The first two releases of MUMmer had over 1,600 site licensees, a number that has grown since moving to an open-source license in May 2003.The number of pairs of closely related genomes has increased dramatically in recent year
Minimus: a fast, lightweight genome assembler
Daniel D Sommer, Arthur L Delcher, Steven L Salzberg, Mihai Pop
BMC Bioinformatics , 2007, DOI: 10.1186/1471-2105-8-64
Abstract: We have developed the Minimus assembler to address these issues, and tested it on a range of assembly problems. We show that Minimus performs well on several small assembly tasks, including the assembly of viral genomes, individual genes, and BAC clones. In addition, we evaluate Minimus' performance in assembling bacterial genomes in order to assess its suitability as a component of a larger assembly pipeline. We show that, unlike other software currently used for these tasks, Minimus produces significantly fewer assembly errors, at the cost of generating a more fragmented assembly.We find that for small genomes and other small assembly tasks, Minimus is faster and far more flexible than existing tools. Due to its small size and modular design Minimus is perfectly suited to be a component of complex assembly pipelines. Minimus is released as an open-source software project and the code is available as part of the AMOS project at Sourceforge.With the advent of whole-genome shotgun (WGS) sequencing in the mid-1990s, the genomics community had an urgent need for software that could process tens of thousands of individual sequence "reads" and assemble those into the genome from which they had come. The first generation of assemblers, including TIGR Assembler [1], phrap [2], and CAP3 [3], were able to assemble small- to medium-sized bacterial genomes, often requiring several weeks of computer time on the fastest computers then available. As sequencing technology improved, ever larger projects were attempted with the WGS method, and it became clear that new methods were needed. For the 130 million base pair (Mbp) genome of the fruit fly Drosophila melanogaster, an entirely new assembler was developed [4], which incorporated many new ideas about efficient memory usage and sophisticated repeat processing. The Celera Assembler (CelAsm) was also the first algorithm to use mate pair information to any serious degree: taking advantage of the fact that most reads in a WGS projec
Efficient decoding algorithms for generalized hidden Markov model gene finders
William H Majoros, Mihaela Pertea, Arthur L Delcher, Steven L Salzberg
BMC Bioinformatics , 2005, DOI: 10.1186/1471-2105-6-16
Abstract: As a first step toward addressing the implementation challenges of these next-generation systems, we describe in detail two software architectures for GHMM-based gene finders, one comprising the common array-based approach, and the other a highly optimized algorithm which requires significantly less memory while achieving virtually identical speed. We then show how both of these architectures can be accelerated by a factor of two by optimizing their content sensors. We finish with a brief illustration of the impact these optimizations have had on the feasibility of our new homology-based gene finder, TWAIN.In describing a number of optimizations for GHMM-based gene finders and making available two complete open-source software systems embodying these methods, it is our hope that others will be more enabled to explore promising extensions to the GHMM framework, thereby improving the state-of-the-art in gene prediction techniques.Generalized Hidden Markov Models have seen wide use in recent years in the field of computational gene prediction. A number of ab initio gene-finding programs are now available which utilize this mathematical framework internally for the modeling and evaluation of gene structure [1-6], and newer systems are now emerging which expand this framework by simultaneously modeling two genomes at once, in order to harness the mutually informative signals present in homologous gene structures from recently diverged species. As greater numbers of such genomes become available, it is tempting to consider the possibility of integrating all this information into increasingly complex models of gene structure and evolution.Notwithstanding our eagerness to utilize this expected flood of genomic data, methods have yet to be demonstrated which can perform such large-scale parallel analyses without requiring inordinate computational resources. In the case of Generalized Pair HMMs (GPHMMs), for example, the only systems in existence of which we are familiar make
Core Gene Set As the Basis of Multilocus Sequence Analysis of the Subclass Actinobacteridae
To?di Adékambi,Ray W. Butler,Finnian Hanrahan,Arthur L. Delcher,Michel Drancourt,Thomas M. Shinnick
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0014792
Abstract: Comparative genomic sequencing is shedding new light on bacterial identification, taxonomy and phylogeny. An in silico assessment of a core gene set necessary for cellular functioning was made to determine a consensus set of genes that would be useful for the identification, taxonomy and phylogeny of the species belonging to the subclass Actinobacteridae which contained two orders Actinomycetales and Bifidobacteriales. The subclass Actinobacteridae comprised about 85% of the actinobacteria families. The following recommended criteria were used to establish a comprehensive gene set; the gene should (i) be long enough to contain phylogenetically useful information, (ii) not be subject to horizontal gene transfer, (iii) be a single copy (iv) have at least two regions sufficiently conserved that allow the design of amplification and sequencing primers and (v) predict whole-genome relationships. We applied these constraints to 50 different Actinobacteridae genomes and made 1,224 pairwise comparisons of the genome conserved regions and gene fragments obtained by using Sequence VARiability Analysis Program (SVARAP), which allow designing the primers. Following a comparative statistical modeling phase, 3 gene fragments were selected, ychF, rpoB, and secY with R2>0.85. Selected sets of broad range primers were tested from the 3 gene fragments and were demonstrated to be useful for amplification and sequencing of 25 species belonging to 9 genera of Actinobacteridae. The intraspecies similarities were 96.3–100% for ychF, 97.8–100% for rpoB and 96.9–100% for secY among 73 strains belonging to 15 species of the subclass Actinobacteridae compare to 99.4–100% for 16S rRNA. The phylogenetic topology obtained from the combined datasets ychF+rpoB+secY was globally similar to that inferred from the 16S rRNA but with higher confidence. It was concluded that multi-locus sequence analysis using core gene set might represent the first consensus and valid approach for investigating the bacterial identification, phylogeny and taxonomy.
Correction: Serendipitous discovery of Wolbachia genomes in multiple Drosophila species
Steven L Salzberg, Julie Dunning Hotopp, Arthur L Delcher, Mihai Pop, Douglas R Smith, Michael B Eisen, William C Nelson
Genome Biology , 2005, DOI: 10.1186/gb-2005-6-7-402
Abstract: While searching the Trace Archive to verify this correction, however, one of us (S.L.S.) found that the traces for a new fly sequencing project, that of D. willistoni, had just been deposited. On searching the D. willistoni traces, a substantial Wolbachia infection in this species was discovered and 2,291 sequences belonging to Wolbachia were found. They were assembled into 485 contigs using the comparative assembler AMOS-Cmp [2] and the methods described in [1]. These sequences and assemblies are freely available for download from [3].We thank Therese Markow of the University of Arizona for bringing this error in the Trace Archive data to our attention, and Jack Werren of the University of Rochester for suggesting that D. willistoni might have a Wolbachia infection.
Serendipitous discovery of Wolbachia genomes in multiple Drosophila species
Steven L Salzberg, Julie Hotopp, Arthur L Delcher, Mihai Pop, Douglas R Smith, Michael B Eisen, William C Nelson
Genome Biology , 2005, DOI: 10.1186/gb-2005-6-3-r23
Abstract: By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome.The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.Large-scale sequencing projects continue to generate a growing number of new genomes from an ever-wider range of species. A rarely noted and unappreciated side effect of some projects occurs when the organism being sequenced contains an intracellular endosymbiont. In some cases, the existence of the endosymbiont is unknown to both the sequencing center and the laboratory providing the source DNA. Fortunately, many genome projects deposit all their raw sequence data into a publicly available, unrestricted repository known as the Trace Archive [1]. By conducting large-scale searches of the Trace Archive, one can discover the presence of these endosymbionts and, with the aid of bioinformatics tools including genome assembly algorithms, reconstruct some or most of the endosymbiont genomes.The amount of endosymbio
A whole-genome assembly of the domestic cow, Bos taurus
Aleksey V Zimin, Arthur L Delcher, Liliana Florea, David R Kelley, Michael C Schatz, Daniela Puiu, Finnian Hanrahan, Geo Pertea, Curtis P Van Tassell, Tad S Sonstegard, Guillaume Mar?ais, Michael Roberts, Poorani Subramanian, James A Yorke, Steven L Salzberg
Genome Biology , 2009, DOI: 10.1186/gb-2009-10-4-r42
Abstract: We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions.By using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome.Seven years after the first whole-genome assembly of the human genome [1], sequencing and assembly of mammalian genomes has become almost routine. However, despite the continuing progress on sequencing technology, the assembly problem is far from solved. Assemblies of large genomes contain numerous errors, and many years of work can be dedicated to correcting errors and improving an assembly [2]. Technical progress in computational assembly methods offers the potential to make many of these improvements far faster and more efficiently than would be possible by laboratory methods.Having an accurate assembly of the genome of an important species provides an invaluable substrate for future research. For example, studies of genetic diversity need a good reference genome in order to catalog differences in new strains or lineages. Expression analyses that sequence RNA from various tissues rely on the genome to map out gene models and to discover such features as alterna
Logarithmic-Time Updates and Queries in Probabilistic Networks
A. L. Delcher,A. J. Grove,S. Kasif,J. Pearl
Computer Science , 1996,
Abstract: Traditional databases commonly support efficient query and update procedures that operate in time which is sublinear in the size of the database. Our goal in this paper is to take a first step toward dynamic reasoning in probabilistic databases with comparable efficiency. We propose a dynamic data structure that supports efficient algorithms for updating and querying singly connected Bayesian networks. In the conventional algorithm, new evidence is absorbed in O(1) time and queries are processed in time O(N), where N is the size of the network. We propose an algorithm which, after a preprocessing phase, allows us to answer queries in time O(log N) at the expense of O(log N) time per evidence absorption. The usefulness of sub-linear processing time manifests itself in applications requiring (near) real-time response over large probabilistic databases. We briefly discuss a potential application of dynamic probabilistic reasoning in computational biology.
Page 1 /211055
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.