Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99


Any time

2019 ( 7 )

2018 ( 14 )

2017 ( 11 )

2016 ( 22 )

Custom range...

Search Results: 1 - 10 of 8395 matches for " Scott Mardis "
All listed articles are free for downloading (OA Articles)
Page 1 /8395
Display every page Item
Nephele: genotyping via complete composition vectors and MapReduce
Marc E Colosimo, Matthew W Peterson, Scott Mardis, Lynette Hirschman
Source Code for Biology and Medicine , 2011, DOI: 10.1186/1751-0473-6-13
Abstract: Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on traditional hardware. Nephele can use the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. We were able to generate a neighbour-joined tree of over 10,000 16S samples in less than 2 hours.We conclude that using Nephele can substantially decrease the processing time required for generating genotype trees of tens to hundreds of organisms at genome scale sequence coverage.In the post-genomic era, as sequencing becomes ever cheaper and more routine, biological sequence analysis has provided many useful tools for the study and combat of infectious disease. These tools, which can include both experimental and computational methods, are important for molecular epidemiological studies [1-3], vaccine development [4-6], and microbial forensics [7-9]. One such method is genotyping, the grouping of samples based on their genetic sequence. This can be done experimentally [10-12] or computationally, either by identifying genetic signatures (nucleotide substrings which are only found in a single group of sequences) [13], or on the basis of genetic distance among the sequences [14-16]. These methods allow a researcher to split a group of sequences into distinct partitions for further analysis. In a forensics context, genotyping a sequence can yield clues on where the sequence comes from. In surveillance, genotyping can be used to examine the evolutionary footprint of a pathogen, for example, to identify areas where certain v
Improved oligonucleotides for microarrays
Elaine Mardis
Genome Biology , 2000, DOI: 10.1186/gb-2000-1-1-reports032
Abstract: Under the conditions tested, the authors report that the best yield of inverted oligonucleotides represented about 25% of the maximum amount of material that could have been synthesized. Analysis of inversion products following in situ synthesis of mixed-length oligonucleotides, which mimic truncated synthesis products, indicated that the inversion process resulted in effective in situ oligonucleotide purification by eliminating truncated products from the solid support. The inverted products were further tested for function in minisequencing and pyrosequencing assays. Both assays indicated that DNA polymerase could add nucleotides to the 3' -OH end of an oligonucleotide resulting from inversion.There are extensive descriptions of the innovative chemical derivatization of the solid support and of the in situ synthesis protocols which enable the inversion of oligonucleotides. Because ''synthesis on planar solid supports results in a limited quantity of product'', the authors used 50-70 μm diameter polystyrene beads to obtain sufficient product to monitor the individual steps of the inversion procedure. The 5' functionality that allowed inversion was an added o-chlorophenyl phosphodiester moiety, which in the presence of a condensing agent forms a phosphodiester linkage between the 5' end of the oligonucleotide and the solid support. Analysis by capillary electrophoresis of reaction products cleaved from the solid support indicated that two major products were obtained after inversion - a mix of 5' inverted and non-inverted 19 base oligonucleotides (19mers). The 5' inverted 19mers occupied a physically distinct peak on the chromatogram as a result of the attachment of several triethyleneglycol moieties, added to facilitate cleavage of the oligonucleotides from the solid support and to distinguish inverted oligonucleotides from truncated products.According to the authors, this method is suitable for the production of oligonucleotide arrays and is compatible with all ex
Cancer genomics identifies determinants of tumor biology
Elaine R Mardis
Genome Biology , 2010, DOI: 10.1186/gb-2010-11-5-211
Abstract: Whole-genome sequencing and analysis of tumor and matched normal genomes with next-generation sequencing platforms has begun to illuminate commonly mutated genes and transcript-level events that contribute to the underlying tumor biology. To elucidate the role of frequent somatic mutations, the mutant proteins have been biochemically characterized and the results interpreted in terms of the selective advantages these variants may confer on the tumor. Certain somatic alterations have demonstrable prognostic value for specific tumor types in which they commonly occur, although their downstream metabolic signatures may obviate genotyping to identify their mutational status. The metabolic signature is a direct result of the mutation's impact on a given protein/enzyme; therefore, rather than performing sequencing to detect whether a mutation is present, metabolic profiling may be more straightforward, cheaper, and have a lower error rate, for example. New insights into the relationship between a primary tumor and its fatal metastatic disease are also beginning to emerge from genomic comparisons, with the fine detail afforded by next-generation sequencing enabling these comparisons.The transcriptomes of cancer cells also have their own unique somatic complexities, which often result from structural perturbations to the genome, but can be due to transcription-level events such as alternative splicing, RNA editing or transcript fusion. These types of alterations may explain certain aspects of tumor biology and may also be corroborated by cytogenetic phenomena. In this review, I will describe some tumor-specific alterations that were discovered as a result of analyses of unbiased genome or transcriptome sequencing data (unbiased sequencing does not select for portions of the genome or transcriptome in advance, and the entire genome or transcriptome is therefore surveyed) and then illustrate how these discoveries were pursued further to reveal insights into tumor biology that
A glimpse at tumor genome evolution
Elaine R Mardis
Genome Biology , 2011, DOI: 10.1186/gb-2011-12-s1-i10
Anticipating the $1,000 genome
Elaine R Mardis
Genome Biology , 2006, DOI: 10.1186/gb-2006-7-7-112
Abstract: In April 2003, 50 years after Watson and Crick first described the chemical structure of DNA [1], the DNA sequence that makes up the human genome was proclaimed "essentially complete" [2]. Following on from this, in October 2005, the project of the HapMap consortium to identify the locations of one million common single-nucleotide polymorphisms (SNPs) in the context of this reference human genome sequence were completed [3]. Accomplishing these two genomic milestones required the development, testing and implementation of technology platforms that could produce data at previously unprecedented throughputs, as well as of the bioinformatics tools and computational capabilities to analyze the resulting data and to interpret it in meaningful ways. It is this critical interplay of technology and bioinformatics that will usher in the next era of genome sequencing technology, commonly referred to as 'the $1,000 genome' on the basis of its targeted price per genome in US dollars; today, we find ourselves poised at the brink of this era. In this paradigm, the cost of determining an individual genome sequence would fall to a price of around $1,000, placing it firmly in the realm of advanced clinical diagnostic tests. As a result, determining a person's genome sequence might ultimately become an important first step upon entering a health insurance network or a health care provider's practice, akin to determining their height, weight and blood type, for example.Given this paradigm, one might ask why a $1,000 genome is an important or necessary goal to achieve. Fundamentally, even with the significant achievements of the HapMap Project [3], we have little context for comprehending the breadth of human genomic diversity, encompassing all types of variation beyond common single-nucleotide variants. Capturing this range of diversity, at the current cost of around $10-20 million per genome sequence, places it firmly outside the bounds of fiscal reality. Yet without this 'baseline',
The $1,000 genome, the $100,000 analysis?
Elaine R Mardis
Genome Medicine , 2010, DOI: 10.1186/gm205
Abstract: One source of difficulty in using resequencing approaches for diagnosis centers on the need to improve the quality and completeness of the human reference genome. In terms of quality, it is clear that the clone-based methods used to map, assign a minimal tiling path, and sequence the human reference genome did not yield a properly assembled or contiguous sequence equally across all loci. Lack of proper assembly is often due to collapsing of sequence within repetitive regions, such as segmental duplications, wherein genes can be found once the correct clones are identified and sequenced. At some loci, the current reference contains a single nucleotide polymorphism (SNP) that occurs at the minor allele frequency rather than being the major allele. In addition, some loci cannot be represented by a single tiling path and require multiple clone tiling paths to capture all of the sequence variations. All of these deficiencies and others not cited provide a less-than-optimal alignment target for next-generation sequencing data and can confound the analytical validity of variants necessary to properly interpret patient-derived data. Hence, although it is difficult work to perform, the ongoing efforts of the Genome Resource Consortium [1] to improve the overall completeness and correctness of the human reference genome should be enhanced.Along these lines, although projects such as the early SNP Consortium [2], the subsequent HapMap projects [3-5], and more recently the 1,000 Genomes Project [6] have identified millions of SNPs in multiple ethnic groups, there is much more diversity to the human genome than single base differences. In some ways, the broader scope of 'beyond SNP' diversity of the genome across human populations remains mysterious, including common copy number polymorphisms, large insertions and deletions, and inversions. Mining the 1,000 Genomes data using methods to identify genome-wide structural variation should augment this considerably [7], with validati
New strategies and emerging technologies for massively parallel sequencing: applications in medical research
Elaine R Mardis
Genome Medicine , 2009, DOI: 10.1186/gm40
Abstract: The human genome lies at the core of research into human disease. New technologies for obtaining genome sequence data are being combined with novel bioinformatics analyses to characterize disease samples of many types, in the hope of enhancing our fundamental understanding of susceptibility and onset for inherited diseases, of the somatic changes that take place to initiate cancers and cause metastatic disease, and of the identity and allelic spectra of pathogenic and commensal microbes that infect humans. These sequencing-based discoveries will have a major impact on medical practice, including the development of diagnostic and prognostic assays, the identification of altered proteins to which targeted therapies may be developed, the ability to predict onset and severity of disease, and an improved capability to predict our range of responses to pathogenic agents. They will also create large datasets that effectively identify each patient by their sequence information, establishing the potential of linking a patient to a disease and heightening the need to safeguard the privacy of these data through legislation against genetic discrimination.Inherited complex diseases have proved the most pervasive yet recalcitrant examples of human disease to reveal their genomic secrets. From a standpoint of statistical significance, studying inherited disease at the genomic level requires large numbers (ideally thousands) of cases (affected) versus controls (unaffected) to uncover initial findings, as well as the replication of any primary discoveries in other case-control cohorts to solidify the association of a given genomic variant(s) with disease. Although genome-wide association studies (GWAS) have been broadly applied across the spectrum of hypertension, diabetes, autism and other diseases, the identification of disease-associated genes by GWAS has so far identified mainly genes of low effect size or within regions of the genome that do not contain annotated genes, hence m
Evidence or Evidence Based Practice? An Analysis of IASL Research Forum Papers, 1998-2009
Marcia A. Mardis
Evidence Based Library and Information Practice , 2011,
Abstract: Objective - Conferences are essential opportunities for professional development and for learning about research. This study analyses papers presented in the Research Forum track of the International Association of School Librarians (IASL) conferences to determine whether the amount of school library research reporting increased or decreased over time; who (i.e., what author roles and affiliations) has written about research; which countries were represented in the research articles; what topics were discussed in research articles; and what research methodologies were used. The aim was to determine the extent to which the Research Forum provides research evidence that relates to practice. Methods - This study continues the longitudinal analysis of published school library research begun by Clyde (1996) by analyzing Research Forum papers published in IASL conference proceedings from 1998-2009 and using the same approaches and metrics as previous studies by Clyde (e.g., 1996; 2002; 2004), Clyde and Oberg (2004), and Oberg (2006). Results - Conference paper topics, author origins, quantities, and research approaches remained static through the 11 years analyzed. The analysis reveals that the papers’ authors, methods, and topics reflected those found in previous studies of school library research. As well as replicating previous studies, the role of academic research at a practitioner-based conference was investigated. Conclusions - Based on long-established imperatives from leaders in the profession, the IASL conferences provide both evidence and evidence -based practice for school librarians from all over the world. However, when scholarly research is shared at practitioner venues, it is possible that school librarians may assume that research results constitute evidence -based practice (EBP), not evidence upon which practice should be based. This distinction is important if considering that the purpose of academic research is to objectively inform, not to advocate a particular position or practice. The Research Forum can be a valuable venue for the presentation of empirical research findings and conclusions and objective program evaluations and provide a valuable complement to the evidence-based practice descriptions shared in the Professional Papers portion of the conference program. It is argued that the Research Forum must be clear in its purpose: to present the results of research; to present effective practice determined by rigorous evaluation; or to present research-supported arguments for the support of school libraries. Through a reconceptualizat
De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data
Scott DiGuistini, Nancy Y Liao, Darren Platt, Gordon Robertson, Michael Seidel, Simon K Chan, T Roderick Docking, Inanc Birol, Robert A Holt, Martin Hirst, Elaine Mardis, Marco A Marra, Richard C Hamelin, J?rg Bohlmann, Colette Breuil, Steven JM Jones
Genome Biology , 2009, DOI: 10.1186/gb-2009-10-9-r94
Abstract: The efficiency of de novo genome sequence assembly processes depends heavily on the length, fold-coverage and per-base accuracy of the sequence data. Despite substantial improvements in the quality, speed and cost of Sanger sequencing, generating a high quality draft de novo genome sequence for a eukaryotic genome remains expensive. New sequencing-by-synthesis systems from Roche (454), Illumina (Genome Analyzer) and ABI (SOLiD) offer greatly reduced per-base sequencing costs. While they are attractive for generating de novo sequence assemblies for eukaryotes, these technologies add several complicating factors: they generate short (typically 450 bp for 454; 50 to 100 bp for Illumina and SOLiD) reads that cannot resolve low complexity sequence regions or distributed repetitive elements; they have system-specific error models; and they can have higher base-calling error rates. To this point, then, de novo assemblies that use either 454 data alone, or that combine 454 with Sanger data in a 'hybrid' approach, have been reported only for prokaryote genomes, and no de novo assemblies that use Illumina reads, either alone or in combination with Sanger and 454 read data, have been reported for a eukaryotic genome.In principle, it should be possible to generate a de novo genome sequence for a eukaryotic genome by combining sequence information from different technologies. However, the new sequencing technologies are evolving rapidly, and no comprehensive bioinformatic system has been developed for optimizing such an approach. Such a system should flexibly integrate read data from different sequencing platforms while addressing sequencing depth, read quality and error models. Read quality and error models raise two challenges. First, while it is desirable to identify a subset of high quality reads prior to genome assembly, and established read quality scoring methods exist for Sanger sequence data, there are no rigorous equivalents for 454 or Illumina reads [1]. Second, error
Design and implementation of a generalized laboratory data model
Michael C Wendl, Scott Smith, Craig S Pohl, David J Dooling, Asif T Chinwalla, Kevin Crouse, Todd Hepler, Shin Leong, Lynn Carmichael, Mike Nhan, Benjamin J Oberkfell, Elaine R Mardis, LaDeana W Hillier, Richard K Wilson
BMC Bioinformatics , 2007, DOI: 10.1186/1471-2105-8-362
Abstract: We describe a general modeling framework for laboratory data and its implementation as an information management system. The model utilizes several abstraction techniques, focusing especially on the concepts of inheritance and meta-data. Traditional approaches commingle event-oriented data with regular entity data in ad hoc ways. Instead, we define distinct regular entity and event schemas, but fully integrate these via a standardized interface. The design allows straightforward definition of a "processing pipeline" as a sequence of events, obviating the need for separate workflow management systems. A layer above the event-oriented schema integrates events into a workflow by defining "processing directives", which act as automated project managers of items in the system. Directives can be added or modified in an almost trivial fashion, i.e., without the need for schema modification or re-certification of applications. Association between regular entities and events is managed via simple "many-to-many" relationships. We describe the programming interface, as well as techniques for handling input/output, process control, and state transitions.The implementation described here has served as the Washington University Genome Sequencing Center's primary information system for several years. It handles all transactions underlying a throughput rate of about 9 million sequencing reactions of various kinds per month and has handily weathered a number of major pipeline reconfigurations. The basic data model can be readily adapted to other high-volume processing environments.Over the past several decades, many of the biomedical sciences have been transformed into what might be called "high-throughput" areas of study, e.g., DNA mapping and sequencing, gene expression, and proteomics. In a number of cases, the rate at which data can now be generated has increased by several orders of magnitude. This scale-up has contributed to the rise of "big biology" projects of the type that
Page 1 /8395
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.