oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2019 ( 13 )

2018 ( 26 )

2017 ( 19 )

2016 ( 17 )

Custom range...

Search Results: 1 - 10 of 11413 matches for " Rafael Irizarry "
All listed articles are free for downloading (OA Articles)
Page 1 /11413
Display every page Item
A statistical framework for the analysis of microarray probe-level data
Zhijin Wu,Rafael A. Irizarry
Statistics , 2007, DOI: 10.1214/07-AOAS116
Abstract: In microarray technology, a number of critical steps are required to convert the raw measurements into the data relied upon by biologists and clinicians. These data manipulations, referred to as preprocessing, influence the quality of the ultimate measurements and studies that rely upon them. Standard operating procedure for microarray researchers is to use preprocessed data as the starting point for the statistical analyses that produce reported results. This has prevented many researchers from carefully considering their choice of preprocessing methodology. Furthermore, the fact that the preprocessing step affects the stochastic properties of the final statistical summaries is often ignored. In this paper we propose a statistical framework that permits the integration of preprocessing into the standard statistical analysis flow of microarray data. This general framework is relevant in many microarray platforms and motivates targeted analysis methods for specific applications. We demonstrate its usefulness by applying the idea in three different applications of the technology.
Thawing Frozen Robust Multi-array Analysis (fRMA)
Matthew N McCall, Rafael A Irizarry
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-369
Abstract: We present an R package, frmaTools, that allows the user to quickly create his or her own frozen parameter vectors. We describe how this package fits into a preprocessing workflow and explore the size of the training dataset needed to generate reliable frozen parameter estimates. This is followed by a discussion of specific situations in which one might wish to create one's own fRMA implementation. For a few specific scenarios, we demonstrate that fRMA performs well even when a large database of arrays in unavailable.By allowing the user to easily create his or her own fRMA implementation, the frmaTools package greatly increases the applicability of the fRMA algorithm. The frmaTools package is freely available as part of the Bioconductor project.In microarray data analysis, the process of converting probe-level flourescent intensities from a scanner to gene-level expression estimates is commonly referred to as preprocessing. The vast majority of preprocessing algorithms require multiple arrays to be analyzed simultaneously, and in general such multi-array preprocessing algorithms outperform single-array algorithms [1]. Therefore, it is not surprising that four of the most widely used preprocessing algorithms - RMA [2], gcRMA [3], MBEI [4], and PLIER [5] - are multi-array.However, multi-array preprocessing algorithms restrict scientific inquiry because it is necessary to analyze all arrays simultaneously. Because data preprocessed separately cannot be combined without introducing artifacts [6-10], the total number of arrays one can compare is limited by computer memory, restricting large meta-analyses; furthermore, datasets that grow incrementally need to be preprocessed each time an array is added. Lastly, for microarrays to aid in clinical diagnosis and treatment, one needs to obtain information based on a single sample hybridized to a single microarray.Recent work by McCall et al. (2010) provided a method of single-array preprocessing, Frozen Robust Multiarray Ana
BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions
Kasper D Hansen, Benjamin Langmead, Rafael A Irizarry
Genome Biology , 2012, DOI: 10.1186/gb-2012-13-10-r83
Abstract: DNA methylation is an important epigenetic modification involved in gene silencing, tissue differentiation, and cancer [1]. High-resolution, genome-wide measurement of DNA methylation is now possible using whole-genome bisulfite sequencing (WGBS), a process whereby input DNA is treated with sodium bisulfite and sequenced. While WGBS is comprehensive, it is also quite costly [2]. For instance, an application of WGBS by Lister et al. [3] compared DNA methylation profiles of an embryonic stem cell line and a fibroblast cell line. Both were sequenced to about 30× coverage (25× coverage of all CpGs), requiring 376 total lanes of bisulfite sequencing on the Illumina GA II instrument. While conventional wisdom is that 30× coverage or deeper is needed to achieve accurate results, advanced statistical techniques proposed here, such as local likelihood smoothing, can reduce this requirement to as little as 4×.It has also been shown that different genomic regions exhibit different levels of DNA methylation variation among individuals [4]. As a consequence, regions that are inherently variable can easily be confused with regions that differ consistently between groups when few replicates are available [1] (Figure 1). But performing WGBS on the number of biological replicates required to overcome such issues can be quite expensive. The techniques proposed here address this issue both by making full use of replicate information during analysis, and by potentially reducing the coverage needed for (and therefore the cost of) replication.Analysis of WGBS data starts with alignment of bisulfite converted reads. After alignment, statistical methods are employed to identify differentially methylated regions (DMRs) between two or more conditions. Extensive work has been dedicated to alignment [5-10] but methods for post-alignment analysis are limited. Published work based on WGBS has relied on a modular approach that first identifies differentially methylated CpGs that are then grouped
Feature-level exploration of a published Affymetrix GeneChip control dataset
Rafael A Irizarry, Leslie M Cope, Zhijin Wu
Genome Biology , 2006, DOI: 10.1186/gb-2006-7-8-404
Abstract: In a recent Genome Biology article, Choe et al. [1] describe a spike-in experiment that they use to compare expression measures for Affymetrix GeneChip technology. In this work, two sets of triplicates were created to represent control (C) and experimental (S) samples. We describe here some properties of the Choe et al. [1] control dataset one should consider before using it to assess GeneChip expression measures. In [2] and [3] we describe a benchmark for such measures based on experiments developed by Affymetrix and GeneLogic. These datasets are described in detail in [2]. A web-based implementation of the benchmark, is available at [4]. The experiment described in [1] is a worthy contribution to the field as it permits assessments with data that is likely to better emulate the nonspecific binding (NSB) and cross-hybridization seen in typical experiments. However, there are various inconsistencies between the conclusions reached by [1] and [3] that we do not believe are due to NSB and cross-hybridization effects. In this Correspondence we describe certain characteristics of the feature-level data produced by [1] which we believe explain these inconsistencies. These can be divided into characteristics induced by the experimental design and an artifact.There are three characteristics of the experimental design described by [1] that one should consider before using it for assessments like those carried out by Affycomp. We enumerate them below and explain how they may lead to unfair assessments. Other considerations are described by Dabney and Storey [5].First, the spike-in concentrations are unrealistically high. In [3] we demonstrate that background noise makes it harder to detect differentially expression for genes that are present at low concentrations. We point out that in the Affymetrix spike-in experiments [2,3] the concentrations for spiked-in features result in artificially high intensities but that a large range of the nominal concentrations are actually in
Overcoming bias and systematic errors in next generation sequencing data
Margaret A Taub, Hector Corrada Bravo, Rafael A Irizarry
Genome Medicine , 2010, DOI: 10.1186/gm208
Abstract: While microarrays were rapidly accepted in research applications, incorporating them in clinical settings has required over a decade of benchmarking, standardization and the development of appropriate analysis methods. Extensive cross-platform and cross-laboratory analyses demonstrated the importance of low-level processing choices [1-3], including data summarization, normalization, and adjustment for laboratory or 'batch' effects [4], on outcome accuracy. Some of this work was done under the auspices of the Food and Drug Administration (FDA), most notably the Microarray Quality Control (MAQC) studies, which were developed specifically in order to determine the utility of microarray technologies in a clinical setting [5,6]. Microarray-measured gene expression signatures now form the basis of several FDA-approved clinical diagnostic tests, including MammaPrint, and Pathwork's Tissue of Origin test [7,8].With high-throughput sequencing still in its infancy, many questions remain to be addressed before any hope of achieving approval for clinical applications is warranted. Although a study on the scale of the MAQC analyses for microarrays has yet to be carried out for sequencing (although one is in the works), there is already evidence that similar technical biases are present in sequencing data, and these will need to be understood and adjusted for to enable use of these new technologies in a clinical setting. In this commentary, we present some of these known biases and discuss the current state of solutions aimed at addressing them. Looking ahead to the application of this new technology in the clinical setting, we see both hurdles and promise.Biases arise when an observed measurement does not reflect the quantity to be measured due to a systematic distorting effect. For a concrete example from microarrays, non-specific hybridization at microarray probes produces an observed intensity that is not an unbiased measure of the presence of the target sequence in the popul
Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays
Shin Lin, Benilton Carvalho, David J Cutler, Dan E Arking, Aravinda Chakravarti, Rafael A Irizarry
Genome Biology , 2008, DOI: 10.1186/gb-2008-9-4-r63
Abstract: Genome-wide association studies hold great promise in discovering genes underlying complex, heritable disorders for which less powerful study designs have failed in the past [1-3]. Much effort spanning academia and industry and across multiple disciplines has already been invested in making this type of study a reality, with the most recent and largest effort being the Human HapMap Project [4].Single nucleotide polymorphism (SNP) microarrays represent a key technology allowing for the high throughput genotyping necessary to assess genome-wide variation and conduct association studies [5-9]. Over the years, Affymetrix has introduced SNP microarrays of ever increasing density. The GeneChip? Human Mapping 100K and 500K arrays are beginning to be widely used in association studies, and the 6.0 array with >900,000 SNPs has recently been introduced. At these genotype densities, association studies are theoretically well-powered to detect variants of small phenotypic effect in samples involving hundreds to thousands of subjects [10], and indeed, a number of such successes have recently been reported [11-16].Practically though, the use of SNP microarrays in association studies has not been entirely straightforward. Genotyping errors, even at a low rate, are known to produce large numbers of putative disease loci, which upon further investigation are found to be false positives. Work by Mitchell and colleagues [17] suggests a per single SNP rate of 0.5% as a maximal threshold for error, particularly for family-based tests. Arriving short of a dataset with such a low rate of error is not so much a failure of the microarray platform per se but rather the inadequacy of current SNP calling programs to extract the greatest information from the raw data and, more importantly, to quantify SNP quality, so that unreliable SNPs may be eliminated from further analysis.In general, genotyping algorithms make a call (AA, AB, or BB) for a SNP of each sample assuming diploids. Typically, a
Assessing affymetrix GeneChip microarray quality
Matthew N McCall, Peter N Murakami, Margus Lukk, Wolfgang Huber, Rafael A Irizarry
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-137
Abstract: We begin by providing a precise definition of microarray quality and reviewing existing Affymetrix GeneChip quality metrics in light of this definition. We show that the best-performing metrics require multiple arrays to be assessed simultaneously. While such multi-array quality metrics are adequate for bench science, as microarrays begin to be used in clinical settings, single-array quality metrics will be indispensable. To this end, we define a single-array version of one of the best multi-array quality metrics and show that this metric performs as well as the best multi-array metrics. We then use this new quality metric to assess the quality of microarry data available via the Gene Expression Omnibus (GEO) using more than 22,000 Affymetrix HGU133a and HGU133plus2 arrays from 809 studies.We find that approximately 10 percent of these publicly available arrays are of poor quality. Moreover, the quality of microarray measurements varies greatly from hybridization to hybridization, study to study, and lab to lab, with some experiments producing unusable data. Many of the concepts described here are applicable to other high-throughput technologies.Microarray technology has become a widely used tool in the biological sciences. Over the past decade, the number of users has grown exponentially, and with the number of applications and secondary data analyses rapidly increasing, we expect this rate to continue. Various initiatives such as the External RNA Control Consortium (ERCC) [1] and the MicroArray Quality Control (MAQC) projects [2,3] have explored ways to provide standards for the technology. For microarrays to become generally accepted as a reliable technology, statistical methods for assessing quality will be an indispensable component; however, there remains a lack of consensus in both defining and measuring microarray quality.Defining quality in the context of a microarray experiment is not an easy task. The American Society for Quality (ASQ) defines quality as
A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database
Simon Katz, Rafael A Irizarry, Xue Lin, Mark Tripputi, Mark W Porter
BMC Bioinformatics , 2006, DOI: 10.1186/1471-2105-7-464
Abstract: We assess performance based on multiple sets of samples processed over HG U133A Affymetrix GeneChip? arrays. We show that the refRMA workflow, when used in conjunction with a large, biologically diverse training set, results in the same general characteristics as that of RMA in its classic form when comparing overall data structure, sample-to-sample correlation, and variation. Further, we demonstrate that the refRMA workflow and reference set can be robustly applied to na?ve organ types and to benchmark data where its performance indicates respectable results.Our results indicate that a biologically diverse reference database can be used to train a model for estimating probe set intensities of exclusive test sets, while retaining the overall characteristics of the base algorithm. Although the results we present are specific for RMA, similar versions of other multi-array normalization and summarization schemes can be developed.Pre-processing of Affymetrix GeneChip? feature-level data has been a widely researched topic over the past few years. Many of the commonly used algorithms utilize models where parameters are estimated using data from multiple arrays. These approaches are typically used in the normalization and summarization steps. Examples of multi-array procedures are RMA, gcRMA, MBEI, and, most recently, PLIER [1-4]. Each of the algorithms have been extensively compared to one another based on a variety of dilution and spike-in series of data sets [5-8]. From these studies, measures of precision and accuracy have been utilized to determine advantages and disadvantages for each of these methods. In general, multi-array based methods outperform those that derive expression measures using data from just the array in question.A problem associated with these algorithms that has not received much attention is the limitation they impose on data archiving. When data from a new study becomes available, all arrays are pre-processed together to obtain expression measure
The partitioned LASSO-patternsearch algorithm with application to gene expression data
Weiliang Shi, Grace Wahba, Rafael A Irizarry, Hector Corrada Bravo, Stephen J Wright
BMC Bioinformatics , 2012, DOI: 10.1186/1471-2105-13-98
Abstract: The Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes.We demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several alternative methodologies.
Processing of Agilent microRNA array data
Pedro López-Romero, Manuel A González, Sergio Callejas, Ana Dopazo, Rafael A Irizarry
BMC Research Notes , 2010, DOI: 10.1186/1756-0500-3-18
Abstract: We have adapted the RMA method to obtain a processed signal for the Agilent arrays and have compared the RMA summarized signal to the TGS generated with the image analysis software provided by the vendor. We also compared the use of the RMA algorithm with uncorrected and background-corrected signals, and compared quantile normalization with the normalization method recommended by the vendor. The pre-processing methods were compared in terms of their ability to reduce the variability (increase precision) of the signals between biological replicates. Application of the RMA method to non-background corrected signals produced more precise signals than either the RMA-background-corrected signal or the quantile-normalized Agilent TGS. The Agilent TGS normalized to the 75% percentile showed more variation than the other measures.Used without background correction, a summarized signal that takes into account the probe effect might provide a more precise estimate of microRNA expression. The variability of quantile normalization was lower compared with the normalization method recommended by the vendor.MicroRNAs are a family of small single-stranded non-coding RNAs which regulate gene expression [1]. Functional studies show that microRNAs participate in virtually every cellular process investigated, and changes in their expression might underlie many human pathologies [2]. The main research tool for identifying microRNAs involved in specific cellular processes is gene expression profiling using microarray technology. The microRNA Agilent microarrays [3] use different oligonucleotide probes for each individual microRNA that are replicated a number of times across the array surface. The Agilent Feature Extraction image analysis software (AFE) computes a summary measure for each microRNA, referred to as total gene signal (TGS), based on the robust average of all the background subtracted signals for each replicated probe. To make statistical inferences, Agilent recommends using
Page 1 /11413
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.