oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2019 ( 120 )

2018 ( 228 )

2017 ( 215 )

2016 ( 306 )

Custom range...

Search Results: 1 - 10 of 123705 matches for " Jeffrey T Leek "
All listed articles are free for downloading (OA Articles)
Page 1 /123705
Display every page Item
Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
Jeffrey T Leek,John D Storey
PLOS Genetics , 2007, DOI: 10.1371/journal.pgen.0030161
Abstract: It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.
Reproducible Research Can Still Be Wrong: Adopting a Prevention Approach
Jeffrey T. Leek,Roger D. Peng
Computer Science , 2015, DOI: 10.1073/pnas.1421412111
Abstract: Reproducibility, the ability to recompute results, and replicability, the chances other experimenters will achieve a consistent result, are two foundational characteristics of successful scientific research. Consistent findings from independent investigators are the primary means by which scientific evidence accumulates for or against an hypothesis. And yet, of late there has been a crisis of confidence among researchers worried about the rate at which studies are either reproducible or replicable. In order to maintain the integrity of science research and maintain the public's trust in science, the scientific community must ensure reproducibility and replicability by engaging in a more preventative approach that greatly expands data analysis education and routinely employs software tools.
Empirical estimates suggest most published medical research is true
Leah R. Jager,Jeffrey T. Leek
Statistics , 2013,
Abstract: The accuracy of published medical research is critical both for scientists, physicians and patients who rely on these results. But the fundamental belief in the medical literature was called into serious question by a paper suggesting most published medical research is false. Here we adapt estimation methods from the genomics community to the problem of estimating the rate of false positives in the medical literature using reported P-values as the data. We then collect P-values from the abstracts of all 77,430 papers published in The Lancet, The Journal of the American Medical Association, The New England Journal of Medicine, The British Medical Journal, and The American Journal of Epidemiology between 2000 and 2010. We estimate that the overall rate of false positives among reported results is 14% (s.d. 1%), contrary to previous claims. We also find there is not a significant increase in the estimated rate of reported false positive results over time (0.5% more FP per year, P = 0.18) or with respect to journal submissions (0.1% more FP per 100 submissions, P = 0.48). Statistical analysis must allow for false positives in order to make claims on the basis of noisy data. But our analysis suggests that the medical literature remains a reliable record of scientific progress.
Cooperation between Referees and Authors Increases Peer Review Accuracy
Jeffrey T. Leek, Margaret A. Taub, Fernando J. Pineda
PLOS ONE , 2011, DOI: 10.1371/journal.pone.0026895
Abstract: Peer review is fundamentally a cooperative process between scientists in a community who agree to review each other's work in an unbiased fashion. Peer review is the foundation for decisions concerning publication in journals, awarding of grants, and academic promotion. Here we perform a laboratory study of open and closed peer review based on an online game. We show that when reviewer behavior was made public under open review, reviewers were rewarded for refereeing and formed significantly more cooperative interactions (13% increase in cooperation, P = 0.018). We also show that referees and authors who participated in cooperative interactions had an 11% higher reviewing accuracy rate (P = 0.016). Our results suggest that increasing cooperation in the peer review process can lead to a decreased risk of reviewing errors.
Cloud-scale RNA-sequencing differential expression analysis with Myrna
Ben Langmead, Kasper D Hansen, Jeffrey T Leek
Genome Biology , 2010, DOI: 10.1186/gb-2010-11-8-r83
Abstract: As cost and throughput continue to improve, second generation sequencing [1], in conjunction with RNA-Seq [2,3], is becoming an increasingly efficient and popular tool for studying gene expression. Currently, an RNA-Seq sequencing run generates hundreds of millions of reads derived from coding mRNA molecules in one or more biological samples. A typical RNA-Seq differential-expression analysis proceeds in three stages. First, reads are computationally categorized according to the transcribed feature from which each likely originated. Features of interest could be genes, exons or isoforms. This categorization might be conducted comparatively with respect to a reference [4], by de novo assembly [5], or a combination of both [6-8]. Second, a normalized count of the number of reads assigned to each feature is calculated. The count acts as a proxy for the feature's true abundance in the sample. Third, a statistical test is applied to identify which features exhibit differential abundance, or expression, between samples.Since second generation sequencing produces a very large number of reads distributed across the entire transcriptome, RNA-Seq affords greater resolution than expression arrays. Preliminary comparisons of the data from RNA-Seq also suggest that the measurements may more precisely measure RNA abundance in spike-in experiments than gene expression microarrays, provided appropriate normalization is applied [4,9].But improvements in sequencing cost and throughput also pose a data analysis challenge. While sequencing throughput grows at a rate of about 5× per year [10-12], computer speeds are thought to double approximately every 18 or 24 months [13]. Recent studies and commentaries [13-17] propose cloud computing as a paradigm that counteracts this disparity by tapping into the economies of scale afforded by commercial and institutional computing centers. If an algorithm can be made to run efficiently on many loosely coupled processors, implementing it as a clou
ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets
Alyssa C Frazee, Ben Langmead, Jeffrey T Leek
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-449
Abstract: ReCount is an online resource of RNA-seq gene count tables and auxilliary data. Tables were built from raw RNA sequencing data from 18 different published studies comprising 475 samples and over 8 billion reads. Using the Myrna package, reads were aligned, overlapped with gene models and tabulated into gene-by-sample count tables that are ready for statistical analysis. Count tables and phenotype data were combined into Bioconductor ExpressionSet objects for ease of analysis. ReCount also contains the Myrna manifest files and R source code used to process the samples, allowing statistical and computational scientists to consider alternative parameter values.By combining datasets from many studies and providing data that has already been processed from. fastq format into ready-to-use. RData and. txt files, ReCount facilitates analysis and methods development for RNA-seq count data. We anticipate that ReCount will also be useful for investigators who wish to consider cross-study comparisons and alternative normalization strategies for RNA-seq.RNA-seq, or short-read sequencing of mRNA, has emerged as a powerful and flexible tool for studying gene expression [1]. As with other new technologies, the analysis of RNA-seq data requires the development of new statistical methods. Data from many RNA-seq experiments are publicly available, but processing raw data into a form suitable for statistical analysis remains challenging [2]. This difficulty together with the high cost of using second-generation sequencing technology means that most computational scientists have only a limited number of samples to work with [3]. However, replication is critical to understanding biological variation in RNA-sequencing [4].The Gene Expression Omnibus [5] is a useful repository that contains both processed and raw microarray data, but there is no comparable resource for processed RNA-seq data. We have compiled a resource, called ReCount, consisting of aligned, preprocessed RNA-seq data from
A statistical approach to selecting and confirming validation targets in -omics experiments
Jeffrey T Leek, Margaret A Taub, Jason L Rasgon
BMC Bioinformatics , 2012, DOI: 10.1186/1471-2105-13-150
Abstract: Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result.For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results.
Removing batch effects for prediction problems with frozen surrogate variable analysis
Hilary S. Parker,Héctor Corrada Bravo,Jeffrey T. Leek
PeerJ , 2015, DOI: 10.7717/peerj.561
Abstract: Batch effects are responsible for the failure of promising genomic prognostic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to remove these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where samples are analyzed one at a time for diagnostic, prognostic, and predictive applications. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction accuracy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.
Removing batch effects for prediction problems with frozen surrogate variable analysis
Hilary S. Parker,Héctor Corrada Bravo,Jeffrey T. Leek
Statistics , 2013,
Abstract: Batch effects are responsible for the failure of promising genomic prognos- tic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to re- move these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where sam- ples are analyzed one at a time for diagnostic, prognostic, and predictive applica- tions. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction ac- curacy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package.
A glass half full interpretation of the replicability of psychological science
Jeffrey T. Leek,Prasad Patil,Roger D. Peng
Statistics , 2015,
Abstract: A recent study of the replicability of key psychological findings is a major contribution toward understanding the human side of the scientific process. Despite the careful and nuanced analysis reported in the paper, mass and social media adhered to the simple narrative that only 36% of the studies replicated their original results. Here we show that 77% of the replication effect sizes reported were within a prediction interval based on the original effect size. In this light, the results of Reproducibility Project: Psychology can be viewed as a positive result for the scientific process.
Page 1 /123705
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.