All Title Author
Keywords Abstract


Effects of dependence in high-dimensional multiple testing problems

DOI: 10.1186/1471-2105-9-114

Full-Text   Cite this paper   Add to My Lib

Abstract:

We study the robustness against dependence of several FDR procedures that are popular in microarray studies, such as Benjamin-Hochberg FDR, Storey's q-value, SAM and resampling based FDR procedures. False Non-discovery Rates and estimates of the number of null hypotheses are computed from those methods and compared. Our simulation study shows that methods such as SAM and the q-value do not adequately control the FDR to the level claimed under dependence conditions. On the other hand, the adaptive Benjamini-Hochberg procedure seems to be most robust while remaining conservative. Finally, the estimates of the number of true null hypotheses under various dependence conditions are variable.We discuss a new method for efficient guided simulation of dependent data, which satisfy imposed network constraints as conditional independence structures. Our simulation set-up allows for a structural study of the effect of dependencies on multiple testing criterions and is useful for testing a potentially new method on π0 or FDR estimation in a dependency context.Scientists regularly face multiple testing of a large number of hypotheses nowadays. Typically in microarray data, one performs hypothesis testing for each gene and the number of genes is usually more than thousands. In this situation, direct application of single hypothesis testing thousands times produces a large number of false discoveries. Hence, alternative testing criterions for controlling errors of false discoveries have been introduced.It is widely recognized that dependencies are omnipresent in many high-throughput studies. Such dependencies may be regulatory or functional as in gene pathways, but also spatial such as in SNP or DNA copy number arrays because of the genomic order. Although attempts to infer such interactions from data have been made, it is a notoriously difficult problem. Usually solutions focus on some modules with relatively few elements and many samples, in particular for model organisms (see e

Full-Text

comments powered by Disqus