全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Power and Stability Properties of Resampling-Based Multiple Testing Procedures with Applications to Gene Oncology Studies

DOI: 10.1155/2013/610297

Full-Text   Cite this paper   Add to My Lib

Abstract:

Resampling-based multiple testing procedures are widely used in genomic studies to identify differentially expressed genes and to conduct genome-wide association studies. However, the power and stability properties of these popular resampling-based multiple testing procedures have not been extensively evaluated. Our study focuses on investigating the power and stability of seven resampling-based multiple testing procedures frequently used in high-throughput data analysis for small sample size data through simulations and gene oncology examples. The bootstrap single-step min procedure and the bootstrap step-down min procedure perform the best among all tested procedures, when sample size is as small as 3 in each group and either familywise error rate or false discovery rate control is desired. When sample size increases to 12 and false discovery rate control is desired, the permutation max procedure and the permutation min procedure perform best. Our results provide guidance for high-throughput data analysis when sample size is small. 1. Introduction With rapidly developing biotechnology, microarrays and next generation sequencing have been widely used in biomedical and biological fields for identifying differentially expressed genes, detecting transcription factor binding sites, and mapping complex traits using single nucleotide polymorphisms (SNPs) [1–7]. The multiple testing error rates associated with thousands, even millions of hypotheses testing, need to be taken into account. Common multiple testing error rates controlled in multiple hypotheses testing are the familywise error rate (FWER), which is the probability of at least one false rejection [8, 9] and the false discovery rate (FDR), which is the expected proportion of falsely rejected null hypotheses [10]. Resampling-based multiple testing procedures are widely used in high-throughput data analysis (e.g., microarray and next generation sequencing), especially when the sample size is small or the distribution of test statistic is nonnormally distributed or is unknown. Resampling-based multiple testing procedures can account for dependent structures among ??values or test statistics, resulting in lower type II errors. The commonly used resampling techniques include permutation tests and bootstrap methods. Permutation tests are nonparametric statistical significance tests, where the test statistics’ distribution under the null hypothesis is constructed by calculating all possible values or a concrete number of test statistics (usually 1000 or above) from permuted observations under the null

References

[1]  D. A. Kulesh, D. R. Clive, D. S. Zarlenga, and J. J. Greene, “Identification of interferon-modulated proliferation-related cDNA sequences,” Proceedings of the National Academy of Sciences of the United States of America, vol. 84, no. 23, pp. 8453–8457, 1987.
[2]  M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, “Quantitative monitoring of gene expression patterns with a complementary DNA microarray,” Science, vol. 270, no. 5235, pp. 467–470, 1995.
[3]  D. A. Lashkari, J. L. Derisi, J. H. Mccusker et al., “Yeast microarrays for genome wide parallel genetic and gene expression analysis,” Proceedings of the National Academy of Sciences of the United States of America, vol. 94, no. 24, pp. 13057–13062, 1997.
[4]  J. R. Pollack, C. M. Perou, A. A. Alizadeh et al., “Genome-wide analysis of DNA copy-number changes using cDNA microarrays,” Nature Genetics, vol. 23, no. 1, pp. 41–46, 1999.
[5]  M. J. Buck and J. D. Lieb, “ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments,” Genomics, vol. 83, no. 3, pp. 349–360, 2004.
[6]  R. Mei, P. C. Galipeau, C. Prass et al., “Genome-wide detection of allelic imbalance using human SNPs and high-density DNA arrays,” Genome Research, vol. 10, no. 8, pp. 1126–1137, 2000.
[7]  J. Y. Hehir-Kwa, M. Egmont-Petersen, I. M. Janssen, D. Smeets, A. G. van Kessel, and J. A. Veltman, “Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis,” DNA Research, vol. 14, no. 1, pp. 1–11, 2007.
[8]  Y. Hochberg and A. C. Tamhane, Multiple Comparison Procedures, John Wiley & Sons, New York, NY, USA, 1987.
[9]  J. P. Shaffer, “Multiple hypothesis testing: a review,” Annual Review of Psychology, vol. 46, pp. 561–584, 1995.
[10]  Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society B, vol. 57, no. 1, pp. 289–300, 1995.
[11]  B. Efron, “Bootstrap methods: another look at the jackknife,” The Annals of Statistics, vol. 7, no. 1, pp. 1–26, 1979.
[12]  B. Efron and R. Tibshirani, An Introduction to the Bootstrap, CRC Press, New York, NY, USA, 1994.
[13]  D. A. Freedman, “Bootstrapping regression models,” The Annals of Statistics, vol. 9, no. 6, pp. 1218–1228, 1981.
[14]  P. Hall, “On the bootstrap and confidence intervals,” The Annals of Statistics, vol. 14, no. 4, pp. 1431–1452, 1986.
[15]  K. S. Pollard and M. K. van der Laan, “Choice of a null distribution in resampling-based multiple testing,” Journal of Statistical Planning and Inference, vol. 125, no. 1-2, pp. 85–100, 2004.
[16]  P. H. Westfall and S. S. Young, Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment, John Wiley & Sons, New York, NY, USA, 1993.
[17]  V. G. Tusher, R. Tibshirani, and G. Chu, “Significance analysis of microarrays applied to the ionizing radiation response,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 9, pp. 5116–5121, 2001.
[18]  Y. Ge, S. Dudoit, and T. P. Speed, “Resampling-based multiple testing for microarray data analysis,” Test, vol. 12, no. 1, pp. 1–77, 2003.
[19]  D. Rubin, S. Dudoit, and M. van der Laan, “A method to increase the power of multiple testing procedures through sample splitting,” Statistical Applications in Genetics and Molecular Biology, vol. 5, no. 1, article 19, 2006.
[20]  A. Jemal, R. Siegel, E. Ward et al., “Cancer statistics, 2006,” CA: A Cancer Journal for Clinicians, vol. 56, no. 2, pp. 106–130, 2006.
[21]  C. S. Moreno, L. Matyunina, E. B. Dickerson et al., “Evidence that p53-mediated cell-cycle-arrest inhibits chemotherapeutic treatment of ovarian carcinomas,” PLoS ONE, vol. 2, no. 5, article e441, 2007.
[22]  R. Edgar, M. Domrachev, and A. E. Lash, “Gene Expression Omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Research, vol. 30, no. 1, pp. 207–210, 2002.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133