全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Exploiting Identifiability and Intergene Correlation for Improved Detection of Differential Expression

DOI: 10.1155/2013/404717

Full-Text   Cite this paper   Add to My Lib

Abstract:

Accurate differential analysis of microarray data strongly depends on effective treatment of intergene correlation. Such dependence is ordinarily accounted for in terms of its effect on significance cutoffs. In this paper, it is shown that correlation can, in fact, be exploited to share information across tests and reorder expression differentials for increased statistical power, regardless of the threshold. Significantly improved differential analysis is the result of two simple measures: (i) adjusting test statistics to exploit information from identifiable genes (the large subset of genes represented on a microarray that can be classified a priori as nondifferential with very high confidence], but (ii) doing so in a way that accounts for linear dependencies among identifiable and nonidentifiable genes. A method is developed that builds upon the widely used two-sample t-statistic approach and uses analysis in Hilbert space to decompose the nonidentified gene vector into two components that are correlated and uncorrelated with the identified set. In the application to data derived from a widely studied prostate cancer database, the proposed method outperforms some of the most highly regarded approaches published to date. Algorithms in MATLAB and in R are available for public download. 1. Preamble In certain ways, this paper represents a departure from current trends in scientific publishing. The Worldwide Web has made available extraordinary resources in the form of databases for comparative analysis of methods in bioinformatics and numerous other disciplines. The benefits of using common sets of real data to compare and contrast new algorithms are obvious. In some fields of investigation, especially, perhaps, research in early states of knowledge (e.g., genomics), there is an equally obvious drawback in using real data—that the “correct answers are not known,” making it difficult to ultimately interpret differences in performance as anything but differences. Lest the reader be preparing for an argument promoting classic simulation studies, we hasten to state at the outset that this argument is not forthcoming. Before the age of the internet, simulation studies using reasonably justified data models (Gaussian errors, etc.) were a time-honored standard in all areas of math, science, and engineering. The ready availability of rich data resources makes it irrational to advocate to a return to “pure simulation” using models that are untested against these existing data sets. The authors of this paper in no way promote a return to such methods and appeal to

References

[1]  E. Lander, “Array of hope,” Nature Genetics, vol. 21, pp. 3–4, 1999.
[2]  S. Frantz, “An array of problems,” National Review of Drug Discovery, vol. 4, pp. 362–363, 2005.
[3]  A. B. Owen, “Variance of the number of false discoveries,” Journal of the Royal Statistical Society, Series B, vol. 67, no. 3, pp. 411–426, 2005.
[4]  B. Efron, “Correlation and large-scale simultaneous significance testing,” Journal of the American Statistical Association, vol. 102, no. 477, pp. 93–103, 2007.
[5]  B. Efron, “Size, power, and false discovery rates,” Annals of Statistics, vol. 35, no. 4, pp. 1351–1377, 2007.
[6]  J. T. Leek and J. D. Storey, “Capturing heterogeneity in gene expression studies by surrogate variable analysis,” PLoS Genetics, vol. 3, no. 9, article e161, 2007.
[7]  S. Degrelle, et al., “Amplification biases: Possible differences among deviating gene expressions,” BMC Genomics, vol. 9, article 46, 2008.
[8]  X. Qiu, L. Klebanov, and A. Yakovlev, “Correlation between gene expression levels and limitations of the empirical bayes methodology for finding differentially expressed genes,” Statistics Applications in Genetics and Molecular Biology, vol. 4, article 34, 2005.
[9]  X. Qiu and A. Yakovlev, “Some comments of instability of false discovery rate estimation,” Journal of Bioinformatics and Computational Biology, vol. 4, no. 5, pp. 1057–1068, 2006.
[10]  T. Yu, H. Peng, and W. Sun, “Incorporating nonlinear relationships in microarray missing value imputation,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 723–731, 2011.
[11]  K. Desai, J. R. Deller Jr., and J. J. McCormick, “The distribution of the number of false discoveries in DNA microarray data,” in Proceedings of the IEEE Statistical Signal Processing Workshop, pp. 205–209, Madison, Wis, USA, August 2007.
[12]  J. Deller Jr., H. Radha, J. McCormick, and H. Wang, “Nonlinear dependence in the discovery of differentially-expressed genes,” ISRN Bioinformatics, vol. 2012, Article ID 564715, 18 pages, 2012.
[13]  Y. Pawitan, K. R. K. Murthy, S. Michiels, and A. Ploner, “Bias in the estimation of false discovery rate in microarray studies,” Bioinformatics, vol. 21, no. 20, pp. 3865–3872, 2005.
[14]  J. D. Storey, J. Y. Dai, and J. T. Leek, “The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments,” Biostatistics, vol. 8, no. 2, pp. 414–432, 2007.
[15]  R. Tibshirani and L. Wasserman, “Correlation-sharing for detection of differential gene expression,” http://arxiv.org/pdf/math/0608061.pdf.
[16]  R. Hu, X. Qiu, and G. Glazko, “A new gene selection procedure based on the covariance distance,” Bioinformatics, vol. 26, no. 3, pp. 348–354, 2010.
[17]  Q. Cui, B. Liu, T. Jiang, and S. Ma, “Characterizing the dynamic connectivity between genes by variable parameter regression and Kalman filtering based on temporal gene expression data,” Bioinformatics, vol. 21, no. 8, pp. 1538–1541, 2005.
[18]  V. Martyanov and R. H. Gross, “Identifying functional relationships within sets of co-expressed genes by combining upstream regulatory motif analysis and gene expression information,” BMC Genomics, vol. 11, supplement 2, article S8, 2010.
[19]  R. Tewhey, V. Bansal, A. Torkamani, E. J. Topol, and N. J. Schork, “The importance of phase information for human genomics,” Nature Reviews Genetics, vol. 12, no. 3, pp. 215–223, 2011.
[20]  M. Dettling, E. Gabrielson, and G. Parmigiani, “Searching for differentially expressed gene combinations,” Genome Biology, vol. 6, no. 10, article R88, 2005.
[21]  Y. Lai, B. Wu, L. Chen, and H. Zhao, “A statistical method for identifying gene-gene co-expression dynamics,” Bioinformatics, vol. 20, no. 17, pp. 3146–3155, 2004.
[22]  Y. Choi and C. Kendziorski, “Statistical methods for gene set co-expression analysis,” Bioinformatics, vol. 25, no. 21, pp. 2780–2786, 2009.
[23]  E. Huerta, B. Duval, and J. K. Hao, “Fuzzy logic for elimination of redundant information of microarray data,” Genomics, Proteomics and Bioinformatics, vol. 6, no. 2, pp. 61–73, 2008.
[24]  F. Reverter, E. Vegas, and P. Sánchez, “Mining gene expression profiles: An integrated implementation of Kernel principal component analysis and singular value decomposition,” Genomics, Proteomics and Bioinformatics, vol. 8, no. 3, pp. 200–210, 2010.
[25]  S. Yang, X. Guo, and H. Hu, “MOF: An R function to detect outlier microarray,” Genomics, Proteomics and Bioinformatics, vol. 6, no. 3-4, pp. 186–189, 2008.
[26]  Z. Xiang, Z. S. Qin, and Y. He, “CRCView: A web server for analyzing and visualizing microarray gene expression data using model-based clustering,” Bioinformatics, vol. 23, no. 14, pp. 1843–1845, 2007.
[27]  A. K. C. Wong, W.-H. Au, and K. C. C. Chan, “Discovering high-order patterns of gene expression levels,” Journal of Computational Biology, vol. 15, no. 6, pp. 625–637, 2008.
[28]  S. Bandyopadhyay and M. Bhattacharyya, “A biologically inspired measure for co-expression analysis,” IEEE Transactions on Computational Biology and Bioinformatics, vol. 8, no. 4, pp. 929–942, 2011.
[29]  L. Dalton, V. Ballarin, and M. Brun, “Clustering algorithms: On learning, validation, performance, and applications to genomics,” Current Genomics, vol. 10, no. 6, pp. 430–445, 2009.
[30]  N. Ancona, R. Maglietta, A. Piepoli et al., “On the statistical assessment of classifiers using DNA microarray data,” BMC Bioinformatics, vol. 7, article 387, 2006.
[31]  V. Tusher, R. Tibshirani, and G. Chu, “Significance analysis of microarrays applied to ionizing radiation response,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 9, pp. 5116–5121, 2001.
[32]  B. Efron, “Large-scale simultaneous hypothesis testing: The choice of a null hypothesis,” Journal of the American Statistical Association, vol. 99, no. 465, pp. 96–104, 2004.
[33]  Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society, Series B, vol. 57, no. 1, pp. 289–300, 1995.
[34]  Y. Benjamini and D. Yekutieli, “The control of the false discovery rate in multiple testing under dependency,” Annals of Statistics, vol. 29, no. 4, pp. 1165–1188, 2001.
[35]  J. D. Storey, “A direct approach to false discovery rates,” Journal of the Royal Statistical Society, Series B, vol. 64, no. 3, pp. 479–498, 2002.
[36]  J. D. Storey, J. E. Taylor, and D. Siegmund, “Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach,” Journal of the Royal Statistical Society, Series B, vol. 66, no. 1, pp. 187–205, 2004.
[37]  M. Langaas, B. H. Lindqvist, and E. Ferkingstad, “Estimating the proportion of true null hypotheses, with application to DNA microarray data,” Journal of the Royal Statistical Society, Series B, vol. 67, no. 4, pp. 555–572, 2005.
[38]  M. L. T. Lee, F. C. Kuo, G. A. Whitmore, and J. Sklar, “Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations,” Proceedings of the National Academy of Sciences of the United States of America, vol. 97, no. 18, pp. 9834–9839, 2000.
[39]  M. A. Newton, C. M. Kendziorski, C. S. Richmond, F. R. Blattner, and K. W. Tsui, “On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data,” Journal of Computational Biology, vol. 8, no. 1, pp. 37–52, 2001.
[40]  B. Efron, R. Tibshirani, J. D. Storey, and V. Tusher, “Empirical Bayes analysis of a microarray experiment,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1151–1160, 2001.
[41]  N. Akhiezer and I. Glazman, Theory of Linear Operators in Hilbert Space, Dover, New York, NY, USA, 1993.
[42]  A. Friedman, Foundations of Modern Analysis, chapter 6, Dover, New York, NY, USA, 1982.
[43]  S. Lang, Real and Functional Analysis, chapter 5, Springer, New York, NY, USA, 3rd edition, 1993.
[44]  S. Roman, Advanced Linear Algebra, chapter 13, Springer, New York, NY, USA, 1992.
[45]  K. Petersen and M. Pedersen, The Matrix Cookbook, 2008, http://orion.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf.
[46]  G. Golub and C. van Loan, Matrix Computations, The Johns Hopkins University Press, 3rd edition, 1996.
[47]  D. Singh, P. G. Febbo, K. Ross et al., “Gene expression correlates of clinical prostate cancer behavior,” Cancer Cell, vol. 1, no. 2, pp. 203–209, 2002.
[48]  J. T. Leek, E. Monsen, A. R. Dabney, and J. D. Storey, “EDGE: Extraction and analysis of differential gene expression,” Bioinformatics, vol. 22, no. 4, pp. 507–508, 2006.
[49]  M. K. Kerr, M. Martin, and G. A. Churchill, “Analysis of variance for gene expression microarray data,” Journal of Computational Biology, vol. 7, no. 6, pp. 819–837, 2001.
[50]  S. Dudoit, Y. H. Yang, M. J. Callow, and T. P. Speed, “Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments,” Statistica Sinica, vol. 12, no. 1, pp. 111–139, 2002.
[51]  X. Gui, J. T. G. Hwang, J. Qiu, N. J. Blades, and G. A. Churchill, “Improved statistical tests for differential gene expression by shrinking variance components estimates,” Biostatistics, vol. 6, no. 1, pp. 59–75, 2005.
[52]  I. Lonnstedt and T. Speed, “Replicated microarray data,” Statistica Sinica, vol. 12, no. 1, pp. 31–46, 2002.
[53]  R. A. Irizarry, B. M. Bolstad, F. Collin, L. M. Cope, B. Hobbs, and T. P. Speed, “Summaries of Affymetrix GeneChip probe level data,” Nucleic Acids Research, vol. 31, no. 4, article e15, 2003.
[54]  C. A. Tsai, Y. J. Chen, and J. J. Chen, “Testing for differentially expressed genes with microarray data,” Nucleic Acids Research, vol. 31, no. 9, article e52, 2003.
[55]  I. Hedenfalk, D. Duggan, Y. Chen et al., “Gene-expression profiles in hereditary breast cancer,” The New England Journal of Medicine, vol. 344, no. 8, pp. 539–548, 2001.
[56]  B. Efron, “Bayesians, frequentists, and scientists,” Journal of the American Statistical Association, vol. 100, no. 469, pp. 1–5, 2005.
[57]  B. Efron, “R.A. Fisher in the 21st century,” in Statistics for the 21st Century: Methodologies for Applications of the Future, vol. 1, p. 9, 2000.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133