DNA copy number aberrations (DCNA) and subsequent altered gene expression profiles may have a major impact on tumor initiation, on development, and eventually on recurrence and cancer-specific mortality. However, most methods employed in integrative genomic analysis of the two biological levels, DNA and RNA, do not consider survival time. In the present note, we propose the adoption of a survival analysis-based framework for the integrative analysis of DCNA and mRNA levels to reveal their implication on patient clinical outcome with the prerequisite that the effect of DCNA on survival is mediated by mRNA levels. The specific aim of the paper is to offer a feasible framework to test the DCNA-mRNA-survival pathway. We provide statistical inference algorithms for mediation based on asymptotic results. Furthermore, we illustrate the applicability of the method in an integrative genomic analysis setting by using a breast cancer data set consisting of 141 invasive breast tumors. In addition, we provide implementation in R. 1. Introduction Concomitant analysis of the two biological levels, DNA and RNA, and elucidating their implication in cancer development and cancer-related mortality is a key objective of studies within the cancer genetics field. Integrative analysis of DNA copy number aberrations (DCNA) and mRNA levels has received considerable interest with studies employing a wide range of statistical methods [1–5]. Integrative genomic analyses aim to identify novel biomarkers that can distinguish between patients with favorable and unfavorable prognosis. However, the sole focus on DCNA-driven altered gene expression profiles falls short of this goal. To develop a better understanding of the impact DCNA-driven altered gene expression profiles have on tumor recurrence or cancer-specific mortality, we need to consider patient survival status and survival time, that is, survival analysis. However, the unique relationship between DNA and RNA raises the need to interpret the data from a more refined viewpoint. Ascertaining causality between two biological factors is far to be straightforward. However, no one would question that mRNA is transcribed from a DNA template. Thus, mRNA mediates the genetic information imprinted in DNA and possibly the effect copy number aberrations have on survival status. The chosen statistical-mathematical framework has to properly address this issue. Mediation assumes that an independent variable (DNA) causes the mediator (mRNA), which in turn causes the outcome (survival status). Thus, the mediator accounts partially or totally
References
[1]
M. Schafer, H. Schwender, S. Merk, C. Haferlach, K. Ickstadt, and M. Dugas, “Integrated analysis of copy number alterations and gene expression: a bivariate assessment of equally directed abnormalities,” Bioinformatics, vol. 25, no. 24, pp. 3228–3235, 2009.
[2]
S. Nemes, T. Z. Parris, A. Danielsson et al., “Segmented regression, a versatile tool to analyze mRNA levels in relation to DNA copy number aberrations,” Genes Chromosomes and Cancer, vol. 51, no. 1, pp. 77–82, 2012.
[3]
Y. Xie and C. Ahn, “Statistical methods for integrating multiple types of high-throughput data,” Methods in Molecular Biology, vol. 620, pp. 511–529, 2010.
[4]
H. K. Solvang, O. C. Lingj?rde, A. Frigessi, A. L. B?rresen-Dale, and V. N. Kristensen, “Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer,” BMC Bioinformatics, vol. 12, article 197, 2011.
[5]
C. Soneson, H. Lilljebj?rn, T. Fioretos, and M. Fontes, “Integrative analysis of gene expression and copy number alterations using canonical correlation analysis,” BMC Bioinformatics, vol. 11, article 191, 2010.
[6]
T. J. VanderWeele, “Causal mediation analysis with survival data,” Epidemiology, vol. 22, no. 4, pp. 582–585, 2011.
[7]
T. Lange and J. V. Hansen, “Direct and indirect effects in a survival context,” Epidemiology, vol. 22, no. 4, pp. 575–581, 2011.
[8]
E. J. Tchetgen Tchetgen, “On causal mediation analysis with a survival outcome,” International Journal of Biostatistics, vol. 7, no. 1, article 33, 2011.
[9]
L. Chin, W. C. Hahn, G. Getz, and M. Matthew, “Making sense of cancer genomic data,” Genes and Development, vol. 25, no. 6, pp. 534–555, 2011.
[10]
G. Cortese, T. H. Scheike, and T. Martinussen, “Flexible survival regression modelling,” Statistical Methods in Medical Research, vol. 19, no. 1, pp. 5–28, 2010.
[11]
O. O. Aalen, “Further results on the non-parametric linear regression model in survival analysis,” Statistics in Medicine, vol. 12, no. 17, pp. 1569–1588, 1993.
[12]
Z. A. Lomnicki, “On the distribution of products of random variables,” Journal of the Royal Statistical Society B, vol. 29, no. 3, pp. 513–524, 1967.
[13]
M. D. Springer and W. E. Thompson, “The distribution of products of independent random variables,” SIAM Journal on Applied Mathematics, vol. 14, no. 3, pp. 511–526, 1966.
[14]
C. C. Craig, “On the frequency function of xy,” The Annals of Mathematical Statistics, vol. 7, no. 1, pp. 1–15, 1936.
[15]
A. G. Glen, L. M. Leemis, and J. H. Drew, “Computing the distribution of the product of two continuous random variables,” Computational Statistics and Data Analysis, vol. 44, no. 3, pp. 451–464, 2004.
[16]
G. W. Oehlert, “A note on the delta method,” The American Statistician, vol. 46, no. 1, pp. 27–29, 1992.
[17]
D. Tofighi and D. P. MacKinnon, “Rmediation: an R package for mediation analysis confidence intervals,” Behavior Research Methods, vol. 43, no. 3, pp. 692–700, 2011.
[18]
A. Burton, D. G. Altman, P. Royston, and R. L. Holder, “The design of simulation studies in medical statistics,” Statistics in Medicine, vol. 25, no. 24, pp. 4279–4292, 2006.
[19]
D. E. Jennings, “How do we judge confidence-interval adequacy?” The American Statistician, vol. 41, no. 4, pp. 335–337, 1987.
[20]
G. J?nsson, J. Staaf, E. Olsson et al., “High-resolution genomic profiles of breast cancer cell lines assessed by tiling BAC array comparative genomic hybridization,” Genes Chromosomes and Cancer, vol. 46, no. 6, pp. 543–558, 2007.
[21]
E. Lim, F. Vaillant, D. Wu et al., “Aberrant iuminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers,” Nature Medicine, vol. 15, no. 8, pp. 907–913, 2009.
[22]
T. Z. Parris, A. Danielsson, S. Nemes et al., “Clinical implications of gene dosage and gene expression patterns in diploid breast carcinoma,” Clinical Cancer Research, vol. 16, no. 15, pp. 3860–3874, 2010.
[23]
W. N. van Wieringen, K. Unger, G. G. R. Leday et al., “Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses,” BMC Bioinformatics, vol. 13, no. 1, article 80, 2012.
[24]
J. Fosen, E. Ferkingstad, O. Borgan, and O. O. Aalen, “Dynamic path analysis-a new approach to analyzing time-dependent covariates,” Lifetime Data Analysis, vol. 12, no. 2, pp. 143–167, 2006.
[25]
T. Martinussen, “Dynamic path analysis for event time data: large sample properties and inference,” Lifetime Data Analysis, vol. 16, no. 1, pp. 85–101, 2010.
[26]
T. Martinussen, S. Vansteelandt, M. Gerster, and J. V. B. Hjelmborg, “Estimation of direct effects for survival data by using the Aalen additive hazards model,” Journal of the Royal Statistical Society B, vol. 73, no. 5, pp. 773–788, 2011.
[27]
E. E. Schadt, J. Lamb, X. Yang et al., “An integrative genomics approach to infer causal associations between gene expression and disease,” Nature Genetics, vol. 37, no. 7, pp. 710–717, 2005.
[28]
Y. Li, B. M. Tesson, G. A. Churchill, and R. C. Jansen, “Critical reasoning on causal inference in genome-wide linkage and association studies,” Trends in Genetics, vol. 26, no. 12, pp. 493–498, 2010.
[29]
E. Lee, S. Cho, K. Kim, and T. Park, “An integrated approach to infer causal associations among gene expression, genotype variation, and disease,” Genomics, vol. 94, no. 4, pp. 269–277, 2009.
[30]
A. C. Lozano, N. Abe, Y. Liu, and S. Rosset, “Grouped graphical Granger modeling for gene expression regulatory networks discovery,” Bioinformatics, vol. 25, no. 12, pp. I110–I118, 2009.
[31]
L. Chindelevitch, P.-R. Loh, A. Enayetallah, B. Berger, and D. Ziemek, “Assessing statistical significance in causal graphs,” BMC Bioinformatics, vol. 13, no. 1, article 35, 2012.
[32]
E. Ferkingstad, A. Frigessi, and H. Lyng, “Indirect genomic effects on survival from gene expression data,” Genome Biology, vol. 9, no. 3, article R58, 2008.
[33]
D. P. MacKinnon, C. M. Lockwood, and J. Williams, “Confidence limits for the indirect effect: distribution of the product and resampling methods,” Multivariate Behavioral Research, vol. 39, no. 1, pp. 99–128, 2004.
[34]
D. P. MacKinnon, C. M. Lockwood, J. M. Hoffman, S. G. West, and V. Sheets, “A comparison of methods to test mediation and other intervening variable effects,” Psychological Methods, vol. 7, no. 1, pp. 83–104, 2002.
[35]
K. J. Preacher and K. Kelley, “Effect size measures for mediation models: quantitative strategies for communicating indirect effects,” Psychological Methods, vol. 16, no. 2, pp. 93–115, 2011.
[36]
D. P. MacKinnon and G. Warsi, “A simulation study of mediated effect measures,” Multivariate Behavioral Research, vol. 30, no. 1, p. 41, 1995.
[37]
D. Tofighi, D. P. MacKinnon, and M. Yoon, “Covariances between regression coefficient estimates in a single mediator model,” British Journal of Mathematical and Statistical Psychology, vol. 62, no. 3, pp. 457–484, 2009.
[38]
I. Orlow, D. V. Tommasi, B. Bloom et al., “Evaluation of the clonal origin of multiple primary melanomas using molecular profiling,” Journal of Investigative Dermatology, vol. 129, no. 8, pp. 1972–1982, 2009.
[39]
I. Ostrovnaya and C. B. Begg, “Testing clonal relatedness of tumors using array comparative genomic hybridization: a statistical challenge,” Clinical Cancer Research, vol. 16, no. 5, pp. 1358–1367, 2010.
[40]
M. S. Fritz and D. P. MacKinnon, “Required sample size to detect the mediated effect,” Psychological Science, vol. 18, no. 3, pp. 233–239, 2007.
[41]
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, and A. L. Boulesteix, “Over-optimism in bioinformatics: an illustration,” Bioinformatics, vol. 26, no. 16, pp. 1990–1998, 2010.
[42]
J. Peng, J. Zhu, A. Bergamaschi et al., “Regularized multivariate regression for identifying master predictors with application to integrative genomic study of breast cancer,” Annals of Applied Statistics, vol. 4, no. 1, pp. 53–77, 2010.
[43]
R. Jornsten, T. Abenius, T. King et al., “Network modeling of the transcriptional effects of copy number aberrations in glioblastoma,” Molecular Systems Biology, vol. 7, article 486, 2011.
[44]
H. Lee, S. W. Kong, and P. J. Park, “Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes,” Bioinformatics, vol. 24, no. 7, pp. 889–896, 2008.
[45]
J. Textor, J. Hardt, and S. Knüppel, “DAGitty: a graphical tool for analyzing causal diagrams,” Epidemiology, vol. 22, no. 5, article 745, 2011.