|
基于多组学融合和对抗自编码器的生存分析模型
|
Abstract:
多组学整合分析可以利用不同组学之间的互补信息,有利于系统全面地理解癌症疾病的分子生物学机制。多组学数据的高维小样本属性,导致传统的生存分析模型存在严重的过拟合问题。深度学习模型可以从高维数据中进行自动特征提取,在处理复杂的多组学数据方面具有显著优势。为了有效地整合多组学数据,本文提出了基于对抗自编码器的多组学特征提取网络。结合1D-CNNCox生存分析模型,构建了基于多组学融合和生成对抗网络的GAN-1DCCox模型。在8种不同癌症类型的TCGA数据集上进行了消融和对比实验,相比流行的生存分析基准模型,GAN-1DCCox模型取得了更高的C指数值。结果表明GAN-1DCCox模型能够有效地融合多组学数据,筛选出重要的预后特征基因,提升了模型的生存预测性能和稳健性。
Multi-omics integration analysis can utilize complementary information from different omics, beneficial for a more systematic and comprehensive understanding of the molecular biology mechanisms of cancer diseases. The high-dimension small-sample size of multi-omics data leads to serious overfitting issues in traditional survival analysis models. Deep learning models can automatically extract features from high-dimensional data and have significant advantages in processing complex multi-omics data. In this study, we proposed a survival analysis model based on multi-omics integration and adversarial autoencoder, called GAN-1DCCox model, which consists of a multi-omics feature extraction module based on generative adversarial networks and a 1D-CNNCox survival analysis module. GAN-1DCCox model achieved the highest C-index values in both ablation and comparative experiments on TCGA datasets of 8 different cancer types. It indicates that GAN-1DCCox model can effectively integrate multi-omics data and screen out important prognostic signature genes, and thereby improving the prediction performance and robustness of survival analysis model.
[1] | Hasin, Y., Seldin, M. and Lusis, A. (2017) Multi-Omics Approaches to Disease. Genome Biology, 18, Article No. 83. https://doi.org/10.1186/s13059-017-1215-1 |
[2] | Subramanian, I., Verma, S., Kumar S., et al. (2020) Multi-Omics Data Integration, Interpretation, and Its Application. Bioinformatics and Biology Insights, 14. https://doi.org/10.1177/1177932219899051 |
[3] | David, G.K. and Mitchel, K. (2012) Survival Analysis: A Self-Learning Text. 3rd Edition, Springer. |
[4] | Cox, D.R. (1972) Regression Models and Life-Tables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 34, 187-202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x |
[5] | Ching, T., Zhu, X. and Garmire, L.X. (2018) Cox-Nnet: An Artificial Neural Network Method for Prognosis Prediction of High-Throughput Omics Data. PLOS Computational Biology, 14, e1006076. https://doi.org/10.1371/journal.pcbi.1006076 |
[6] | Katzman, L., Shaham, U., Cloninger, A., et al. (2018) DeepSurv: Personalized Treatment Recommender System Using a Cox Proportional Hazards Deep Neural Network. BMC Medical Research Methodology, 18, Article No. 24. https://doi.org/10.1186/s12874-018-0482-1 |
[7] | Hao, J., Kim, Y., Mallavarapu, T., Oh, J.H. and Kang, M. (2019) Interpretable Deep Neural Network for Cancer Survival Analysis by Integrating Genomic and Clinical Data. BMC Medical Genomics, 12, Article No. 189. https://doi.org/10.1186/s12920-019-0624-2 |
[8] | Kvamme, H., Borgan, O. and Scheel, I. (2019) Time-to-Event Prediction with Neural Networks and Cox Regression. Journal of Machine Learning Research, 20, 1-30. |
[9] | Huang, Z., Zhan, X., Xiang, S., Johnson, T.S., Helm, B., Yu, C.Y., et al. (2019) SALMON: Survival Analysis Learning with Multi-Omics Neural Networks on Breast Cancer. Frontiers in Genetics, 10, Article 166. https://doi.org/10.3389/fgene.2019.00166 |
[10] | Zhao, L., Dong, Q., Luo, C., Wu, Y., Bu, D., Qi, X., et al. (2021) Deepomix: A Scalable and Interpretable Multi-Omics Deep Learning Framework and Application in Cancer Survival Analysis. Computational and Structural Biotechnology Journal, 19, 2719-2725. https://doi.org/10.1016/j.csbj.2021.04.067 |
[11] | Tong, L., Mitchel, J., Chatlin, K. and Wang, M.D. (2020) Deep Learning Based Feature-Level Integration of Multi-Omics Data for Breast Cancer Patients Survival Analysis. BMC Medical Informatics and Decision Making, 20, Article No. 225. https://doi.org/10.1186/s12911-020-01225-8 |
[12] | Yin, Q., Chen, W., Zhang, C. and Wei, Z. (2022) A Convolutional Neural Network Model for Survival Prediction Based on Prognosis-Related Cascaded Wx Feature Selection. Laboratory Investigation, 102, 1064-1074. https://doi.org/10.1038/s41374-022-00801-y |
[13] | Yang, H., Chen, R., Li, D. and Wang, Z. (2021) Subtype-GAN: A Deep Learning Approach for Integrative Cancer Subtyping of Multi-Omics Data. Bioinformatics, 37, 2231-2237. https://doi.org/10.1093/bioinformatics/btab109 |
[14] | Mondol, R.K., Truong, N.D., Reza, M., Ippolito, S., Ebrahimie, E. and Kavehei, O. (2022) Afexnet: An Adversarial Autoencoder for Differentiating Breast Cancer Sub-Types and Extracting Biologically Relevant Genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19, 2060-2070. https://doi.org/10.1109/tcbb.2021.3066086 |
[15] | Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014) Generative Adversarial Nets. Communications of the ACM, 63, 2672-2680. |
[16] | Makhzani, A., Shlens, J., Jaitly, N., et al. (2016) Adversarial Autoencoders. Proceeding of the 4th International Conference on Learning Representation. San Juan, Puerto Rico, 2-4 May 2016, 1-16. |
[17] | Chaubey, V., Nair, M.S. and Pillai, G.N. (2019). Gene Expression Prediction Using a Deep 1D Convolution Neural Network. 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, 6-9 December 2019, 1383-1389. https://doi.org/10.1109/ssci44817.2019.9002669 |
[18] | Mostavi, M., Chiu, Y., Huang, Y. and Chen, Y. (2020) Convolutional Neural Network Models for Cancer Type Prediction Based on Gene Expression. BMC Medical Genomics, 13, Article No. 44. https://doi.org/10.1186/s12920-020-0677-2 |
[19] | Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39, 1-13. https://doi.org/10.18637/jss.v039.i05 |
[20] | Ishwaran, H., Kogalur, U.B., Blackstone, E.H. and Lauer, M.S. (2008) Random Survival Forests. The Annals of Applied Statistics, 2, 841-860. https://doi.org/10.1214/08-aoas169 |
[21] | Hothorn, T. (2005) Survival Ensembles. Biostatistics, 7, 355-373. https://doi.org/10.1093/biostatistics/kxj011 |
[22] | Van Belle, V., Pelckmans, K., Van Huffel, S. and Suykens, J.A.K. (2011) Support Vector Methods for Survival Analysis: A Comparison between Ranking and Regression Approaches. Artificial Intelligence in Medicine, 53, 107-118. https://doi.org/10.1016/j.artmed.2011.06.006 |
[23] | Yuan, M., Pei, J., Li, R., Tian, L., He, X. and Li, Y. (2021) CD40LG as a Prognostic Molecular Marker Regulates Tumor Microenvironment through Immune Process in Breast Cancer. International Journal of General Medicine, 14, 8833-8846. https://doi.org/10.2147/ijgm.s336813 |
[24] | Li, J., Zhang, X., Liu, B., Shi, C., Ma, X., Ren, S., et al. (2022) The Expression Landscape of FOXP3 and Its Prognostic Value in Breast Cancer. Annals of Translational Medicine, 10, 801-801. https://doi.org/10.21037/atm-22-3080 |
[25] | Thomas, J.K., Mir, H., Kapur, N., Bae, S. and Singh, S. (2019) CC Chemokines Are Differentially Expressed in Breast Cancer and Are Associated with Disparity in Overall Survival. Scientific Reports, 9, Article No. 4014. https://doi.org/10.1038/s41598-019-40514-9 |
[26] | Zhou, M., Zhang, P., Da, M., Yang, R., Ma, Y., Zhao, J., et al. (2022) A Pan-Cancer Analysis of the Expression of STAT Family Genes in Tumors and Their Relationship to the Tumor Microenvironment. Frontiers in Oncology, 12, Article 925537. https://doi.org/10.3389/fonc.2022.925537 |