|
基于深度多视图对比学习方法的多组学数据整合及预后预测模型构建
|
Abstract:
在癌症研究中,精准识别癌症亚型和评估患者预后对制定优化治疗方案至关重要。高通量测序技术生成的大量多组学数据为癌症预后研究提供了宝贵资源。深度学习方法能够有效整合这些数据,精确识别更多癌症亚型。在本研究中,我们分析了12种癌症的多组学数据集,并将其作为模型的输入。我们提出了一种基于卷积自动编码器的深度多视图对比学习模型(dmCLCAE),该模型旨在利用多组学数据预测与生存相关的癌症亚型。为了验证模型的效果,我们对比了多组学因子分析算法(MOFA+)和深度学习模型(ProgCAE)在不同癌症类型分类中的表现。结果显示,dmCLCAE在区分不同生存亚型方面表现出显著优势,同时在预测一致性上也有更优异的表现。
In cancer research, accurately identifying cancer subtypes and assessing patient prognosis are crucial for developing optimized treatment strategies. The vast amount of multi-omics data generated by high-throughput sequencing technologies provides valuable resources for cancer prognosis studies. Deep learning methods can effectively integrate these data to accurately identify more cancer subtypes. In this study, we analyzed multi-omics datasets from 12 types of cancer and used them as input for our model. We proposed a deep multi-view contrastive learning model based on a convolutional autoencoder (dmCLCAE), designed to predict survival-related cancer subtypes using multi-omics data. To validate the model’s performance, we compared it with the Multi-Omics Factor Analysis v2 (MOFA+) and prognostic model based on a convolutional autoencoder (ProgCAE) in classifying various cancer types. The results showed that dmCLCAE demonstrated a significant advantage in distinguishing different survival subtypes and exhibited superior consistency in predictions.
[1] | Conesa, A. and Beck, S. (2019) Making Multi-Omics Data Accessible to Researchers. Scientific Data, 6, Article No. 251. https://doi.org/10.1038/s41597-019-0258-4 |
[2] | Hasin, Y., Seldin, M. and Lusis, A. (2017) Multi-Omics Approaches to Disease. Genome Biology, 18, Article No. 83. https://doi.org/10.1186/s13059-017-1215-1 |
[3] | Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., et al. (2013) The Cancer Genome Atlas Pan-Cancer Analysis Project. Nature Genetics, 45, 1113-1120. https://doi.org/10.1038/ng.2764 |
[4] | Alameer, A. and Chicco, D. (2021) Geocancerprognosticdatasetsretriever: A Bioinformatics Tool to Easily Identify Cancer Prognostic Datasets on Gene Expression Omnibus (GEO). Bioinformatics, 38, 1761-1763. https://doi.org/10.1093/bioinformatics/btab852 |
[5] | Zhang, J., Baran, J., Cros, A., Guberman, J.M., Haider, S., Hsu, J., et al. (2011) International Cancer Genome Consortium Data Portal—A One-Stop Shop for Cancer Genomics Data. Database, 2011, bar026. https://doi.org/10.1093/database/bar026 |
[6] | Sørlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J.S., Nobel, A., et al. (2003) Repeated Observation of Breast Tumor Subtypes in Independent Gene Expression Data Sets. Proceedings of the National Academy of Sciences, 100, 8418-8423. https://doi.org/10.1073/pnas.0932692100 |
[7] | Cabassi, A. and Kirk, P.D.W. (2020) Multiple Kernel Learning for Integrative Consensus Clustering of Omic Datasets. Bioinformatics, 36, 4789-4796. https://doi.org/10.1093/bioinformatics/btaa593 |
[8] | Nguyen, N.D. and Wang, D. (2020) Multiview Learning for Understanding Functional Multiomics. PLOS Computational Biology, 16, e1007677. https://doi.org/10.1371/journal.pcbi.1007677 |
[9] | Trunk, G.V. (1979) A Problem of Dimensionality: A Simple Example. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 306-307. https://doi.org/10.1109/tpami.1979.4766926 |
[10] | Rappoport, N. and Shamir, R. (2018) Multi-Omic and Multi-View Clustering Algorithms: Review and Cancer Benchmark. Nucleic Acids Research, 46, 10546-10562. https://doi.org/10.1093/nar/gky889 |
[11] | Reel, P.S., Reel, S., Pearson, E., Trucco, E. and Jefferson, E. (2021) Using Machine Learning Approaches for Multi-Omics Data Analysis: A Review. Biotechnology Advances, 49, Article 107739. https://doi.org/10.1016/j.biotechadv.2021.107739 |
[12] | Springenberg, J.T., Dosovitskiy, A., Brox, T. and Riedmiller, M. (2014) Striving for Simplicity: The All Convolutional Net. |
[13] | Chauhan, R., Ghanshala, K.K. and Joshi, R.C. (2018). Convolutional Neural Network (CNN) for Image Detection and Recognition. 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, 15-17 December 2018, 278-282. https://doi.org/10.1109/icsccc.2018.8703316 |
[14] | Sun, W., Zheng, B. and Qian, W. (2016). Computer Aided Lung Cancer Diagnosis with Deep Learning Algorithms. SPIE Proceedings, San Diego, California, 24 March 2016, 97850Z. https://doi.org/10.1117/12.2216307 |
[15] | Masci, J., Meier, U., Cireşan, D. and Schmidhuber, J. (2011) Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction. In: Honkela, T., Duch, W., Girolami, M. and Kaski, S., Eds., Artificial Neural Networks and Machine Learning—ICANN 2011, Springer, 52-59. https://doi.org/10.1007/978-3-642-21735-7_7 |
[16] | Tian, Y., Krishnan, D. and Isola, P. (2020) Contrastive Multiview Coding. In: Vedaldi, A., Bischof, H., Brox, T. and Frahm, J.-M., Eds., Computer Vision—ECCV 2020, Springer, 776-794. https://doi.org/10.1007/978-3-030-58621-8_45 |
[17] | Oord, A.V.D., Li, Y. and Vinyals, O. (2018) Representation Learning with Contrastive Predictive Coding. |
[18] | 胡深, 钱宇华, 王婕婷, 李飞江, 吕维. 基于对比学习的超多类深度图像聚类模型[J]. 计算机科学, 2023, 50(9): 192-201. |
[19] | Poirion, O.B., Jing, Z., Chaudhary, K., Huang, S. and Garmire, L.X. (2021) Deepprog: An Ensemble of Deep-Learning and Machine-Learning Models for Prognosis Prediction Using Multi-Omics Data. Genome Medicine, 13, Article No. 112. https://doi.org/10.1186/s13073-021-00930-x |
[20] | Liu, Q. and Song, K. (2023) Progcae: A Deep Learning-Based Method That Integrates Multi-Omics Data to Predict Cancer Subtypes. Briefings in Bioinformatics, 24, bbad196. https://doi.org/10.1093/bib/bbad196 |