|
MOGCWMLP:基于图卷积网络和加权多层感知机的多组学数据整合模型用于改进肺癌分期
|
Abstract:
癌症是全球范围内导致死亡的主要疾病之一,尤其是对晚期或发生转移的癌症治疗依然面临巨大的挑战。癌症的精准分期在临床上对治疗方案的选择和患者预后评估至关重要。传统的分期方法主要依赖影像学和临床检查数据,然而随着基因组学和分子生物学技术的飞速发展,利用多组学数据进行癌症的早期诊断和分期变得越来越重要。为了提高癌症分类和分期的准确性,本研究提出了一种新的多组学数据分析框架MOGCWMLP。该框架基于图卷积网络(GCN)对不同组学数据进行特征学习,结合加权多层感知机(MLP)网络进行分类决策。具体来说,MOGCWMLP框架集成了RNA-seq、miRNA和lncRNA等三种不同类型的组学数据,通过学习每种数据的特征并进行加权融合,最大化不同组学数据的互补信息。实验结果表明,MOGCWMLP模型在肺鳞癌(LUSC)数据集上的分类精度显著优于现有的单组学模型和多组学模型,尤其是在多组学数据整合的情况下,分类性能得到显著提升。此外,采用可学习的加权融合机制,能够动态调整各视图的贡献,从而进一步优化模型的分类效果。该研究为癌症精准诊断和个性化治疗提供了有效的工具,也为多组学数据的整合提供了新的思路。
Cancer remains one of the leading causes of mortality worldwide, particularly in advanced or metastatic cases, where treatment remains a significant challenge. Accurate cancer staging is critical in clinical practice for determining optimal treatment strategies and assessing patient prognosis. Traditional staging methods primarily rely on imaging and clinical examination data. However, with rapid advancements in genomics and molecular biology, lever aging multi-omics data for early cancer diagnosis and staging has become increasingly important. To enhance the accuracy of cancer classification and staging, this study proposes an ovel multi-omics data analysis framework, MOGCWMLP. This framework utilizes graph convolutional networks (GCN) for feature learning across different omics data types and incorporates a weighted multilayer perceptron (MLP) for classification decision-making. Specifically, MOGCWMLP integrates three distinct types of omics data—mRNA, miRNA, and lncRNA—by extracting and fusing their features through a weighted mechanism, there by maximizing the complementary information among different omics modalities. Experimental results demonstrate that the MOGCWMLP model achieves significantly higher classification accuracy on the lung squamous cell carcinoma (LUSC) dataset compared to existing single-omics and multi-omics models. Notably, the integration of multi-omics data leads to substantial improvements in classification performance. Furthermore, the incorporation of a learnable weighted fusion mechanism enables the dynamic adjustment of each modality’s contribution, further optimizing the model’s classification effectiveness. This study provides an effective tool for precise cancer diagnosis and personalized treatment, while also offering new insights into the integration of multi-omics data.
[1] | Cheever, M.A., Allison, J.P., Ferris, A.S., Finn, O.J., Hastings, B.M., Hecht, T.T., et al. (2009) The Prioritization of Cancer Antigens: A National Cancer Institute Pilot Project for the Acceleration of Translational Research. Clinical Cancer Research, 15, 5323-5337. https://doi.org/10.1158/1078-0432.ccr-09-0737 |
[2] | Siegel, R.L., Miller, K.D., Fuchs, H.E. and Jemal, A. (2022) Cancer Statistics, 2022. CA: A Cancer Journal for Clinicians, 72, 7-33. https://doi.org/10.3322/caac.21708 |
[3] | Bray, F., Laversanne, M., Sung, H., Ferlay, J., Siegel, R.L., Soerjomataram, I., et al. (2024) Global Cancer Statistics 2022: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 74, 229-263. https://doi.org/10.3322/caac.21834 |
[4] | McPhail, S., Johnson, S., Greenberg, D., Peake, M. and Rous, B. (2015) Stage at Diagnosis and Early Mortality from Cancer in England. British Journal of Cancer, 112, S108-S115. https://doi.org/10.1038/bjc.2015.49 |
[5] | Amin, M.B., Greene, F.L., Edge, S.B., Compton, C.C., Gershenwald, J.E., Brookland, R.K., et al. (2017) The Eighth Edition AJCC Cancer Staging Manual: Continuing to Build a Bridge from a Population‐Based to a More “Personalized” Approach to Cancer Staging. CA: A Cancer Journal for Clinicians, 67, 93-99. https://doi.org/10.3322/caac.21388 |
[6] | Teichgraeber, D.C., Guirguis, M.S. and Whitman, G.J. (2021) Breast Cancer Staging: Updates in the AJCC Cancer Staging Manual, 8th Edition, and Current Challenges for Radiologists, from the AJR Special Series on Cancer Staging. American Journal of Roentgenology, 217, 278-290. https://doi.org/10.2214/ajr.20.25223 |
[7] | Zhang, Z., Bajic, V.B., Yu, J., Cheung, K.-H. and Townsend, J.P. (2011) Data Integration in Bioinformatics: Current Efforts and Challenges. In: Mahdavi, M.A., Ed., Bioinformatics—Trends and Methodologies, InTech, 41-56. https://doi.org/10.5772/21654 |
[8] | Tomczak, K., Czerwińska, P. and Wiznerowicz, M. (2015) Review the Cancer Genome Atlas (TCGA): An Immeasurable Source of Knowledge. Współczesna Onkologia, 1, 68-77. https://doi.org/10.5114/wo.2014.47136 |
[9] | Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., et al. (2013) The Cancer Genome Atlas Pan-Cancer Analysis Project. Nature Genetics, 45, 1113-1120. https://doi.org/10.1038/ng.2764 |
[10] | van de Wiel, M.A., Lien, T.G., Verlaat, W., van Wieringen, W.N. and Wilting, S.M. (2015) Better Prediction by Use of Co‐Data: Adaptive Group‐Regularized Ridge Regression. Statistics in Medicine, 35, 368-381. https://doi.org/10.1002/sim.6732 |
[11] | Singh, A., Shannon, C.P., Gautier, B., Rohart, F., Vacher, M., Tebbutt, S.J., et al. (2019) DIABLO: An Integrative Approach for Identifying Key Molecular Drivers from Multi-Omics Assays. Bioinformatics, 35, 3055-3062. https://doi.org/10.1093/bioinformatics/bty1054 |
[12] | Kim, D., Li, R., Dudek, S.M. and Ritchie, M.D. (2013) ATHENA: Identifying Interactions between Different Levels of Genomic Data Associated with Cancer Clinical Outcomes Using Grammatical Evolution Neural Network. BioData Mining, 6, Article No. 23. https://doi.org/10.1186/1756-0381-6-23 |
[13] | Huang, Z., Zhan, X., Xiang, S., Johnson, T.S., Helm, B., Yu, C.Y., et al. (2019) SALMON: Survival Analysis Learning with Multi-Omics Neural Networks on Breast Cancer. Frontiers in Genetics, 10, Article 166. https://doi.org/10.3389/fgene.2019.00166 |
[14] | Günther, O.P., Chen, V., Freue, G.C., Balshaw, R.F., Tebbutt, S.J., Hollander, Z., et al. (2012) A Computational Pipeline for the Development of Multi-Marker Bio-Signature Panels and Ensemble Classifiers. BMC Bioinformatics, 13, Article No. 326. https://doi.org/10.1186/1471-2105-13-326 |
[15] | Kline, A., Wang, H., Li, Y., Dennis, S., Hutch, M., Xu, Z., et al. (2022) Multimodal Machine Learning in Precision Health: A Scoping Review. npj Digital Medicine, 5, Article No. 171. https://doi.org/10.1038/s41746-022-00712-8 |
[16] | Abdelaziz, E.H., Ismail, R., Mabrouk, M.S. and Amin, E. (2024) Multi-Omics Data Integration and Analysis Pipeline for Precision Medicine: Systematic Review. Computational Biology and Chemistry, 113, Article 108254. https://doi.org/10.1016/j.compbiolchem.2024.108254 |
[17] | Tian, J., Zhu, M., Ren, Z., Zhao, Q., Wang, P., He, C.K., et al. (2022) Deep Learning Algorithm Reveals Two Prognostic Subtypes in Patients with Gliomas. BMC Bioinformatics, 23, Article No. 417. https://doi.org/10.1186/s12859-022-04970-x |
[18] | Lin, Y., Zhang, W., Cao, H., Li, G. and Du, W. (2020) Classifying Breast Cancer Subtypes Using Deep Neural Networks Based on Multi-Omics Data. Genes, 11, Article 888. https://doi.org/10.3390/genes11080888 |
[19] | Madhumita, and Paul, S. (2022) Capturing the Latent Space of an Autoencoder for Multi-Omics Integration and Cancer Subtyping. Computers in Biology and Medicine, 148, Article 105832. https://doi.org/10.1016/j.compbiomed.2022.105832 |
[20] | Rong, Z., Liu, Z., Song, J., Cao, L., Yu, Y., Qiu, M., et al. (2022) Mcluster-VAEs: An End-to-End Variational Deep Learning-Based Clustering Method for Subtype Discovery Using Multi-Omics Data. Computers in Biology and Medicine, 150, Article 106085. https://doi.org/10.1016/j.compbiomed.2022.106085 |
[21] | Rong, Z., Lingyun, D., Jinxing, L. and Ying, G. (2021) Diagnostic Classification of Lung Cancer Using Deep Transfer Learning Technology and Multi‐Omics Data. Chinese Journal of Electronics, 30, 843-852. https://doi.org/10.1049/cje.2021.06.006 |
[22] | Hu, Y., Zhao, L., Li, Z., Dong, X., Xu, T. and Zhao, Y. (2022) Classifying the Multi-Omics Data of Gastric Cancer Using a Deep Feature Selection Method. Expert Systems with Applications, 200, Article 116813. https://doi.org/10.1016/j.eswa.2022.116813 |
[23] | Paul, T.K. and Iba, H. (2009) Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6, 353-367. https://doi.org/10.1109/tcbb.2007.70245 |
[24] | Broët, P., Kuznetsov, V.A., Bergh, J., Liu, E.T. and Miller, L.D. (2006) Identifying Gene Expression Changes in Breast Cancer That Distinguish Early and Late Relapse among Uncured Patients. Bioinformatics, 22, 1477-1485. https://doi.org/10.1093/bioinformatics/btl110 |
[25] | Huang, X., Lei, Q., Xie, T., Zhang, Y., Hu, Z. and Zhou, Q. (2020) Deep Transfer Convolutional Neural Network and Extreme Learning Machine for Lung Nodule Diagnosis on CT Images. Knowledge-Based Systems, 204, Article 106230. https://doi.org/10.1016/j.knosys.2020.106230 |
[26] | Koike, Y., Aokage, K., Ikeda, K., Nakai, T., Tane, K., Miyoshi, T., et al. (2020) Machine Learning-Based Histological Classification That Predicts Recurrence of Peripheral Lung Squamous Cell Carcinoma. Lung Cancer, 147, 252-258. https://doi.org/10.1016/j.lungcan.2020.07.011 |
[27] | Xu, J., Xiang, L., Liu, Q., Gilmore, H., Wu, J., Tang, J., et al. (2016) Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images. IEEE Transactions on Medical Imaging, 35, 119-130. https://doi.org/10.1109/tmi.2015.2458702 |
[28] | Li, X., Ma, J., Leng, L., Han, M., Li, M., He, F., et al. (2022) MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Frontiers in Genetics, 13, Article 806842. https://doi.org/10.3389/fgene.2022.806842 |
[29] | Fix, E. and Hodges, J.L. (1989) Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties. International Statistical Review/Revue Internationale de Statistique, 57, 238-247. https://doi.org/10.2307/1403797 |
[30] | Boser, B.E., Guyon, I.M. and Vapnik, V.N. (1992) A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, 27-29 July 1992, 144-152. https://doi.org/10.1145/130385.130401 |
[31] | Meng, Q. (2017) LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 3149-3157. |