全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Predicting Lung Cancer Stage by Expressions of Protein-Encoding Genes

DOI: 10.4236/abb.2023.148024, PP. 368-377

Keywords: Lung Cancer Prediction, XGBoost, Central Dogma, Feature Selection

Full-Text   Cite this paper   Add to My Lib

Abstract:

Predicting the stages of cancer accurately is crucial for effective treatment planning. In this study, we aimed to develop a model using gene expression data and XGBoost (eXtreme Gradient Boosting) that include clinical and demographic variables to predict specific lung cancer stages in patients. By conducting the feature selection using the Wilcoxon Rank Test, we picked the most impactful genes associated with lung cancer stage prediction. Our model achieved an overall accuracy of 82% in classifying lung cancer stages according to patients’ gene expression data. These findings demonstrate the potential of gene expression analysis and machine learning techniques in improving the accuracy of lung cancer stage prediction, aiding in personalized treatment decisions.

References

[1]  Crick, F. (1970) Central Dogma of Molecular Biology. Nature, 227, 561-563.
https://doi.org/10.1038/227561a0
[2]  Collins, K., Jacks, T. and Pavletich, N.P. (1997) The Cell Cycle and Cancer. Proceedings of the National Academy of Sciences of the United States of America, 94, 2776-2778.
https://doi.org/10.1073/pnas.94.7.2776
[3]  Kastan, M.B. and Bartek, J. (2004) Cell-Cycle Checkpoints and Cancer. Nature, 432, 316-323.
https://doi.org/10.1038/nature03097
[4]  (2013) Focusing on the Cell Biology of Cancer. Nature Cell Biology, 15, 1.
https://doi.org/10.1038/ncb2667
[5]  Dingil, N., Inan, Z. and Şentürk, A. (2022) Association between the DNA Repair Gene Polymorphisms and Lung Cancer in Turkish Population. Advances in Lung Cancer, 11, 15-29.
https://doi.org/10.4236/alc.2022.112002
[6]  Cooper, G. and Adams, K. (2023) The Cell: A Molecular Approach. Oxford University Press, Oxford.
[7]  Li, Y., Wu, X., Yang, P., Jiang, G. and Luo, Y. (2022) Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis. Genomics, Proteomics & Bioinformatics, 20, 850-866.
https://doi.org/10.1016/j.gpb.2022.11.003
[8]  Preston, J., Van Zeeland, A. and Peiffer, D.A. (2021) Innovation at Illumina: The Road to the $600 Human Genome. Nature Portfolio, Berlin.
[9]  Li, Y., Kang, K., Krahn, J.M., et al. (2017) A Comprehensive Genomic Pan-Cancer Classification Using the Cancer Genome Atlas Gene Expression Data. BMC Genomics, 18, Article No. 508.
https://doi.org/10.1186/s12864-017-3906-0
[10]  Yang, S. and Naiman, D.Q. (2014) Multiclass Cancer Classification Based on Gene Expression Comparison. Statistical Applications in Genetics and Molecular Biology, 13, 477-496.
https://doi.org/10.1515/sagmb-2013-0053
[11]  Kaur, P., Schlatzer, D., Cooke, K. and Chance, M.R. (2012) Pairwise Protein Expression Classifier for Candidate Biomarker Discovery for Early Detection of Human Disease Prognosis. BMC Bioinformatics, 13, Article No. 191.
https://doi.org/10.1186/1471-2105-13-191
[12]  Haibe-Kains, B., Desmedt, C., Loi, S., Culhane, A. C., Bontempi, G., Quackenbush, J. and Sotiriou, C. (2012) A Three-Gene Model to Robustly Identify Breast Cancer Molecular Subtypes. Journal of the National Cancer Institute, 104, 311-325.
https://doi.org/10.1093/jnci/djr545
[13]  Tamborero, D., Gonzalez-Perez, A., Perez-Llamas, C., Deu-Pons, J., Kandoth, C., Reimand, J. and Lopez-Bigas, N. (2013) Comprehensive Identification of Mutational Cancer Driver Genes across 12 Tumor Types. Scientific Reports, 3, Article No. 2650.
https://doi.org/10.1038/srep02650
[14]  Raoof, S.S., Jabbar, M.A. and Fathima, S.A. (2020) Lung Cancer Prediction Using Machine Learning: A Comprehensive Approach. 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, 5-7 March 2020, 108-115.
https://doi.org/10.1109/ICIMIA48430.2020.9074947
[15]  Chen, T. and Guestrin, C. (2016) Xgboost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794.
https://doi.org/10.1145/2939672.2939785
[16]  Wang, W., Chakraborty, G. and Chakraborty, B. (2020) Predicting the Risk of Chronic Kidney Disease (CKD) Using Machine Learning Algorithm. Applied Sciences, 11, Article No. 202.
https://doi.org/10.3390/app11010202
[17]  Clarke, R., Ressom, H.W., Wang, A., Xuan, J., Liu, M.C., Gehan, E.A. and Wang, Y. (2008) The Properties of High-Dimensional Data Spaces: Implications for exploring Gene and Protein Expression Data. Nature Reviews Cancer, 8, 37-49.
https://doi.org/10.1038/nrc2294
[18]  Bradley, A.P. (1997) The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30, 1145-1159.
https://doi.org/10.1016/S0031-3203(96)00142-2
[19]  Xie, L., Dang, Y., Guo, J., Sun, X., Xie, T., Zhang, L., et al. (2019) High KRT8 Expression Independently Predicts Poor Prognosis for Lung Adenocarcinoma Patients. Genes, 10, Article No. 36.
https://doi.org/10.3390/genes10010036

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133