Traditional treatment selection of cancers mainly relies on clinical observations and doctor’s judgment, but most outcomes can hardly be predicted. Through Genomics Topology, we use 272 breast cancer patients’ clinical and gene information as an example to propose a treatment optimization and top gene identification system. This study faces certain challenges such as collinearity and the Curse of Dimensionality within data, so by the idea of Analysis of Variance (ANOVA), Principal Component Analysis (PCA) is implemented to resolve this issue. Several genes, for example, SLC40A1 and ACADSB, are found to be both statistically significant and biological-studies supported; the model developed can precisely predict breast cancer mortality, recurrence time, and survival time, with an average MSE of 3.697, accuracy rate of 88.97%, and F1 score of 0.911. The result and methodology used in this study provide a channel for people to further look into the more precise prediction of other cancer outcomes through machine learning and assist in the discovery of targetable pathways for next-generation cancer treatment methods.
References
[1]
Trop, I., Dugas, A., David, J., El Khoury, M., Boileau, J.F., Larouche, N. and Lalonde, L. (2011) Breast Abscesses: Evidence-Based Algorithms for Diagnosis, Management, and Follow-Up. Radiographics, 31, 1683-1699.
https://doi.org/10.1148/rg.316115521
[2]
Edgar, R., Domrachev, M. and Lash, A.E. (2002) Gene Expression Omnibus: NCBI Gene Expression and Hybridization Array Data Repository. Nucleic Acids Research, 30, 207-210. https://doi.org/10.1093/nar/30.1.207
[3]
Ramanan, D. and Angelov, B. (2016) NKI Breast Cancer Data.
https://data.world/deviramanan2016/nki-breast-cancerdata
[4]
Kohavi, R. (1995) A Study of cross-validation and Bootstrap for Accuracy Estimation and Model Selection. In Ijcai, 14, 1137-1145.
[5]
Jolliffe, I.T. (1986) Principal Component Analysis and Factor Analysis. In: Principal Component Analysis, Springer, New York, 115-128. https://doi.org/10.1007/978-1-4757-1904-8_7
[6]
Neter, J., Kutner, M.H., Nachtsheim, C.J. and Wasserman, W. (1996) Applied Linear Statistical Models. Vol. 4, Irwin, Chicago, 318.
[7]
Sakamoto, Y., Ishiguro, M. and Kitagawa, G. (1986) Akaike Information Criterion Statistics.
[8]
Akaike, H. (1976) Canonical Correlation Analysis of Time Series and the Use of an Information Criterion. Mathematics in Science and Engineering, 126, 27-96. https://doi.org/10.1016/S0076-5392(08)60869-3
[9]
Tibshirani, R. (1996) Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society, Series B (Methodological), 73, 267-288.
[10]
Cortes, C. and Vapnik, V. (1995) Support Vector Machine. Machine Learning, 20, 273-297.
https://doi.org/10.1007/BF00994018
[11]
Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
[12]
Hosmer Jr., D.W., Lemeshow, S. and Sturdivant, R.X. (2013) Ap-plied Logistic Regression. Vol. 398, John Wiley & Sons, Hoboken.
Cover, T. and Hart, P. (1967) Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, 13, 21-27. https://doi.org/10.1109/TIT.1967.1053964
[15]
Quinlan, J.R. (1987) Simplifying Decision Trees. International Journal of Man-Machine Studies, 27, 221-234. https://doi.org/10.1016/S0020-7373(87)80053-6
[16]
Freund, Y. and Schapire, R.E. (1996) Experiments with a New Boosting Algorithm. Proceedings of the 13th International Conference on International Conference on Machine Learning, Bari, 3-6 July 1996, Vol. 96, 148-156.
[17]
Cizkova, M., Cizeron-Clairac, G., Vacher, S., Susini, A., Andrieu, C., Lidereau, R. and Bièche, I. (2010) Gene Expression Profiling Reveals New Aspects of PIK3CA Mutation in ERalpha-Positive Breast Cancer: Major Implication of the Wnt Signaling Pathway. PLoS ONE, 5, e15647. https://doi.org/10.1371/journal.pone.0015647
[18]
Dorjgochoo, T., Delahanty, R., Lu, W., Long, J.R., Cai, Q., Zheng, Y., Shu, X.O., et al. (2011) Common Genetic Variants in the Vitamin D Pathway Including Genome-Wide Associated Variants Are Not Associated with Breast Cancer Risk among Chinese Women. Cancer Epidemiology, Biomarkers & Prevention, 20, 2313-2316.
https://doi.org/10.1158/1055-9965.EPI-11-0704
[19]
Cheng, C.J., Lin, Y.C., Tsai, M.T., Chen, C.S., Hsieh, M.C., Chen, C.L. and Yang, R.B. (2009) SCUBE2 Suppresses Breast Tumor Cell Proliferation and Confers a Favorable Prognosis in Invasive Breast Cancer. Cancer Research, 69, 3634-3641. https://doi.org/10.1158/0008-5472.CAN-08-3615
[20]
Fillmore, C.M., Gupta, P.B., Rudnick, J.A., Caballero, S., Keller, P.J., Lander, E.S. and Kuperwasser, C. (2010) Estrogen Expands Breast Cancer Stem-Like Cells through Paracrine FGF/Tbx3 Signaling. Proceedings of the National Academy of Sciences, 107, 21737-21742. https://doi.org/10.1073/pnas.1007863107
[21]
Datta, D., Flaxenburg, J.A., Laxmanan, S., Geehan, C., Grimm, M., Waaga-Gasser, A.M., Pal, S., et al. (2006) Ras-Induced Modulation of CXCL10 and Its Receptor Splice Variant CXCR3-B in MDA-MB-435 and MCF-7 Cells: Relevance for the Development of Human Breast Cancer. Cancer Research, 66, 9509-9518.
https://doi.org/10.1158/0008-5472.CAN-05-4345
[22]
Xiong, L., Wen, Y., Miao, X. and Yang, Z. (2014) NT5E and FcGBP as Key Regulators of TGF-1-Induced Epithelial-Mesenchymal Transition (EMT) Are Associated with Tumor Progression and Survival of Patients with Gallbladder Cancer. Cell and Tissue Research, 355, 365-374. https://doi.org/10.1007/s00441-013-1752-1
[23]
Neumann, O., Kesselmeier, M., Geffers, R., Pellegrino, R., Radlwimmer, B., Hoffmann, K., Longerich, T., et al. (2012) Methylome Analysis and Integrative Profiling of Human HCCs Identify Novel Protumorigenic Factors. Hepatology, 56, 1817-1827. https://doi.org/10.1002/hep.25870
[24]
Heppner, K.J., Matrisian, L.M., Jensen, R.A. and Rodgers, W.H. (1996) Expression of Most Matrix Metalloproteinase Family Members in Breast Cancer Represents a Tumor-Induced Host Response. The American Journal of Pathology, 149, 273.