Performance Comparison of Vision Transformer- and CNN-Based Image Classification Using Cross Entropy: A Preliminary Application to Lung Cancer Discrimination from CT Images
This study evaluates the performance and reliability of a vision transformer (ViT) compared to convolutional neural networks (CNNs) using the ResNet50 model in classifying lung cancer from CT images into four categories: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), large cell carcinoma (LULC), and normal. Although CNNs have made significant advancements in medical imaging, their limited capacity to capture long-range dependencies has led to the exploration of ViTs, which leverage self-attention mechanisms for a more comprehensive global understanding of images. The study utilized a dataset of 748 lung CT images to train both models with standardized input sizes, assessing their performance through conventional metrics—accuracy, precision, recall, F1 score, specificity, and AUC—as well as cross entropy, a novel metric for evaluating prediction uncertainty. Both models achieved similar accuracy rates (95%), with ViT demonstrating a slight edge over ResNet50 in precision and F1 scores for specific classes. However, ResNet50 exhibited higher recall for LULC, indicating fewer missed cases. Cross entropy analysis showed that the ViT model had lower average uncertainty, particularly in the LUAD, Normal, and LUSC classes, compared to ResNet50. This finding suggests that ViT predictions are generally more reliable, though ResNet50 performed better for LULC. The study underscores that accuracy alone is insufficient for model comparison, as cross entropy offers deeper insights into the reliability and confidence of model predictions. The results highlight the importance of incorporating cross entropy alongside traditional metrics for a more comprehensive evaluation of deep learning models in medical image classification, providing a nuanced understanding of their performance and reliability. While the ViT outperformed the CNN-based ResNet50 in lung cancer classification based on cross-entropy values, the performance differences were minor and may not hold clinical significance. Therefore, it may be premature to consider replacing CNNs with ViTs in this specific application.
References
[1]
World Health Organization (2024) Global Cancer Burden Growing, amidst Mounting Need for Services. https://www.who.int/news/item/01-02-2024-global-cancer-burden-growing--amidst-mounting-need-for-services
[2]
Kwee, T.C. and Kwee, R.M. (2021) Workload of Diagnostic Radiologists in the Foreseeable Future Based on Recent Scientific Advances: Growth Expectations and Role of Artificial Intelligence. Insights into Imaging, 12, Article No. 88. https://doi.org/10.1186/s13244-021-01031-4
[3]
Harolds, J.A., Parikh, J.R., Bluth, E.I., Dutton, S.C. and Recht, M.P. (2016) Burnout of Radiologists: Frequency, Risk Factors, and Remedies: A Report of the ACR Commission on Human Resources. Journal of the American College of Radiology, 13, 411-416. https://doi.org/10.1016/j.jacr.2015.11.003
[4]
Lewis, S.J., Gandomkar, Z. and Brennan, P.C. (2019) Artificial Intelligence in Medical Imaging Practice: Looking to the Future. Journal of Medical Radiation Sciences, 66, 292-295. https://doi.org/10.1002/jmrs.369
[5]
Rajpurkar, P., Chen, E., Banerjee, O. and Topol, E.J. (2022) AI in Health and Medicine. Nature Medicine, 28, 31-38. https://doi.org/10.1038/s41591-021-01614-0
[6]
Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) Imagenet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. https://doi.org/10.1145/3065386
[7]
Goodfellow, I., Bengio, Y. and Courville, A. (2016) Deep Learning. MIT Press.
[8]
Chen, M. (2024) Classification with Convolutional Neural Networks in Mapreduce. Journal of Computer and Communications, 12, 174-190. https://doi.org/10.4236/jcc.2024.128011
[9]
Ren, J. and Wang, Y. (2022) Overview of Object Detection Algorithms Using Convolutional Neural Networks. Journal of Computer and Communications, 10, 115-132. https://doi.org/10.4236/jcc.2022.101006
[10]
Lundervold, A.S. and Lundervold, A. (2019) An Overview of Deep Learning in Medical Imaging Focusing on MRI. Zeitschrift für Medizinische Physik, 29, 102-127. https://doi.org/10.1016/j.zemedi.2018.11.002
[11]
Lakhani, P. and Sundaram, B. (2017) Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks. Radiology, 284, 574-582. https://doi.org/10.1148/radiol.2017162326
[12]
Iqbal, T. and Ali, H. (2018) Generative Adversarial Network for Medical Images (MI-GAN). Journal of Medical Systems, 42, Article No. 231. https://doi.org/10.1007/s10916-018-1072-9
[13]
Comes, M.C., Fanizzi, A., Bove, S., Didonna, V., Diotaiuti, S., La Forgia, D., et al. (2021) Early Prediction of Neoadjuvant Chemotherapy Response by Exploiting a Transfer Learning Approach on Breast DCE-MRIs. Scientific Reports, 11, Article No. 14123. https://doi.org/10.1038/s41598-021-93592-z
[14]
Comes, M.C., Fucci, L., Mele, F., Bove, S., Cristofaro, C., De Risi, I., et al. (2022) A Deep Learning Model Based on Whole Slide Images to Predict Disease-Free Survival in Cutaneous Melanoma Patients. Scientific Reports, 12, Article No. 20366. https://doi.org/10.1038/s41598-022-24315-1
[15]
Bove, S., Fanizzi, A., Fadda, F., Comes, M.C., Catino, A., Cirillo, A., et al. (2023) A CT-Based Transfer Learning Approach to Predict NSCLC Recurrence: The Added-Value of Peritumoral Region. PLOS ONE, 18, e0285188. https://doi.org/10.1371/journal.pone.0285188
[16]
Sakamoto, T., Furukawa, T., Lami, K., Pham, H.H.N., Uegami, W., Kuroda, K., et al. (2020) A Narrative Review of Digital Pathology and Artificial Intelligence: Focusing on Lung Cancer. Translational Lung Cancer Research, 9, 2255-2276. https://doi.org/10.21037/tlcr-20-591
[17]
Silva, F., Pereira, T., Neves, I., Morgado, J., Freitas, C., Malafaia, M., et al. (2022) Towards Machine Learning-Aided Lung Cancer Clinical Routines: Approaches and Open Challenges. Journal of Personalized Medicine, 12, Article No. 480. https://doi.org/10.3390/jpm12030480
[18]
Azad, R., Kazerouni, A., Heidari, M., Aghdam, E.K., Molaei, A., Jia, Y., et al. (2024) Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review. Medical Image Analysis, 91, Article ID: 103000. https://doi.org/10.1016/j.media.2023.103000
[19]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) An Image Is Worth16x16 Words: Transformers for Image Recognition at Scale. https://arxiv.org/abs/2010.11929
[20]
Zhu, X., Su, W., Lu, L., et al. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection. https://doi.org/10.48550/arXiv.2010.11929
[21]
21. Chen, J., Lu, Y., Yu, Q., et al. (2021) TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. https://doi.org/10.48550/arXiv.2102.04306
[22]
Maurício, J., Domingues, I. and Bernardino, J. (2023) Comparing Vision Transformers and Convolutional Neural Networks for Image Classification: A Literature Review. Applied Sciences, 13, Article No. 5521. https://doi.org/10.3390/app13095521
[23]
Gheflati, B. and Rivaz, H. (2022) Vision Transformers for Classification of Breast Ultrasound Images. 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, 11-15 July 2022, 480-483. https://doi.org/10.1109/embc48229.2022.9871809
[24]
Wu, Y., Qi, S., Sun, Y., Xia, S., Yao, Y. and Qian, W. (2021) A Vision Transformer for Emphysema Classification Using CT Images. Physics in Medicine & Biology, 66, Article ID: 245016. https://doi.org/10.1088/1361-6560/ac3dc8
[25]
Fanizzi, A., Fadda, F., Comes, M.C., Bove, S., Catino, A., Di Benedetto, E., et al. (2023) Comparison between Vision Transformers and Convolutional Neural Networks to Predict Non-Small Lung Cancer Recurrence. Scientific Reports, 13, Article No. 20605. https://doi.org/10.1038/s41598-023-48004-9
[26]
Gai, L., Xing, M., Chen, W., Zhang, Y. and Qiao, X. (2023) Comparing CNN-Based and Transformer-Based Models for Identifying Lung Cancer: Which Is More Effective? Multimedia Tools and Applications, 83, 59253-59269. https://doi.org/10.1007/s11042-023-17644-4
[27]
Uparkar, O., Bharti, J., Pateriya, R.K., Gupta, R.K. and Sharma, A. (2023) Vision Transformer Outperforms Deep Convolutional Neural Network-Based Model in Classifying X-Ray Images. Procedia Computer Science, 218, 2338-2349. https://doi.org/10.1016/j.procs.2023.01.209
[28]
Goh, J.H.L., Ang, E., Srinivasan, S., Lei, X., Loh, J., Quek, T.C., et al. (2024) Comparative Analysis of Vision Transformers and Conventional Convolutional Neural Networks in Detecting Referable Diabetic Retinopathy. Ophthalmology Science, 4, Article ID: 100552. https://doi.org/10.1016/j.xops.2024.100552
[29]
Oh, S., Kim, N. and Ryu, J. (2024) Analyzing to Discover Origins of CNNs and Vit Architectures in Medical Images. Scientific Reports, 14, Article No. 8755. https://doi.org/10.1038/s41598-024-58382-3
[30]
Murphy, Z.R., Venkatesh, K., Sulam, J. and Yi, P.H. (2022) Visual Transformers and Convolutional Neural Networks for Disease Classification on Radiographs: A Comparison of Performance, Sample Efficiency, and Hidden Stratification. Radiology: Artificial Intelligence, 4, e220012. https://doi.org/10.1148/ryai.220012
[31]
Cantone, M., Marrocco, C., Tortorella, F. and Bria, A. (2023) Convolutional Networks and Transformers for Mammography Classification: An Experimental Study. Sensors, 23, Article No. 1229. https://doi.org/10.3390/s23031229
[32]
Nishigaki, D., Suzuki, Y., Watabe, T., Katayama, D., Kato, H., Wataya, T., et al. (2024) Vision Transformer to Differentiate between Benign and Malignant Slices in 18F-FDG PET/CT. Scientific Reports, 14, Article No. 8334. https://doi.org/10.1038/s41598-024-58220-6
Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. https://doi.org/10.48550/arXiv.1409.1556
[35]
Szegedy, C., Liu, W., Jia, Y., et al. (2015) Going Deeper with Convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 1-9. https://doi.org/10.1109/cvpr.2015.7298594
[36]
He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. https://doi.org/10.1109/cvpr.2016.90
[37]
Matsuyama, E., Nishiki, M., Takahashi, N. and Watanabe, H. (2024) Using Cross Entropy as a Performance Metric for Quantifying Uncertainty in DNN Image Classifiers: An Application to Classification of Lung Cancer on CT Images. Journal of Biomedical Science and Engineering, 17, 1-12. https://doi.org/10.4236/jbise.2024.171001
[38]
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010. https://doi.org/10.48550/arXiv.1706.03762
[39]
Powers, D.M. (2020) Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. https://doi.org/10.48550/arXiv.2010.16061
[40]
Shan, B. and Fang, Y. (2020) A Cross Entropy Based Deep Neural Network Model for Road Extraction from Satellite Images. Entropy, 22, Article No. 535. https://doi.org/10.3390/e22050535
[41]
Kurian, N.C., Meshram, P.S., Patil, A., Patel, S. and Sethi, A. (2021) Sample Specific Generalized Cross Entropy for Robust Histology Image Classification. 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Nice, 13-16 April 2021, 1934-1938. https://doi.org/10.1109/isbi48211.2021.9434169
[42]
Mannor, S., Peleg, D. and Rubinstein, R. (2005) The Cross Entropy Method for Classification. Proceedings of the 22nd International Conference on Machine Learning—ICML’05, Bonn, 7-11 August 2005, 561-568. https://doi.org/10.1145/1102351.1102422
[43]
Brownlee, J. (2020) A Gentle Introduction to Cross-Entropy for Machine Learning. https://machinelearningmastery.com/cross-entropy-for-machine-learning/
[44]
Mao, A., Mohri, M. and Zhong, Y. (2023) Cross-Entropy Loss Functions: Theoretical Analysis and Applications. Proceedings of the 40th International Conference on Machine Learning, Honolulu, Vol. 202, 23803-23828. https://proceedings.mlr.press/v202/mao23b/mao23b.pdf
[45]
Nova (2023) A Comprehensive Guide to Cross Entropy in Machine Learning. https://aitechtrend.com/a-comprehensive-guide-to-cross-entropy-in-machine-learning/
[46]
Sheikh, I. (2023) Understanding Cross-Entropy Loss and Its Role in Classification Problems. https://medium.com/@l228104/understanding-cross-entropy-loss-and-its-role-in-classification-problems-d2550f2caad5
[47]
Chen, F., Luo, Z., Zhou, L., Pan, X. and Jiang, Y. (2024) Comprehensive Survey of Model Compression and Speed up for Vision Transformers. Journal of Information, Technology and Policy, 1-12. https://doi.org/10.62836/jitp.v1i1.156
[48]
Li, Y., Xu, S., Lin, M., Cao, X., Liu, C., Sun, X., et al. (2024) Bi-Vit: Pushing the Limit of Vision Transformer Quantization. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 3243-3251. https://doi.org/10.1609/aaai.v38i4.28109
[49]
Kim, J., Park, J., Kim, S. and Lee, J. (2024) Curved Representation Space of Vision Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 13142-13150. https://doi.org/10.1609/aaai.v38i12.29213
[50]
Pardyl, A., Kurzejamski, G., Olszewski, J., Trzci’nski, T. and Zieli’nski, B. (2023) Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers. https://arxiv.org/abs/2309.13353
[51]
Pandey, L., Wood, S.M.W. and Wood, J.N. (2023) Are Vision Transformers More Data Hungry than Newborn Visual Systems? https://arxiv.org/abs/2312.02843