Accurate histological classification of lung cancer in CT images is essential for diagnosis and treatment planning. In this study, we propose a vision transformer (ViT) model with two-stage fine-tuning using wavelet transformation to improve classification performance. In the first stage, feature extraction is enhanced using wavelet-transformed images, and in the second stage, the model is fine-tuned with the original CT images. This method improves classification accuracy and enhances model robustness. Experimental results show that the proposed method outperforms conventional ViT and CNN fine-tuning methods. It achieves a classification accuracy of 0.971, surpassing the 0.953 obtained with conventional ViT fine-tuning and 0.945 with ResNet50 fine-tuning. Moreover, the proposed method reduces classification uncertainty, with particularly significant improvements in the classification of large cell lung carcinoma. These results demonstrate the effectiveness of incorporating wavelet-based feature extraction into ViT fine-tuning for lung cancer classification. Future research will focus on developing optimization techniques, applying the method to multimodal medical imaging, and integrating explainable AI technologies to further improve its applicability in clinical settings.
References
[1]
World Health Organization (2024) Global Cancer Burden Growing, Amidst Mounting Need for Services. https://www.who.int/news/item/01-02-2024-global-cancer-burden-growing--amidst-mounting-need-for-services
[2]
Luo, G., Zhang, Y., Rumgay, H., Morgan, E., Langselius, O., Vignat, J., et al. (2025) Estimated Worldwide Variation and Trends in Incidence of Lung Cancer by Histological Subtype in 2022 and over Time: A Population-Based Study. The Lancet Respiratory Medicine, 13, 348-363. https://doi.org/10.1016/s2213-2600(24)00428-4
[3]
Mannepalli, D., Kuan Tak, T., Bala Krishnan, S. and Sreenivas, V. (2025) GSC-DVIT: A Vision Transformer Based Deep Learning Model for Lung Cancer Classification in CT Images. Biomedical Signal Processing and Control, 103, Article ID: 107371. https://doi.org/10.1016/j.bspc.2024.107371
[4]
Luna, H.G.C., Severino Imasa, M., Juat, N., Hernandez, K.V., May Sayo, T., Cristal-Luna, G., et al. (2023) Expression Landscapes in Non-Small Cell Lung Cancer Shaped by the Thyroid Transcription Factor 1. Lung Cancer, 176, 121-131. https://doi.org/10.1016/j.lungcan.2022.12.015
[5]
Azad, R., Kazerouni, A., Heidari, M., Aghdam, E.K., Molaei, A., Jia, Y., et al. (2024) Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review. Medical Image Analysis, 91, Article ID: 103000. https://doi.org/10.1016/j.media.2023.103000
[6]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021) An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929. https://arxiv.org/abs/2010.11929
[7]
Zhu, X., Su, W., Lu, L., et al. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv: 2010.04159. https://arxiv.org/abs/2010.04159
[8]
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., et al. (2021) TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv: 2102.04306. https://doi.org/10.48550/arXiv.2102.04306
[9]
Matsuyama, E., Watanabe, H. and Takahashi, N. (2024) Performance Comparison of Vision Transformer-and CNN-Based Image Classification Using Cross Entropy: A Preliminary Application to Lung Cancer Discrimination from CT Images. Journal of Biomedical Science and Engineering, 17, 157-170. https://doi.org/10.4236/jbise.2024.179012
[10]
Ali, H., Mohsen, F. and Shah, Z. (2023) Improving Diagnosis and Prognosis of Lung Cancer Using Vision Transformers: A Scoping Review. BMC Medical Imaging, 23, Article No. 129. https://doi.org/10.1186/s12880-023-01098-z
[11]
Kumar, A., Mehta, R., Reddy, B.R. and Singh, K.K. (2024) Vision Transformer Based Effective Model for Early Detection and Classification of Lung Cancer. SN Computer Science, 5, Article No. 839. https://doi.org/10.1007/s42979-024-03120-9
[12]
Martin, O.A. and Sanchez, J. (2025) Evaluation of Vision Transformers for Multi-Modal Image Classification: A Case Study on Brain, Lung, and Kidney Tumors. arXiv: 2502.05517v1. https://arxiv.org/html/2502.05517v1
[13]
Xiong, Y., Du, B., Xu, Y., Deng, J., She, Y. and Chen, C. (2022) Pulmonary Nodule Classification with Multi-View Convolutional Vision Transformer. 2022 International Joint Conference on Neural Networks (IJCNN), Padua, 18-23 July 2022, 1-7. https://doi.org/10.1109/ijcnn55064.2022.9892716
[14]
Yang, L., Li, B., Dong, T. and Wang, L. (2024) ViTR-SP: A CT-Based Vision Transformer Model for Prediction of Pneumonitis in Patients with Non-Small Cell Lung Cancer Who Received Thoracic Radiotherapy and Immunotherapy. Journal of Clinical Oncology, 42, e20034-e20034. https://doi.org/10.1200/jco.2024.42.16_suppl.e20034
[15]
He, C., Diao, Y., Ma, X., Yu, S., He, X., Mao, G., et al. (2024) A Vision Transformer Network with Wavelet-Based Features for Breast Ultrasound Classification. Image Analysis and Stereology, 43, 185-194. https://doi.org/10.5566/ias.3116
[16]
Ding, M., Qu, A., Zhong, H., Lai, Z., Xiao, S. and He, P. (2023) An Enhanced Vision Transformer with Wavelet Position Embedding for Histopathological Image Classification. Pattern Recognition, 140, Article ID: 109532. https://doi.org/10.1016/j.patcog.2023.109532
[17]
Yao, T., Pan, Y., Li, Y., Ngo, C. and Mei, T. (2022) Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M. and Hassner, T., Eds., Computer Vision—ECCV 2022, Springer, 328-345. https://doi.org/10.1007/978-3-031-19806-9_19
[18]
Wu, F., Wu, J., Shu, H., Carrault, G. and Senhadji, L. (2024) Spatial-Enhanced Multi-Level Wavelet Patching in Vision Transformers. IEEE Signal Processing Letters, 31, 446-450. https://doi.org/10.1109/lsp.2024.3350811
[19]
Yang, D. and Seo, S. (2023) Discrete Wavelet Transform Meets Transformer: Unleashing the Full Potential of the Transformer for Visual Recognition. IEEE Access, 11, 102430-102443. https://doi.org/10.1109/access.2023.3316144
Chui, C.K. (1992) An Introduction to Wavelets. 2nd Edition, Academic Press.
[22]
Daubechies, I. (1992) Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611970104
[23]
Mallat, S.G. (1989) A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 674-693. https://doi.org/10.1109/34.192463
[24]
Shahbahrami, A. (2012) Algorithms and Architectures for 2D Discrete Wavelet Transform. The Journal of Supercomputing, 62, 1045-1064. https://doi.org/10.1007/s11227-012-0790-x
[25]
Abdulazeez1, A.M., Zeebaree, D.Q., Zebari, D.A., MustafaZebari, G., Adeen, I.M.N. (2020) The Applications of Discrete Wavelet Transform in Image Processing: A Review. Journal of Soft Computing and Data Mining, 1, 31-43. https://publisher.uthm.edu.my/ojs/index.php/jscdm/article/view/7215/3935
[26]
Park, N. and Kim, S. (2022) How Do Vision Transformers Work? arXiv: 2202.06709. https://doi.org/10.48550/arXiv.2202.06709
[27]
Bai, J., Yuan, L., Xia, S., Yan, S., Li, Z. and Liu, W. (2022) Improving Vision Transformers by Revisiting High-Frequency Components. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M. and Hassner, T., Eds., Computer Vision—ECCV 2022, Springer, 1-18. https://doi.org/10.1007/978-3-031-20053-3_1
[28]
Powers, D.M. (2020) Evaluation: From Precision, Recall and F-Measure to ROC, In-formedness, Markedness and Correlation. arXiv: 2010.16061. https://doi.org/10.48550/arXiv.2010.16061
[29]
Müller, D., Soto-Rey, I. and Kramer, F. (2022) Towards a Guideline for Evaluation Metrics in Medical Image Segmentation. BMC Research Notes, 15, Article No. 210. https://doi.org/10.1186/s13104-022-06096-y
[30]
Shan, B. and Fang, Y. (2020) A Cross Entropy Based Deep Neural Network Model for Road Extraction from Satellite Images. Entropy, 22, Article 535. https://doi.org/10.3390/e22050535
[31]
Mannor, S., Peleg, D. and Rubinstein, R. (2005) The Cross Entropy Method for Classification. Proceedings of the 22nd international conference on Machine learning—ICML’05, Bonn, 7-11 August 2005, 561-568. https://doi.org/10.1145/1102351.1102422
[32]
Mao, A., Mohri, M. and Zhong, Y. (2023) Cross-Entropy Loss Functions: Theoretical Analysis and Applications. Proceedings of the 40th International Conference on Machine Learning, Honolulu, 23-29 July 2023, 23803-23828. https://proceedings.mlr.press/v202/mao23b/mao23b.pdf
[33]
Matsuyama, E., Nishiki, M., Takahashi, N. and Watanabe, H. (2024) Using Cross Entropy as a Performance Metric for Quantifying Uncertainty in DNN Image Classifiers: An Application to Classification of Lung Cancer on CT Images. Journal of Biomedical Science and Engineering, 17, 1-12. https://doi.org/10.4236/jbise.2024.171001