全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于金字塔池化以及掩码生成的特征知识蒸馏
Feature Knowledge Distillation Based on Pyramid Pooling and Mask Generation

DOI: 10.12677/mos.2025.142151, PP. 279-290

Keywords: 模型压缩,知识蒸馏,特征蒸馏
Model Compression
, Knowledge Distillation, Feature Distillation

Full-Text   Cite this paper   Add to My Lib

Abstract:

知识蒸馏(KD)的目标是将知识从大型教师网络传递到轻量级的学生网络中去。主流的KD方法可以被分为Logit蒸馏和特征蒸馏。基于特征的知识蒸馏是KD的重要组成部分,它利用中间层来监督学生网络的训练过程。然而,中间层的潜在不匹配可能会在训练过程中适得其反,并且目前的学生模型往往直接通过模仿老师的特征来学习。针对这一问题,本文提出了一种新的知识蒸馏框架,称为解耦空间金字塔池知识蒸馏,以区分特征图中区域的重要性。同时,本文还提出了一种掩码生成特征蒸馏模块,指导学生模型通过一个块生成而不是模仿教师的完整特征。与之前复杂的蒸馏方法相比,本文提出的方法在CIFAR-100和Tiny-ImageNet数据集上取得了更高的知识蒸馏模型分类结果。
The goal of Knowledge Distillation (KD) is to transfer knowledge from a large teacher network to a lightweight student network. Mainstream KD methods can be divided into logit distillation and feature distillation. Feature-based knowledge distillation is a critical component of KD, utilizing intermediate layers to supervise the training process of the student network. However, potential mismatches in intermediate layers may backfire during training, and current student models often learn directly by imitating the teacher’s features. To address this issue, this paper proposes a novel distillation framework called Decoupled Spatial Pyramid Pooling Knowledge Distillation, which distinguishes the importance of regions in feature maps. This paper also introduces a mask-based feature distillation module, which guides the student model to generate features from a block rather than mimicking the complete features of the teacher model. Compared to previous complex distillation methods, the proposed approach achieves superior classification results on the CIFAR-100 and Tiny-ImageNet datasets.

References

[1]  He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778.
https://doi.org/10.1109/cvpr.2016.90

[2]  Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90.
https://doi.org/10.1145/3065386

[3]  Hu, J., Shen, L. and Sun, G. (2018) Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 7132-7141.
https://doi.org/10.1109/cvpr.2018.00745

[4]  Li, Q., Jin, S. and Yan, J. (2017) Mimicking Very Efficient Network for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 7341-7349.
https://doi.org/10.1109/cvpr.2017.776

[5]  Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944.
https://doi.org/10.1109/cvpr.2017.106

[6]  Long, J., Shelhamer, E. and Darrell, T. (2015) Fully Convolutional Networks for Semantic Segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 3431-3440.
https://doi.org/10.1109/cvpr.2015.7298965

[7]  Ren, S., He, K., Girshick, R. and Sun, J. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149.
https://doi.org/10.1109/TPAMI.2016.2577031

[8]  Geoffrey, H., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Network. arXiv: 1503.02531.
https://doi.org/10.48550/arXiv.1503.02531

[9]  Zhao, B., Cui, Q., Song, R., Qiu, Y. and Liang, J. (2022) Decoupled Knowledge Distillation. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11943-11952.
https://doi.org/10.1109/cvpr52688.2022.01165

[10]  Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C. and Bengio, Y. (2015) FitNets: Hints for Thin Deep Nets. arXiv: 1412.6550.
https://doi.org/10.48550/arXiv.1412.6550

[11]  He, K., Zhang, X., Ren, S. and Sun, J. (2015) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916.
https://doi.org/10.1109/tpami.2015.2389824

[12]  Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A. and Ghasemzadeh, H. (2020) Improved Knowledge Distillation via Teacher Assistant. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 5191-5198.
https://doi.org/10.1609/aaai.v34i04.5963
s
[13]  Zhang, Y., Xiang, T., Hospedales, T.M. and Lu, H. (2018) Deep Mutual Learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4320-4328.
https://doi.org/10.1109/cvpr.2018.00454

[14]  Jin, Y., Wang, J. and Lin, D. (2023) Multi-Level Logit Distillation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 24276-24285.
https://doi.org/10.1109/cvpr52729.2023.02325

[15]  Li, Z., Li, X., Yang, L., Zhao, B., Song, R., Luo, L., et al. (2023) Curriculum Temperature for Knowledge Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 1504-1512.
https://doi.org/10.1609/aaai.v37i2.25236

[16]  Phuong, M. and Lampert, C.H. (2019) Towards Understanding Knowledge Distillation. International Conference on Machine Learning. arXiv: 2105.13093.
https://doi.org/10.48550/arXiv.2105.13093

[17]  Cheng, X., Rao, Z., Chen, Y. and Zhang, Q. (2020) Explaining Knowledge Distillation by Quantifying the Knowledge. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 12922-12932.
https://doi.org/10.1109/cvpr42600.2020.01294

[18]  Chen, P., Liu, S., Zhao, H. and Jia, J. (2021) Distilling Knowledge via Knowledge Review. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 5006-5015.
https://doi.org/10.1109/cvpr46437.2021.00497

[19]  Chen, D., Mei, J., Zhang, H., Wang, C., Feng, Y. and Chen, C. (2022) Knowledge Distillation with the Reused Teacher Classifier. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11923-11932.
https://doi.org/10.1109/cvpr52688.2022.01163

[20]  Tian, Y., Krishnan, D. and Isola P. (2019) Contrastive Representation Distillation. arXiv: 1910.10699.
https://doi.org/10.48550/arXiv.1910.10699

[21]  Zagoruyko, S. and Komodakis, N. (2016) Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. arXiv: 1612.03928.
https://doi.org/10.48550/arXiv.1612.03928

[22]  Gou, J., Yu, B., Maybank, S.J. and Tao, D. (2021) Knowledge Distillation: A Survey. International Journal of Computer Vision, 129, 1789-1819.
https://doi.org/10.1007/s11263-021-01453-z

[23]  Yang, Z., Li, Z., Shao, M., Shi, D., Yuan, Z. and Yuan, C. (2022) Masked Generative Distillation. Computer Vision—ECCV 2022, Tel Aviv, 23-27 October 2022, 53-69.
https://doi.org/10.1007/978-3-031-20083-0_4

[24]  Park, W., Kim, D., Lu, Y. and Cho, M. (2019) Relational Knowledge Distillation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 3962-3971.
https://doi.org/10.1109/cvpr.2019.00409

[25]  Tung, F. and Mori, G. (2019) Similarity-Preserving Knowledge Distillation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, 27 October-2 November 2019, 1365-1374.
https://doi.org/10.1109/iccv.2019.00145

[26]  Song, J., Chen, Y., Ye, J. and Song, M. (2022) Spot-Adaptive Knowledge Distillation. IEEE Transactions on Image Processing, 31, 3359-3370.
https://doi.org/10.1109/tip.2022.3170728

[27]  Guo, Z., Yan, H., Li, H. and Lin, X. (2023) Class Attention Transfer Based Knowledge Distillation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 11868-11877.
https://doi.org/10.1109/cvpr52729.2023.01142

[28]  Gao, L. and Gao, H. (2023) Feature Decoupled Knowledge Distillation via Spatial Pyramid Pooling. Computer Vision—ACCV 2022, Macao, 4-8 December 2022, 732-745.
https://doi.org/10.1007/978-3-031-26351-4_44

[29]  Krizhevsky, A. and Hinton, G. (2009) Learning Multiple Layers of Features from Tiny Images. Technical Report, University of Toronto, Toronto.
[30]  Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115, 211-252.
https://doi.org/10.1007/s11263-015-0816-y

[31]  Zhang, X., Zhou, X., Lin, M. and Sun, J. (2018) Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 6848-6856.
https://doi.org/10.1109/cvpr.2018.00716

[32]  Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv: 1409.1556.
https://doi.org/10.48550/arXiv.1409.1556

[33]  Zagoruyko, S. and Komodakis, N. (2016) Wide Residual Networks. In: Wilson, R.C., Hancock, E.R. and Smith, W.A.P., Eds., Proceedings of the British Machine Vision Conference 2016, BMVA Press.
https://doi.org/10.5244/c.30.87

[34]  Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: 1704.04861.
https://doi.org/10.48550/arXiv.1704.04861

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133