Recently, deep convolutional neural networks (DCNNs)
have achieved remarkable results in image classification tasks. Despite
convolutional networks’ great successes, their training process relies on a
large amount of data prepared in advance, which is often challenging in
real-world applications, such as streaming data and concept drift. For this reason,
incremental learning (continual learning) has attracted increasing attention from
scholars. However, incremental learning is associated with the challenge of
catastrophic forgetting: the performance on previous tasks drastically degrades
after learning a new task. In this paper, we propose a new strategy to
alleviate catastrophic forgetting when neural networks are trained in continual
domains. Specifically, two components are applied: data translation based on
transfer learning and knowledge distillation. The former translates a portion
of new data to reconstruct the partial data distribution of the old domain. The
latter uses an old model as a teacher to guide a new model. The experimental
results on three datasets have shown that our work can effectively alleviate
catastrophic forgetting by a combination of the two methods aforementioned.
References
[1]
Hsu, Y.C., Liu, Y.C., Ramasamy, A. and Kira, Z. (2018) Re-Evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines. Preprint.
https://arxiv.org/abs/1810.12488
[2]
van de Ven, G.M. and Tolias, A.S. (2019) Three Scenarios for Continual Learning. https://arxiv.org/abs/1904.07734
[3]
McClosk ey, M. and Cohen, N.J. (1989) Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. Psychology of Learning and Motivation, 24, 109-165. https://doi.org/10.1016/S0079-7421(08)60536-8
[4]
Kirkpatrick, J., Pascanu, R., Rabinowitz, N.C., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska Barwinska, A., et al. (2016) Overcoming Catastrophic Forgetting in Neural Networks. Proceedings of the National Academy of Sciences of the United States of America, 114, 3521-3526.
https://doi.org/10.1073/pnas.1611835114
[5]
Zenke, F., Poole, B. and Ganguli, S. (2017) Continual Learning through Synaptic Intelligence. Proceedings of the 34th International Conference on Machine Learning, Sydney, 6-11 August 2017, 3987-3995.
[6]
Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M. and Tuytelaars, T. (2018) Memory Aware Synapses: Learning What (Not) to Forget. Proceedings of the Computer Vision-ECCV 2018-15th European Conference, Munich, 8-14 September 2018, 144-161. https://doi.org/10.1007/978-3-030-01219-9_9
[7]
Delange, M., Aljundi, R., Masana, M., Parisot, S., Jia, X., Leonardis, A., Slabaugh, G., Tuytelaars, T. (2021) A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 3366-3385. https://doi.org/10.1109/TPAMI.2021.3057446
[8]
Parisi, G.I., Kemker, R., Part, J.L., Kanan, C., Wermter, S. (2019) Continual Lifelong Learning with Neural Ntworks: A Review. Neural Networks, 113, 54-71.
https://doi.org/10.1016/j.neunet.2019.01.012
[9]
Belouadah, E., Popescu, A. and Kanellos, I. (2020) A Comprehensive Study of Class Incremental Learning Algorithms for Visual Tasks. Neural Networks, 135, 38-54.
https://doi.org/10.1016/j.neunet.2020.12.003
[10]
Zhao, B., Xiao, X., Gan, G., Zhang, B. and Xia, S.T. (2020) Fairness in Class Incremental Learning. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 13-19 June 2020, 13205-13214.
https://doi.org/10.1109/CVPR42600.2020.01322
[11]
Hou, S., Pan, X., Loy, C.C., Wang, Z. and Lin, D. (2019) Learning a Unified Classifier Incrementally via Rebalancing. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 16-20 June 2019, 831-839.
https://doi.org/10.1109/CVPR.2019.00092
[12]
Mittal, S., Galesso, S. and Brox, T. (2021) Essentials for Class Incremental Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Nashville, TN, 19-25 June 2021, 3508-3517.
https://doi.org/10.1109/CVPRW53098.2021.00390
[13]
Volpi, R., Larlus, D. and Rogez, G. (2020) Continual Adaptation of Visual Representations via Domain Randomization and Meta-Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, 20-25 June 2021, 4441-4451. https://doi.org/10.1109/CVPR46437.2021.00442
[14]
Fernando, C., Banarse, D., Blundell, C., Zwols, Y., Ha, D., Rusu, A.A., Pritzel, A. and Wierstra, D. (2017) PathNet: Evolution Channels Gradient Descent in Super Neural Networks. https://arxiv.org/abs/1701.08734
[15]
Serrà, J., Suris, D., Miron, M. and Karatzoglou, A. (2018) Overcoming Catastrophic Forgetting with Hard Attention to the Task. Proceedings of the Proceedings of the 35th International Conference on Machine Learning, Stockholm, 10-15 July 2018, 4555-4564.
[16]
Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R. and Hadsell, R. (2016) Progressive Neural Networks.
https://arxiv.org/abs/1606.04671
[17]
Xu, J. and Zhu, Z. (2018) Reinforced Continual Learning. Proceedings of the Advances in Neural Information Processing Systems, Montréal, 3-8 December 2018, 907-916.
[18]
Yan, S., Xie, J. and He, X. (2021) DER: Dynamically Expandable Representation for Class Incremental Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, 20-25 June 2021, 3013-3022.
https://doi.org/10.1109/CVPR46437.2021.00303
[19]
Rajasegaran, J., Khan, S., Hayat, M., Khan, F.S. and Shah, M. (2020) iTAML: An Incremental Task-Agnostic Meta-learning Approach. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 13-19 June 2020, 13585-13594. https://doi.org/10.1109/CVPR42600.2020.01360
[20]
Wang, Q., Fink, O., Van Gool, L., et al. (2022) Continual Test-Time Domain Adaptation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, 18-24 June 2022, 7201-7211.
https://doi.org/10.1109/CVPR52688.2022.00706
[21]
Lee, K., Lee, K., Shin, J. and Lee, H. (2019) Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 312-321.
https://doi.org/10.1109/ICCV.2019.00040.
[22]
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K. and Wang, B. (2019) Moment Matching for Multi-Source Domain Adaptation. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, 27 October-2 November 2019, 1406-1415. https://doi.org/10.1109/ICCV.2019.00149
[23]
Pan, S.J., Tsang, I.W., Kwok, J.T. and Yang, Q. (2011) Domain Adaptation via Transfer Component Analysis. IEEE Transactions on Neural Networks, 22, 199-210. https://doi.org/10.1109/TNN.2010.2091281
[24]
Hinton, G.E., Vinyals, O. and Dean, J. (2015) Distilling the Knowledge in a Neural Network. https://arxiv.org/abs/1503.02531
[25]
Addepalli, S., Nayak, G.K., Chakraborty, A. and Radhakrishnan, V.B. (2020) DeGAN: Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier. Proceedings of the The Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, 7-12 February 2020, 3130-3137.
[26]
Kim, J., Jeong, J. and Shin, J. (2020) M2m: Imbalanced Classification via Major-to-Minor Translation. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 13-19 June 2020, 13893-13902. https://doi.org/10.1109/CVPR42600.2020.01391.
[27]
LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324.
https://doi.org/10.1109/5.726791
[28]
Ganin, Y. and Lempitsky, V.S. (2015) Unsupervised Domain Adaptation by Backpropagation. Proceedings of the Proceedings of the 32nd International Conference on Machine Learning, Lille, 6-11 July 2015, 1180-1189.
[29]
Xu, R., Chen, Z., Zuo, W., Yan, J. and Lin, L. (2018) Deep Cocktail Network: Multi-Source Unsupervised Domain Adaptation with Category Shift. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 18-22 June 2018, 3964-3973. https://doi.org/10.1109/CVPR.2018.00417
[30]
Krizhevsky, A., Hinton, G., et al. (2009) Learning Multiple Layers of Features from Tiny Images. http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf
[31]
Deng, J., Dong, W., Socher, R., Li, L., Li, K. and Li, F. (2009) ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255. https://doi.org/10.1109/CVPR.2009.5206848
[32]
Sutskever, I., Martens, J., Dahl, G.E. and Hinton, G.E. (2013) On the Importance of Initialization and Momentum in Deep Learning. Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, 16-21 June 2013, 1139-1147.
[33]
Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Nevada, 3-6 December 2012, 1106-1114.
[34]
He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 27-30 June 2016, 770-778.
https://doi.org/10.1109/CVPR.2016.90.