Convolutional auto-encoders have shown their remarkable performance in stacking deep convolutional neural networks for classifying image data during the past several years. However, they are unable to construct the state-of-the-art convolutional neural networks due to their intrinsic architectures. In this paper, we have proposed an effective stacked convolutional auto-encoder that integrates a selective kernel attention mechanism for image classification. This model is based on a fully convolutional auto-encoder and can be trained end-to-end. It consists of two parts: encoder and decoder. The encoder and decoder are composed of convolutional layer chain and deconvolution layer chain, respectively. The proposed method consists of three main modifications. First, a selection kernel (SK) convolution module and a selection kernel deconvolution module are constructed to form convolutional layer chain and deconvolution layer chain. Second, to solve the problem of network degradation, the idea of residual networks is cited to add skip layer connections between the SK convolution module and the symmetrically connected SK deconvolution module. Third, to alleviate the overfitting of the model, a method of adding noise in data augmentation is used to improve the generalization ability of the model. The experimental results show that this method can effectively integrate the channel attention module and the fully convolutional autoencoder. Although it is an unsupervised feature learning model, it can still achieve good classification results.
References
[1]
Li, Z., Yang, W., Peng, S., et al. (2020) A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects.
[2]
Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) Imagenet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. https://doi.org/10.1145/3065386
[3]
Sun, Y., Xue, B., Zhang, M. and Yen, G.G. (2019) A Particle Swarm Optimization-Based Flexible Convolutional Autoencoder for Image Classification. IEEE Transactions on Neural Networks and Learning Systems, 30, 2295-2309. https://doi.org/10.1109/tnnls.2018.2881143
[4]
Masci, J., Meier, U., Cireşan, D. and Schmidhuber, J. (2011) Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction. In: Proceedings of International Conference on Artificial Neural Networks, Springer, 52-59. https://doi.org/10.1007/978-3-642-21735-7_7
[5]
Du, B., Xiong, W., Wu, J., Zhang, L., Zhang, L. and Tao, D. (2017) Stacked Convolutional Denoising Auto-Encoders for Feature Representation. IEEE Transactions on Cybernetics, 47, 1017-1027. https://doi.org/10.1109/tcyb.2016.2536638
[6]
Li, F., Qiao, H. and Zhang, B. (2018) Discriminatively Boosted Image Clustering with Fully Convolutional Auto-encoders. Pattern Recognition, 83, 161-173. https://doi.org/10.1016/j.patcog.2018.05.019
[7]
He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. https://doi.org/10.1109/cvpr.2016.90
[8]
Mao, X.J., Shen, C. and Yang, Y.B. (2016) Image Restoration Using Convolutional Auto-Encoders with Symmetric Skip Connections.
[9]
Cai, W.W. and Wei, Z.G. (2020) Remote Sensing Image Classification Based on a Cross-Attention Mechanism and Graph Convolution. IEEE Geoscience and Remote Sensing Letters, 19, Article ID: 8002005.
[10]
Vaswani, A., et al. (2017) Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 6000-6010.
[11]
Galassi, A., Lippi, M. and Torroni, P. (2019) Attention, Please! A Critical Review of Neural Attention Models in Natural Language Processing.
[12]
Wang, F., Jiang, M., Qian, C., et al. (2017) Residual Attention Network for Image Classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 3156-3164.
[13]
Li, X., Wang, W., Hu, X. and Yang, J. (2019) Selective Kernel Networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 510-519. https://doi.org/10.1109/cvpr.2019.00060
[14]
Fu, J., Liu, J., Li, Y., Bao, Y., Yan, W., Fang, Z., et al. (2020) Contextual Deconvolution Network for Semantic Segmentation. Pattern Recognition, 101, Article ID: 107152. https://doi.org/10.1016/j.patcog.2019.107152
[15]
Mounsaveng, S., Laradji, I., Ayed, I.B., Vazquez, D. and Pedersoli, M. (2021) Learning Data Augmentation with Online Bilevel Optimization for Image Classification. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, 3-8 January 2021, 1691-1700. https://doi.org/10.1109/wacv48630.2021.00173
[16]
Zoph, B., Cubuk, E.D., Ghiasi, G., Lin, T., Shlens, J. and Le, Q.V. (2020) Learning Data Augmentation Strategies for Object Detection. In: European Conference on Computer Vision, Springer International Publishing, 566-583. https://doi.org/10.1007/978-3-030-58583-9_34
[17]
Giuste, F.O. and Vizcarra, J.C. (2020) CIFAR-10 Image Classification Using Feature Ensembles.
[18]
Alahmadi, A., Hussain, M., Aboalsamh, H.A. and Zuair, M. (2019) Pcapool: Unsupervised Feature Learning for Face Recognition Using PCA, LBP, and Pyramid Pooling. Pattern Analysis and Applications, 23, 673-682. https://doi.org/10.1007/s10044-019-00818-y
[19]
Goodfellow, I., Warde-Farley, D., Mirza, M., et al. (2013) Maxout Networks. International Conference on Machine Learning, Atlanta, 16-21 June 2013, 1319-1327.
[20]
Lin, M., Chen, Q. and Yan, S. (2014) Network in Network.
[21]
He, K., Zhang, X., Ren, S. and Sun, J. (2016) Identity Mappings in Deep Residual Networks. In: European Conference on Computer Vision, Springer International Publishing, 630-645. https://doi.org/10.1007/978-3-319-46493-0_38.