OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Software Engineering and Applications 2022

SA-C3D神经网络在动作识别上的应用
Application of SA-C3D Neural Network in Action Recognition

DOI: 10.12677/SEA.2022.116161, PP. 1561-1569

张宏博, 陈胜

Keywords: C3D，3维卷积神经网络，自注意力，Non-Local，动作识别
C3D, 3-Dimensional Convolutional Neural Networks, Self-Attention, Non-Local, Action Recognition

Full-Text Cite this paper Add to My Lib

Abstract:

本文的主要目的是利用自注意力机制加强C3D网络在动作识别方面的准确率。C3D神经网络作为比较早提出的模型，在视频动作识别领域中有着重要的地位。随着各项研究的进展，C3D网络已经渐渐过时，识别准确率也较低。所以本文主要以C3D网络为基础，结合目前的自注意力机制，在C3D网络中集成了Non-Local模块，同时将固定学习率衰减替换为余弦退火学习率衰减，提高模型跳出局部最优解的能力。利用3D卷积提取动作视频的局部特征，再使用自注意力机制捕捉人体动作的全局信息，开发出新的SA-C3D网络。在没有预训练的前提下，对UCF-101数据集进行训练，识别准确率较之前的C3D网络以及一系列优秀的动作识别模型有了较大的提高，识别准确率高达95%。
The main objective of this paper is to enhance the accuracy of C3D networks for action recognition using a self-attentive mechanism. C3D neural networks, as a relatively early proposed model, have an important place in the field of video action recognition. With the progress of various researches, C3D networks have gradually become obsolete and the recognition accuracy is low. Therefore, this paper focuses on the C3D network as the basis, combining the current self-attentive mechanism, integrating the Non-Local module in the C3D network, while replacing the fixed learning rate decay with the cosine annealing learning rate decay to improve the ability of the model to jump out of the local optimal solution. The new SA-C3D network is developed by using 3D convolution to extract local features of action videos, and then using a self-attentive mechanism to capture global information of human actions. Trained on the UCF-101 dataset without pre-training, the recognition accuracy has improved significantly over the previous C3D network and a series of excellent action recognition models, with recognition accuracy as high as 95%.

References

[1]	Wang, C., Liu, M. and Qi, F. (2018) Summary of Dynamic Target Detection and Recognition Algorithm in Intelligent Video Surveillance System. Electrical Engineering.
[2]	李坤坤, 刘正熙, 熊运余. 基于深度学习的目标检测系统性文献综述[J]. 现代计算机, 2021(16): 98-102, 117.
[3]	Zhang, S., Wei, Z., Nie, J., et al. (2017) A Review on Human Activity Recognition Using Vision-Based Method. Journal of Healthcare Engineering, No. 3, 1-31. https://doi.org/10.1155/2017/3090343
[4]	钱闻卓. 基于MA-C3D神经网络的人体动作识别技术[J]. 现代计算机, 2021, 27(35): 70-74+94.
[5]	孙毅, 成金勇, 禹继国. 基于C3D模型的视频分类技术[J]. 曲阜师范大学学报(自然科学版), 2020, 46(3): 85-89.
[6]	Tran, D., Bourdev, L., Fergus, R., et al. (2015) Learning Spatiotemporal Features with 3d Convolutional Networks. Proceedings of the IEEE international Conference on Computer Vision, Santiago, 11-18 December 2015, 4489-4497. https://doi.org/10.1109/ICCV.2015.510
[7]	Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. 31st Conference on Neural Information Processing Systems, Long Beach, 4-9 December 2017, 5998-6008.
[8]	Deng, J., et al. (2009) Imagenet: A Large-Scale Hierarchical Image Database. IEEE Conference on Computer Vision and Pattern Recognition, Miami, 20-25 June 2009, 248-255. https://doi.org/10.1109/CVPR.2009.5206848
[9]	Krizhevsky, A. and Hinton, G. (2009) Learning Multiple Layers of Features from Tiny Images.
[10]	Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L. (2014) Microsoft COCO: Common Objects in Context. 13th European Conference, Zurich, 6-12 September 2014, 740-755. https://doi.org/10.1007/978-3-319-10602-1_48
[11]	Wang, X., Girshick, R., Gupta, A., et al. (2018) Non-Local Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-22 June 2018, 7794-7803. https://doi.org/10.1109/CVPR.2018.00813
[12]	Loshchilov, I. and Hutter, F. (2016) Sgdr: Stochastic Gradient Descent with Warm Restarts.
[13]	Hara, K., Kensho, H. and Satoh, Y. (2017) Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, 22-29 October 2017, 1109-1115. https://doi.org/10.1109/ICCVW.2017.373
[14]	Abdel-Aty, H., Zagrosek, A., Schulz-Menger, J., et al. (2004) Delayed Enhancement and T2-Weighted Cardiovascular Magnetic Resonance Imaging Differentiate Acute from Chronic Myocardial Infarction. Circulation, 109, 2411-2416. https://doi.org/10.1161/01.CIR.0000127428.10985.C6
[15]	Smulders, M.W., Bekkers, S.C.A.M., Kim, H.W., et al. (2015) Performance of CMR Methods for Differentiating Acute from Chronic MI. JACC: Cardiovascular Imaging, 8, 669-679. https://doi.org/10.1016/j.jcmg.2014.12.030

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

SA-C3D神经网络在动作识别上的应用Application of SA-C3D Neural Network in Action Recognition

SA-C3D神经网络在动作识别上的应用
Application of SA-C3D Neural Network in Action Recognition