|
基于深度学习与Mediapipe的列车司机手比行为检测方法研究
|
Abstract:
列车司机的行为监测在智能交通系统中对于提高安全性和减少交通事故至关重要。针对列车司机在驾驶过程中手比行为的识别,本研究提出了一种基于深度学习与Mediapipe技术相结合的手比行为检测方法。研究重点在于提升手比行为的检测精度与实时性,尤其是在复杂环境下的应用。研究首先使用ResNet50卷积神经网络(CNN)对列车驾驶舱图像数据集进行训练,完成对掌手比与指手比的分类任务。通过对不同手势类型的数据进行训练,模型成功实现了超过85%的准确率,验证了深度学习在此类行为识别中的有效性。此外,研究采用了Mediapipe框架,通过实时的手部关键点检测与姿态估计,基于动态视频数据对智轨司机的手比行为进行了分析。该方法结合关键点之间的几何关系,准确率达到90%,能够在动态驾驶环境中实现高效的行为识别。本研究的创新性在于,结合深度学习的特征提取能力与Mediapipe的实时骨架点检测,优化了手比行为的检测精度和环境适应性。通过实验验证,提出的检测方法能够在复杂环境下稳定运行,具有显著的实时性和鲁棒性。这为智能交通系统中的司机行为监控提供了新的技术路径,尤其在提升智能驾驶舱安全性和交互效率方面具有重要应用价值。
The monitoring of train driver behavior is crucial for enhancing safety and reducing traffic accidents in intelligent transportation systems. This study proposes a hand gesture behavior detection method for train drivers during operation, which combines deep learning with Mediapipe technology. The focus of the research is to improve the detection accuracy and real-time performance of hand gestures, especially in complex environments. The study first uses the ResNet50 convolutional neural network (CNN) to train a dataset of train cockpit images, completing the classification task of palm gestures and finger gestures. By training on different gesture types, the model successfully achieved an accuracy rate exceeding 85%, validating the effectiveness of deep learning in such behavior recognition tasks. Additionally, the research employs the Mediapipe framework for real-time hand keypoint detection and posture estimation, analyzing the hand gesture behaviors of smart track drivers based on dynamic video data. The method, which incorporates the geometric relationships between keypoints, achieved an accuracy rate of 90%, enabling efficient behavior recognition in dynamic driving environments. The novelty of this study lies in the integration of deep learning’s feature extraction capabilities with Mediapipe’s real-time skeletal point detection, optimizing the detection accuracy and environmental adaptability of hand gestures. Experimental validation shows that the proposed detection method can operate stably in complex environments, demonstrating significant real-time performance and robustness. This provides a new technical pathway for driver behavior monitoring in intelligent transportation systems, with substantial application value, particularly in enhancing the safety and interaction efficiency of intelligent cockpits.
[1] | Wang, Q., Zhu, F., Dang, R., Wei, X., Han, G., Huang, J., et al. (2023) An Eye Tracking Investigation of Attention Mechanism in Driving Behavior under Emotional Issues and Cognitive Load. Scientific Reports, 13, Article No. 16963. https://doi.org/10.1038/s41598-023-43693-8 |
[2] | Xu, J., Fard, M., Zhang, N., Davy, J.L. and Robinson, S.R. (2024) Cognitive Load and Task Switching in Drivers: Implications for Road Safety in Semi-Autonomous Vehicles. Transportation Research Part F: Traffic Psychology and Behaviour, 107, 1175-1197. https://doi.org/10.1016/j.trf.2024.11.005 |
[3] | Filtness, A.J. and Naweed, A. (2017) Causes, Consequences and Countermeasures to Driver Fatigue in the Rail Industry: The Train Driver Perspective. Applied Ergonomics, 60, 12-21. https://doi.org/10.1016/j.apergo.2016.10.009 |
[4] | Tichon, J., Wallis, G. and Mildred, T. (2006) Virtual Training Environments to Improve Train Driver’s Crisis Decision Making. SimTecT 2006 Conference and Exhibition, Melbourne, 29 May-1 June 2006. |
[5] | Cogan, B. and Milius, B. (2023) Remote Control Concept for Automated Trains as a Fallback System: Needs and Preferences of Future Operators. Smart and Resilient Transportation, 5, 50-69. https://doi.org/10.1108/srt-11-2022-0018 |
[6] | Xu, J., Tang, Z., Zhao, H. and Zhang, J. (2019) Hand Gesture-Based Virtual Reality Training Simulator for Collaboration Rescue of a Railway Accident. Interacting with Computers, 31, 577-588. https://doi.org/10.1093/iwc/iwz037 |
[7] | Chang, C., Chang, C. and Lin, Y. (2022) A Hybrid CNN and LSTM-Based Deep Learning Model for Abnormal Behavior Detection. Multimedia Tools and Applications, 81, 11825-11843. https://doi.org/10.1007/s11042-021-11887-9 |
[8] | Singh, A.K., Kumbhare, V.A. and Arthi, K. (2022) Real-Time Human Pose Detection and Recognition Using Mediapipe. In: Reddy, V.S., Prasad, V.K., Wang, J. and Reddy, K., Eds., Soft Computing and Signal Processing, Springer, 145-154. https://doi.org/10.1007/978-981-16-7088-6_12 |
[9] | Watson, E., Viana, T. and Zhang, S. (2024) Machine Learning Driven Developments in Behavioral Annotation: A Recent Historical Review. International Journal of Social Robotics, 16, 1605-1618. https://doi.org/10.1007/s12369-024-01117-1 |
[10] | Chen, K., Zhang, D., Yao, L., Guo, B., Yu, Z. and Liu, Y. (2021) Deep Learning for Sensor-Based Human Activity Recognition: Overview, Challenges, and Opportunities. ACM Computing Surveys, 54, 1-40. https://doi.org/10.1145/3447744 |
[11] | Rahim, M.A., Miah, A.S.M., Akash, H.S., Shin, J., Hossain, M.I. and Hossain, M.N. (2024) An Advanced Deep Learning Based Three-Stream Hybrid Model for Dynamic Hand Gesture Recognition. arXiv: 2408.08035. https://doi.org/10.48550/arXiv.2408.08035 |
[12] | Mahbub, U., Imtiaz, H., Roy, T., Rahman, M.S. and Rahman Ahad, M.A. (2013) A Template Matching Approach of One-Shot-Learning Gesture Recognition. Pattern Recognition Letters, 34, 1780-1788. https://doi.org/10.1016/j.patrec.2012.09.014 |
[13] | Oudah, M., Al-Naji, A. and Chahl, J. (2020) Hand Gesture Recognition Based on Computer Vision: A Review of Techniques. Journal of Imaging, 6, Article 73. https://doi.org/10.3390/jimaging6080073 |
[14] | Ramamoorthy, A., Vaswani, N., Chaudhury, S. and Banerjee, S. (2003) Recognition of Dynamic Hand Gestures. Pattern Recognition, 36, 2069-2081. https://doi.org/10.1016/s0031-3203(03)00042-6 |
[15] | Escalera, S., Athitsos, V. and Guyon, I. (2017) Challenges in Multi-Modal Gesture Recognition. In: Escalera, S., Guyon, I. and Athitsos, V., Eds., Gesture Recognition, Springer, 1-60. https://doi.org/10.1007/978-3-319-57021-1_1 |
[16] | Kaur, A. and Bansal, S. (2022) Deep Learning for Dynamic Hand Gesture Recognition: Applications, Challenges and Future Scope. 2022 5th International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT), Aligarh, 26-27 November 2022, 1-6. https://doi.org/10.1109/impact55510.2022.10029100 |
[17] | Ay, Ö. and Emel, E. (2025) Real-time Assembly Task Validation Using Deep Learning-Based Object Detection and Operator’s Hand-Joints Trajectory Classification. IEEE Access, 13, 57009-57029. https://doi.org/10.1109/access.2025.3554263 |
[18] | Patel, M., Rao, S., Chauhan, S. and Kumar, B. (2024) Real-Time Hand Gesture Recognition Using Python and Web Application. 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N), Greater Noida, 16-17 December 2024, 564-570. https://doi.org/10.1109/icac2n63387.2024.10895151 |
[19] | He, L. and Zhang, J. (2021) Railway Driver Behavior Recognition System Based on Deep Learning Algorithm. 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, 28-31 May 2021, 398-403. https://doi.org/10.1109/icaibd51990.2021.9458983 |
[20] | Liu, C. and Szirányi, T. (2021) Real-time Human Detection and Gesture Recognition for On-Board UAV Rescue. Sensors, 21, Article 2180. https://doi.org/10.3390/s21062180 |
[21] | Park, S., Kwon, H., Baek, J. and Chung, K. (2022) Dimensional Expansion and Time-Series Data Augmentation Policy for Skeleton-Based Pose Estimation. IEEE Access, 10, 112261-112272. https://doi.org/10.1109/access.2022.3214659 |
[22] | Pérez-García, F., Sparks, R. and Ourselin, S. (2021) Torchio: A Python Library for Efficient Loading, Preprocessing, Augmentation and Patch-Based Sampling of Medical Images in Deep Learning. Computer Methods and Programs in Biomedicine, 208, Article ID: 106236. https://doi.org/10.1016/j.cmpb.2021.106236 |
[23] | Hernández-García, A. and König, P. (2018) Data Augmentation Instead of Explicit Regularization. arXiv: 1806.03852. https://doi.org/10.48550/arXiv.1806.03852 |
[24] | Xin, C., Kim, S., Cho, Y. and Park, K.S. (2024) Enhancing Human Action Recognition with 3D Skeleton Data: A Comprehensive Study of Deep Learning and Data Augmentation. Electronics, 13, Article 747. https://doi.org/10.3390/electronics13040747 |
[25] | Guo, T., Liu, H., Chen, Z., Liu, M., Wang, T. and Ding, R. (2022) Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-Supervised Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 762-770. https://doi.org/10.1609/aaai.v36i1.19957 |
[26] | Poon, Y., Kao, C., Wang, Y., Hsiao, C., Hung, M., Wang, Y., et al. (2021) Driver Distracted Behavior Detection Technology with Yolo-Based Deep Learning Networks. 2021 IEEE International Symposium on Product Compliance Engineering—Asia (ISPCE-ASIA), 30 November-1 December 2021, 1-5. https://doi.org/10.1109/ispce-asia53453.2021.9652435 |
[27] | Zhang, Z., Lu, X., Cao, G., Yang, Y., Jiao, L. and Liu, F. (2021) ViT-YOLO: Transformer-Based YOLO for Object Detection. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, 11-17 October 2021, 2799-2808. https://doi.org/10.1109/iccvw54120.2021.00314 |
[28] | Abbass, M.A.B. and Ban, Y. (2024) MobileNet-Based Architecture for Distracted Human Driver Detection of Autonomous Cars. Electronics, 13, Article 365. https://doi.org/10.3390/electronics13020365 |
[29] | Guo, K., Li, X., Zhang, M., Bao, Q. and Yang, M. (2021) Real-Time Vehicle Object Detection Method Based on Multi-Scale Feature Fusion. IEEE Access, 9, 115126-115134. https://doi.org/10.1109/access.2021.3104849 |
[30] | Steno, P., Alsadoon, A., Prasad, P.W.C., Al-Dala’in, T. and Alsadoon, O.H. (2020) A Novel Enhanced Region Proposal Network and Modified Loss Function: Threat Object Detection in Secure Screening Using Deep Learning. The Journal of Supercomputing, 77, 3840-3869. https://doi.org/10.1007/s11227-020-03418-4 |
[31] | Shi, J., Bian, J., Richter, J., Chen, K., Rahnenführer, J., Xiong, H., et al. (2021) MODES: Model-Based Optimization on Distributed Embedded Systems. Machine Learning, 110, 1527-1547. https://doi.org/10.1007/s10994-021-06014-6 |
[32] | Fan, Y. (2024) The Gesture Recognition Improvement of Mediapipe Model Based on Historical Trajectory Assist Tracking, Kalman Filtering and Smooth Filtering. Proceedings of the 2024 7th International Conference on Computer Information Science and Artificial Intelligence, Shaoxing, 13-15 September 2024, 641-647. https://doi.org/10.1145/3703187.3703295 |
[33] | Xing, Y., Lv, C., Cao, D. and Lu, C. (2020) Energy Oriented Driving Behavior Analysis and Personalized Prediction of Vehicle States with Joint Time Series Modeling. Applied Energy, 261, Article ID: 114471. https://doi.org/10.1016/j.apenergy.2019.114471 |
[34] | Tang, Y., Pan, M., Li, H. and Cao, X. (2024) A Convolutional-Transformer-Based Approach for Dynamic Gesture Recognition of Data Gloves. IEEE Transactions on Instrumentation and Measurement, 73, 1-13. https://doi.org/10.1109/tim.2024.3400361 |
[35] | Ong, A.J.S., Cabatuan, M., Tiberio, J.L.L. and Jose, J.A. (2022) LSTM-Based Traffic Gesture Recognition Using MediaPipe Pose. TENCON 2022—2022 IEEE Region 10 Conference (TENCON), Hong Kong, 1-4 November 2022, 1-5. https://doi.org/10.1109/tencon55691.2022.9977857 |
[36] | Ma, J., Chen, L. and Gao, Z. (2018) Hardware Implementation and Optimization of Tiny-Yolo Network. In: Zhai, G., Zhou, J. and Yang, X., Eds., Digital TV and Wireless Multimedia Communication, Springer, 224-234. https://doi.org/10.1007/978-981-10-8108-8_21 |