全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于潜在空间回归的头部姿态估计方法
Head Pose Estimation Method Based on Latent Spatial Regression

DOI: 10.12677/mos.2025.145385, PP. 194-202

Keywords: 头部姿态估计,潜在空间回归,头部遮挡,特征增强
Head Pose Estimation
, Latent Spatial Regression, Head Occlusion, Feature Enhancement

Full-Text   Cite this paper   Add to My Lib

Abstract:

头部姿势估计已成为计算机视觉的一个重要研究领域,广泛地应用在机器人、监控或驾驶员注意力监控中。头部姿态估计最困难的挑战之一是管理真实场景中经常发生的头部遮挡,为此文章提出了一种基于潜在空间回归的头部姿态估计方法。在特征提取阶段采用Vision Transformer用于提取图像的全局信息,并设计了特征增强模块,采用金字塔结构提取局部特征信息。此外,为处理遮挡的存在引入了潜在空间回归,将遮挡图像的潜在特征逼近于非遮挡图像的特征,同时改进了头部姿态的角度预测,并设计了多重损失函数。实验结果表明,在经过遮挡处理的AFLW2000数据上,本文方法的平均绝对误差降低至9.872,优于现有方法,证明了本文方法处理头部遮挡的有效性。
Head pose estimation has become an important research field in computer vision, with extensive applications in robotics, surveillance, and driver attention monitoring. One of the most challenging difficulties in head pose estimation is managing occlusions that frequently occur in real-world scenarios. To address this, this paper proposes a head pose estimation method based on latent space regression. During the feature extraction stage, a Vision Transformer is employed to capture global information from images. A feature enhancement module is designed, adopting a pyramid structure to extract local feature information. Additionally, to handle occlusions, latent space regression is introduced to approximate the latent features of occluded images to those of non-occluded ones, while improving angle prediction for head poses and incorporating a multi-loss function design. Experimental results demonstrate that on the occlusion-processed AFLW2000 dataset, the mean absolute error of this method is reduced to 9.872, outperforming existing approaches and proving the effectiveness of our method in handling head occlusions.

References

[1]  He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778.
https://doi.org/10.1109/cvpr.2016.90
[2]  Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv: 2010.11929.
https://doi.org/10.48550/arXiv.2010.11929
[3]  Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 936-944.
https://doi.org/10.1109/cvpr.2017.106
[4]  Ruiz, N. and Rehg, J.M. (2017) Dockerface: An Easy to Install and Use Faster R-CNN Face Detector in a Docker Container. arXiv:1708.04370.
[5]  Ranjan, R., Patel, V.M. and Chellappa, R. (2019) HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 121-135.
https://doi.org/10.1109/tpami.2017.2781233
[6]  Valle, R., Buenaposada, J.M. and Baumela, L. (2021) Multi-Task Head Pose Estimation in-the-Wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 2874-2881.
https://doi.org/10.1109/tpami.2020.3046323
[7]  Ruiz, N., Chong, E. and Rehg, J.M. (2018) Fine-Grained Head Pose Estimation without Keypoints. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, 18-22 June 2018, 2074-2083.
https://doi.org/10.1109/cvprw.2018.00281
[8]  Yang, T., Chen, Y., Lin, Y. and Chuang, Y. (2019) FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose Estimation from a Single Image. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 15-20 June 2019, 1087-1096.
https://doi.org/10.1109/cvpr.2019.00118
[9]  Zhang, H., Wang, M., Liu, Y. and Yuan, Y. (2020) FDN: Feature Decoupling Network for Head Pose Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 12789-12796.
https://doi.org/10.1609/aaai.v34i07.6974
[10]  Hsu, H., Wu, T., Wan, S., Wong, W.H. and Lee, C. (2019) QuatNet: Quaternion-Based Head Pose Estimation with Multiregression Loss. IEEE Transactions on Multimedia, 21, 1035-1046.
https://doi.org/10.1109/tmm.2018.2866770
[11]  Liu, H., Fang, S., Zhang, Z., Li, D., Lin, K. and Wang, J. (2022) MFDNet: Collaborative Poses Perception and Matrix Fisher Distribution for Head Pose Estimation. IEEE Transactions on Multimedia, 24, 2449-2460.
https://doi.org/10.1109/tmm.2021.3081873
[12]  Pan, L., Ai, S., Ren, Y. and Xu, Z. (2020) Self-Paced Deep Regression Forests with Consideration on Underrepresented Examples. Computer Vision—ECCV 2020, Glasgow, 23-28 August 2020, 271-287.
https://doi.org/10.1007/978-3-030-58577-8_17
[13]  Bisogni, C., Nappi, M., Pero, C. and Ricciardi, S. (2021) FASHE: A Fractal Based Strategy for Head Pose Estimation. IEEE Transactions on Image Processing, 30, 3192-3203.
https://doi.org/10.1109/tip.2021.3059409
[14]  Albiero, V., Chen, X., Yin, X., Pang, G. and Hassner, T. (2021) Img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 20-25 June 2021, 7613-7623.
https://doi.org/10.1109/cvpr46437.2021.00753
[15]  Zhang, C., Liu, H., Deng, Y., Xie, B. and Li, Y. (2023) TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, 17-24 June 2023, 8897-8906.
https://doi.org/10.1109/cvpr52729.2023.00859
[16]  Zhu, X., Lei, Z., Liu, X., Shi, H. and Li, S.Z. (2016) Face Alignment across Large Poses: A 3D Solution. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 146-155.
https://doi.org/10.1109/cvpr.2016.23
[17]  Fanelli, G., Dantone, M., Gall, J., Fossati, A. and Van Gool, L. (2012) Random Forests for Real Time 3D Face Analysis. International Journal of Computer Vision, 101, 437-458.
https://doi.org/10.1007/s11263-012-0549-0

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133