|
基于深层特征高效提取的脉冲神经网络目标检测方法
|
Abstract:
由于出色类脑机制和自身节能性,脉冲神经网络(SNN)受到广泛研究,且在目标分类任务中取得显著进展,但其在目标检测领域的研究尚处于初步阶段。针对现有SNN目标检测算法在复杂场景下检测精度低的问题,本文构建了一种深层特征高效、轻量提取的脉冲神经网络目标检测架构——ES-YOLO。该架构提出了高效特征提取模块(SDF-Module),并结合多尺度特征提取的空间金字塔设计,显著提升了模型在复杂目标检测任务中的性能。此外,通过引入脉冲解耦检测头,进一步优化了模型的检测精度和实时性。实验结果表明,在VOC2012数据集上,ES-YOLO模型获得了60.5%的mAP@0.5和37.1%的mAP@0.5:0.95的性能指标,相比EMS-YOLO分别提升了4%和3.7%。该模型不仅缩小了与同等架构ANN模型的性能差距,而且整体能耗为同等架构ANN的1/5。为后续SNN在目标检测任务中的广泛应用提供支持。
Due to the outstanding brain-inspired mechanisms and energy efficiency, Spiking Neural Networks (SNN) have garnered widespread attention and achieved significant progress in object classification tasks. However, research on SNNs in the field of object detection remains in its early stages. To address the issue of low detection accuracy of existing SNN-based object detection algorithms in complex scenarios, this paper proposes a novel, deep-feature-efficient and lightweight SNN-based object detection architecture—ES-YOLO. The architecture introduces an efficient feature extraction module (SDF-Module) and incorporates a spatial pyramid design for multi-scale feature extraction, significantly improving the model’s performance in complex object detection tasks. Furthermore, by integrating a spiking decoupled detection head, the model’s accuracy and real-time performance are further optimized. Experimental results on the VOC2012 dataset demonstrate that the ES-YOLO model achieves 60.5% mAP@0.5 and 37.1% mAP@0.5:0.95, representing improvements of 4% and 3.7% respectively compared to EMS-YOLO. The model not only reduces the performance gap between SNNs and equivalent ANN models but also achieves overall energy consumption that is only 1/5 that of the equivalent ANN architecture. This work provides support for the broader application of SNNs in object detection tasks in the future.
[1] | Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., et al. (2016) SSD: Single Shot Multibox Detector. In: Lecture Notes in Computer Science, Springer, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2 |
[2] | Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 779-788. https://doi.org/10.1109/cvpr.2016.91 |
[3] | Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 6517-6525. https://doi.org/10.1109/cvpr.2017.690 |
[4] | Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., et al. (2022) Swin Transformer V2: Scaling up Capacity and Resolution. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, 18-24 June 2022, 11999-12009. https://doi.org/10.1109/cvpr52688.2022.01170 |
[5] | Furber, S.B., Galluppi, F., Temple, S. and Plana, L.A. (2014) The Spinnaker Project. Proceedings of the IEEE, 102, 652-665. https://doi.org/10.1109/jproc.2014.2304638 |
[6] | Benjamin, B.V., Gao, P., McQuinn, E., Choudhary, S., Chandrasekaran, A.R., Bussat, J., et al. (2014) Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations. Proceedings of the IEEE, 102, 699-716. https://doi.org/10.1109/jproc.2014.2313565 |
[7] | Shen, J., Ma, D., Gu, Z., Zhang, M., Zhu, X., Xu, X., et al. (2015) Darwin: A Neuromorphic Hardware Co-Processor Based on Spiking Neural Networks. Science China Information Sciences, 59, 1-5. https://doi.org/10.1007/s11432-015-5511-7 |
[8] | Maass, W. (1997) Networks of Spiking Neurons: The Third Generation of Neural Network Models. Neural Networks, 10, 1659-1671. https://doi.org/10.1016/s0893-6080(97)00011-7 |
[9] | Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning Representations by Back-Propagating Errors. Nature, 323, 533-536. https://doi.org/10.1038/323533a0 |
[10] | Kim, S., Park, S., Na, B. and Yoon, S. (2020) Spiking-Yolo: Spiking Neural Network for Energy-Efficient Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 11270-11277. https://doi.org/10.1609/aaai.v34i07.6787 |
[11] | Li, Y., He, X., Dong, Y.T., Kong, Q.Q. and Zeng, Y. (2022) Spike Calibration: Fast and Accurate Conversion of Spiking Neural Network for Object Detection and Segmentation. |
[12] | Hu, Y.F., Deng, L., Wu, Y.J., Yao, M. and Li, G.Q. (2021) Advancing Spiking Neural Networks towards Deep Residual Learning. |
[13] | Fang, W., Yu, Z.F., Chen, Y.Q., Huang, T.J., et al. (2021) Deep Residual Learning in Spiking Neural Networks. Advances in Neural Information Processing Systems, 34, 21056-21069. |
[14] | Su, Q., Chou, Y., Hu, Y., Li, J., Mei, S., Zhang, Z., et al. (2023) Deep Directly-Trained Spiking Neural Networks for Object Detection. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, 1-6 October 2023, 6532-6542. https://doi.org/10.1109/iccv51070.2023.00603 |
[15] | Yao, M. (2024) Spike-Driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips. |
[16] | Dayan, P. and Abbott, L. (2001) Computational Neuroscience: Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. MIT Press, 162-166.f |
[17] | Brunel, N. and Latham, P.E. (2003) Firing Rate of the Noisy Quadratic Integrate-and-Fire Neuron. Neural Computation, 15, 2281-2306. https://doi.org/10.1162/089976603322362365 |
[18] | Fourcaud-Trocmé, N., Hansel, D., van Vreeswijk, C. and Brunel, N. (2003) How Spike Generation Mechanisms Determine the Neuronal Response to Fluctuating Inputs. The Journal of Neuroscience, 23, 11628-11640. https://doi.org/10.1523/jneurosci.23-37-11628.2003 |
[19] | Gerstner, W. and Kistler, W.M. (2002) Spiking Neuron Models. Cambridge University Press. https://doi.org/10.1017/cbo9780511815706 |
[20] | Abbott, L.F. (1999) Lapicque’s Introduction of the Integrate-and-Fire Model Neuron (1907). Brain [Research Bulletin, 50, 303-304. https://doi.org/10.1016/s0361-9230(99)00161-6 |
[21] | Burkitt, A.N. (2006) A Review of the Integrate-And-Fire Neuron Model: I. Homogeneous Synaptic Input. Biological Cybernetics, 95, 1-19. https://doi.org/10.1007/s00422-006-0068-6 |
[22] | Wu, Z.F., Shen, C.H. and Van Den Hengel, A. (2016) Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. Pattern Recognition, 90, 119-133. |
[23] | Chollet, F. (2017) Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 21-26 July 2017, 1800-1807. https://doi.org/10.1109/cvpr.2017.195 |
[24] | Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L. (2018) Mobilenetv2: Inverted Residuals and Linear Bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520. https://doi.org/10.1109/cvpr.2018.00474 |
[25] | Zhou, D., Hou, Q., Chen, Y., Feng, J. and Yan, S. (2020) Rethinking Bottleneck Structure for Efficient Mobile Network Design. In: Lecture Notes in Computer Science, Springer, 680-697. https://doi.org/10.1007/978-3-030-58580-8_40 |
[26] | Wang, A. (2024) YOLOv10: Real-Time End-to-End Object Detection. |
[27] | Zhu, X., Lyu, S., Wang, X. and Zhao, Q. (2021) Tph-Yolov5: Improved Yolov5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, 11-17 October 2021, 2778-2788. https://doi.org/10.1109/iccvw54120.2021.00312 |
[28] | Sengupta, A., Ye, Y., Wang, R., Liu, C. and Roy, K. (2019) Going Deeper in Spiking Neural Networks: VGG and Residual Architectures. Frontiers in Neuroscience, 13, Article 95. https://doi.org/10.3389/fnins.2019.00095 |
[29] | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., et al. (2017) Attention Is All You Need. Advances in Neural Information Processing Systems, 2017, 5998-6008. |
[30] | Ali, M.H. (2023) Advanced Efficient Strategy for Detection of Dark Objects Based on Spiking Network with Multi-Box Detection. |
[31] | Horowitz, M. (2014) Computing’s Energy Problem (and What We Can Do about It). 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, 9-13 February 2014, 10-14. |
[32] | Li, Y., Guo, Y., Zhang, S., et al. (2021) Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks. https://proceedings.neurips.cc/paper/2021/file/c4ca4238a0b923820dcc509a6f75849b-Paper.pdf |
[33] | Kim, Y., Chough, J. and Panda, P. (2022) Beyond Classification: Directly Training Spiking Neural Networks for Semantic Segmentation. Neuromorphic Computing and Engineering, 2, Article 044015. https://doi.org/10.1088/2634-4386/ac9b86 |