With the development
of computer vision researches, due to the state-of-the-art performance on image
and video processing tasks, deep neural network (DNN) has been widely applied
in various applications (autonomous vehicles, weather forecasting,
counter-terrorism, surveillance, traffic management, etc.). However, to achieve
such performance, DNN models have become increasingly complicated and deeper,
and result in heavy computational stress. Thus, it is not sufficient for the
general central processing unit (CPU) processors to meet the real-time application
requirements. To deal with this bottleneck, research based on hardware acceleration
solution for DNN attracts great attention. Specifically, to meet various
real-life applications, DNN acceleration solutions mainly focus on issue of
hardware acceleration with intense memory and calculation resource. In this
paper, a novel resource-saving architecture based on Field Programmable Gate
Array (FPGA) is proposed. Due to the novel designed processing element (PE),
the proposed architecture achieves good
performance with the extremely limited calculating resource. The on-chip buffer
allocation helps enhance resource-saving performance on memory. Moreover, the
accelerator improves its performance by exploitingthe sparsity property of the input feature map.
Compared to other state-of-the-art solutions based on FPGA, our
architecture achieves good performance, with quite limited resource
consumption, thus fully meet the requirement of real-time applications.
References
[1]
Jiang, X., et al. (2020) Attention Scaling for Crowd Counting. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13-19 June 2020, 4705-4714. https://doi.org/10.1109/CVPR42600.2020.00476
[2]
He, K., Gkioxari, G., Dollár, P. and Girshick, R. (2020) Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 386-397.
https://doi.org/10.1109/TPAMI.2018.2844175
[3]
Zhang, Y., Zhou, D., Chen, S., Gao, S. and Ma, Y. (2016) Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27-30 June 2016, 589-597. https://doi.org/10.1109/CVPR.2016.70
[4]
Chen, Y., Yang, T., Emer, J. and Sze, V. (2019) Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9, 292-308.
https://doi.org/10.1109/JETCAS.2019.2910232
[5]
Jouppi, N.P., Young, C., et al. (2017) In-Datacenter Performance Analysis of a Tensor Processing Unit. Proceeding of 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 24-28 June 2017, 12 p. https://doi.org/10.1145/3079856.3080246
[6]
Shin, D., Lee, J., Lee, J., Lee, J. and Yoo, H.-J. (2018) DNPU: An Energy-Efficient Deep-Learning Processor with Heterogeneous Multi-Core Architecture. IEEE Micro, 38, 85-93. https://doi.org/10.1109/MM.2018.053631145
[7]
Zhang, C., et al. (2015) Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA, February 2015, 161-170. https://doi.org/10.1145/2684746.2689060
[8]
Qiu, J., Wang, J., Yao, S., Guo, K.Y., et al. (2016) Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. Proceedings of the 2016 ACM/ SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA, February 2016, 26-35. https://doi.org/10.1145/2847263.2847265
[9]
Meloni, P., Capotondi, A., et al. (2018) NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs. ACM Transactions on Reconfigurable Technology and Systems, 11, 24 p.
https://doi.org/10.1145/3284357
[10]
Venieris, S.I. and Bouganis, C. (2018) fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs. IEEE Transactions on Neural Networks and Learning Systems, 30, 326-342. https://doi.org/10.1109/TNNLS.2018.2844093
[11]
Qu, X., Huang, Z.H., Mao, N., Xu, Y. Cai, G. and Fang, Z. (2019) A Grain-Adaptive Computing Structure for FPGA CNN Acceleration. IEEE 13th International Conference on ASIC, Chongqing, China, 2019, 1-4.
https://doi.org/10.1109/ASICON47005.2019.8983480
[12]
Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556. https://arxiv.org/abs/1409.1556v6
[13]
Howard, A.G., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861. https://arxiv.org/abs/1704.04861
[14]
He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27-30 June 2016, 770-778.
https://doi.org/10.1109/CVPR.2016.90