全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

An FPGA-Based Resource-Saving Hardware Accelerator for Deep Neural Network

DOI: 10.4236/ijis.2021.112005, PP. 57-69

Keywords: Deep Neural Network, Resource-Saving, Hardware Accelerator, Data Flow

Full-Text   Cite this paper   Add to My Lib

Abstract:

With the development of computer vision researches, due to the state-of-the-art performance on image and video processing tasks, deep neural network (DNN) has been widely applied in various applications (autonomous vehicles, weather forecasting, counter-terrorism, surveillance, traffic management, etc.). However, to achieve such performance, DNN models have become increasingly complicated and deeper, and result in heavy computational stress. Thus, it is not sufficient for the general central processing unit (CPU) processors to meet the real-time application requirements. To deal with this bottleneck, research based on hardware acceleration solution for DNN attracts great attention. Specifically, to meet various real-life applications, DNN acceleration solutions mainly focus on issue of hardware acceleration with intense memory and calculation resource. In this paper, a novel resource-saving architecture based on Field Programmable Gate Array (FPGA) is proposed. Due to the novel designed processing element (PE), the proposed architecture achieves good performance with the extremely limited calculating resource. The on-chip buffer allocation helps enhance resource-saving performance on memory. Moreover, the accelerator improves its performance by exploiting the sparsity property of the input feature map. Compared to other state-of-the-art solutions based on FPGA, our architecture achieves good performance, with quite limited resource consumption, thus fully meet the requirement of real-time applications.

References

[1]  Jiang, X., et al. (2020) Attention Scaling for Crowd Counting. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13-19 June 2020, 4705-4714.
https://doi.org/10.1109/CVPR42600.2020.00476
[2]  He, K., Gkioxari, G., Dollár, P. and Girshick, R. (2020) Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 386-397.
https://doi.org/10.1109/TPAMI.2018.2844175
[3]  Zhang, Y., Zhou, D., Chen, S., Gao, S. and Ma, Y. (2016) Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27-30 June 2016, 589-597.
https://doi.org/10.1109/CVPR.2016.70
[4]  Chen, Y., Yang, T., Emer, J. and Sze, V. (2019) Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9, 292-308.
https://doi.org/10.1109/JETCAS.2019.2910232
[5]  Jouppi, N.P., Young, C., et al. (2017) In-Datacenter Performance Analysis of a Tensor Processing Unit. Proceeding of 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 24-28 June 2017, 12 p.
https://doi.org/10.1145/3079856.3080246
[6]  Shin, D., Lee, J., Lee, J., Lee, J. and Yoo, H.-J. (2018) DNPU: An Energy-Efficient Deep-Learning Processor with Heterogeneous Multi-Core Architecture. IEEE Micro, 38, 85-93.
https://doi.org/10.1109/MM.2018.053631145
[7]  Zhang, C., et al. (2015) Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA, February 2015, 161-170.
https://doi.org/10.1145/2684746.2689060
[8]  Qiu, J., Wang, J., Yao, S., Guo, K.Y., et al. (2016) Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. Proceedings of the 2016 ACM/ SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA, February 2016, 26-35.
https://doi.org/10.1145/2847263.2847265
[9]  Meloni, P., Capotondi, A., et al. (2018) NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs. ACM Transactions on Reconfigurable Technology and Systems, 11, 24 p.
https://doi.org/10.1145/3284357
[10]  Venieris, S.I. and Bouganis, C. (2018) fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs. IEEE Transactions on Neural Networks and Learning Systems, 30, 326-342.
https://doi.org/10.1109/TNNLS.2018.2844093
[11]  Qu, X., Huang, Z.H., Mao, N., Xu, Y. Cai, G. and Fang, Z. (2019) A Grain-Adaptive Computing Structure for FPGA CNN Acceleration. IEEE 13th International Conference on ASIC, Chongqing, China, 2019, 1-4.
https://doi.org/10.1109/ASICON47005.2019.8983480
[12]  Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556.
https://arxiv.org/abs/1409.1556v6
[13]  Howard, A.G., et al. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861.
https://arxiv.org/abs/1704.04861
[14]  He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27-30 June 2016, 770-778.
https://doi.org/10.1109/CVPR.2016.90

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133