OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

- 2015

Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs

DOI: 10.1109/TST.2015.7350017

Daming Zhang,Yongpan Liu,Shuangchen Li,Tongda Wu,Huazhong Yang

Keywords: accelerator parallelization,point-to-point interconnect insertion,bus-based embedded system-on-chips

Full-Text Cite this paper Add to My Lib

Abstract:

As performance requirements for bus-based embedded System-on-Chips (SoCs) increase, more and more on-chip application-specific hardware accelerators (e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are being integrated into their designs. These accelerators require system-level tradeoffs among performance, area, and scalability. Accelerator parallelization and Point-to-Point (P2P) interconnect insertion are two effective system-level adjustments. The former helps to boost the computing performance at the cost of area, while the latter provides higher bandwidth at the cost of routability. What's more, they interact with each other. This paper proposes a design flow to optimize accelerator parallelization and P2P interconnect insertion simultaneously. To explore the huge optimization space, we develop an effective algorithm, whose goal is to reduce total SoC latency under the constraints of SoC area and total P2P wire length. Experimental results show that the performance difference between our proposed algorithm and the optimal results is only 2.33% on average, while the running time of the algorithm is less than 17 s.

References

[1]	Hagiescu A., Wong W. F., Bacon D. F., Rabbah R., A computing origami: Folding streams in fpgas, in Design Automation Conference (DAC), 2009, pp. 282–287.
[2]	Bertozzi D., Jalabert A., Murali S., Tamhankar R., Stergiou S., Benini L., Micheli G. D., Noc synthesis flow for customized domain specific multiprocessor systems-on-chip, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 16, no. 2, pp. 113–129, 2005.
[3]	Pham-Quoc C., Heisswolf J., Werner S., Al-Ars Z., Becker J., Bertels K., Hybrid interconnect design for heterogeneous hardware accelerators, in Design, Automation and Test in Europe Conference and Exhibition (DATE), 2013, pp. 843–846.
[4]	Belhadj N., Bahri N., Ayed M. B., Marrakchi Z., Mehrez H., Data level parallelism for h264/avc baseline intra-prediction chain on mpsoc, in Multi-Conference on Systems, Signals and Devices (SSD), 2013, pp. 1–4.
[5]	Li S., Liu Y., Hu X., He X., Zhang Y., Zhang P., Yang H., Optimal partition with block-level parallelization in c-to-rtl synthesis for streaming applications, in Asia and South Pacific Design Automation Conference (ASP-DAC), 2013, pp. 225–230.
[6]	Lee H. G., Ogras U. Y., Marculescu R., Chang N., Design space exploration and prototyping for on-chip multimedia applications, in Design Automation Conference (DAC), 2006, pp. 137–142.
[7]	Rose B., Samsung's 8-core exynos 5 octa processor: Your next phone will be fast, , 2013.
[8]	Hauser P., Olivier H., Connected device platform, Patent US20130303087A1, Nov. 14, 2013.
[9]	Lee H. G., Chang N., Ogras U. Y., Marculescu R., On-chip communication architecture exploration: A quantitative evaluation of point-to-point, bus, and network-on-chip approaches, ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 12, no. 3, 2007.
[10]	Tan S., Qiao F., Xia B., Yang H., Wang H., A functional model of systemc-based mpeg-2 decoder with heterogeneous multi-ip-cores and hybrid-interconnections architecture, in International Congress on Image and Signal Processing (CISP), 2009, pp. 1–5.
[11]	MIT, 48 half-hour excerpts of two-channel ambulatory ecg recordings, , 2013.
[12]	Ma P., Liu P., Li K., Zou Y., An A., Wang Y., Hao Y., A parallel low latency bus on chip for packet processing mpsoc, in International Conference on Solid-State and Integrated Circuit Technology (ICSICT), 2010, pp. 545–547.
[13]	Ahmedy S., Wangy Z., Klaibery M., Ahl S., Roblewskiy M., Simon S., Parallel hardware architecture for jpeg-ls based on domain decomposition, Proc. SPIE, Applications of Digital Image Processing, vol. 8499, no. 14, pp. 1–11, 2012.
[14]	Sridhara S. R., DiRenzo M., Lingam S., Lee S. J., Blzquez R., Maxey J., Ghanem S., Lee Y. H., Abdallah R., Singh P.et al, Microwatt processor platform for medical system-on-chip applications, IEEE Journal of Solid-State Circuits (JSSC), vol. 46, no. 4, pp. 721–730, 2011.
[15]	Kwong J., Chandrakasan A. P., An energy-efficient biomedical signal processing platform, IEEE Journal of Solid-State Circuits (JSSC), vol. 46, no. 7, pp. 1742–1753, 2011.
[16]	Zhang F., Zhang Y., Silver J., Shakhsheer Y., Nagaraju M., Klinefelter A., Pandey J. N., Boley J., Carlson E. J., Shrivastava A.et al, A batteryless 19w mics/ism-band energy harvesting body area sensor node soc, in IEEE International Solid-state Circuits Conference (ISSCC), 2012, pp. 298–300.
[17]	Goulding-Hotta N., Sampson J., Zheng Q., Bhatt V., Auricchio J., Swanson S., Taylor M. B., Greendroid: An architecture for the dark silicon age, in Asia and South Pacific Design Automation Conference (ASP-DAC), 2012, pp. 100–105.
[18]	Corvino R., Diken E., Gamatie A., Jozwiak L., Transformation-based exploration of data parallel architecture for customizable hardware: A jpeg encoder case study, in Euromicro Conference on Digital System Design (DSD), 2012, pp. 774–781.
[19]	Haris J., Sri P., Synthesis of heterogeneous pipelined multiprocessor systems using ilp: Jpeg case study, in International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), 2008, pp. 1–6.
[20]	Zuo W., Liang Y., Li P., Rupnow K., Chen D., Cong J., Improving high level synthesis optimization opportunity through polyhedral transformations, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013, pp. 92–97.
[21]	Vainbrand D., Ginosar R., Network-on-chip architectures for neural networks, in International Symposium on Networks-on-chip (NOCS), 2007, pp. 135–144.
[22]	Gladigau J., Gerstlauer A., Haubelt C., Streubhr M., Teich J., A system-level synthesis approach from formal application models to generic bus-based mpsocs, in International Conference on Embedded Computer Systems (SAMOS), 2010, pp. 118–125.
[23]	Hempstead M., Wei G. Y., Brooks D., An accelerator-based wireless sensor network processor in 130 nm cmos, IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), vol. 1, no. 2, pp. 193–202, 2011.
[24]	Zahir R., Ewert M., Seshadri H., The medfield smartphone: Intel architecture in a handheld form factor, IEEE Micro, vol. 33, no. 6, pp. 38–46, 2013.
[25]	Bassam R., Toni M., Home automation system: A cheap and open-source alternative to control household appliances, , 2013.
[26]	Pasricha S., Dutt N., Ben-Romdhane M., Constraint-driven bus matrix synthesis for mpsoc, in Asia and South Pacific Design Automation Conference (ASP-DAC), 2006, pp. 30–35.
[27]	Vainbrand D., Ginosar R., Network-on-chip architectures for neural networks, in Symposium on Networks-on-Chip (NOCS), 2010, pp. 135–144.
[28]	Zhu W., Liu L., Yin S., Dong Y., Wei S., Tang E. Y., Song J., Peng J., A 65 nm uneven-dual-core soc based platform for multi-device collaborative computing, in International Symposium on Circuits and Systems (ISCAS), 2014, pp. 2527–2530.
[29]	Wei Y., Sze C., Viswanathan N., Li Z., Alpert C. J., Reddy L., Huber A. D., Tellez G.E., Keller D., Sapatnekar S. S., Glare: Global and local wiring aware routability evaluation, in Design Automation Conference (DAC), 2012, pp. 768–773.
[30]	Zhang Y., Image Engineering (I) Image Processing (2nd ed), Beijing, China: Tsinghua University Press, 2009.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133