OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

International Journal of Reconfigurable Computing 2012

Redsharc: A Programming Model and On-Chip Network for Multi-Core Systems on a Programmable Chip

DOI: 10.1155/2012/872610

William V. Kritikos,Andrew G. Schmidt,Ron Sass,Erik K. Anderson,Matthew French

Full-Text Cite this paper Add to My Lib

Abstract:

The reconfigurable data-stream hardware software architecture (Redsharc) is a programming model and network-on-a-chip solution designed to scale to meet the performance needs of multi-core Systems on a programmable chip (MCSoPC). Redsharc uses an abstract API that allows programmers to develop systems of simultaneously executing kernels, in software and/or hardware, that communicate over a seamless interface. Redsharc incorporates two on-chip networks that directly implement the API to support high-performance systems with numerous hardware kernels. This paper documents the API, describes the common infrastructure, and quantifies the performance of a complete implementation. Furthermore, the overhead, in terms of resource utilization, is reported along with the ability to integrate hard and soft processor cores with purely hardware kernels being demonstrated. 1. Introduction Since the resources found on FPGA devices continue to track Moore’s Law, modern, high-end chips provide hundreds of millions of equivalent transistors in the form of reconfigurable logic, memory, multipliers, processors, and a litany of increasingly sophisticated hard IP cores. As a result, engineers are turning to multi-core systems on a programmable chip (MCSoPC) solutions to leverage these FPGA resources. MCSoPC allow system designers to mix hard processors, soft processors, third party IP, or custom hardware cores all within a single FPGA. In this work, we are only considering multi-core systems with a single processor core, multiple third party IP cores, and multiple custom hardware cores. A major challenge of MCSoPC is how to achieve intercore communication without sacrificing performance. This problem is compounded by the realization that cores may use different computational and communication models; threads running on a processor communicate much differently than cores running within the FPGA fabric. Furthermore, standard on-chip interconnects for FPGAs do not scale well and cannot be optimized for specific programming models; contention on a bus can quickly limit performance. To address these issues, this paper investigates Redsharc—an API and common infrastructure for realizing MCSoPC designs. Redsharc’s contribution has two parts. First, introduction of an abstract programming model and API that specifically targets MCSoPC is presented. An abstract API, as described by Jerraya and Wolf in [1], allows cores to exchange data without knowing how the opposite core is implemented. In a Redsharc system, computational units, known as kernels, are implemented as either software

References

[1]	A. Jerraya and W. Wolf, “Hardware/software interface codesign for embedded systems,” Computer, vol. 38, no. 2, pp. 63–69, 2005.
[2]	M. Jones, L. Scharf, J. Scott et al., “Implementing an API for distributed adaptive computing systems,” in Proceedings of the 7th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCMM '99), pp. 222–230, April 1999.
[3]	R. Laufer, R. R. Taylor, and H. Schmit, “PCI-pipeRench and the swordAPI: a system for stream-based reconfigurable computing,” in Proceedings of the 7th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCMM '99), pp. 200–208, April 1999.
[4]	E. Lubbers and M. Platzner, “Reconos: an rtos supporting hard-and software threads,” in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL '07), pp. 441–446, August 2007.
[5]	D. Andrews, R. Sass, E. Anderson et al., “Achieving programming model abstractions for reconfigurable computing,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 1, pp. 34–44, 2008.
[6]	A. Patel, C. A. Madill, M. Salda？a, C. Comis, R. Pomès, and P. Chow, “A scalable FPGA-based multiprocessor,” in Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '06), pp. 111–120, April 2006.
[7]	P. Mahr, C. L？rchner, H. Ishebabi, and C. Bobda, “SoC-MPI: a flexible message passing library for multiprocessor systems-on-chips,” in Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig '08), pp. 187–192, 2008.
[8]	M. Gokhale, J. Stone, J. Arnold, and M. Kalinowski, “Streamoriented fpga computing in the streams-c high level language,” in Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '00), pp. 49–56, April 2000.
[9]	D. Unnikrishnan, J. Zhao, and R. Tessier, “Application-specific customization and scalability of soft multiprocessors,” in Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM '09), pp. 123–130, April 2009.
[10]	J. Liang, A. Laffely, S. Srinivasan, and R. Tessier, “An architecture and compiler for scalable on-chip communication,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 7, pp. 711–726, 2004.
[11]	L. Shannon and P. Chow, “Simplifying the integration of processing elements in computing systems using a programmable controller,” in Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '05), vol. 2005, pp. 63–72, 2005.
[12]	P. Mattison and W. Thies, “Streaming virtual machine specification, version 1.2,” Tech. Rep., January 2007.
[13]	M. B. Taylor, J. Kim, J. Miller et al., “The raw microprocessor: a computational fabric for software circuits and general-purpose programs,” IEEE Micro, vol. 22, no. 2, pp. 25–35, 2002.
[14]	K. Sankaralingam, R. Nagarajan, H. Liu et al., “Exploiting ILP, TLP, and DLP with the polymorphous trips architecture,” IEEE Micro, vol. 23, no. 6, pp. 46–10, 2003.
[15]	R. Rettberg, W. Crowther, P. Carvey, and R. Tomlinson, “The monarch parallel processor hardware design,” Computer, vol. 23, no. 4, pp. 18–28, 1990.
[16]	128-Bit Processor Local Bus Architecture Specifications, IBM, Version 4.7 edition.
[17]	Xilinx, http://www.xilinx.com/products/design resources/conn central/locallink member/sp006.pdf.
[18]	A. G. Schmidt and R. Sass, “Characterizing effective memory bandwidth of designs with concurrent high-performance computing cores,” in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL '07), pp. 601–604, August 2007.
[19]	S. Datta, P. Beeraka, and R. Sass, “RC-BLASTn: implementation and evaluation of the BLASTn Scan function,” in IEEE Symposium on Field Programmable Custom Computing Machines (FCCM '09), pp. 88–95, April 2009.
[20]	S. Datta and R. Sass, “Scalability studies of the BLASTn scan and ungapped extension functions,” in Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig '09), pp. 131–136, December 2009.
[21]	A. G. Schmidt, S. Datta, A. A. Mendon, and R. Sass, “Investigation into scaling i/o bound streaming applications productively with an all-FPGA cluster,” International Journal on Parallel Computing. In press.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133