This paper presents the hardware architecture and the software abstraction layer of an adaptive multiclient Network-on-Chip (NoC) memory core. The memory core supports the flexibility of a heterogeneous FPGA-based runtime adaptive multiprocessor system called RAMPSoC. The processing elements, also called clients, can access the memory core via the Network-on-Chip (NoC). The memory core supports a dynamic mapping of an address space for the different clients as well as different data transfer modes, such as variable burst sizes. Therefore, two main limitations of FPGA-based multiprocessor systems, the restricted on-chip memory resources and that usually only one physical channel to an off-chip memory exists, are leveraged. Furthermore, a software abstraction layer is introduced, which hides the complexity of the memory core architecture and which provides an easy to use interface for the application programmer. Finally, the advantages of the novel memory core in terms of performance, flexibility, and user friendliness are shown using a real-world image processing application. 1. Introduction and Motivation Due to the increasing number of available logic blocks on today’s Field Programmable Gate Arrays (FPGAs), complete multiprocessor systems can be realized on FPGA. Compared to traditional application specific integrated circuit (ASIC) solutions these FPGA-based Multiprocessor Systems-on-Chip (MPSoCs) can be realized-with lower costs and a shorter (re)design cycle, due to the flexible hardware architecture of the FPGA, which can be adapted to the needs of the application. However, the major limitations of these FPGA-based MPSoCs are the limited on-chip memory resources as well as the limited physical connection to an off-chip memory. A possible solution would be to connect each processing element to its own external memory. However, this would result in a very specific board design and reduce the flexibility of such an FPGA-based solution. Moreover, due to different application scenarios, the memory requirements of a processor can vary at design and runtime. This is in particular the case, if runtime adaptive MPSoCs, such as RAMPSoC [1], are considered, which support the modification of the MPSoC hardware architecture (number and type of processing elements, communication infrastructure, etc.) as well as the runtime adaptation of the software. To resolve the memory bottleneck for FPGA-based MPSoCs, an adaptive multiclient Network-on-Chip (NoC) memory core has been developed [2]. This intelligent memory core can support between 1 and 16 processing cores,
References
[1]
D. G?hringer, Flexible design and dynamic utilization of adaptive scalable multi-core systems [Ph.D. thesis], Dr. Hut München, 2011.
[2]
D. G?hringer, L. Meder, M. Hübner, and J. Becker, “Adaptive multi-client network-on-chip memory,” in Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig '11), pp. 7–12, Cancun, Mexico, 2011.
[3]
MPI, “A message-passing interface standard,” Version 2.2, Message Passing Interface Forum, September 2009, http://www.mpi-forum.org/.
[4]
Xilinx, “LogiCORE IP Processor Local Bus (PLB) v4.6 (v1.05a),” DS531, September 2010, http://www.xilinx.com/.
[5]
Xilinx, “LogiCORE IP Multi-Port Memory Controller (MPMC) (v6.01.a),” DS643, July 2010, http://www.xilinx.com/.
[6]
N. Voros, A. Rosti, and M. Hübner, Dynamic System Reconfiguration in Heterogeneous Platforms: The MORPHEUS Approach, Springer, 2009.
[7]
B. B. Fraguela, J. Renau, P. Feautrier, D. Padua, and J. Torrellas, “Programming the FlexRAM parallel intelligent memory system,” in Proceedings of the 9th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’03), pp. 49–60, New York, NY, USA, June 2003.
[8]
R. Buchty, O. Mattes, and W. Karl, “Self-aware memory: managing distributed memory in an autonomous multi-master environment,” in Proceedings of the 21st International Conference on Architecture of Computing Systems (ARCS '08), pp. 98–116, Dresden, Germany, February 2008.
[9]
Z. Dai and J. Zhu, “A bursty multi-port memory controller with quality-of-service guarantees,” in Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS 111), pp. 21–28, Taipei, Taiwan, October 2011.
[10]
W. Kritikos, A. Schmidt, R. Sass, E. K. Anderson, and M. French, “Redsharc: a programming model and on-chip network for multi-core systems on a programmable chip,” International Journal of Reconfigurable Computing, vol. 2012, Article ID 872610, 11 pages, 2012.
[11]
K. Goossens, J. Dielissen, and A. Rǎdulescu, “?thereal network on chip: concepts, architectures, and implementations,” IEEE Design and Test of Computers, vol. 22, no. 5, pp. 414–421, 2005.
[12]
F. Lemonnier, P. Millet, G. M. Almeida, et al., “Towards future adaptive multiprocessor systems-on-chip: an innovative approach for flexible architectures,” in Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XII), Samos, Greece, July 2012.
[13]
Xilinx, “Fast Simplex Link (FSL) Bus (v2.11a),” DS449, June 2007, http://www.xilinx.com/.
[14]
S. Werner, O. Oey, D. G?hringer, M. Hübner, and J. Becker, “Virtualized on-chip distributed computing for heterogeneous reconfigurable multi-core systems,” in Proceedings of the Design, Automation & Test in Europe (DATE '12), pp. 280–283, Dresden, Germany, March 2012.
[15]
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.