The design space of FPGA-based processor systems is huge, because many parameters can be modified at design- and runtime to achieve an efficient system solution in terms of performance, power and energy consumption. Such parameters are, for example, the number of processors and their configurations, the clock frequencies at design time, the use of dynamic frequency scaling at runtime, the application task distribution, and the FPGA type and size. The major contribution of this paper is the exploration of all these parameters and their impact on performance, power dissipation, and energy consumption for four different application scenarios. The goal is to introduce a first approach for a developer's guideline, supporting the choice of an optimized and specific system parameterization for a target application on FPGA-based multiprocessor systems-on-chip. The FPGAs used for these explorations were Xilinx Virtex-4 and Xilinx Virtex-5. The performance results were measured on the FPGA while the power consumption was estimated using the Xilinx XPower Analyzer tool. Finally, a novel runtime adaptive multiprocessor architecture for dynamic clock frequency scaling is introduced and used for the performance, power and energy consumption evaluations. 1. Introduction Parameterizable function blocks used in FPGA-based system development, open a huge design space, which can only hardly be managed by the user. Examples for this are arithmetic blocks like divider, adder, and soft IP-multiplier, which are adjustable in terms of bit width and parallelism. Additional to arithmetic blocks, soft-IP processor cores provide a variety of parameters, which can be adapted to the requirements of the application to be realized with the system. Especially, Xilinx offers, with the MicroBlaze Soft-IP 32-bit RISC processor [1], a variety of options for characterizing the core individually. These options are, amongst others, the use and size of cache memory, the arithmetic unit, a memory management unit, and the number of pipeline stages. Furthermore, the tools offer to deploy semiautomatically up to two processor cores as multiprocessor on one FPGA. Certainly more cores are available for the system design by performing the custom tool chain. Every option as described above can be adjusted to find an optimized parameterization of the single processor core in relation to the target application. For example, a specific cache size can speed up the application tremendously, but also the optimal partition of functions onto the two cores has a strong impact on the speed and power consumption
References
[1]
“Xilinx MicroBlaze Reference Guide,” UG081 (v7.0), September 2006, http://www.xilinx.com/.
[2]
D. Meintanis and I. Papaefstathiou, “Power consumption estimations vs measurements for FPGA-based security cores,” in Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig '08), pp. 433–437, Cancun, Mexico, December 2008.
[3]
J. Becker, M. Huebner, and M. Ullmann, “Power estimation and power measurement of Xilinx virtex FPGAs: trade-offs and limitations,” in Proceedings of the 16th Symposium on Integrated Circuits and Systems Design (SBCCI '03), Sao Paulo, Brazil, September 2003.
[4]
K. Poon, A. Yan, and S. J. E. Wilton, “A flexible power model for FPGAs,” in Proceedings of the 12th International Conference on Field-Programmable Logic and Applications (FPL '02), September 2002.
[5]
F. N. Najm, “Transition density: a new measure of activity in digital circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 2, pp. 310–323, 1993.
[6]
K. Weiss, C. Oetker, I. Katchan, T. Steckstor, and W. Rosenstiel, “Power estimation approach for SRAM-based FPGAs,” in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '00), pp. 195–202, Monterey, Calif, USA, February 2000.
[7]
V. Degalahal and T. Tuan, “Methodology for high level estimation of FPGA power consumption,” in Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC '05), Shanghai, China, January 2005.
[8]
“Xilinx Power Estimator User Guide,” UG440 (v3.0), June 2009, http://www.xilinx.com/.
[9]
“Development System Reference Guide,” v9.2i, Chapter 10 XPower, http://www.xilinx.com/.
[10]
“Embedded System Tools Reference Manual,” Embedded Development Kit, EDK 9.2i, UG111 (v9.2i), Chapter 3, September 2007, http://www.xilinx.com/.
[11]
“Fast Simplex Link (FSL) Bus (v2.00a),” DS449 December 2005, http://www.xilinx.com/.
[12]
D. G?hringer, J. Obie, M. Hübner, and J. Becker, “Impact of task distribution, processor configurations and dynamic clock frequency scaling on the power consumption of FPGA-based multiprocessors,” in Proceedings of the 5th International Workshop on Reconfigurable Communication Centric Systems-on-Chip (ReCoSoC '10), Karlsruhe, Germany, May 2010.
[13]
“Virtex-4 FPGA Configuration User Guide,” UG071 (v1.11), June 2009, http://www.xilinx.com/.
[14]
“Virtex-4 FPGA User Guide,” UG070 (v2.6), December 2008, http://www.xilinx.com/.
[15]
C. A. R. Hoare, “Quicksort,” Computer Journal, vol. 5, no. 1, pp. 10–15, 1962.
[16]
A. Boukerche, J. M. Correa, A. C. M. Melo, and R. P. Jacobi, “A hardware accelerator for the fast retrieval of DIALIGN biological sequence alignments in linear space,” IEEE Transactions on Computers, vol. 59, no. 6, pp. 808–821, 2010.
[17]
R. Palacios and A. Gupta, “A system for processing handwritten bank checks automatically,” Image and Vision Computing, vol. 26, no. 10, pp. 1297–1313, 2008.