全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

HwPMI: An Extensible Performance Monitoring Infrastructure for Improving Hardware Design and Productivity on FPGAs

DOI: 10.1155/2012/162404

Full-Text   Cite this paper   Add to My Lib

Abstract:

Designing hardware cores for FPGAs can quickly become a complicated task, difficult even for experienced engineers. With the addition of more sophisticated development tools and maturing high-level language-to-gates techniques, designs can be rapidly assembled; however, when the design is evaluated on the FPGA, the performance may not be what was expected. Therefore, an engineer may need to augment the design to include performance monitors to better understand the bottlenecks in the system or to aid in the debugging of the design. Unfortunately, identifying what to monitor and adding the infrastructure to retrieve the monitored data can be a challenging and time-consuming task. Our work alleviates this effort. We present the Hardware Performance Monitoring Infrastructure (HwPMI), which includes a collection of software tools and hardware cores that can be used to profile the current design, recommend and insert performance monitors directly into the HDL or netlist, and retrieve the monitored data with minimal invasiveness to the design. Three applications are used to demonstrate and evaluate HwPMI’s capabilities. The results are highly encouraging as the infrastructure adds numerous capabilities while requiring minimal effort by the designer and low resource overhead to the existing design. 1. Introduction As hardware designers develop custom cores and assemble Systems-on-Chip (SoCs) targeting FPGAs, the challenge of the design meeting timing, fitting within the resource constraints, and balancing bandwidth and latency can lead to significant increases in development time. When a design does not meet a specific performance requirement, the designer typically must go back and manually add more custom logic to monitor the behavior of several components in the design. While this performance information can be used to better understand the inner workings of the system, as well as the interfaces between the subcomponents of the system, identifying and inserting infrastructure can quickly become a daunting task. Furthermore, the addition of the monitors may change the original behavior of the system, potentially obfuscating the identified performance bottleneck or design bug. In this work, we focus on an extensible set of tools and hardware cores to enable a hardware designer to insert a minimally invasive performance monitoring infrastructure into an existing design, with little effort. The monitors are used in an introspective capacity, providing feedback about the design’s performance under real workloads, while running on real devices. This paper

References

[1]  “Torc: Tools for Open Reconfigurable Computing,” 2012, http://torc.isi.edu/.
[2]  R. Sass, W. V. Kritikos, A. G. Schmidt et al., “Reconfigurable Computing Cluster (RCC) project: investigating the feasibility of FPGA-based petascale computing,” in Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '07), pp. 127–140, IEEE Computer Society, April 2007.
[3]  D. Burke, J. Wawrzynek, K. Asanovic, et al., “RAMP Blue: implementation of a Manycore 1008 Processor System,” in Proceedings of the Reconfigurable Systems Summer Institute 2008 (RSSI '08), 2008.
[4]  R. Baxter, S. Booth, M. Bull et al., “Maxwell—a 64 FPGA supercomputer,” in Proceedings of the 2nd NASA/ESA Conference on Adaptive Hardware and Systems (AHS '07), pp. 287–294, August 2007.
[5]  P. P. Kuen Hung Tsoi, A. Tse, and W. Luk, “Programming framework for clusters with heterogeneous accelerators,” in International Workshop on Highly-Efficient Accelerators and Reconfigurable Technologies, 2010.
[6]  NSF Center for High Performance Reconfigurable Computing (CHREC), “Novo-g: Adaptively custom research supercomputer,” April 2005.
[7]  Xilinx, Inc., “Xilinx CORE Generator System,” July 2011, http://www.xilinx.com/tools/coregen.htm.
[8]  Xilinx, Inc., Embedded System Tools Reference Manual EDK 10.1, 2010.
[9]  Altera Corporation, System-on-Programmable-Chip (SOPC) Builder User Guide (UG-01096-1.0), 2010.
[10]  Xilinx, Inc., “ChipScope Pro and the Serial I/O Toolkit,” http://www.xilinx.com/tools/cspro.htm.
[11]  Altera Corporation, “Design Debugging Using the SignalTap II Embedded Logic Analyzer,” http://www.altera.com/literature/hb/qts/qts_qii53009.pdf.
[12]  M. Schulz, B. S. White, S. A. McKee, H.-H. S. Lee, and J. Jeitner, “Owl: next generation system monitoring,” in Proceedings of the 2nd Conference on Computing Frontiers, pp. 116–124, ACM, May 2005.
[13]  S. Koehler, J. Curreri, and A. D. George, “Performance analysis challenges and framework for high-performance reconfigurable computing,” Parallel Computing, vol. 34, no. 4-5, pp. 217–230, 2008.
[14]  R. A. Deville, I. A. Troxel, and A. D. George, “Performance monitoring for run-time management of reconfigurable devices,” in Proceedings of the 5th International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA '05), pp. 175–181, June 2005.
[15]  J. M. Lancaster, J. D. Buhler, and R. D. Chamberlain, “Efficient runtime performance monitoring of FPGA-based applications,” in Proceedings of the IEEE International SOC Conference (SOCC '09), pp. 23–28, September 2009.
[16]  J. M. Lancaster and R. D. Chamberlain, “Crossing timezones in the timetrial performance monitor,” in Proceedings of the Symposium on Application Accelerators in High Performance Computing, 2010.
[17]  A. Pellegrini, K. Constantinides, D. Zhang, S. Sudhakar, V. Bertacco, and T. Austin, “Crash test: a fast high-fidelity FPGA-based resiliency analysis framework,” in Proceedings of the 26th IEEE International Conference on Computer Design (ICCD '08), pp. 363–370, October 2008.
[18]  N. Steiner, A. Wood, H. Shojaei, J. Couch, P. Athanas, and M. French, “Torc: towards an open-source tool flow,” in Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '11), pp. 41–44, March 2011.
[19]  V. Betz and J. Rose, “VPR: a new packing, placement and routing tool for FPGA research,” in Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications, W. Luk, P. Y. K. Cheung, and M. Glesner, Eds., vol. 1304 of Lecture Notes in Computer Science, pp. 213–222, Springer, 1997.
[20]  J. Luu, I. Kuon, P. Jamieson et al., “VPR 5.0: FPGA CAD and architecture exploration tools with single-driver routing, heterogeneity and process scaling,” in Proceedings of the 7th ACM SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '09), pp. 133–142, February 2009.
[21]  J. Rose, J. Luu, C. W. Yu et al., “The VTR project: architecture and CAD for FPGAs from verilog to routing,” in Proceedings of the 20th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 77–86, 2012.
[22]  C. Lavin, M. Padilla, P. Lundrigan, B. Nelson, and B. Hutchings, “Rapid prototyping tools for FPGA designs: RapidSmith,” in Proceedings of the 2010 International Conference on Field-Programmable Technology (FPT '10), pp. 353–356, December 2010.
[23]  B. Huang, A. G. Schmidt, A. A. Mendon, and R. Sass, “Investigating resilient high performance reconfigurable computing with minimally-invasive system monitoring,” in Proceedings of the 4th International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA '10), pp. 1–8, November 2010.
[24]  A. G. Schmidt, B. Huang, R. Sass, and M. French, “Checkpoint/restart and beyond: resilient high performance computing with FPGAs,” in Proceedings of the 19th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM '11), pp. 162–169, May 2011.
[25]  A. G. Schmidt and R. Sass, “Improving design productivity with a hardware performance monitoring infrastructure,” in Proceedings of the 6th Annual International Conference on Reconfigurable Computing and FPGAs, 2011.
[26]  A. G. Schmidt, W. V. Kritikos, R. R. Sharma, and R. Sass, “AIREN: a novel integration of on-chip and off-chip FPGA networks,” in Proceedings of the IEEE Symposium on Field Programmable Custom Computing Machines (FCCM '09), pp. 271–274, April 2009.
[27]  A. G. Schmidt, Productively scaling hardware designs over increasing resources using a systematic design analysis approach [Ph.D. thesis], The University of North Carolina at Charlotte, 2011.
[28]  T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, vol. 147, no. 1, pp. 195–197, 1981.
[29]  S. Ganesh, Implementation of the smith-waterman algorithm on fpgas [Ph.D. thesis], University of North Carolina at Charlotte, 2009.
[30]  W. R. Pearson, “FASTA Sequence Comparison at the University of Virginia,” July 2011, http://fasta.bioch.virginia.edu/fasta_www2/.
[31]  J. C. Lagarias, “The 3x+1 problem and its generalizations,” American Mathematical Monthly, pp. 3–23, 1985.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133