OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

International Journal of Reconfigurable Computing 2012

NCOR: An FPGA-Friendly Nonblocking Data Cache for Soft Processors with Runahead Execution

DOI: 10.1155/2012/915178

Kaveh Aasaraai,Andreas Moshovos

Full-Text Cite this paper Add to My Lib

Abstract:

Soft processors often use data caches to reduce the gap between processor and main memory speeds. To achieve high efficiency, simple, blocking caches are used. Such caches are not appropriate for processor designs such as Runahead and out-of-order execution that require nonblocking caches to tolerate main memory latencies. Instead, these processors use non-blocking caches to extract memory level parallelism and improve performance. However, conventional non-blocking cache designs are expensive and slow on FPGAs as they use content-addressable memories (CAMs). This work proposes NCOR, an FPGA-friendly non-blocking cache that exploits the key properties of Runahead execution. NCOR does not require CAMs and utilizes smart cache controllers. A 4？KB NCOR operates at 329？MHz on Stratix III FPGAs while it uses only 270 logic elements. A 32？KB NCOR operates at 278？Mhz and uses 269 logic elements. 1. Introduction Embedded system applications increasingly use soft processors implemented over reconfigurable logic. Embedded applications, like other application classes, evolve over time and their computation needs and structure change. If the experience with other application classes is any indication, embedded applications will evolve and incorporate algorithms with unstructured instruction-level parallelism. Existing soft processor implementations use in-order organizations since these organizations map well onto reconfigurable logic and offer adequate performance for most existing embedded applications. Previous work has shown that for programs with unstructured instruction-level parallelism, a 1-way out-of-order (OoO) processor has the potential to outperform a 2- or even a 4-way superscalar processor in a reconfigurable logic environment [1]. Conventional OoO processor designs are tuned for custom logic implementation and rely heavily on content addressable memories, multiported register files, and wide, multisource and multidestination datapaths. These structures are inefficient when implemented on an FPGA fabric. It is an open question whether it is possible to design an FPGA-friendly soft core that offers the benefits of OoO execution while overcoming the complexity and inefficiency of conventional OoO structures. A lower complexity alternative to OoO architectures is Runahead Execution, or simply Runahead, which offers most of the benefits of OoO execution [2]. Runahead relies on the observation that often most of the performance benefits of OoO execution result from allowing multiple outstanding main memory requests. Runahead extends a conventional

References

[1]	K. Aasaraai and A. Moshovos, “Towards a viable out-of-order soft core: copy-free, checkpointed register renaming,” in the 19th International Conference on Field Programmable Logic and Applications (FPL '09), Prague, Czech Republic, September 2009.
[2]	J. Dundas and T. Mudge, “Improving data cache performance by pre-executing instructions under a cache miss,” in Proceedings of the International Conference on Supercomputing, pp. 68–75, July 1997.
[3]	K. Aasaraai and A. Moshovos, “An efficient non-blocking data cache for soft processors,” in Proceedings of the International Conference on ReConFigurable Computing and FPGAs, December 2010.
[4]	D. Kroft, “Lockup-free instruction fetch/prefetch cache organization,” in Proceedings of the 8th Annual International Symposium on Computer Architecture, pp. 81–87, 1982.
[5]	Altera Corp., “Nios II Processor Reference Handbook v10.0,” 2010.
[6]	Altera Corp, “Stratix III Device Handbook: Chapter 4. TriMatrix Embedded Memory Blocks in Stratix III Devices,” 2010.
[7]	Arcturus Networks Inc, “uClinux,” http://www.uclinux.org/.
[8]	Standard Performance Evaluation Corporation, “SPEC CPU 2006,” http://www.spec.org/cpu2006/.
[9]	P. Yiannacouras and J. Rose, “A parameterized automatic cache generator for FPGAs,” in Proceedings of Field-Programmable Technology (FPT), pp. 324–327, 2003.
[10]	IBM and LSI, “PowerPC 476FP Embedded Processor Core and PowerPC 470S Synthesizable Core User's Manual,” http://www-03.ibm.com/press/us/en/pressrelease/28399.wss.
[11]	G. Stitt and J. Coole, “Traversal caches: a framework for FPGA acceleration of pointer data structures,” International Journal of Reconfigurable Computing, vol. 2010, Article ID 652620, 16 pages, 2010.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133