%0 Journal Article %T Investigation of a Superscalar Operand Stack Using FO4 and ASIC Wire-Delay Metrics %A Christopher Bailey %A Brendan Mullane %J VLSI Design %D 2014 %I Hindawi Publishing Corporation %R 10.1155/2014/493189 %X Complexity in processor microarchitecture and the related issues of power density, hot spots and wire delay, are seen to be a major concern for design migration into low nanometer technologies of the future. This paper evaluates the hardware cost of an alternative to register-file organization, the superscalar stack issue array (SSIA). We believe this is the first such reported study using discrete stack elements. Several possible implementations are evaluated, using a 90£¿nm standard cell library as a reference model, yielding delay data and FO4 metrics. The evaluation, including reference to ASIC layout, RC extraction, and timing simulation, suggests a 4-wide issue rate of at least four Giga-ops/sec at 90£¿nm and opportunities for twofold future improvement by using more advanced design approaches. 1. Introduction Current trends in semiconductor technology, and in particular the International Technology Roadmap for Semiconductors [1], suggest that future concerns in microarchitecture at the VLSI level will pose significant challenges. These include increasing power density [2], progressively severe thermal hot spots in increasingly complex designs [3], the impact of growing static power [4], and the problem of wire versus gate-delay and power scaling [5, 6]. Such problems are often most acutely exposed in key mainstream processor components such as cache, register related logic such as reorder buffers, rename logic, and the register file itself. Any alternative scheme to the traditional register-based computing paradigm can therefore open up the possibility of new approaches to these problems. However, register files are so highly optimized that measuring alternatives now requires complete layout of an optimal design for comparison, followed by timing and power analysis and nothing as simple as functional comparison of abstract logic. This paper focuses upon one possible unexplored option for operand storage which is alternative in its structure to that of a register file. The questions we examine are (a) can a LIFO (last-in-first-out) stack support superscalar operand access and (b) what is its performance relative to established mainstream approaches. This work is undertaken with a 90£¿nm UMC CMOS process library; however, we ultimately utilize FO4 as a delay metric [7] in order to provide a general measure of performance that can be scaled to other process nodes. The work is undertaken using standard cell digital libraries and not at the transistor level. Although this is not therefore an optimal solution, it permits rapid assessment of multiple %U http://www.hindawi.com/journals/vlsi/2014/493189/