Synchronous early-completion-prediction adders (ECPAs) are used for high clock rate and high-precision DSP datapaths, as they allow a dominant amount of single-cycle operations even if the worst-case carry propagation delay is longer than the clock period. Previous works have also demonstrated ECPA advantages for average leakage reduction and NBTI effects reduction in nanoscale CMOS technologies. This paper illustrates a general systematic methodology to design ECPA units, targeting nanoscale CMOS technologies, which is not available in the current literature yet. The method is fully compatible with standard VLSI macrocell design tools and standard adder structures and includes automatic definition of critical test patterns for postlayout verification. A design example is included, reporting speed and power data superior to previous works. 1. Introduction Fast integer adders are an essential component of most DSP datapaths. Synchronous early-completion-prediction adders (ECPAs) [1], also known as variable-latency adders [2], have been introduced for high clock rate and high-precision datapaths, as they allow single-cycle operations even if the clock period is shorter than the worst-case carry propagation delay. Thanks to the data dependency of actual carry chain propagation, the occurrence of multicycle operations can be maintained statistically rare, thus allowing an overall speed improvement. The industrial effectiveness of the idea was first proven by the design of a full-custom ECPA unit for a DSP datapath at Toshiba Labs [1]. The logic foundation of that adder is shown in [3]. An extension to multiply unit design has been shown in [4]. The works in [2] and [5] have recently pointed out the potentials of variable-latency adder units in nano-CMOS addition units, for reducing average leakage power consumption and improving robustness to NTBI faults occurring in nano-scale technologies. An ECPA consists of a conventional adder plus a completion-prediction logic unit (Figure 1). The prediction unit estimates the actual critical path length in the adder depending on the operand values and hence the cycle count of the operation for the target cycle time. This approach differs from asynchronous completion detection units [6–8], as it is based on a totally synchronous scheme. From the design point of view, the logic specification of the prediction function depends on the target cycle time and on the estimation of the variable completion time of the adder, in order to define the cycle count output. Moreover, the speed of the prediction unit is critical,
References
[1]
Y. Kondo, N. Ikumi, K. Ueno, J. Mori, and M. Hirano, “Early-completion-detecting ALU for a 1 GHz 64 b datapath,” in Proceedings of theIEEE International Solid-State Circuits Conference (ISSCC '97), pp. 418–419, February 1997.
[2]
Y. Chen, H. Li, C. K. Koh et al., “Variable-latency adder (VL-adder) designs for low power and NBTI tolerance,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, no. 11, pp. 1621–1624, 2010.
[3]
J. Lee and K. Asada, “A synchronous completion prediction adder (SCPA),” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E80-A, no. 3, pp. 606–609, 1997.
[4]
M. Olivieri, “Design of synchronous and asynchronous variable-latency pipelined multipliers,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 9, no. 2, pp. 365–376, 2001.
[5]
Y. Chen, H. Li, J. Li, and C. K. Koh, “Variable-latency adder (VL-adder): new arithmetic circuit design practice to overcome NBTI,” in International Symposium on Low Power Electronics and Design (ISLPED '07), pp. 195–200, Portland, Ore, USA, August 2007.
[6]
A. De Gloria and M. Olivieri, “Statistical Carry Lookahead adders,” IEEE Transactions on Computers, vol. 45, no. 3, pp. 340–347, 1996.
[7]
A. D. Gloria, “Completion-detecting carry select addition,” IEE Proceedings: Computers and Digital Techniques, vol. 147, no. 2, pp. 93–100, 2000.
[8]
D. J. Kinniment, “An evaluation of asynchronous addition,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 4, no. 1, pp. 137–140, 1996.
[9]
S. M. Nowick, K. Y. Yun, P. A. Beerel, and A. E. Dooply, “Speculative completion for the design of high-performance asynchronous dynamic adders,” in Proceedings of the 3rd International Symposium on Advanced Research Asynchronous Circuits and Systems, pp. 210–223, Eindhoven, The Netherlands, April 1997.
[10]
O. J. Bedrij, “Carry-select adder,” IRE Transactions on Electronic Computers, vol. 11, pp. 340–346, 1962.
[11]
J. M. Rabaey, Digital Integrated Circuits: A Design Perspective, Prentice Hall, Upper Saddle River, NJ, USA, 2nd edition, 2003.
[12]
N. H. Weste and K. Eshraghian, Principles of CMOS VLSI Design, Addison Wesley, Reading, Mass, USA, 2nd edition, 1994.
[13]
D. Koes, T. Chelcea, C. Onyeama, and S. Goldstein, “Adding faster with application specific early termination,” Technical Report No. CMU-CS-05-101, Carnegie Mellon University, May 2005, http://www.cs.cmu.edu/~seth/papers/koes-tr05.html.
[14]
B. E. Briley, “Some new results on average worst case carry,” IEEE Transactions on Computers, vol. 22, no. 5, pp. 459–463, 1973.
[15]
G. W. Reitwiesner, “The determination of carry propagation length for binary addition,” IRE Transactions on Electronic Computers, vol. 9, no. 1, pp. 35–38, 1960.
[16]
I. Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan-Kaufmann, San Francisco, Calif, USA, 1999.
[17]
F. Lannutti, P. Nenzi, and M. Olivieri, “KLU sparse direct linear solver implementation into NGSPICE,” in Proceedings of the 19th International Conference on Mixed Design of Integrated Circuits and Systems (MIXDES '12), p. 69, 2012.
[18]
A. Mastrandrea, M. Olivieri, and F. Menichelli, “A delay model allowing nano-CMOS standard cells statistical simulation at the logic level,” in Proceedings of the 7th IEEE Conference on Ph.D. Research in Microelectronics and Electronics (PRIME '11), pp. 217–220, Trento, Italy, July 2011.
[19]
R. Vattikonda, W. Wang, and Y. Cao, “Modeling and minimization of PMOS NBTI effect for robust nanometer design,” in Proceedings of the 43rd Annual Design Automation Conference (DAC '06), pp. 1047–1052, ACM, San Francisco, Calif, USA, 2006.