全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Error Detection and Recovery Techniques for Variation-Aware CMOS Computing: A Comprehensive Review

DOI: 10.3390/jlpea1030334

Keywords: variation tolerance, error detection, error recovery

Full-Text   Cite this paper   Add to My Lib

Abstract:

While Moore’s law scaling continues to double transistor density every technology generation, new design challenges are introduced. One of these challenges is variation, resulting in deviations in the behavior of transistors, most importantly in switching delays. These exaggerated delays widen the gap between the average and the worst case behavior of a circuit. Conventionally, circuits are designed to accommodate the worst case delay and are therefore becoming very limited in their performance advantages. Thus, allowing for an average case oriented design is a promising solution, maintaining the pace of performance improvement over future generations. However, to maintain correctness, such an approach will require on the fly mechanisms to prevent, detect, and resolve violations. This paper explores such mechanisms, allowing the improvement of circuit performance under intensifying variations. We present speculative error detection techniques along with recovery mechanisms. We continue by discussing their ability to operate under extreme variations including sub-threshold operation. While the main focus of this survey is on circuit approaches, for its completeness, we discuss higher-level, architectural and algorithmic techniques as well.

References

[1]  Lorenz, J.; B?r, E.; Clees, T.; Evanschitzky, P.; Jancke, R.; Kampen, C.; Paschen, U.; Salzig, C.; Selberherr, S. Hierarchical simulation of process variations and their impact on circuits and systems: Results. IEEE Trans. Electron Devices 2011, 58, 2227–2234.
[2]  Bull, D.; Das, S.; Shivshankar, K.; Dasika, G.; Flautner, K.; Blaauw, D. A Power-Efficient 32b ARM ISA Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation. Proceedings of the 2010 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA; 2010; pp. 284–285.
[3]  Bowman, K.; Tschanz, J.; Lu, S.; Aseron, P.; Khellah, M.; Raychowdhury, A.; Geuskens, B.; Tokunaga, C.; Wilkerson, C.; Karnik, T.; De, V. A 45 nm Resilient Microprocessor Core for Dynamic Variation Tolerance. IEEE J. Solid-State Circuits 2011, 46, 194–208.
[4]  Bowman, K.; Tschanz, J.; Kim, N.S.; Lee, J.; Wilkerson, C.; Lu, S.L.; Karnik, T.; De, V. Energy-Efficient and Metastability-Immune Timing-Error Detection and Recovery Circuits for Dynamic Variation Tolerance. Proceedings of the IEEE International Conference on Integrated Circuit Design and Technology and Tutorial (ICICDT '08), Austin, TX, USA, 2–4 June 2008; pp. 155–158.
[5]  Bowman, K.; Tschanz, J.; Wilkerson, C.; Lu, S.L.; Karnik, T.; De, V.; Borkar, S. Circuit Techniques for Dynamic Variation Tolerance. Proceedings of the 46th ACM/IEEE Design Automation Conference (DAC '09), San Francisco, CA, USA, 26–31 July 2009; pp. 4–7.
[6]  Mintarno, E.; Skaf, J.; Zheng, R.; Velamala, J.; Cao, Y.; Boyd, S.; Dutton, R.; Mitra, S. Self-Tuning for Maximized Lifetime Energy-Efficiency in the Presence of Circuit Aging. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2011, 30, 760–773.
[7]  Hanson, S.; Zhai, B.; Bernstein, K.; Blaauw, D.; Bryant, A.; Chang, L.; Das, K.K.; Haensch, W.; Nowak, E.J.; Sylvester, D.M. Ultralow-voltage, minimum-energy CMOS. IBM J. Res. Dev. 2006, 50, 469–490.
[8]  Krimer, E.; Pawlowski, R.; Erez, M.; Chiang, P. Synctium: A near-threshold stream processor for energy-constrained parallel applications. Comput. Archit. Lett. 2010, 9, 21–24.
[9]  Tschanz, J.; Bowman, K.; Walstra, S.; Agostinelli, M.; Karnik, T.; De, V. Tunable Replica Circuits and Adaptive Voltage-Frequency Techniques for Dynamic Voltage, Temperature, and Aging Variation Tolerance. Proceedings of the 2009 Symposium on VLSI Circuits, Kyoto, Japan, 16–18 June 2009; pp. 112–113.
[10]  Agarwal, K.; Nassif, S. Characterizing Process Variation in Nanometer CMOS. Proceedings of the 44th Annual Design Automation Conference (DAC '07), San Diego, CA, USA, 4–8 June 2007; pp. 396–399.
[11]  Hassoun, S.; Ebeling, C. Architectural Retiming: Pipelining Latency-Constrained Circuits. Proceedings of the 33rd Design Automation Conference 1996, Las Vegas, NV, USA, 3–7 June 1996; pp. 708–713.
[12]  Liu, T.; Lu, S.L. Performance Improvement With Circuit-Level Speculation. Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-33), Monterey, CA, USA, 10–13 December 2000; pp. 348–355.
[13]  Ernst, D.; Das, S.; Lee, S.; Blaauw, D.; Austin, T.; Mudge, T.; Kim, N.; Flautner, K. Razor: Circuit-level correction of timing errors for low-power operation. IEEE Micro 2004, 24, 10–20.
[14]  Das, S.; Tokunaga, C.; Pant, S.; Ma, W.; Kalaiselvan, S.; Lai, K.; Bull, D.; Blaauw, D. Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance. IEEE J. Solid-State Circuits 2010, 46, 32–48.
[15]  Franco, P.; McCluskey, E. On-Line Delay Testing of Digital Circuits. Proceedings of the 12th IEEE VLSI Test Symposium, Cherry Hill, NJ, USA, 25–28 April 1994; pp. 167–173.
[16]  Nicolaidis, M. Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies. Proceedings of the 17th IEEE VLSI Test Symposium, Dana Point, CA, USA, 25–29 April 1999; pp. 86–94.
[17]  Sproull, R.; Sutherland, I.; Molnar, C. The counterflow pipeline processor architecture. IEEE Des. Test Comput. 1994, 11, 48.
[18]  Tamir, Y.; Tremblay, M. High-performance fault-tolerant VLSI systems using micro rollback. IEEE Trans. Comput. 1990, 39, 548–554.
[19]  Drake, A.; Senger, R.; Singh, H.; Carpenter, G.; James, N. Dynamic Measurement of Critical-Path Timing. Proceedings of the IEEE International Conference on Integrated Circuit Design and Technology and Tutorial (ICICDT '08), Austin, TX, USA, 2–4 June 2008; pp. 249–252.
[20]  Bowman, K.; Tokunaga, C.; Tschanz, J.; Raychowdhury, A.; Khellah, M.; Geuskens, B.; Lu, S.L.; Aseron, P.; Karnik, T.; De, V. Dynamic Variation Monitor for Measuring the Impact of Voltage Droops on Microprocessor Clock Frequency. Proceedings of the 2010 IEEE Custom Integrated Circuits Conference (CICC), San Jose, CA, USA, 19–22 September 2010; pp. 1–4.
[21]  Sorin, D.; Martin, M.; Hill, M.; Wood, D. Fast Checkpoint/recovery to Support Kilo-Instruction Speculation and Hardware Fault Tolerance. Technical Report CS-TR-2000-1420; Department of Computer Sciences, University of Wisconsin-Madison: Madison, WI, USA, 2000.
[22]  Prvulovic, M.; Zhang, Z.; Torrellas, J. ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors. Proceedings of the 29th Annual International Symposium on Computer Architecture, Anchorage, AK, USA, 25–29 May 2002; pp. 111–122.
[23]  Austin, T. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO-32), Haifa, Israel, 16–18 November 1999; pp. 196–207.
[24]  Rao, T.R.N. Error Coding for Arithmetic Processors; Academic Press Inc.: Orlando, FL, USA, 1974.
[25]  Lo, J.C. Reliable floating-point arithmetic algorithms for error-coded operands. IEEE Trans. Comput. 1994, 43, 400–412.
[26]  Lo, J.C.; Thanawastien, S.; Rao, T. Concurrent Error Detection in Arithmetic and Logical Operations Using Berger Codes. Proceedings of 9th Symposium on Computer Arithmetic, Santa Monica, CA, USA, 6–8 September 1989; pp. 233–240.
[27]  Strukov, D. The Area and Latency Tradeoffs of Binary Bit-Parallel BCH Decoders for Prospective Nanoelectronic Memories. Proceedings of the Fortieth Asilomar Conference on Signals Systems and Computers (ACSSC '06), Pacific Grove, CA, USA, 29 October–1 November 2006; pp. 1058–1187.
[28]  Slayman, C. Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations. IEEE Trans. Device Mater. Reliab. 2005, 5, 397–404.
[29]  Hamming, R. Error correcting and error detecting codes. Bell Syst. Tech. J. 1950, 29, 147–160.
[30]  Hsiao, M. A class of optimal minimum odd-weight-column SEC-DED codes. IBM J. Res. Dev. 1950, 29, 147–160.
[31]  Chen, C.; Hsiao, M. Error-correcting codes for semiconductor memory applications: A state-of-the-art review. IBM J. Res. Dev. 1984, 28, 124–134.
[32]  Lin, S.; Costello, D.J. Error Control Coding: Fundamentals and Applications; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1983.
[33]  Andrews, D.M. Using Executable Assertions for Testing and Fault Tolerance. Proceedings of the 9th Fault-Tolerance Computing Symposium, Madison, WI, USA, June 1979.
[34]  Mahmood, A.; Lu, D.J.; McCluskey, E.J. Concurrent Fault Detection Using a Watchdog Processor and Assertions. Proceedings of the International Test Conference, Philadelphia, PA, USA, 18–20 October 1983; pp. 622–628.
[35]  Rela, M.Z.; Madiera, H.; Silva, J.G. Experimental Evaluation of the Fail-Silent Behavior in Programs with Consistency Checks. Proceedings of the 26th Annual International Symposium on Fault Tolerant Computing (FTCS '96), Sendai, Japan, 25–27 June 1996; pp. 394–403.
[36]  Wozniak, J.M.; Striegel, A.; Salyers, D.; Izaguirre, J.A. GIPSE: Streamlining the Management of Simulation on the Grid. Proceedings of the 38th Annual Symposium on Simulation (ANSS '05), San Diego, CA, USA, 4–6 April 2005; pp. 130–137.
[37]  Balasubramanian, V.; Banerjee, P. Compiler-assisted synthesis of algorithm-based checking in multiprocessors. IEEE Trans. Comput. 1990, 39, 436–446.
[38]  Al-Yamani, A.; Oh, N.; McCluskey, E. Performance Evaluation of Checksum-Based ABFT. Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, San Francisco, CA, USA, 24–26 October 2001; pp. 461–466.
[39]  Huang, K.H.; Abraham, J. Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 1984, C-33, 518–528.
[40]  Banerjee, P.; Rahmeh, J.; Stunkel, C.; Nair, V.; Roy, K.; Balasubramanian, V.; Abraham, J. Algorithm-based fault tolerance on a hypercube multiprocessor. IEEE Trans. Comput. 1990, 39, 1132–1145.
[41]  Reddy, A.; Banerjee, P. Algorithm-based fault detection for signal processing applications. IEEE Trans. Comput. 1990, 39, 1304–1308.
[42]  Jou, J.Y.; Abraham, J. Fault-tolerant FFT networks. IEEE Trans. Comput. 1988, 37, 548–561.
[43]  Mishra, A.; Banerjee, P. An algorithm-based error detection scheme for the multigrid method. IEEE Trans. Comput. 2003, 52, 1089–1099.
[44]  Wensley, J.; Green, M.; Levitt, K.; Shostak, R. The Design, Analysis, and Verification of the SIFT Fault Tolerant System. Proceedings of the 2nd International Conference on Software Engineering, San Francisco, CA, USA; 1976; pp. 458–469.
[45]  Nicolescu, B.; Velazco, R.; Sonza-Reorda, M.; Rebaudengo, M.; Violante, M. A Software Fault Tolerance Method for Safety-Critical Systems: Effectiveness and Drawbacks. Proceedings of the 15th Symposium on Integrated Circuits and Systems Design, Porto Alegre, Brazil, 9–14 September 2002; pp. 101–106.
[46]  Oh, N.; Mitra, S.; McCluskey, E. ED4I: Error detection by diverse data and duplicated instructions. IEEE Trans. Comput. 2002, 51, 180–199.
[47]  Reis, G.; Chang, J.; Vachharajani, N.; Rangan, R.; August, D. SWIFT: Software Implemented Fault Tolerance. Proceedings of the International Symposium on Code Generation and Optimization, San Jose, CA, USA, 20–23 March 2005; pp. 243–254.
[48]  Kwong, J.; Chandrakasan, A. Variation-Driven Device Sizing for Minimum Energy Sub-threshold Circuits. Proceedings of the 2006 International Symposium on Low Power Electronics and Design (ISLPED '06), Tegernsee, Germany, 4–6 October 2006; pp. 8–13.
[49]  Turnquist, M.; Laulainen, E.; Makipaa, J.; Pulkkinen, M.; Koskinen, L. Measurement of a Timing Error Detection Latch Capable of Sub-Threshold Operation. Proceedings of the NORCHIP, Trondheim, Norway, 16–17 November 2009; pp. 1–4.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133