Error Correction Code protected Data Processing Units

More Info
expand_more

Abstract

The significant uncertainty associated with current nanodevices fabrication and operation, calls for a circuit design paradigm change, which ought to actively embrace the inherently nanodevice unreliability to generate overall circuit architectures able to perform reliable computation. While for data storage units viable solutions exist, Data Processing Units (DPUs) are not amenable to a similar line of reasoning. The typical approach undertaken for fault-tolerant DPUs relies on modular redundancy (e.g., spatial, temporal), which while being effective from an error tolerance perspective, generally involves high area and/or performance impairments. This paper proposes a generic methodology to obtain reliable DPU implementations built with unreliable components by intimately intertwining Error Correcting Codes (ECCs) codecs with the DPU functionality. The ECC protected DPU architecture is derived cluster-wise with area and reliability constraints, by exploiting dependence relations (logical and w.r.t. shared area) between internal signals pertaining to the DPU and the ECC codec. To evaluate the error rate and performance implications, a multitude of test corners were considered (e.g., gate criticality, ECC type and structure, faulty and low complexity decoder, time-space redundancy) for an ECC protected 6-bit adder architecture. Simulation results reveal that the ECC embedding approach can be effective from both error rate and area perspective, for the Pareto designs with performance figures of merit situated in-between consecutive modular redundancy based design corresponding curves. The proposed approach is generic from the coding point of view, scalable, and enables a fine grained control of the DPU desired reliability degree and area overhead.