Soft Error Handling for Embedded Systems using Compiler-OS Interaction (original) (raw)

A Flexible Software-Based Technique for Soft Error Mitigation in Low-Cost Embedded Systems

2013

Commercial off-the-shelf microprocessors are the core of low-cost embedded systems due to their programmability and cost-effectiveness. Recent advances in electronic technologies have allowed remarkable improvements in their performance. However, they have also made microprocessors more susceptible to transient faults induced by radiation. These non-destructive events (soft errors), may cause a microprocessor to produce a wrong computation result or lose control of a system with catastrophic consequences. Therefore, soft error mitigation has become a compulsory requirement for an increasing number of applications, which operate from the space to the ground level. In this context, this paper uses the concept of selective hardening, which is aimed to design reduced-overhead and flexible mitigation techniques. Following this concept, a novel flexible version of the software-based fault recovery technique known as SWIFT-R is proposed. Our approach makes possible to select different registers subsets from the microprocessor register file to be protected on software. Thus, design space is enriched with a wide spectrum of new partially protected versions, which offer more flexibility to designers. This permits to find the best tradeoffs between performance, code size, and fault coverage. Three case studies have been developed to show the applicability and flexibility of the proposal.

Cost-efficient soft error protection for embedded microprocessors

2006

Abstract Device scaling trends dramatically increase the susceptibility of microprocessors to soft errors. Further, mounting demand for embedded microprocessors in a wide array of safety critical applications, ranging from automobiles to pacemakers, compounds the importance of addressing the soft error problem. Historically, soft error tolerance techniques have been targeted mainly at high-end server markets, leading to solutions such as coarse-grained modular redundancy and redundant multithreading.

Compiler-Directed Soft Error Mitigation for Embedded Systems

IEEE Transactions on Dependable and Secure Computing, 2012

The protection of processor-based systems to mitigate the harmful effect of transient faults (soft errors) is gaining importance as technology shrinks. At the same time, for large segments of embedded markets, parameters like cost and performance continue to be as important as reliability. This paper presents a compiler-based methodology for facilitating the design of fault-tolerant embedded systems. The methodology is supported by an infrastructure that permits to easily combine hardware/software soft errors mitigation techniques in order to best satisfy both usual design constraints and dependability requirements. It is based on a generic microprocessor architecture that facilitates the implementation of software-based techniques, providing a uniform isolated-from-target hardening core that allows the automatic generation of protected source code (hardened code). Two case studies are presented. In the first one, several software-based mitigation techniques are implemented and evaluated showing the flexibility of the infrastructure. In the second one, a customized fault tolerant embedded system is designed by combining selective protection on both hardware and software. Several trade-offs among performance, code size, reliability, and hardware costs have been explored. Results show the applicability of the approach. Among the developed software-based mitigation techniques, a novel selective version of the well known SWIFT-R is presented.

A Compiler-Microarchitecture Hybrid Approach to Soft Error Reduction for Register Files

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2000

For embedded systems, where neither energy nor reliability can be easily sacrificed, this paper presents an energy efficient soft error protection scheme for register files (RFs). Unlike previous approaches, the proposed method explicitly optimizes for energy efficiency and can exploit the fundamental tradeoff between reliability and energy. While even simple compiler-managed RF protection scheme can be more energy efficient than hardware schemes, this paper formulates and solves further compiler optimization problems to significantly enhance the energy efficiency of RF protection schemes by an additional 30% on average, as demonstrated in our experiments on a number of embedded application benchmarks.

Analysis of error detection schemes: Toolchain support and hardware/software implications

Proceedings of the 2012 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2012, 2012

Meeting safety requirements typically require substantial invasive extensions to applications. Even in the absence of faults, the overhead associated with these invasive extensions may unacceptably increase execution time. In this paper we focus on a number of experiments with schemes for error detection, having a 3D Path Planning application for an avionics system as case study. We analyze how these error detection schemes can be implemented to meeting system's time budget. The experiments allowed us to acquire the requirements for automating the application of the error detection schemes in the context of a hardware/software design-flow, and to determine how those schemes can be addressed using a novel approach where safety requirements are described using an aspect-and strategy-oriented programming language, named LARA. For our experiments and validation, we consider an FPGA-based embedded system consisting of a general purpose processor (GPP) coupled to custom computing units which are primarily used for hardware acceleration and for implementing fault detection schemes. I.