Performability Analysis of Fault-Tolerant Computer Systems (original) (raw)

A unified framework for the performability evaluation of fault-tolerant computer systems

IEEE Transactions on Computers, 1993

In this paper, we consider the problem of evaluating the performability density and distribution of degradable computer systems. A generalized model of performability is considered, wherein the dynamics of configuration modes are modeled as a nonhomogeneous Markov process, and the performance rate in each configuration mode can be time-dependent.

Analysis of fault tolerant computer systems

Microelectronics Reliability, 1987

In many critical applications of digital systems, fault tolerance has been an essential architectual attribute for achieving high reliability. In recent years, the concept of the performability of such systems has drawn the attention of many researchers. In this paper, we develop a general Markov model for fault tolerant computer systems. Various important performance measures, including the performability measures as well as some new performance measures, are treated in a unified manner. Futhermore general and efficient computational procedures are developed for calculating these performance measures based on the uniformization technique of Keilson(1974,1979). A numerical example is given to illustrate the computational procedures developed. XIR P.'1' VTF; 2 0,7 7P IT FTC Z3&.krCH (AFSC) :"'7tIC'E OF RNTITTTAL TO DTIC This technic, l report Mns been reviewed and is approvcd for public release IAW AFR 190-12. Distribution is unlimited.

Performability analysis using semi-Markov reward processes

IEEE Transactions on Computers, 1990

With the increasing complexity of multiprocessor and distributed processing systems, the need to develop efficient and accurate modeling methods is evident. Fault tolerance and degradable performance of such systems has given rise to considerable interest in models for the combined evaluation of performance and reliability [l], [2]. Markov or semi-Markov reward models can be used to evaluate the effectiveness of degradable fault-tolerant systems. Beaudry [l] proposed a simple method for computing the distribution of performability in a Markov reward process. We present two extensions of Beaudry's approach. First, we generalize the method to a semi-Markov reward process. Second, we remove the restriction requiring the association of zero reward to absorbing states only. We illustrate the use of the approach with three interesting applications.

Performability Analysis Using Semi-Markov Reard Processes

IEEE Transactions on Computers, 1990

With the increasing complexity of multiprocessor and distributed processing systems, the need to develop efficient and accurate modeling methods is evident. Fault tolerance and degradable performance of such systems has given rise to considerable interest in models for the combined evaluation of performance and reliability [l], [2]. Markov or semi-Markov reward models can be used to evaluate the effectiveness of degradable fault-tolerant systems. Beaudry [l] proposed a simple method for computing the distribution of performability in a Markov reward process. We present two extensions of Beaudry's approach. First, we generalize the method to a semi-Markov reward process. Second, we remove the restriction requiring the association of zero reward to absorbing states only. We illustrate the use of the approach with three interesting applications.

Performability measure for acyclic Markovian models

Computers & Mathematics with Applications, 1998

Continuous-time Markov processes with a finite-state space are generally considered m. to model degradable fault-tolerant computer systems. The finite space is partitioned as Ui__IB,, where Bi stands for the set of states which corresponds to the configuration where the system has a performance level (or reward rate) equal to r~. The performability Yt is defined as the accumulated reward over a mission time [0, t]. In this paper, a renewal equation is established for the performability measure and solved for both "standard" and uniform acyclic models. Two closed form expressions for the performability measure are derived for the two types of models. Furthermore, an algorithm with a low polynomial computational complexity is presented and applied to a degradable computer system.

A Unified Approach to Reliability, Availability, Performability Analysis Based on Markov Processes with Rewards

Advances in systems science and applications, 2018

This paper discusses a unified approach to reliability, availability and performability analysis of complex engineering systems. Theoretical basis of this approach is continuous-time discrete state Markov processes with rewards. From reliability modeling point of view complex systems are the systems with static and dynamic redundancy, imperfect fault coverage, various recovery strategies, multilevel operation and varying severity of failure states. We propose a unified method of calculating the reliability, availability and performability indices based on the definition of special forms of reward matrix. This method proved to be effective in calculating both cumulative and instantaneous measures in steady-state and transient cases. We describe special analytical software which implements suggested method. We demonstrate the flexibility of the proposed method and software by analyzing multilevel process unit with protection and demand-based warm standby system.

Investigating fault tolerant computing systems reliability

2008 IEEE International Symposium on Parallel and Distributed Processing, 2008

Nowadays, computers and networks represent the heart of a great part of modern technologies. Computing systems are widely used in many application areas, and they are desired to achieve various complex and safety-critical missions. As consequence, greater attention is lavished on performance and dependability evaluation of computing systems. This brings to the specification of precise techniques and models, that consider and evaluate aspects before (consciously or unconsciously) approximated or ignored at all. On the other hand, the increasing importance assumed by such systems is translated in terms of tighter and tighter constraints, requirements and/or policies (QoS, fault tolerance, maintenance, redundancy, etc.) according to the systems' criticism. The evaluation must therefore take into great account such dynamic behaviors, carefully identifying and quantifying dependencies among devices. In this paper we face the problem of individuating and evaluating the most common dynamic behaviors and dependencies affecting fault tolerant computing systems. We propose some models to represent such aspects in terms of reliability/availability, basing on dynamic reliability block diagrams (DRBD), a new formalism derived from RBD we developed. In this way we want to provide the guidelines for adequately evaluating fault tolerant computing system reliability/availability.

Reliability estimation of fault-tolerant systems: tools and techniques

Computer, 2000

A power has focused attention on tools and techniques we might use to accurately estimate the reliability of a proposed computing system on the basis of models derived from the design of that system. Reliability modeling of fault-tolerant computing systems has become an integral part of the system design process, especially for those systems with life-critical applications such as aircraft and spacecraft flight control.