Ann Tai - Academia.edu (original) (raw)

Papers by Ann Tai

Research paper thumbnail of Toward Accessibility Enhancement of Dependability Modeling Techniques and Tools

Research paper thumbnail of A user-friendly dependability evaluation tool

Proceedings of the IEEE 1996 National Aerospace and Electronics Conference NAECON 1996

... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the ... more ... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the high-level specification into a representation that can be automatically solved by the underlying modeling en-gines SHARPE or SPNP (both from Trivedi, a Professor at Duke University). ...

Research paper thumbnail of On the development of dependability-evaluation workbench for high-assurance system designers

Proceedings 1997 High-Assurance Engineering Workshop

Research paper thumbnail of Long-Life Deep-Space Applications

IEEE Computer, 1998

ABSTRACT

Research paper thumbnail of Deductive Glue Code Synthesis for Embedded Software Systems Based on Code Patterns

Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'06)

Research paper thumbnail of Viable Techniques for Model Construction and Solution

The Kluwer International Series in Engineering and Computer Science, 1996

In keeping with the modeling framework described in the previous chapter, let us now consider som... more In keeping with the modeling framework described in the previous chapter, let us now consider some specific modeling techniques that are suited to the purpose of software performability evaluation. This is not to say that a particular method, if used exclusively, will suffice; what will generally be required is some appropriate combination of techniques for model specification, construction, and solution. Also, although certain of these methods apply to simulation models as well as analytic models, we choose to confine our attention to the latter. Specifically, the four types of models we consider in this chapter are 1) series-parallel graphs, 2) Markov chains, 3) queueing models, and 4) stochastic Petri nets (SPNs) and stochastic activity networks (SANs).

Research paper thumbnail of Case Study III: Performability Management in Distributed Database Systems

The Kluwer International Series in Engineering and Computer Science, 1996

Among the applications of adaptive methods, those for distributed systems present greater challen... more Among the applications of adaptive methods, those for distributed systems present greater challenges. In particular, capturing system conditions in the context of distributed computing for adaptive decision making is very difficult and costly for at least two reasons. First, obtaining information about remote sites requires additional message passing such that the benefits from adaptive mechanisms could be negated by a performance penalty. Secondly, there is always a latency between the time when the information about remote sites is collected and the time when an adaptive decision is made, which may make the adaptation ineffective since the information (based on which the adaptive decision is made) could become obsolete through the latency.

Research paper thumbnail of Case Study II: Performability-Management Oriented Adaptive Fault Tolerance

The Kluwer International Series in Engineering and Computer Science, 1996

In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, et... more In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, etc., a computer or communication system is required to react adaptively to these changes in order to make the service dependable with respect to what the user demands. Accordingly, during the past several years, increased attention has been given to the notion of dynamic optimization with respect to either fault tolerance or performance. Kim and Lawrence proposed that the choice of operational strategies, such as redundant computing resource allocation, should be made adaptive to the operating modes of a fault-tolerant system [109]. Liu et al. introduced the imprecise computation techniques which reduce the adverse effects of real-time constraint violation by providing the user with an approximate result of acceptable quality [110]. More recently, de Meer and Mauser proposed a performability modeling approach for dynamically reconfigurable systems based on extended Markov reward models [111]. Bondavalli, Stankovic and Strigini developed a framework and a specification notation called FERT, for real-time adaptive, software implemented fault tolerance [112].

Research paper thumbnail of Case Study I: Comparative Studies of Fault-Tolerant Software

The Kluwer International Series in Engineering and Computer Science, 1996

The types of computational redundancy employed by fault-tolerant software typically result in per... more The types of computational redundancy employed by fault-tolerant software typically result in performance penalties, particularly with regard to computation delays. These, in turn, may have an adverse effect on system dependability, e.g., in real-time applications, an increased probability of failing to meet a deadline. More generally, such interactions and tradeoffs between performance and dependability affect the user-perceived benefits of a particular fault tolerance scheme. Hence, consideration of combined performance-dependability, via the concept of performability, appears to be a promising basis for assessing and improving the effectiveness of fault-tolerant software.

Research paper thumbnail of General Concepts and Applications of Performability Modeling

The Kluwer International Series in Engineering and Computer Science, 1996

Generally, an evaluation of performability (relative to a designated measure or set of measures) ... more Generally, an evaluation of performability (relative to a designated measure or set of measures) can be either model-based or conducted experimentally via measurements of an actual system. The review that follows presumes the former, permitting us to be more precise in the statements of key concepts. Except for small changes in some of the notation and terminology, it summarizes the original framework described in [5, 6].

Research paper thumbnail of On-board guarded software upgrading for space missions

Gateway to the New Millennium. 18th Digital Avionics Systems Conference. Proceedings (Cat. No.99CH37033)

Research paper thumbnail of Key applications for high-assurance systems

Research paper thumbnail of IPDS Reviewers

Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchho... more Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchholz Savio Chau Ram Chillarege Giovanni Chiola Gwan Choi Gianfranco Ciardo Michel Cukier Susanna Donatelli Joanne Dugan Ricardo Fricks Robert Geist Reinhard German Pedro Gil Swapna Gokhale Katerina Goseva-Popstojanova Stefan Greiner Gunter Haring Rick Harper Boudewijn Haverkort Armin Heindl Ravishankar Iyer Lizy John Mohamed Kaaniche Karama Kanoun Johan Karlson Kimberly Keeton Peter Kemper P Krishnan ...

Research paper thumbnail of Model-Based Resilience Assessment Framework for Autonomous Systems

Volume 13: Safety Engineering, Risk, and Reliability Analysis

While automation technologies advance faster than ever, gaps of resilience capabilities between a... more While automation technologies advance faster than ever, gaps of resilience capabilities between autonomous and human-operated systems have not yet been identified and addressed appropriately. To date, there exists no generic framework for resilience assessment that is applicable to a broad spectrum of domains or able to take into account the impacts on mission-scenario-level resilience from system-specific attributes. In the proposed framework, resilience is meant to describe the ability of a system, in an open range of adverse scenarios, to maintain normal operating conditions or to recover from degraded or failed states in order to provide anticipated functions or services to achieve mission success. The term resilience is introduced in relation with classical terms such as fault, error, failure, fault-tolerance, reliability, and risk. The proposed model-based resilience assessment framework is based on a resilience ontology that enables the use of system models into reliability a...

Research paper thumbnail of Cluster-Based Failure Detection Service for Large-Scale Ad Hoc Wireless Network Applications

Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004

Research paper thumbnail of Highly Survivable Avionics Systems for Long-Term Deep Space Exploration

Forum on Innovative Approaches to Outer Planetary Exploration 2001 2020, 2001

ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) explo... more ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) exploration of space is an essential technology for all current and future missions in the Outer Planets roadmap. Long-term exposure to extreme environmental conditions such as high radiation and low-temperatures make survivability in space a major challenge. Moreover, current and future missions are increasingly using commercial technology such as deep sub-micron (0.25 microns) fabrication processes with specialized circuit designs, commercial interfaces, processors, memory, and other commercial off the shelf components that were not designed for long-term survivability in space. Therefore, the design of highly reliable, and available systems for the exploration of Europa, Pluto and other destinations in deep-space require a comprehensive and fresh approach to this problem. This paper summarizes work in progress in three different areas: a framework for the design of highly reliable and highly available space avionics systems, distributed reliable computing architecture, and Guarded Software Upgrading (GSU) techniques for software upgrading during long-term missions. Additional information is contained in the original extended abstract.

Research paper thumbnail of Fault-tolerant communication channel structures

Research paper thumbnail of Fault-tolerant communication channel structures

Research paper thumbnail of Performability Modeling of Coordinated Software and Hardware Fault Tolerance Ann T. Tai

Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinat... more Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinated software and hardwarefault tolerance we developed earlier [1]. The schemeinvolves 1) a time-based (TB) checkpointing protocol developedby Neves and Fuchs for tolerating hardware faults,and 2) our message-driven confidence-driven (MDCD) protocolfor software error containment and recovery. The twoprotocols coordinate through their checkpointing activities,guaranteeing that when recovering from a...

Research paper thumbnail of Cluster-based failure detection service for large-scale ad hoc wireless network applications

International Conference on Dependable Systems and Networks, 2004, 2004

Research paper thumbnail of Toward Accessibility Enhancement of Dependability Modeling Techniques and Tools

Research paper thumbnail of A user-friendly dependability evaluation tool

Proceedings of the IEEE 1996 National Aerospace and Electronics Conference NAECON 1996

... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the ... more ... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the high-level specification into a representation that can be automatically solved by the underlying modeling en-gines SHARPE or SPNP (both from Trivedi, a Professor at Duke University). ...

Research paper thumbnail of On the development of dependability-evaluation workbench for high-assurance system designers

Proceedings 1997 High-Assurance Engineering Workshop

Research paper thumbnail of Long-Life Deep-Space Applications

IEEE Computer, 1998

ABSTRACT

Research paper thumbnail of Deductive Glue Code Synthesis for Embedded Software Systems Based on Code Patterns

Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'06)

Research paper thumbnail of Viable Techniques for Model Construction and Solution

The Kluwer International Series in Engineering and Computer Science, 1996

In keeping with the modeling framework described in the previous chapter, let us now consider som... more In keeping with the modeling framework described in the previous chapter, let us now consider some specific modeling techniques that are suited to the purpose of software performability evaluation. This is not to say that a particular method, if used exclusively, will suffice; what will generally be required is some appropriate combination of techniques for model specification, construction, and solution. Also, although certain of these methods apply to simulation models as well as analytic models, we choose to confine our attention to the latter. Specifically, the four types of models we consider in this chapter are 1) series-parallel graphs, 2) Markov chains, 3) queueing models, and 4) stochastic Petri nets (SPNs) and stochastic activity networks (SANs).

Research paper thumbnail of Case Study III: Performability Management in Distributed Database Systems

The Kluwer International Series in Engineering and Computer Science, 1996

Among the applications of adaptive methods, those for distributed systems present greater challen... more Among the applications of adaptive methods, those for distributed systems present greater challenges. In particular, capturing system conditions in the context of distributed computing for adaptive decision making is very difficult and costly for at least two reasons. First, obtaining information about remote sites requires additional message passing such that the benefits from adaptive mechanisms could be negated by a performance penalty. Secondly, there is always a latency between the time when the information about remote sites is collected and the time when an adaptive decision is made, which may make the adaptation ineffective since the information (based on which the adaptive decision is made) could become obsolete through the latency.

Research paper thumbnail of Case Study II: Performability-Management Oriented Adaptive Fault Tolerance

The Kluwer International Series in Engineering and Computer Science, 1996

In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, et... more In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, etc., a computer or communication system is required to react adaptively to these changes in order to make the service dependable with respect to what the user demands. Accordingly, during the past several years, increased attention has been given to the notion of dynamic optimization with respect to either fault tolerance or performance. Kim and Lawrence proposed that the choice of operational strategies, such as redundant computing resource allocation, should be made adaptive to the operating modes of a fault-tolerant system [109]. Liu et al. introduced the imprecise computation techniques which reduce the adverse effects of real-time constraint violation by providing the user with an approximate result of acceptable quality [110]. More recently, de Meer and Mauser proposed a performability modeling approach for dynamically reconfigurable systems based on extended Markov reward models [111]. Bondavalli, Stankovic and Strigini developed a framework and a specification notation called FERT, for real-time adaptive, software implemented fault tolerance [112].

Research paper thumbnail of Case Study I: Comparative Studies of Fault-Tolerant Software

The Kluwer International Series in Engineering and Computer Science, 1996

The types of computational redundancy employed by fault-tolerant software typically result in per... more The types of computational redundancy employed by fault-tolerant software typically result in performance penalties, particularly with regard to computation delays. These, in turn, may have an adverse effect on system dependability, e.g., in real-time applications, an increased probability of failing to meet a deadline. More generally, such interactions and tradeoffs between performance and dependability affect the user-perceived benefits of a particular fault tolerance scheme. Hence, consideration of combined performance-dependability, via the concept of performability, appears to be a promising basis for assessing and improving the effectiveness of fault-tolerant software.

Research paper thumbnail of General Concepts and Applications of Performability Modeling

The Kluwer International Series in Engineering and Computer Science, 1996

Generally, an evaluation of performability (relative to a designated measure or set of measures) ... more Generally, an evaluation of performability (relative to a designated measure or set of measures) can be either model-based or conducted experimentally via measurements of an actual system. The review that follows presumes the former, permitting us to be more precise in the statements of key concepts. Except for small changes in some of the notation and terminology, it summarizes the original framework described in [5, 6].

Research paper thumbnail of On-board guarded software upgrading for space missions

Gateway to the New Millennium. 18th Digital Avionics Systems Conference. Proceedings (Cat. No.99CH37033)

Research paper thumbnail of Key applications for high-assurance systems

Research paper thumbnail of IPDS Reviewers

Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchho... more Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchholz Savio Chau Ram Chillarege Giovanni Chiola Gwan Choi Gianfranco Ciardo Michel Cukier Susanna Donatelli Joanne Dugan Ricardo Fricks Robert Geist Reinhard German Pedro Gil Swapna Gokhale Katerina Goseva-Popstojanova Stefan Greiner Gunter Haring Rick Harper Boudewijn Haverkort Armin Heindl Ravishankar Iyer Lizy John Mohamed Kaaniche Karama Kanoun Johan Karlson Kimberly Keeton Peter Kemper P Krishnan ...

Research paper thumbnail of Model-Based Resilience Assessment Framework for Autonomous Systems

Volume 13: Safety Engineering, Risk, and Reliability Analysis

While automation technologies advance faster than ever, gaps of resilience capabilities between a... more While automation technologies advance faster than ever, gaps of resilience capabilities between autonomous and human-operated systems have not yet been identified and addressed appropriately. To date, there exists no generic framework for resilience assessment that is applicable to a broad spectrum of domains or able to take into account the impacts on mission-scenario-level resilience from system-specific attributes. In the proposed framework, resilience is meant to describe the ability of a system, in an open range of adverse scenarios, to maintain normal operating conditions or to recover from degraded or failed states in order to provide anticipated functions or services to achieve mission success. The term resilience is introduced in relation with classical terms such as fault, error, failure, fault-tolerance, reliability, and risk. The proposed model-based resilience assessment framework is based on a resilience ontology that enables the use of system models into reliability a...

Research paper thumbnail of Cluster-Based Failure Detection Service for Large-Scale Ad Hoc Wireless Network Applications

Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004

Research paper thumbnail of Highly Survivable Avionics Systems for Long-Term Deep Space Exploration

Forum on Innovative Approaches to Outer Planetary Exploration 2001 2020, 2001

ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) explo... more ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) exploration of space is an essential technology for all current and future missions in the Outer Planets roadmap. Long-term exposure to extreme environmental conditions such as high radiation and low-temperatures make survivability in space a major challenge. Moreover, current and future missions are increasingly using commercial technology such as deep sub-micron (0.25 microns) fabrication processes with specialized circuit designs, commercial interfaces, processors, memory, and other commercial off the shelf components that were not designed for long-term survivability in space. Therefore, the design of highly reliable, and available systems for the exploration of Europa, Pluto and other destinations in deep-space require a comprehensive and fresh approach to this problem. This paper summarizes work in progress in three different areas: a framework for the design of highly reliable and highly available space avionics systems, distributed reliable computing architecture, and Guarded Software Upgrading (GSU) techniques for software upgrading during long-term missions. Additional information is contained in the original extended abstract.

Research paper thumbnail of Fault-tolerant communication channel structures

Research paper thumbnail of Fault-tolerant communication channel structures

Research paper thumbnail of Performability Modeling of Coordinated Software and Hardware Fault Tolerance Ann T. Tai

Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinat... more Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinated software and hardwarefault tolerance we developed earlier [1]. The schemeinvolves 1) a time-based (TB) checkpointing protocol developedby Neves and Fuchs for tolerating hardware faults,and 2) our message-driven confidence-driven (MDCD) protocolfor software error containment and recovery. The twoprotocols coordinate through their checkpointing activities,guaranteeing that when recovering from a...

Research paper thumbnail of Cluster-based failure detection service for large-scale ad hoc wireless network applications

International Conference on Dependable Systems and Networks, 2004, 2004