Ann Tai - Profile on Academia.edu (original) (raw)

Papers by Ann Tai

Fast turnaround time in dependability e v aluation is crucial for e cient fault-tolerant system d... more Fast turnaround time in dependability e v aluation is crucial for e cient fault-tolerant system design and dependability of the resulting product since timely feedbacks will allow more iterations for design modi cation under the constraints of project schedule. Therefore, it is highly desirable to enable system designers to handle and control dependability modeling processes themselves, instead of turning over the problems to reliability/quality-assurance personnel. Although various dependability e v aluation techniques and tools have b e e n d e v eloped in the last two decades, no adequate attention has been paid regarding how to enable system designers with minimal analytic background to easily employ these techniques and tools. In this paper, we report our experiences on accessibility enhancement for o-the-shelf modeling techniques and tools. In particular, we discuss our approaches to the development of a user-friendly dependability-evaluation workbench which is intended to lead the user to exploit the features and capabilities of the modeling tool SHARPE.

A user-friendly dependability evaluation tool

Proceedings of the IEEE 1996 National Aerospace and Electronics Conference NAECON 1996

... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the ... more ... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the high-level specification into a representation that can be automatically solved by the underlying modeling en-gines SHARPE or SPNP (both from Trivedi, a Professor at Duke University). ...

Long-Life Deep-Space Applications

IEEE Computer, 1998

ABSTRACT

Deductive Glue Code Synthesis for Embedded Software Systems Based on Code Patterns

Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'06)

Viable Techniques for Model Construction and Solution

The Kluwer International Series in Engineering and Computer Science, 1996

In keeping with the modeling framework described in the previous chapter, let us now consider som... more In keeping with the modeling framework described in the previous chapter, let us now consider some specific modeling techniques that are suited to the purpose of software performability evaluation. This is not to say that a particular method, if used exclusively, will suffice; what will generally be required is some appropriate combination of techniques for model specification, construction, and solution. Also, although certain of these methods apply to simulation models as well as analytic models, we choose to confine our attention to the latter. Specifically, the four types of models we consider in this chapter are 1) series-parallel graphs, 2) Markov chains, 3) queueing models, and 4) stochastic Petri nets (SPNs) and stochastic activity networks (SANs).

Case Study III: Performability Management in Distributed Database Systems

The Kluwer International Series in Engineering and Computer Science, 1996

Among the applications of adaptive methods, those for distributed systems present greater challen... more Among the applications of adaptive methods, those for distributed systems present greater challenges. In particular, capturing system conditions in the context of distributed computing for adaptive decision making is very difficult and costly for at least two reasons. First, obtaining information about remote sites requires additional message passing such that the benefits from adaptive mechanisms could be negated by a performance penalty. Secondly, there is always a latency between the time when the information about remote sites is collected and the time when an adaptive decision is made, which may make the adaptation ineffective since the information (based on which the adaptive decision is made) could become obsolete through the latency.

Case Study II: Performability-Management Oriented Adaptive Fault Tolerance

The Kluwer International Series in Engineering and Computer Science, 1996

In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, et... more In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, etc., a computer or communication system is required to react adaptively to these changes in order to make the service dependable with respect to what the user demands. Accordingly, during the past several years, increased attention has been given to the notion of dynamic optimization with respect to either fault tolerance or performance. Kim and Lawrence proposed that the choice of operational strategies, such as redundant computing resource allocation, should be made adaptive to the operating modes of a fault-tolerant system [109]. Liu et al. introduced the imprecise computation techniques which reduce the adverse effects of real-time constraint violation by providing the user with an approximate result of acceptable quality [110]. More recently, de Meer and Mauser proposed a performability modeling approach for dynamically reconfigurable systems based on extended Markov reward models [111]. Bondavalli, Stankovic and Strigini developed a framework and a specification notation called FERT, for real-time adaptive, software implemented fault tolerance [112].

Case Study I: Comparative Studies of Fault-Tolerant Software

The Kluwer International Series in Engineering and Computer Science, 1996

The types of computational redundancy employed by fault-tolerant software typically result in per... more The types of computational redundancy employed by fault-tolerant software typically result in performance penalties, particularly with regard to computation delays. These, in turn, may have an adverse effect on system dependability, e.g., in real-time applications, an increased probability of failing to meet a deadline. More generally, such interactions and tradeoffs between performance and dependability affect the user-perceived benefits of a particular fault tolerance scheme. Hence, consideration of combined performance-dependability, via the concept of performability, appears to be a promising basis for assessing and improving the effectiveness of fault-tolerant software.

General Concepts and Applications of Performability Modeling

The Kluwer International Series in Engineering and Computer Science, 1996

Generally, an evaluation of performability (relative to a designated measure or set of measures) ... more Generally, an evaluation of performability (relative to a designated measure or set of measures) can be either model-based or conducted experimentally via measurements of an actual system. The review that follows presumes the former, permitting us to be more precise in the statements of key concepts. Except for small changes in some of the notation and terminology, it summarizes the original framework described in [5, 6].

On-board guarded software upgrading for space missions

Gateway to the New Millennium. 18th Digital Avionics Systems Conference. Proceedings (Cat. No.99CH37033)

Computer, 1998

he six summaries we include here represent a cross-section of projects and domains with some stri... more he six summaries we include here represent a cross-section of projects and domains with some strikingly similar challenges. Although they address diverse domains, these reports reveal a set of common criteria for high assurance. Generally speaking, the critical criteria are reliability, availability, safety, timeliness, security, and evolvability. These reports also identify areas for study: • How to determine and verify the requirements of high-assurance systems (how to map the disorganized problem space into system requirements).

IPDS Reviewers

Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchho... more Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchholz Savio Chau Ram Chillarege Giovanni Chiola Gwan Choi Gianfranco Ciardo Michel Cukier Susanna Donatelli Joanne Dugan Ricardo Fricks Robert Geist Reinhard German Pedro Gil Swapna Gokhale Katerina Goseva-Popstojanova Stefan Greiner Gunter Haring Rick Harper Boudewijn Haverkort Armin Heindl Ravishankar Iyer Lizy John Mohamed Kaaniche Karama Kanoun Johan Karlson Kimberly Keeton Peter Kemper P Krishnan ...

Model-Based Resilience Assessment Framework for Autonomous Systems

Volume 13: Safety Engineering, Risk, and Reliability Analysis

While automation technologies advance faster than ever, gaps of resilience capabilities between a... more While automation technologies advance faster than ever, gaps of resilience capabilities between autonomous and human-operated systems have not yet been identified and addressed appropriately. To date, there exists no generic framework for resilience assessment that is applicable to a broad spectrum of domains or able to take into account the impacts on mission-scenario-level resilience from system-specific attributes. In the proposed framework, resilience is meant to describe the ability of a system, in an open range of adverse scenarios, to maintain normal operating conditions or to recover from degraded or failed states in order to provide anticipated functions or services to achieve mission success. The term resilience is introduced in relation with classical terms such as fault, error, failure, fault-tolerance, reliability, and risk. The proposed model-based resilience assessment framework is based on a resilience ontology that enables the use of system models into reliability a...

Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004

The growing interest in ad hoc wireless network applications that are made of large and dense pop... more The growing interest in ad hoc wireless network applications that are made of large and dense populations of lightweight system resources calls for scalable approaches to fault tolerance. Moreover, the nature of these systems creates significant challenges for the development of failure detection services (FDSs), because their quality often depends heavily on reliable communication. In particular, ad hoc wireless networks are notoriously vulnerable to message loss, which precludes deterministic guarantees for the completeness and accuracy properties of FDSs. To meet the challenges, we propose an FDS based on the notion of clustering. Specifically, we use a cluster-based communication architecture to permit the FDS to be implemented in a distributed manner via intra-cluster heartbeat diffusion and to allow a failure report to be forwarded across clusters through the upper layer of the communication hierarchy. In doing so, we extensively exploit the message redundancy that is inherent in ad hoc wireless settings to mitigate the effects of message loss on the accuracy and completeness properties of failure detection. As shown by our mathematical analysis, the resulting FDS is able to provide satisfactory probabilistic guarantees for the desired properties.

Highly Survivable Avionics Systems for Long-Term Deep Space Exploration

Forum on Innovative Approaches to Outer Planetary Exploration 2001 2020, 2001

ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) explo... more ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) exploration of space is an essential technology for all current and future missions in the Outer Planets roadmap. Long-term exposure to extreme environmental conditions such as high radiation and low-temperatures make survivability in space a major challenge. Moreover, current and future missions are increasingly using commercial technology such as deep sub-micron (0.25 microns) fabrication processes with specialized circuit designs, commercial interfaces, processors, memory, and other commercial off the shelf components that were not designed for long-term survivability in space. Therefore, the design of highly reliable, and available systems for the exploration of Europa, Pluto and other destinations in deep-space require a comprehensive and fresh approach to this problem. This paper summarizes work in progress in three different areas: a framework for the design of highly reliable and highly available space avionics systems, distributed reliable computing architecture, and Guarded Software Upgrading (GSU) techniques for software upgrading during long-term missions. Additional information is contained in the original extended abstract.

Fault-tolerant communication channel structures

Performability Modeling of Coordinated Software and Hardware Fault Tolerance Ann T. Tai

Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinat... more Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinated software and hardwarefault tolerance we developed earlier [1]. The schemeinvolves 1) a time-based (TB) checkpointing protocol developedby Neves and Fuchs for tolerating hardware faults,and 2) our message-driven confidence-driven (MDCD) protocolfor software error containment and recovery. The twoprotocols coordinate through their checkpointing activities,guaranteeing that when recovering from a...

International Conference on Dependable Systems and Networks, 2004, 2004

Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings., 2004