Ann Tai - Academia.edu (original) (raw)
Papers by Ann Tai
Proceedings of the IEEE 1996 National Aerospace and Electronics Conference NAECON 1996
... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the ... more ... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the high-level specification into a representation that can be automatically solved by the underlying modeling en-gines SHARPE or SPNP (both from Trivedi, a Professor at Duke University). ...
Proceedings 1997 High-Assurance Engineering Workshop
IEEE Computer, 1998
ABSTRACT
Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'06)
The Kluwer International Series in Engineering and Computer Science, 1996
In keeping with the modeling framework described in the previous chapter, let us now consider som... more In keeping with the modeling framework described in the previous chapter, let us now consider some specific modeling techniques that are suited to the purpose of software performability evaluation. This is not to say that a particular method, if used exclusively, will suffice; what will generally be required is some appropriate combination of techniques for model specification, construction, and solution. Also, although certain of these methods apply to simulation models as well as analytic models, we choose to confine our attention to the latter. Specifically, the four types of models we consider in this chapter are 1) series-parallel graphs, 2) Markov chains, 3) queueing models, and 4) stochastic Petri nets (SPNs) and stochastic activity networks (SANs).
The Kluwer International Series in Engineering and Computer Science, 1996
Among the applications of adaptive methods, those for distributed systems present greater challen... more Among the applications of adaptive methods, those for distributed systems present greater challenges. In particular, capturing system conditions in the context of distributed computing for adaptive decision making is very difficult and costly for at least two reasons. First, obtaining information about remote sites requires additional message passing such that the benefits from adaptive mechanisms could be negated by a performance penalty. Secondly, there is always a latency between the time when the information about remote sites is collected and the time when an adaptive decision is made, which may make the adaptation ineffective since the information (based on which the adaptive decision is made) could become obsolete through the latency.
The Kluwer International Series in Engineering and Computer Science, 1996
In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, et... more In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, etc., a computer or communication system is required to react adaptively to these changes in order to make the service dependable with respect to what the user demands. Accordingly, during the past several years, increased attention has been given to the notion of dynamic optimization with respect to either fault tolerance or performance. Kim and Lawrence proposed that the choice of operational strategies, such as redundant computing resource allocation, should be made adaptive to the operating modes of a fault-tolerant system [109]. Liu et al. introduced the imprecise computation techniques which reduce the adverse effects of real-time constraint violation by providing the user with an approximate result of acceptable quality [110]. More recently, de Meer and Mauser proposed a performability modeling approach for dynamically reconfigurable systems based on extended Markov reward models [111]. Bondavalli, Stankovic and Strigini developed a framework and a specification notation called FERT, for real-time adaptive, software implemented fault tolerance [112].
The Kluwer International Series in Engineering and Computer Science, 1996
The types of computational redundancy employed by fault-tolerant software typically result in per... more The types of computational redundancy employed by fault-tolerant software typically result in performance penalties, particularly with regard to computation delays. These, in turn, may have an adverse effect on system dependability, e.g., in real-time applications, an increased probability of failing to meet a deadline. More generally, such interactions and tradeoffs between performance and dependability affect the user-perceived benefits of a particular fault tolerance scheme. Hence, consideration of combined performance-dependability, via the concept of performability, appears to be a promising basis for assessing and improving the effectiveness of fault-tolerant software.
The Kluwer International Series in Engineering and Computer Science, 1996
Generally, an evaluation of performability (relative to a designated measure or set of measures) ... more Generally, an evaluation of performability (relative to a designated measure or set of measures) can be either model-based or conducted experimentally via measurements of an actual system. The review that follows presumes the former, permitting us to be more precise in the statements of key concepts. Except for small changes in some of the notation and terminology, it summarizes the original framework described in [5, 6].
Gateway to the New Millennium. 18th Digital Avionics Systems Conference. Proceedings (Cat. No.99CH37033)
Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchho... more Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchholz Savio Chau Ram Chillarege Giovanni Chiola Gwan Choi Gianfranco Ciardo Michel Cukier Susanna Donatelli Joanne Dugan Ricardo Fricks Robert Geist Reinhard German Pedro Gil Swapna Gokhale Katerina Goseva-Popstojanova Stefan Greiner Gunter Haring Rick Harper Boudewijn Haverkort Armin Heindl Ravishankar Iyer Lizy John Mohamed Kaaniche Karama Kanoun Johan Karlson Kimberly Keeton Peter Kemper P Krishnan ...
Volume 13: Safety Engineering, Risk, and Reliability Analysis
While automation technologies advance faster than ever, gaps of resilience capabilities between a... more While automation technologies advance faster than ever, gaps of resilience capabilities between autonomous and human-operated systems have not yet been identified and addressed appropriately. To date, there exists no generic framework for resilience assessment that is applicable to a broad spectrum of domains or able to take into account the impacts on mission-scenario-level resilience from system-specific attributes. In the proposed framework, resilience is meant to describe the ability of a system, in an open range of adverse scenarios, to maintain normal operating conditions or to recover from degraded or failed states in order to provide anticipated functions or services to achieve mission success. The term resilience is introduced in relation with classical terms such as fault, error, failure, fault-tolerance, reliability, and risk. The proposed model-based resilience assessment framework is based on a resilience ontology that enables the use of system models into reliability a...
Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004
Forum on Innovative Approaches to Outer Planetary Exploration 2001 2020, 2001
ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) explo... more ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) exploration of space is an essential technology for all current and future missions in the Outer Planets roadmap. Long-term exposure to extreme environmental conditions such as high radiation and low-temperatures make survivability in space a major challenge. Moreover, current and future missions are increasingly using commercial technology such as deep sub-micron (0.25 microns) fabrication processes with specialized circuit designs, commercial interfaces, processors, memory, and other commercial off the shelf components that were not designed for long-term survivability in space. Therefore, the design of highly reliable, and available systems for the exploration of Europa, Pluto and other destinations in deep-space require a comprehensive and fresh approach to this problem. This paper summarizes work in progress in three different areas: a framework for the design of highly reliable and highly available space avionics systems, distributed reliable computing architecture, and Guarded Software Upgrading (GSU) techniques for software upgrading during long-term missions. Additional information is contained in the original extended abstract.
Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinat... more Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinated software and hardwarefault tolerance we developed earlier [1]. The schemeinvolves 1) a time-based (TB) checkpointing protocol developedby Neves and Fuchs for tolerating hardware faults,and 2) our message-driven confidence-driven (MDCD) protocolfor software error containment and recovery. The twoprotocols coordinate through their checkpointing activities,guaranteeing that when recovering from a...
International Conference on Dependable Systems and Networks, 2004, 2004
Proceedings of the IEEE 1996 National Aerospace and Electronics Conference NAECON 1996
... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the ... more ... Herbert Hecht Ann T. Tai Kishor S. Trivedi Andrew J. Chruscicki ... SDDS then translates the high-level specification into a representation that can be automatically solved by the underlying modeling en-gines SHARPE or SPNP (both from Trivedi, a Professor at Duke University). ...
Proceedings 1997 High-Assurance Engineering Workshop
IEEE Computer, 1998
ABSTRACT
Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'06)
The Kluwer International Series in Engineering and Computer Science, 1996
In keeping with the modeling framework described in the previous chapter, let us now consider som... more In keeping with the modeling framework described in the previous chapter, let us now consider some specific modeling techniques that are suited to the purpose of software performability evaluation. This is not to say that a particular method, if used exclusively, will suffice; what will generally be required is some appropriate combination of techniques for model specification, construction, and solution. Also, although certain of these methods apply to simulation models as well as analytic models, we choose to confine our attention to the latter. Specifically, the four types of models we consider in this chapter are 1) series-parallel graphs, 2) Markov chains, 3) queueing models, and 4) stochastic Petri nets (SPNs) and stochastic activity networks (SANs).
The Kluwer International Series in Engineering and Computer Science, 1996
Among the applications of adaptive methods, those for distributed systems present greater challen... more Among the applications of adaptive methods, those for distributed systems present greater challenges. In particular, capturing system conditions in the context of distributed computing for adaptive decision making is very difficult and costly for at least two reasons. First, obtaining information about remote sites requires additional message passing such that the benefits from adaptive mechanisms could be negated by a performance penalty. Secondly, there is always a latency between the time when the information about remote sites is collected and the time when an adaptive decision is made, which may make the adaptation ineffective since the information (based on which the adaptive decision is made) could become obsolete through the latency.
The Kluwer International Series in Engineering and Computer Science, 1996
In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, et... more In the presence of faults, timing anomalies, workload fluctuations, synchronization conflicts, etc., a computer or communication system is required to react adaptively to these changes in order to make the service dependable with respect to what the user demands. Accordingly, during the past several years, increased attention has been given to the notion of dynamic optimization with respect to either fault tolerance or performance. Kim and Lawrence proposed that the choice of operational strategies, such as redundant computing resource allocation, should be made adaptive to the operating modes of a fault-tolerant system [109]. Liu et al. introduced the imprecise computation techniques which reduce the adverse effects of real-time constraint violation by providing the user with an approximate result of acceptable quality [110]. More recently, de Meer and Mauser proposed a performability modeling approach for dynamically reconfigurable systems based on extended Markov reward models [111]. Bondavalli, Stankovic and Strigini developed a framework and a specification notation called FERT, for real-time adaptive, software implemented fault tolerance [112].
The Kluwer International Series in Engineering and Computer Science, 1996
The types of computational redundancy employed by fault-tolerant software typically result in per... more The types of computational redundancy employed by fault-tolerant software typically result in performance penalties, particularly with regard to computation delays. These, in turn, may have an adverse effect on system dependability, e.g., in real-time applications, an increased probability of failing to meet a deadline. More generally, such interactions and tradeoffs between performance and dependability affect the user-perceived benefits of a particular fault tolerance scheme. Hence, consideration of combined performance-dependability, via the concept of performability, appears to be a promising basis for assessing and improving the effectiveness of fault-tolerant software.
The Kluwer International Series in Engineering and Computer Science, 1996
Generally, an evaluation of performability (relative to a designated measure or set of measures) ... more Generally, an evaluation of performability (relative to a designated measure or set of measures) can be either model-based or conducted experimentally via measurements of an actual system. The review that follows presumes the former, permitting us to be more precise in the statements of key concepts. Except for small changes in some of the notation and terminology, it summarizes the original framework described in [5, 6].
Gateway to the New Millennium. 18th Digital Avionics Systems Conference. Proceedings (Cat. No.99CH37033)
Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchho... more Page 1. xxvii IPDS Reviewers Saurabh Bagchi Paul Barford Khalid Begain Andrea Bobbio Peter Buchholz Savio Chau Ram Chillarege Giovanni Chiola Gwan Choi Gianfranco Ciardo Michel Cukier Susanna Donatelli Joanne Dugan Ricardo Fricks Robert Geist Reinhard German Pedro Gil Swapna Gokhale Katerina Goseva-Popstojanova Stefan Greiner Gunter Haring Rick Harper Boudewijn Haverkort Armin Heindl Ravishankar Iyer Lizy John Mohamed Kaaniche Karama Kanoun Johan Karlson Kimberly Keeton Peter Kemper P Krishnan ...
Volume 13: Safety Engineering, Risk, and Reliability Analysis
While automation technologies advance faster than ever, gaps of resilience capabilities between a... more While automation technologies advance faster than ever, gaps of resilience capabilities between autonomous and human-operated systems have not yet been identified and addressed appropriately. To date, there exists no generic framework for resilience assessment that is applicable to a broad spectrum of domains or able to take into account the impacts on mission-scenario-level resilience from system-specific attributes. In the proposed framework, resilience is meant to describe the ability of a system, in an open range of adverse scenarios, to maintain normal operating conditions or to recover from degraded or failed states in order to provide anticipated functions or services to achieve mission success. The term resilience is introduced in relation with classical terms such as fault, error, failure, fault-tolerance, reliability, and risk. The proposed model-based resilience assessment framework is based on a resilience ontology that enables the use of system models into reliability a...
Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004
Forum on Innovative Approaches to Outer Planetary Exploration 2001 2020, 2001
ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) explo... more ABSTRACT The design of highly survivable avionics systems for long-term (> 10 years) exploration of space is an essential technology for all current and future missions in the Outer Planets roadmap. Long-term exposure to extreme environmental conditions such as high radiation and low-temperatures make survivability in space a major challenge. Moreover, current and future missions are increasingly using commercial technology such as deep sub-micron (0.25 microns) fabrication processes with specialized circuit designs, commercial interfaces, processors, memory, and other commercial off the shelf components that were not designed for long-term survivability in space. Therefore, the design of highly reliable, and available systems for the exploration of Europa, Pluto and other destinations in deep-space require a comprehensive and fresh approach to this problem. This paper summarizes work in progress in three different areas: a framework for the design of highly reliable and highly available space avionics systems, distributed reliable computing architecture, and Guarded Software Upgrading (GSU) techniques for software upgrading during long-term missions. Additional information is contained in the original extended abstract.
Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinat... more Abstract: this paper, we tackle the problem of performabilityevaluation for a scheme of coordinated software and hardwarefault tolerance we developed earlier [1]. The schemeinvolves 1) a time-based (TB) checkpointing protocol developedby Neves and Fuchs for tolerating hardware faults,and 2) our message-driven confidence-driven (MDCD) protocolfor software error containment and recovery. The twoprotocols coordinate through their checkpointing activities,guaranteeing that when recovering from a...
International Conference on Dependable Systems and Networks, 2004, 2004