Performance and Dependability Validation of Highly Parallel Fault-Tolerant Systems (original) (raw)

Fault-Aware Runtime Strategies for High-Performance Computing

Zhiling Lan

IEEE Transactions on Parallel and Distributed Systems, 2009

View PDFchevron_right

Self-Healing Dilemmas in Distributed Systems: Fault Correction vs. Fault Tolerance

Jovan Nikolic

IEEE Transactions on Network and Service Management, 2021

View PDFchevron_right

Chameleon: a software infrastructure for adaptive fault tolerance

Saurabh Bagchi

IEEE Transactions on Parallel and Distributed Systems, 1999

View PDFchevron_right

Design and assessment of high performance fault-tolerant digital systems

carl elks

8th Computing in Aerospace Conference, 1991

View PDFchevron_right

Probabilistic diagnosis of performance faults in large-scale parallel applications

Saurabh Bagchi

Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12, 2012

View PDFchevron_right

Performance-reliability tradeoff analysis for multithreaded applications

Oguz Tosun

2012 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012

View PDFchevron_right

Understanding Variations for Better Adjusting Parallel Supplemental Redundant Executions to Tolerate Timing Faults

Yukihiro Sasagawa

IEICE Transactions on Information and Systems, 2014

View PDFchevron_right

Verifying Safety of Fault-Tolerant Distributed Components

Ludovic Henrio

Lecture Notes in Computer Science, 2012

View PDFchevron_right

Queueing Analysis of Fault-Tolerant Computer Systems

Victor F Nicola

IEEE Transactions on Software Engineering, 1987

View PDFchevron_right

Breaking the Limits of Redundancy Systems Analysis

Andrey Morozov

Proceedings of the 29th European Safety and Reliability Conference (ESREL)

View PDFchevron_right

Development of massively parallel applications

Thomas Trappenberg

Computer Physics Communications, 1994

View PDFchevron_right

Risk-Sensitive Control for the Parallel Server Model

Anindya Goswami

SIAM Journal on Control and Optimization, 2013

View PDFchevron_right

A framework for dependability engineering of critical computing systems

Mohamed Kaaniche

Safety Science, 2002

View PDFchevron_right

High Performance Dependable Multiprocessor II

Grzegorz Cieslewski

2007 IEEE Aerospace Conference, 2007

View PDFchevron_right

Control of cascading failures using protective measures

Mozhgan Khanjanianpak

Scientific reports, 2024

View PDFchevron_right

Availability and performance aspects for mainframe consolidated servers

Jayant Shekhar

2016

View PDFchevron_right

Enhancing Dependability Through Flexible Adaptation to Changing Requirements

Luis Fernando Carrillo Andrade

Architecting Dependable Systems II, 2004

View PDFchevron_right

Concurrent error detection using watchdog processors-a survey

Aamer Mahmood

IEEE Transactions on Computers, 1988

View PDFchevron_right

Performance evaluation of fault tolerance techniques in grid computing system

Fiaz Khan

Computers & Electrical Engineering, 2010

View PDFchevron_right

Automatic verification of the Inter-consistency fault tolerance mechanism

ALESSANDRO FANTECHI

View PDFchevron_right

FaulTM-multi: Fault Tolerance for Multithreaded Applications Running on Transactional Memory Hardware

Gulay Yalcin

View PDFchevron_right

A framework for fault tolerance in distributed real time systems

Sheheryar Malik

IEEE International Conference on Emerging Technologies (ICET 2005), 2005

View PDFchevron_right

Formal Techniques for Synchronized Fault-Tolerant Systems

Ricky Butler

Dependable Computing and Fault-Tolerant Systems, 1993

View PDFchevron_right

An exception handling software architecture for developing fault-tolerant software

Alessandro Garcia

Proceedings. Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE 2000), 2000

View PDFchevron_right

Numerical Evaluation of Performability and Job Completion Time in Repairable Fault-Tolerant Systems

Victor F Nicola

1990

View PDFchevron_right