Practical Byzantine Fault Tolerance (original) (raw)

Byzantine fault tolerance can be fast

Proceedings International Conference on Dependable Systems and Networks

Byzantine fault tolerance is important because it can be used to implement highly-available systems that tolerate arbitrary behaviorfrom faulty components. This paper presents a detailed performance evaluation of BFT, a state-machine replication algorithm that tolerates Byzantine faults in asynchronous systems. Our results contradict the common belief that Byzantine fault tolerance is too slow to be used in practice-BFT performs well so that it can be used to iniplenient real systems. We implemented a replicated NFS file system using BFT that performs 2% faster to 24% slower than production implementations of the NFS protocol that are not fault-tolerant.

Practical byzantine fault tolerance and proactive recovery

ACM Transactions on Computer Systems, 2002

Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2% faster to 24% slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the BFT library can be used to build practical systems that tolerate Byzantine faults.

Efficient Byzantine fault tolerance

2011

Abstract We present two asynchronous Byzantine fault-tolerant state machine replication (BFT) algorithms, which improve previous algorithms in terms of several metrics. First, they require only 2f+ 1 replicas, instead of the usual 3f+ 1. Second, the trusted service in which this reduction of replicas is based is quite simple, making a verified implementation straightforward (and even feasible using commercial trusted hardware).

Steward: Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks

IEEE Transactions on Dependable and Secure Computing, 2000

This paper presents the first hierarchical Byzantine fault-tolerant replication architecture suitable to systems that span multiple wide area sites. The architecture confines the effects of any malicious replica to its local site, reduces message complexity of wide area communication, and allows read-only queries to be performed locally within a site for the price of additional standard hardware. We present proofs that our algorithm provides safety and liveness properties. A prototype implementation is evaluated over several network topologies and is compared with a flat Byzantine fault-tolerant approach. The experimental results show considerable improvement over flat Byzantine replication algorithms, bringing the performance of Byzantine replication closer to existing benign fault-tolerant replication techniques over wide area networks.

Byzantine Fault Tolerance as a Service

Communications in Computer and Information Science, 2012

In this paper, we argue for the need and benefits for providing Byzantine fault tolerance as a service to mission critical Web applications. In this new approach to Byzantine fault tolerance, an application server can partition the incoming requests into different domains for concurrent processing, decide which set of messages that should be totally ordered, or not at all, based its application semantics. This flexibility would reduce the end-to-end latency experienced by the clients and significantly increase the system throughput. Perhaps most importantly, we propose a middleware framework that provides a uniform interface to the applications so that they are not strongly tied to any particular Byzantine fault tolerance algorithm implementation.

Minimal Byzantine fault tolerance: Algorithm and evaluation

2009

This paper presents two asynchronous Byzantine faulttolerant state machine replication (BFT) algorithms that are minimal in several senses. First, they require only 2 f + 1 replicas, instead of the usual 3 f + 1. Second, the trusted service in which this reduction of replicas is based is arguably minimal, so it is simple to verify and implement (which is possible even using commercial trusted hardware). Third, in nice executions the two algorithms run in the minimum number of communication steps for nonspeculative and speculative algorithms, respectively 4 and 3 steps. Besides the obvious benefits in terms of cost, resilience and management complexity of having less replicas to tolerate a certain number of faults, our algorithms are simpler than previous ones (being closer to crash faulttolerant replication algorithms). The performance evaluation shows that, even with the trusted component access overhead, they can have better throughput than Castro and Liskov's PBFT, and better latency in networks with nonnegligible communication delays.

Steward: Scaling byzantine fault-tolerant systems to wide area networks

2005

This paper presents the first hierarchical Byzantine tolerant replication architecture suitable to systems that span multiple wide area sites. The architecture confines the effects of any malicious replica to its local site, reduces message complexity of wide area communication, and allows read-only queries to be performed locally within a site for the price of additional hardware. A prototype implementation is evaluated over several network topologies and is compared with a flat Byzantine tolerant approach.

Zeno: Eventually Consistent Byzantine-Fault Tolerance

2009

Many distributed services are hosted at large, shared, geographically diverse data centers, and they use replication to achieve high availability despite the unreachability of an entire data center. Recent events show that non-crash faults occur in these services and may lead to long outages. While Byzantine-Fault Tolerance (BFT) could be used to withstand these faults, current BFT protocols can become unavailable if a small fraction of their replicas are unreachable. This is because existing BFT protocols favor strong safety guarantees (consistency) over liveness (availability).

Minimal Byzantine Fault Tolerance: Algorithms and Evaluation

2009

This paper presents two asynchronous,Byzantine fault- tolerant state machine replication (BFT) algorithms that are minimal in several senses. First, they require only 2f +1 replicas, instead of the usual 3f +1. Second, the trusted service in which this reduction of replicas is based is ar- guably minimal, so it is simple to verify and implement (which is possible even