On the Efficiency of Durable State Machine Replication (original) (raw)
Related papers
Scalable and Decoupled Logging for State Machine Replication
2020
State Machine Replication (SMR) is a widely used approach for fault tolerance of important services. Support for SMR implementations on shared infrastructures has emerged, allowing wider adoption. However, there are still non-trivial aspects that developers have to handle to build and deploy their dependable services. In this paper we tackle the need for recovery to keep faulttolerance levels, and propose an approach to: (i) simplify the development of logging; (ii) improve resource sharing in shared infrastructures; (iii) alleviate costs with replication in pay-per-use infrastructures. The central idea is to decouple service execution from logging and offer logging functionality as a service attachable to SMR deployments. Beyond the added simplicity to deploy an SMR, we show that this approach does not penalize performance of replicated services, and that a logging service can scale to look to several applications.
Rethinking State-Machine Replication for Parallelism
2014 IEEE 34th International Conference on Distributed Computing Systems, 2014
State-machine replication, a fundamental approach to designing fault-tolerant services, requires commands to be executed in the same order by all replicas. Moreover, command execution must be deterministic: each replica must produce the same output upon executing the same sequence of commands. These requirements usually result in single-threaded replicas, which hinders service performance. This paper introduces Parallel State-Machine Replication (P-SMR), a new approach to parallelism in state-machine replication. P-SMR scales better than previous proposals since no component plays a centralizing role in the execution of independent commands-those that can be executed concurrently, as defined by the service. The paper introduces P-SMR, describes a "commodified architecture" to implement it, and compares its performance to other proposals using a key-value store and a networked file system.
Boosting State Machine Replication with Concurrent Execution
2018 Eighth Latin-American Symposium on Dependable Computing (LADC), 2018
State machine replication is a fundamental technique to render services fault tolerant. One of the key assumptions of state machine replication is that replicas must execute operations deterministically. Deterministic execution often translates into sequential execution of requests at replicas. With the increasing demand for dependable services and widespread use of multi-core servers, several proposals for enabling concurrent execution in state machine replication have appeared in the literature. Invariably, these techniques exploit the fact that independent operations, those that do not share any common state or do not update shared state, can execute concurrently. Existing protocols differ in several important ways. In this paper, we survey this field of research and discuss the main aspects of the different protocols. Central aspects include conflict detection, representation and enforcing; tradeoffs involving existing architectures and level of allowed parallelism; workload-driven adaptation schemes; and implications of parallel state machine replication to recovery. Moreover, we discuss ongoing and future work directions for high-throughput state machine replication.
Checkpointing in Parallel State-Machine Replication
Lecture Notes in Computer Science, 2014
State-machine replication is a popular approach to building fault-tolerant systems, which relies on the sequential execution of commands to guarantee strong consistency. Sequential execution, however, threatens performance. Recently, several proposals have suggested parallelizing the execution model of the replicas to enhance state-machine replication's performance. Despite their success in accomplishing high performance, the implications of these models on checkpointing and recovery is mostly left unaddressed. In this paper, we focus on the checkpointing problem in the context of Parallel State-Machine Replication. We propose two novel algorithms and assess them through simulation and a real implementation.
Resilient state machine replication
2005
Abstract Nowadays, one of the major concerns about the services provided over the Internet is related to their availability. Replication is a well known way to increase the availability of a service. However, replication has some associated costs, namely it is necessary to guarantee a correct coordination among the replicas. Moreover, being the Internet such an unpredictable and insecure environment, coordination correctness should be tolerant to Byzantine faults and immune to timing failures.
Distributed Application Checkpointing for Replicated State Machines
Scalable Computing: Practice and Experience, 2021
Application checkpointing is a widely used recovery mechanism that consists of saving an application's state periodically to be used in case of a failure. In this study we investigate the utilisation of distributed checkpointing for replicated state machines. Conventionally, for replicated state machines, checkpointing information is stored in a replicated way in each of the replicas or separately in a single instance. Applying distributed checkpointing provides a means to adjust the level of fault tolerance of the checkpointing approach by giving away from recovery time. We use a local cluster and cloud environment to examine the effects of distributed checkpointing in a simple state machine example and compare the results with conventional approaches. As expected, distributed checkpointing gains from memory consumption and utilise different levels of fault tolerance while performing worse in terms of recovery time.
Efficient checkpointing mechanisms for primary-backup replication on the cloud
Concurrency and Computation: Practice and Experience, 2018
Several distributed services ranging from key-value stores to cloud storage require fault-tolerance and reliability features. For enabling fast recovery and seamless transition, primary-backup replication protocols are widely used in different application settings including distributed databases, web services, and the Internet of Things. In this study, we elaborate the ways of enhancing the efficiency of the primary-backup replication protocol by introducing various checkpointing techniques. We develop a geographically replicated key-value store based on the RocksDB and use the PlanetLab testbed network for large-scale performance analysis. Using various metrics of interest including blocking time, checkpointing time, checkpoint size, failover time, and throughput and testing with practical workloads via the YCSB tool, our findings indicate that periodic-incremental checkpointing promises up to 5 times decrease in blocking time and a drastic improvement on the overall throughput compared to the traditional primary-backup replication. Furthermore, enabling Snappy compression algorithm on the periodic-incremental checkpointing leads to further reduction in blocking time and increases system throughput compared to the traditional primary-backup replication. KEYWORDS checkpointing, compressed checkpointing, incremental checkpointing, periodic checkpointing, primary-backup replication, replicated cloud key-value stores 1 INTRODUCTION As the cloud systems continue to enlarge, the underlying networks empowering them also maintain their steady growth to stay sustainable against challenges involving immense user population and the big data. This growth is observed in two aspects as the geographical scaling of the nodes and the increase in the node counts. The availability becomes more and more significant as any outages that could last milliseconds of increase in response times may result in high income losses. 1 Moreover, the possibility of facing with failures in these systems is inevitable due to extensive usage of software and hardware components with the long running applications that exceed the mean time between failures of the components. 2 The most important and effective approach to deal with crash failures is replication. It is widely used as a fault-tolerance mechanism, and finding optimal replication protocols is an active research area. There exist two main types of replication protocols, namely, active and passive. In the active replication, which is also known as state-machine replication, every incoming request is processed by every replica in the system resulting in multiple results to be collected. Once collected, they are reduced into a single result value using various algorithms and the client is notified accordingly. In the passive replication, which is also known as primary-backup replication, there exist a single primary replica and a group of backup replicas. Each request is executed only in the primary replica, the result is then copied to backup replicas and the client is notified. Another way of introducing recovery from failures is through the checkpointing that refers to saving the system state to a stable storage after critical executions. Afterwards, in the event of any failures during the execution, the previously saved checkpoint can be restored as a failure-free system state enabling the execution continue over. This approach also facilitates a quick rollback feature even against unforeseen failures and decreases the workload needed to revitalize a replica from zero state, since with a single rollback, the system state would be caught up with the latest failure-free state. 3 In our recent work, we demonstrated applicability and benefits of various checkpointing algorithms in replication protocols. 4,5
Fast recovery in parallel state machine replication
2016
A well-established technique used to design fault-tolerant systems is state machine replication. In part, this is explained by the simplicity of the approach and its strong consistency guarantees. The traditional state machine replication model builds on the sequential execution of requests to ensure consistency among the replicas. Sequentiality of execution, however, threatens the scalability of replicas. Recently, some proposals have suggested parallelizing the execution of replicas to achieve higher performance. Despite the success of parallel state machine replication in accomplishing high performance, the implication of such models on the recovery is mostly left unaddressed. Even for the traditional state machine replication approach, relatively few studies have considered the issues involved in recovering faulty replicas. The motivation of this thesis is clarifying the challenges and performance implications involved in checkpointing and recovery for parallel state machine rep...
Efficient and Deterministic Scheduling for Parallel State Machine Replication
2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2017
Many services used in large scale web applications should be able to tolerate faults without impacting their performance. State machine replication is a well-known approach to implementing fault-tolerant services, providing high availability and strong consistency. To boost the performance of state machine replication, recent proposals have introduced parallel execution of commands. In parallel state machine replication, incoming commands may or may not depend on other commands that are waiting for execution. Although dependent commands must be processed in the same relative order at every replica to avoid inconsistencies, independent commands can be executed in parallel and benefit from multi-core architectures. Since many application workloads are mostly composed of independent commands, these parallel models promise high throughput without sacrificing strong consistency. The efficient execution of commands in such environments, however, requires effective scheduling strategies. Existing approaches rely on dependency tracking based on pairwise comparison between commands, which introduces scheduling contention. In this paper, we propose a new and highly efficient scheduler for parallel state machine replication. Our scheduler considers batches of commands, instead of commands individually. Moreover, each batch of commands is augmented with a compact data structure that encodes commands information needed to the dependency analysis. We show, by means of experimental evaluation, that our technique outperforms schedulers for parallel state machine replication by a fairly large margin.
HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers
The Scientific World Journal, 2015
Paxos is a prominent theory of state-machine replication. Recent data intensive systems that implement state-machine replication generally require high throughput. Earlier versions of Paxos as few of them are classical Paxos, fast Paxos, and generalized Paxos have a major focus on fault tolerance and latency but lacking in terms of throughput and scalability. A major reason for this is the heavyweight leader. Through offloading the leader, we can further increase throughput of the system. Ring Paxos, Multiring Paxos, and S-Paxos are few prominent attempts in this direction for clustered data centers. In this paper, we are proposing HT-Paxos, a variant of Paxos that is the best suitable for any large clustered data center. HT-Paxos further offloads the leader very significantly and hence increases the throughput and scalability of the system, while at the same time, among high throughput state-machine replication protocols, it provides reasonably low latency and response time.