EngagedScholarship@CSU Low Latency Fault Tolerance System Low Latency Fault Tolerance System (original) (raw)

The low latency fault tolerance (LLFT) system provides fault tolerance for distributed applications within a local-area network, using a leader-follower replication strategy. LLFT provides application-transparent replication, with strong replica consistency, for applications that involve multiple interacting processes or threads. Its novel system model enables LLFT to maintain a single consistent infinite computation, despite faults and asynchronous communication. The LLFT messaging protocol provides reliable, totally ordered message delivery by employing a group multicast, where the message ordering is determined by the primary replica in the destination group. The leader-determined membership protocol provides reconfiguration and recovery when a replica becomes faulty and when a replica joins or leaves a group, where the membership of the group is determined by the primary replica. The virtual determinizer framework captures the ordering information at the primary replica and enfo...