Vijay Garg - Academia.edu (original) (raw)

Papers by Vijay Garg

Research paper thumbnail of Applying Predicate Detection to the Constrained Optimization Problems

arXiv (Cornell University), Dec 26, 2018

We present a method to design parallel algorithms for constrained combinatorial optimization prob... more We present a method to design parallel algorithms for constrained combinatorial optimization problems. Our method solves and generalizes many classical combinatorial optimization problems including the stable marriage problem, the shortest path problem and the market clearing price problem. These three problems are solved in the literature using Gale-Shapley algorithm, Dijkstra's algorithm, and Demange, Gale, Sotomayor algorithm. Our method solves all these problems by casting them as searching for an element that satisfies an appropriate predicate in a distributive lattice. Moreover, it solves generalizations of all these problems-namely finding the optimal solution satisfying additional constraints called latticelinear predicates. For stable marriage problems, an example of such a constraint is that Peter's regret is less than that of Paul. For shortest path problems, an example of such a constraint is that cost of reaching vertex v1 is at least the cost of reaching vertex v2. For the market clearing price problem, an example of such a constraint is that item1 is priced at least as much as item2. In addition to finding the optimal solution, our method is useful in enumerating all constrained stable matchings, and all constrained market clearing price vectors.

Research paper thumbnail of Byzantine Vector Consensus in Complete Graphs

arXiv (Cornell University), Feb 11, 2013

Consider a network of n processes each of which has a d-dimensional vector of reals as its input.... more Consider a network of n processes each of which has a d-dimensional vector of reals as its input. Each process can communicate directly with all the processes in the system; thus the communication network is a complete graph. All the communication channels are reliable and FIFO (first-in-first-out). The problem of Byzantine vector consensus (BVC) requires agreement on a d-dimensional vector that is in the convex hull of the d-dimensional input vectors at the non-faulty processes. We obtain the following results for Byzantine vector consensus in complete graphs while tolerating up to f Byzantine failures: • We prove that in a synchronous system, n ≥ max(3f +1, (d+1)f +1) is necessary and sufficient for achieving Byzantine vector consensus. * This research is supported in part by National Science Foundation awards CNS-1059540 and CNS-1115808 and the Cullen Trust for Higher Education. Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of the funding agencies or the U.S. government.

Research paper thumbnail of Characterization of Super-stable Matchings

arXiv (Cornell University), May 20, 2021

An instance of the super-stable matching problem with incomplete lists and ties is an undirected ... more An instance of the super-stable matching problem with incomplete lists and ties is an undirected bipartite graph G = (A ∪ B, E), with an adjacency list being a linearly ordered list of ties. Ties are subsets of vertices equally good for a given vertex. An edge (x, y) ∈ E\M is a blocking edge for a matching M if by getting matched to each other neither of the vertices x and y would become worse off. Thus, there is no disadvantage if the two vertices would like to match up. A matching M is super-stable if there is no blocking edge with respect to M. It has previously been shown that super-stable matchings form a distributive lattice [1, 2] and the number of super-stable matchings can be exponential in the number of vertices. We give two compact representations of size O(m) that can be used to construct all super-stable matchings, where m denotes the number of edges in the graph. The construction of the second representation takes O(mn) time, where n denotes the number of vertices in the graph, and gives an explicit rotation poset similar to the rotation poset in the classical stable marriage problem. We also give a polyhedral characterisation of the set of all super-stable matchings and prove that the super-stable matching polytope is integral, thus solving an open problem stated in the book by Gusfield and Irving [3].

Research paper thumbnail of Byzantine Lattice Agreement in Synchronous Message Passing Systems

International Conference on Distributed Computing, 2020

We propose three algorithms for the Byzantine lattice agreement problem in synchronous systems. T... more We propose three algorithms for the Byzantine lattice agreement problem in synchronous systems. The first algorithm runs in min{3h(X) + 6, 6 √ fa + 6}) rounds and takes O(n 2 min{h(X), √ fa}) messages, where h(X) is the height of the input lattice X, n is the total number of processes in the system, f is the maximum number of Byzantine processes such that n ≥ 3f + 1 and fa ≤ f is the actual number of Byzantine processes in an execution. The second algorithm takes 3 log n + 3 rounds and O(n 2 log n) messages. The third algorithm takes 4 log f + 3 rounds and O(n 2 log f) messages. All algorithms can tolerate f < n 3 Byzantine failures. This is the first work for the Byzantine lattice agreement problem in synchronous systems which achieves logarithmic rounds. In our algorithms, we apply a slightly modified version of the Gradecast algorithm given by Feldman et al [10] as a building block. If we use the Gradecast algorithm for authenticated setting given by Katz et al [12], we obtain algorithms for the Byzantine lattice agreement problem in authenticated settings and tolerate f < n 2 failures.

Research paper thumbnail of Fault Tolerance in Distributed Systems using Fused State Machines

arXiv (Cornell University), Mar 23, 2013

Replication is a standard technique for fault tolerance in distributed systems modeled as determi... more Replication is a standard technique for fault tolerance in distributed systems modeled as deterministic finite state machines (DFSMs or machines). To correct f crash or ⌊ f /2⌋ Byzantine faults among n different machines, replication requires n f additional backup machines. We present a solution called fusion that requires just f additional backup machines. First, we build a framework for fault tolerance in DFSMs based on the notion of Hamming distances. We introduce the concept of an (f , m)-fusion, which is a set of m backup machines that can correct f crash faults or ⌊ f /2⌋ Byzantine faults among a given set of machines. Second, we present an algorithm to generate an (f , f)-fusion for a given set of machines. We ensure that our backups are efficient in terms of the size of their state and event sets. Third, we use locality sensitive hashing for the detection and correction of faults that incurs almost the same overhead as that for replication. We detect Byzantine faults with time complexity O(n f) on average while we correct crash and Byzantine faults with time complexity O(nρ f) with high probability, where ρ is the average state reduction achieved by fusion. Finally, our evaluation of fusion on the widely used MCNC'91 benchmarks for DFSMs show that the average state space *This research was supported in part by the NSF Grants CNS-0718990, CNS-0509024, CNS-1115808 and Cullen Trust for Higher Education Endowed Professorship.

Research paper thumbnail of Byzantine Lattice Agreement in Synchronous Systems

arXiv (Cornell University), Oct 30, 2019

In this paper, we study the Byzantine lattice agreement problem in synchronous distributed messag... more In this paper, we study the Byzantine lattice agreement problem in synchronous distributed message passing systems. The lattice agreement problem [2] in crash failure model has been studied both in synchronous and asynchronous systems [2, 10, 22, 21], which leads to the current best upper bound of O(log f) rounds both in synchronous and asynchronous systems. Its applications in building linearizable replicated state machines has also been further explored recently in [10, 19, 21]. However, very few algorithmic results are known for the lattice agreement problem in Byzantine failure model. The paper by Nowak et al [18] first gives an algorithm for a variant of the lattice agreement problem on cycle-free lattices that tolerates up to f < n/(h(X) + 1) Byzantine faults, where n is the number of processes and h(X) is the height of the input lattice X. The recent preprint by Di Luna et al [8] studies this problem in asynchronous systems and slightly modifies the validity condition of the original lattice agreement problem in order to accommodate extra values sent from Byzantine processes. They present a O(f) rounds algorithm by using the reliable broadcast primitive as a first step and following the similar algorithmic framework as in [10, 22]. In this paper, we propose three algorithms for the Byzantine lattice agreement problem in synchronous systems. The first algorithm runs in min{3h(X) + 6, 6 √ f + 6}) rounds and takes O(n 2 min{h(X), √ f }) messages, where h(X) is the height of the input lattice X, n is the total number of processes in the system and f is the maximum number of Byzantine processes such that n ≥ 3f + 1. The second algorithm takes 3 log n + 3 rounds and O(n 2 log n) messages. The third algorithm takes 4 log f + 3 rounds and O(n 2 log f) messages. All algorithms can tolerate f < n 3 Byzantine failures. In our algorithms, we apply a slightly modified version of the Gradecast algorithm given by Feldman et al [11] as a building block. If we use the Gradecast algorithm for authenticated setting given by Katz et al [13], we obtain algorithms for the Byzantine lattice agreement problem in authenticated settings and tolerate f < n 2 failures.

Research paper thumbnail of An Optimal Vector Clock Algorithm for Multithreaded Systems

arXiv (Cornell University), Jan 19, 2019

Tracking causality (or happened-before relation) between events is useful for many applications s... more Tracking causality (or happened-before relation) between events is useful for many applications such as debugging and recovery from failures. Consider a concurrent system with n threads and m objects. For such systems, either a vector clock of size n is used with one component per thread or a vector clock of size m is used with one component per object. A natural question is whether one can use a vector clock of size strictly less than the minimum of m and n to timestamp events. We give an algorithm in this paper that uses a hybrid of thread and object components. Our algorithm is guaranteed to return the minimum number of components necessary for vector clocks. We first consider the case when the interaction between objects and threads is statically known. This interaction is modeled by a thread-object bipartite graph. Our algorithm is based on finding the maximum bipartite matching of such a graph and then applying König-Egerváry Theorem to compute the minimum vertex cover to determine the optimal number of components necessary for the vector clock. We also propose two mechanisms to compute such an vector clock when computation is revealed in an online fashion. Evaluation on different types of graphs indicates that our offline algorithm generates a size vector clock which is significantly less than the minimum of m and n. These mechanisms are more effective when the underlying bipartite graph is not dense.

Research paper thumbnail of A Lattice Linear Predicate Parallel Algorithm for the Dynamic Programming Problems

arXiv (Cornell University), Mar 10, 2021

It has been shown that the parallel Lattice Linear Predicate (LLP) algorithm solves many combinat... more It has been shown that the parallel Lattice Linear Predicate (LLP) algorithm solves many combinatorial optimization problems such as the shortest path problem, the stable marriage problem and the market clearing price problem. In this paper, we give the parallel LLP algorithm for many dynamic programming problems. In particular, we show that the LLP algorithm solves the longest subsequence problem, the optimal binary search tree problem, and the knapsack problem. Furthermore, the algorithm can be used to solve the constrained versions of these problems so long as the constraints are lattice linear. The parallel LLP algorithm requires only read-write atomicity and no higher-level atomic instructions.

Research paper thumbnail of A Generalization of Teo and Sethuraman's Median Stable Marriage Theorem

arXiv (Cornell University), Jan 9, 2020

Let L be any finite distributive lattice and B be any boolean predicate defined on L such that th... more Let L be any finite distributive lattice and B be any boolean predicate defined on L such that the set of elements satisfying B is a sublattice of L. Consider any subset M of L of size k of elements of L that satisfy B. Then, we show that k generalized median elements generated from M also satisfy B. We call this result generalized median theorem on finite distributive lattices. When this result is applied to the stable matching, we get Teo and Sethuraman's median stable matching theorem. Our proof is much simpler than that of Teo and Sethuraman. When the generalized median theorem is applied to the assignment problem, we get an analogous result for market clearing price vectors.

Research paper thumbnail of AutoSynch: An Automatic-Signal Monitor Based on Predicate Tagging

arXiv (Cornell University), Mar 1, 2013

Most programming languages use monitors with explicit signals for synchronization in shared-memor... more Most programming languages use monitors with explicit signals for synchronization in shared-memory programs. Requiring programmers to signal threads explicitly results in many concurrency bugs due to missed notifications, or notifications on wrong condition variables. In this paper, we describe an implementation of an automatic signaling monitor in Java called AutoSynch that eliminates such concurrency bugs by removing the burden of signaling from the programmer. We show that the belief that automatic signaling monitors are prohibitively expensive is wrong. For most problems, programs based on AutoSynch are almost as fast as those based on explicit signaling. For some, AutoSynch is even faster than explicit signaling because it never uses signalAll, whereas the programmers end up using signalAll with the explicit signal mechanism. AutoSynch achieves efficiency in synchronization based on three novel ideas. We introduce an operation called globalization that enables the predicate evaluation in every thread, thereby reducing context switches during the execution of the program. Secondly, AutoSynch avoids signalAll by using a property called relay invariance that guarantees that whenever possible there is always at least one thread whose condition is true which has been signaled. Finally, AutoSynch uses a technique called predicate tagging to efficiently determine a thread that should be signaled. To evaluate the efficiency of AutoSynch, we have implemented many different wellknown synchronization problems such as the producers/consumers problem, the readers/writers problems, and the dining philosophers problem. The results show that AutoSynch is almost as efficient as the explicit-signal monitor and even more efficient for some cases.

Research paper thumbnail of Detecting and tolerating faults in distributed systems

Research paper thumbnail of A Lightweight Algorithm for Causal Message Ordering in Mobile Computing Systems

Research paper thumbnail of Removing Sequential Bottleneck of Dijkstra's Algorithm for the Shortest Path Problem

arXiv (Cornell University), Dec 26, 2018

All traditional methods of computing shortest paths depend upon edge-relaxation where the cost of... more All traditional methods of computing shortest paths depend upon edge-relaxation where the cost of reaching a vertex from a source vertex is possibly decreased if that edge is used. We introduce a method which maintains lower bounds as well as upper bounds for reaching a vertex. This method enables one to find the optimal cost for multiple vertices in one iteration and thereby reduces the sequential bottleneck in Dijkstra's algorithm. We present four algorithms in this paper-SP1, SP2, SP3 and SP4. SP1 and SP2 reduce the number of heap operations in Dijkstra's algorithm. For directed acyclic graphs, or directed unweighted graphs they have the optimal complexity of O(e) where e is the number of edges in the graph which is better than that of Dijkstra's algorithm. For general graphs, their worst case complexity matches that of Dijkstra's algorithm for a sequential implementation but allows for greater parallelism. Algorithms SP3 and SP4 allow for even more parallelism but with higher work complexity. Algorithm SP3 requires O(n + e(max(log n, ∆))) work where n is the number of vertices and ∆ is the maximum in-degree of a node. Algorithm SP4 has the most parallelism. It requires O(ne) work. These algorithms generalize the work by Crauser, Mehlhorn, Meyer, and Sanders on parallelizing Dijkstra's algorithm.

Research paper thumbnail of Byzantine Lattice Agreement in Asynchronous Systems

arXiv (Cornell University), Feb 17, 2020

We study the Byzantine lattice agreement (BLA) problem in asynchronous distributed message passin... more We study the Byzantine lattice agreement (BLA) problem in asynchronous distributed message passing systems. In the BLA problem, each process proposes a value from a join semi-lattice and needs to output a value also in the lattice such that all output values of correct processes lie on a chain despite the presence of Byzantine processes. We present an algorithm for this problem with round complexity of O(log f) which tolerates f < n 5 Byzantine failures in the asynchronous setting without digital signatures, where n is the number of processes. We also show how this algorithm can be modified to work in the authenticated setting (i.e., with digital signatures) to tolerate f < n 3 Byzantine failures.

Research paper thumbnail of Linearizable Replicated State Machines with Lattice Agreement

arXiv (Cornell University), Oct 13, 2018

This paper studies the lattice agreement problem in asynchronous systems and explores its applica... more This paper studies the lattice agreement problem in asynchronous systems and explores its application to building linearizable replicated state machines (RSM). First, we propose an algorithm to solve the lattice agreement problem in O(logf)O(\log f)O(logf) asynchronous rounds, where fff is the number of crash failures that the system can tolerate. This is an exponential improvement over the previous best upper bound. Second, Faleiro et al have shown in [Faleiro et al. PODC, 2012] that combination of conflict-free data types and lattice agreement protocols can be applied to implement linearizable RSM. They give a Paxos style lattice agreement protocol, which can be adapted to implement linearizable RSM and guarantee that a command can be learned in at most O(n)O(n)O(n) message delays, where nnn is the number of proposers. Later on, Xiong et al in [Xiong et al. DISC, 2018] give a lattice agreement protocol which improves the O(n)O(n)O(n) guarantee to be O(f)O(f)O(f). However, neither protocols is practical for building a linearizable RSM. Thus, in the second part of the paper, we first give an improved protocol based on the one proposed by Xiong et al. Then, we implement a simple linearizable RSM using the our improved protocol and compare our implementation with an open source Java implementation of Paxos. Results show that better performance can be obtained by using lattice agreement based protocols to implement a linearizable RSM compared to traditional consensus based protocols.

Research paper thumbnail of Necessary and Sufficient Conditions on Partial Orders for Modeling Concurrent Computations

arXiv (Cornell University), Oct 5, 2014

Partial orders are used extensively for modeling and analyzing concurrent computations. In this p... more Partial orders are used extensively for modeling and analyzing concurrent computations. In this paper, we define two properties of partially ordered sets: width-extensibility and interleaving-consistency, and show that a partial order can be a valid state based model: (1) of some synchronous concurrent computation iff it is width-extensible, and (2) of some asynchronous concurrent computation iff it is widthextensible and interleaving-consistent. We also show a duality between the event based and state based models of concurrent computations, and give algorithms to convert models between the two domains. When applied to the problem of checkpointing, our theory leads to a better understanding of some existing results and algorithms in the field. It also leads to efficient detection algorithms for predicates whose evaluation requires knowledge of states from all the processes in the system.

Research paper thumbnail of NC Algorithms for Popular Matchings in One-Sided Preference Systems and Related Problems

arXiv (Cornell University), Oct 23, 2019

The popular matching problem is of matching a set of applicants to a set of posts, where each app... more The popular matching problem is of matching a set of applicants to a set of posts, where each applicant has a preference list, ranking a non-empty subset of posts in the order of preference, possibly with ties. A matching M is popular if there is no other matching M ′ such that more applicants prefer M ′ to M. We give the first NC algorithm to solve the popular matching problem without ties. We also give an NC algorithm that solves the maximum-cardinality popular matching problem. No NC or RNC algorithms were known for the matching problem in preference systems prior to this work. Moreover, we give an NC algorithm for a weaker version of the stable matching problem, that is, the problem of finding the "next" stable matching given a stable matching.

Research paper thumbnail of A Max-Plus Algebra of Signals for the Supervisory Control of Real-Time Discrete Event Systems

IFAC Proceedings Volumes, 1998

Research paper thumbnail of Introduction to Lattice Theory with Computer Science Applications

Research paper thumbnail of Highly scalable algorithm for distributed real-time text indexing

2009 International Conference on High Performance Computing (HiPC), 2009

Stream computing research is moving from terascale to petascale levels. It aims to rapidly analyz... more Stream computing research is moving from terascale to petascale levels. It aims to rapidly analyze data as it streams in from many sources and make decisions with high speed and accuracy in fields as diverse as security surveillance and financial services including stock trading. We specifically consider real-time text indexing and search with high input data rates (10 GB/s or more) along with small index ageoff(expiry) time. This makes it necessary to have maximal indexing rates for large volumes of data as well as minimal latency for indexing (time between start of indexing for a document and its availability for search) while maintaining very-low search response time. In addition, future massively parallel architectures with storage class memories will enable high speed in-memory real-time indexing, where index can be completely stored in a high capacity storage class memory. In this paper, we present the design of distributed datastructures and distributed real-time text indexing algorithm for parallel systems having large (thousands to hundred thousand) number of cores/processors, while simultaneously providing acceptable search performance [1]. The inherent trade-offs involved in index space, indexing throughput and search response time make this problem particularly challenging. Our algorithm uses group-based index construction and leverages novel index data structures that reduce load imbalance and make text indexing and merge process more scalable and efficient. We show analytically that the asymptotic parallel time complexity of our distributed indexing algorithm, is at least Ω(log(P)) factor better than typical indexing approaches, where P is the number of indexing nodes in a group. We further demonstrate the performance and scalability of our distributed indexing algorithm, on an MPP architecture (Blue Gene/L 1) using actual IBM intranet data. We achieved high indexing throughput of around 312 GB/min on an 8K node Blue Gene/L machine. In comparison with parallel indexing implemented using typical approaches like CLucene 2 , this is 3×-7× better. To the best of our knowledge, this is the first published result on indexing throughput at such a large scale, with sustained search performance. We further show that our approach is scalable 1. http://www.research.ibm.com/bluegene 2. http://www.sourceforge.net/projects/clucene to 128K nodes, giving an estimated indexing throughput of 5 T B/min. We also achieved indexing latency that is around 10× better than typical indexing approaches.

Research paper thumbnail of Applying Predicate Detection to the Constrained Optimization Problems

arXiv (Cornell University), Dec 26, 2018

We present a method to design parallel algorithms for constrained combinatorial optimization prob... more We present a method to design parallel algorithms for constrained combinatorial optimization problems. Our method solves and generalizes many classical combinatorial optimization problems including the stable marriage problem, the shortest path problem and the market clearing price problem. These three problems are solved in the literature using Gale-Shapley algorithm, Dijkstra's algorithm, and Demange, Gale, Sotomayor algorithm. Our method solves all these problems by casting them as searching for an element that satisfies an appropriate predicate in a distributive lattice. Moreover, it solves generalizations of all these problems-namely finding the optimal solution satisfying additional constraints called latticelinear predicates. For stable marriage problems, an example of such a constraint is that Peter's regret is less than that of Paul. For shortest path problems, an example of such a constraint is that cost of reaching vertex v1 is at least the cost of reaching vertex v2. For the market clearing price problem, an example of such a constraint is that item1 is priced at least as much as item2. In addition to finding the optimal solution, our method is useful in enumerating all constrained stable matchings, and all constrained market clearing price vectors.

Research paper thumbnail of Byzantine Vector Consensus in Complete Graphs

arXiv (Cornell University), Feb 11, 2013

Consider a network of n processes each of which has a d-dimensional vector of reals as its input.... more Consider a network of n processes each of which has a d-dimensional vector of reals as its input. Each process can communicate directly with all the processes in the system; thus the communication network is a complete graph. All the communication channels are reliable and FIFO (first-in-first-out). The problem of Byzantine vector consensus (BVC) requires agreement on a d-dimensional vector that is in the convex hull of the d-dimensional input vectors at the non-faulty processes. We obtain the following results for Byzantine vector consensus in complete graphs while tolerating up to f Byzantine failures: • We prove that in a synchronous system, n ≥ max(3f +1, (d+1)f +1) is necessary and sufficient for achieving Byzantine vector consensus. * This research is supported in part by National Science Foundation awards CNS-1059540 and CNS-1115808 and the Cullen Trust for Higher Education. Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of the funding agencies or the U.S. government.

Research paper thumbnail of Characterization of Super-stable Matchings

arXiv (Cornell University), May 20, 2021

An instance of the super-stable matching problem with incomplete lists and ties is an undirected ... more An instance of the super-stable matching problem with incomplete lists and ties is an undirected bipartite graph G = (A ∪ B, E), with an adjacency list being a linearly ordered list of ties. Ties are subsets of vertices equally good for a given vertex. An edge (x, y) ∈ E\M is a blocking edge for a matching M if by getting matched to each other neither of the vertices x and y would become worse off. Thus, there is no disadvantage if the two vertices would like to match up. A matching M is super-stable if there is no blocking edge with respect to M. It has previously been shown that super-stable matchings form a distributive lattice [1, 2] and the number of super-stable matchings can be exponential in the number of vertices. We give two compact representations of size O(m) that can be used to construct all super-stable matchings, where m denotes the number of edges in the graph. The construction of the second representation takes O(mn) time, where n denotes the number of vertices in the graph, and gives an explicit rotation poset similar to the rotation poset in the classical stable marriage problem. We also give a polyhedral characterisation of the set of all super-stable matchings and prove that the super-stable matching polytope is integral, thus solving an open problem stated in the book by Gusfield and Irving [3].

Research paper thumbnail of Byzantine Lattice Agreement in Synchronous Message Passing Systems

International Conference on Distributed Computing, 2020

We propose three algorithms for the Byzantine lattice agreement problem in synchronous systems. T... more We propose three algorithms for the Byzantine lattice agreement problem in synchronous systems. The first algorithm runs in min{3h(X) + 6, 6 √ fa + 6}) rounds and takes O(n 2 min{h(X), √ fa}) messages, where h(X) is the height of the input lattice X, n is the total number of processes in the system, f is the maximum number of Byzantine processes such that n ≥ 3f + 1 and fa ≤ f is the actual number of Byzantine processes in an execution. The second algorithm takes 3 log n + 3 rounds and O(n 2 log n) messages. The third algorithm takes 4 log f + 3 rounds and O(n 2 log f) messages. All algorithms can tolerate f < n 3 Byzantine failures. This is the first work for the Byzantine lattice agreement problem in synchronous systems which achieves logarithmic rounds. In our algorithms, we apply a slightly modified version of the Gradecast algorithm given by Feldman et al [10] as a building block. If we use the Gradecast algorithm for authenticated setting given by Katz et al [12], we obtain algorithms for the Byzantine lattice agreement problem in authenticated settings and tolerate f < n 2 failures.

Research paper thumbnail of Fault Tolerance in Distributed Systems using Fused State Machines

arXiv (Cornell University), Mar 23, 2013

Replication is a standard technique for fault tolerance in distributed systems modeled as determi... more Replication is a standard technique for fault tolerance in distributed systems modeled as deterministic finite state machines (DFSMs or machines). To correct f crash or ⌊ f /2⌋ Byzantine faults among n different machines, replication requires n f additional backup machines. We present a solution called fusion that requires just f additional backup machines. First, we build a framework for fault tolerance in DFSMs based on the notion of Hamming distances. We introduce the concept of an (f , m)-fusion, which is a set of m backup machines that can correct f crash faults or ⌊ f /2⌋ Byzantine faults among a given set of machines. Second, we present an algorithm to generate an (f , f)-fusion for a given set of machines. We ensure that our backups are efficient in terms of the size of their state and event sets. Third, we use locality sensitive hashing for the detection and correction of faults that incurs almost the same overhead as that for replication. We detect Byzantine faults with time complexity O(n f) on average while we correct crash and Byzantine faults with time complexity O(nρ f) with high probability, where ρ is the average state reduction achieved by fusion. Finally, our evaluation of fusion on the widely used MCNC'91 benchmarks for DFSMs show that the average state space *This research was supported in part by the NSF Grants CNS-0718990, CNS-0509024, CNS-1115808 and Cullen Trust for Higher Education Endowed Professorship.

Research paper thumbnail of Byzantine Lattice Agreement in Synchronous Systems

arXiv (Cornell University), Oct 30, 2019

In this paper, we study the Byzantine lattice agreement problem in synchronous distributed messag... more In this paper, we study the Byzantine lattice agreement problem in synchronous distributed message passing systems. The lattice agreement problem [2] in crash failure model has been studied both in synchronous and asynchronous systems [2, 10, 22, 21], which leads to the current best upper bound of O(log f) rounds both in synchronous and asynchronous systems. Its applications in building linearizable replicated state machines has also been further explored recently in [10, 19, 21]. However, very few algorithmic results are known for the lattice agreement problem in Byzantine failure model. The paper by Nowak et al [18] first gives an algorithm for a variant of the lattice agreement problem on cycle-free lattices that tolerates up to f < n/(h(X) + 1) Byzantine faults, where n is the number of processes and h(X) is the height of the input lattice X. The recent preprint by Di Luna et al [8] studies this problem in asynchronous systems and slightly modifies the validity condition of the original lattice agreement problem in order to accommodate extra values sent from Byzantine processes. They present a O(f) rounds algorithm by using the reliable broadcast primitive as a first step and following the similar algorithmic framework as in [10, 22]. In this paper, we propose three algorithms for the Byzantine lattice agreement problem in synchronous systems. The first algorithm runs in min{3h(X) + 6, 6 √ f + 6}) rounds and takes O(n 2 min{h(X), √ f }) messages, where h(X) is the height of the input lattice X, n is the total number of processes in the system and f is the maximum number of Byzantine processes such that n ≥ 3f + 1. The second algorithm takes 3 log n + 3 rounds and O(n 2 log n) messages. The third algorithm takes 4 log f + 3 rounds and O(n 2 log f) messages. All algorithms can tolerate f < n 3 Byzantine failures. In our algorithms, we apply a slightly modified version of the Gradecast algorithm given by Feldman et al [11] as a building block. If we use the Gradecast algorithm for authenticated setting given by Katz et al [13], we obtain algorithms for the Byzantine lattice agreement problem in authenticated settings and tolerate f < n 2 failures.

Research paper thumbnail of An Optimal Vector Clock Algorithm for Multithreaded Systems

arXiv (Cornell University), Jan 19, 2019

Tracking causality (or happened-before relation) between events is useful for many applications s... more Tracking causality (or happened-before relation) between events is useful for many applications such as debugging and recovery from failures. Consider a concurrent system with n threads and m objects. For such systems, either a vector clock of size n is used with one component per thread or a vector clock of size m is used with one component per object. A natural question is whether one can use a vector clock of size strictly less than the minimum of m and n to timestamp events. We give an algorithm in this paper that uses a hybrid of thread and object components. Our algorithm is guaranteed to return the minimum number of components necessary for vector clocks. We first consider the case when the interaction between objects and threads is statically known. This interaction is modeled by a thread-object bipartite graph. Our algorithm is based on finding the maximum bipartite matching of such a graph and then applying König-Egerváry Theorem to compute the minimum vertex cover to determine the optimal number of components necessary for the vector clock. We also propose two mechanisms to compute such an vector clock when computation is revealed in an online fashion. Evaluation on different types of graphs indicates that our offline algorithm generates a size vector clock which is significantly less than the minimum of m and n. These mechanisms are more effective when the underlying bipartite graph is not dense.

Research paper thumbnail of A Lattice Linear Predicate Parallel Algorithm for the Dynamic Programming Problems

arXiv (Cornell University), Mar 10, 2021

It has been shown that the parallel Lattice Linear Predicate (LLP) algorithm solves many combinat... more It has been shown that the parallel Lattice Linear Predicate (LLP) algorithm solves many combinatorial optimization problems such as the shortest path problem, the stable marriage problem and the market clearing price problem. In this paper, we give the parallel LLP algorithm for many dynamic programming problems. In particular, we show that the LLP algorithm solves the longest subsequence problem, the optimal binary search tree problem, and the knapsack problem. Furthermore, the algorithm can be used to solve the constrained versions of these problems so long as the constraints are lattice linear. The parallel LLP algorithm requires only read-write atomicity and no higher-level atomic instructions.

Research paper thumbnail of A Generalization of Teo and Sethuraman's Median Stable Marriage Theorem

arXiv (Cornell University), Jan 9, 2020

Let L be any finite distributive lattice and B be any boolean predicate defined on L such that th... more Let L be any finite distributive lattice and B be any boolean predicate defined on L such that the set of elements satisfying B is a sublattice of L. Consider any subset M of L of size k of elements of L that satisfy B. Then, we show that k generalized median elements generated from M also satisfy B. We call this result generalized median theorem on finite distributive lattices. When this result is applied to the stable matching, we get Teo and Sethuraman's median stable matching theorem. Our proof is much simpler than that of Teo and Sethuraman. When the generalized median theorem is applied to the assignment problem, we get an analogous result for market clearing price vectors.

Research paper thumbnail of AutoSynch: An Automatic-Signal Monitor Based on Predicate Tagging

arXiv (Cornell University), Mar 1, 2013

Most programming languages use monitors with explicit signals for synchronization in shared-memor... more Most programming languages use monitors with explicit signals for synchronization in shared-memory programs. Requiring programmers to signal threads explicitly results in many concurrency bugs due to missed notifications, or notifications on wrong condition variables. In this paper, we describe an implementation of an automatic signaling monitor in Java called AutoSynch that eliminates such concurrency bugs by removing the burden of signaling from the programmer. We show that the belief that automatic signaling monitors are prohibitively expensive is wrong. For most problems, programs based on AutoSynch are almost as fast as those based on explicit signaling. For some, AutoSynch is even faster than explicit signaling because it never uses signalAll, whereas the programmers end up using signalAll with the explicit signal mechanism. AutoSynch achieves efficiency in synchronization based on three novel ideas. We introduce an operation called globalization that enables the predicate evaluation in every thread, thereby reducing context switches during the execution of the program. Secondly, AutoSynch avoids signalAll by using a property called relay invariance that guarantees that whenever possible there is always at least one thread whose condition is true which has been signaled. Finally, AutoSynch uses a technique called predicate tagging to efficiently determine a thread that should be signaled. To evaluate the efficiency of AutoSynch, we have implemented many different wellknown synchronization problems such as the producers/consumers problem, the readers/writers problems, and the dining philosophers problem. The results show that AutoSynch is almost as efficient as the explicit-signal monitor and even more efficient for some cases.

Research paper thumbnail of Detecting and tolerating faults in distributed systems

Research paper thumbnail of A Lightweight Algorithm for Causal Message Ordering in Mobile Computing Systems

Research paper thumbnail of Removing Sequential Bottleneck of Dijkstra's Algorithm for the Shortest Path Problem

arXiv (Cornell University), Dec 26, 2018

All traditional methods of computing shortest paths depend upon edge-relaxation where the cost of... more All traditional methods of computing shortest paths depend upon edge-relaxation where the cost of reaching a vertex from a source vertex is possibly decreased if that edge is used. We introduce a method which maintains lower bounds as well as upper bounds for reaching a vertex. This method enables one to find the optimal cost for multiple vertices in one iteration and thereby reduces the sequential bottleneck in Dijkstra's algorithm. We present four algorithms in this paper-SP1, SP2, SP3 and SP4. SP1 and SP2 reduce the number of heap operations in Dijkstra's algorithm. For directed acyclic graphs, or directed unweighted graphs they have the optimal complexity of O(e) where e is the number of edges in the graph which is better than that of Dijkstra's algorithm. For general graphs, their worst case complexity matches that of Dijkstra's algorithm for a sequential implementation but allows for greater parallelism. Algorithms SP3 and SP4 allow for even more parallelism but with higher work complexity. Algorithm SP3 requires O(n + e(max(log n, ∆))) work where n is the number of vertices and ∆ is the maximum in-degree of a node. Algorithm SP4 has the most parallelism. It requires O(ne) work. These algorithms generalize the work by Crauser, Mehlhorn, Meyer, and Sanders on parallelizing Dijkstra's algorithm.

Research paper thumbnail of Byzantine Lattice Agreement in Asynchronous Systems

arXiv (Cornell University), Feb 17, 2020

We study the Byzantine lattice agreement (BLA) problem in asynchronous distributed message passin... more We study the Byzantine lattice agreement (BLA) problem in asynchronous distributed message passing systems. In the BLA problem, each process proposes a value from a join semi-lattice and needs to output a value also in the lattice such that all output values of correct processes lie on a chain despite the presence of Byzantine processes. We present an algorithm for this problem with round complexity of O(log f) which tolerates f < n 5 Byzantine failures in the asynchronous setting without digital signatures, where n is the number of processes. We also show how this algorithm can be modified to work in the authenticated setting (i.e., with digital signatures) to tolerate f < n 3 Byzantine failures.

Research paper thumbnail of Linearizable Replicated State Machines with Lattice Agreement

arXiv (Cornell University), Oct 13, 2018

This paper studies the lattice agreement problem in asynchronous systems and explores its applica... more This paper studies the lattice agreement problem in asynchronous systems and explores its application to building linearizable replicated state machines (RSM). First, we propose an algorithm to solve the lattice agreement problem in O(logf)O(\log f)O(logf) asynchronous rounds, where fff is the number of crash failures that the system can tolerate. This is an exponential improvement over the previous best upper bound. Second, Faleiro et al have shown in [Faleiro et al. PODC, 2012] that combination of conflict-free data types and lattice agreement protocols can be applied to implement linearizable RSM. They give a Paxos style lattice agreement protocol, which can be adapted to implement linearizable RSM and guarantee that a command can be learned in at most O(n)O(n)O(n) message delays, where nnn is the number of proposers. Later on, Xiong et al in [Xiong et al. DISC, 2018] give a lattice agreement protocol which improves the O(n)O(n)O(n) guarantee to be O(f)O(f)O(f). However, neither protocols is practical for building a linearizable RSM. Thus, in the second part of the paper, we first give an improved protocol based on the one proposed by Xiong et al. Then, we implement a simple linearizable RSM using the our improved protocol and compare our implementation with an open source Java implementation of Paxos. Results show that better performance can be obtained by using lattice agreement based protocols to implement a linearizable RSM compared to traditional consensus based protocols.

Research paper thumbnail of Necessary and Sufficient Conditions on Partial Orders for Modeling Concurrent Computations

arXiv (Cornell University), Oct 5, 2014

Partial orders are used extensively for modeling and analyzing concurrent computations. In this p... more Partial orders are used extensively for modeling and analyzing concurrent computations. In this paper, we define two properties of partially ordered sets: width-extensibility and interleaving-consistency, and show that a partial order can be a valid state based model: (1) of some synchronous concurrent computation iff it is width-extensible, and (2) of some asynchronous concurrent computation iff it is widthextensible and interleaving-consistent. We also show a duality between the event based and state based models of concurrent computations, and give algorithms to convert models between the two domains. When applied to the problem of checkpointing, our theory leads to a better understanding of some existing results and algorithms in the field. It also leads to efficient detection algorithms for predicates whose evaluation requires knowledge of states from all the processes in the system.

Research paper thumbnail of NC Algorithms for Popular Matchings in One-Sided Preference Systems and Related Problems

arXiv (Cornell University), Oct 23, 2019

The popular matching problem is of matching a set of applicants to a set of posts, where each app... more The popular matching problem is of matching a set of applicants to a set of posts, where each applicant has a preference list, ranking a non-empty subset of posts in the order of preference, possibly with ties. A matching M is popular if there is no other matching M ′ such that more applicants prefer M ′ to M. We give the first NC algorithm to solve the popular matching problem without ties. We also give an NC algorithm that solves the maximum-cardinality popular matching problem. No NC or RNC algorithms were known for the matching problem in preference systems prior to this work. Moreover, we give an NC algorithm for a weaker version of the stable matching problem, that is, the problem of finding the "next" stable matching given a stable matching.

Research paper thumbnail of A Max-Plus Algebra of Signals for the Supervisory Control of Real-Time Discrete Event Systems

IFAC Proceedings Volumes, 1998

Research paper thumbnail of Introduction to Lattice Theory with Computer Science Applications

Research paper thumbnail of Highly scalable algorithm for distributed real-time text indexing

2009 International Conference on High Performance Computing (HiPC), 2009

Stream computing research is moving from terascale to petascale levels. It aims to rapidly analyz... more Stream computing research is moving from terascale to petascale levels. It aims to rapidly analyze data as it streams in from many sources and make decisions with high speed and accuracy in fields as diverse as security surveillance and financial services including stock trading. We specifically consider real-time text indexing and search with high input data rates (10 GB/s or more) along with small index ageoff(expiry) time. This makes it necessary to have maximal indexing rates for large volumes of data as well as minimal latency for indexing (time between start of indexing for a document and its availability for search) while maintaining very-low search response time. In addition, future massively parallel architectures with storage class memories will enable high speed in-memory real-time indexing, where index can be completely stored in a high capacity storage class memory. In this paper, we present the design of distributed datastructures and distributed real-time text indexing algorithm for parallel systems having large (thousands to hundred thousand) number of cores/processors, while simultaneously providing acceptable search performance [1]. The inherent trade-offs involved in index space, indexing throughput and search response time make this problem particularly challenging. Our algorithm uses group-based index construction and leverages novel index data structures that reduce load imbalance and make text indexing and merge process more scalable and efficient. We show analytically that the asymptotic parallel time complexity of our distributed indexing algorithm, is at least Ω(log(P)) factor better than typical indexing approaches, where P is the number of indexing nodes in a group. We further demonstrate the performance and scalability of our distributed indexing algorithm, on an MPP architecture (Blue Gene/L 1) using actual IBM intranet data. We achieved high indexing throughput of around 312 GB/min on an 8K node Blue Gene/L machine. In comparison with parallel indexing implemented using typical approaches like CLucene 2 , this is 3×-7× better. To the best of our knowledge, this is the first published result on indexing throughput at such a large scale, with sustained search performance. We further show that our approach is scalable 1. http://www.research.ibm.com/bluegene 2. http://www.sourceforge.net/projects/clucene to 128K nodes, giving an estimated indexing throughput of 5 T B/min. We also achieved indexing latency that is around 10× better than typical indexing approaches.