Reliably Executing Tasks in the Presence of Malicious Processors (original) (raw)

Robust Network Supercomputing with Malicious Processes

Lecture Notes in Computer Science, 2006

Internet supercomputing is becoming a powerful tool for harnessing massive amounts of computational resources. However in typical master-worker settings the reliability of computation crucially depends on the ability of the master to depend on the computation performed by the workers. Fernandez, Georgiou, Lopez, and Santos [12,13] considered a system consisting of a master process and a collection of worker processes that can execute tasks on behalf of the master and that may act maliciously by deliberately returning fallacious results. The master decides on the correctness of the results by assigning the same task to several workers. The master is charged one work unit for each task performed by a worker. The goal is to design an algorithm that enables the master to determine the correct result with high probability, and at the least possible cost. Fernandez et al. assume that the number of faulty processes or the probability of a process acting maliciously is known to the master. In this paper this assumption is removed. In the setting with n processes and n tasks we consider two different failure models, viz., model Fa, where f-fraction, 0 < f < 1 2 , of the workers provide faulty results with probability 0 < p < 1 2 , given that the master has no a priori knowledge of the values of p and f ; and model F b , where at most f-fraction, 0 < f < 1 2 , of the workers can reply with arbitrary results and the rest reply with incorrect results with probability p, 0 < p < 1 2 , but the master knows the values of f and p. For model Fa we provide an algorithm-based on the Stopping Rule Algorithm by Dagum, Karp, Luby, and Ross [10]that can estimate f and p with (, δ)-approximation, for any 0 < δ < 1 and > 0. This algorithm runs in O(log n) time, O(log 2 n) message complexity, and O(log 2 n) task-oriented work and O(n log n) total-work complexities. We also provide a randomized algorithm for detecting the faulty processes, i.e., identifying the processes that have non-zero probability of failures in model Fa, with task-oriented work O(n), and time O(log n). A lower bound on the total-work complexity of performing n tasks correctly with high probability is shown. Finally, two randomized algorithms to perform n tasks with high probability are given for both failure models with closely matching upper bounds on total-work and task-oriented work complexities, and time O(log n).

Reliably executing tasks in the presence of untrusted entities

2006

Abstract In this work we consider a distributed system formed by a master processor and a collection of n processors (workers) that can execute tasks; worker processors are untrusted and might act maliciously. The master assigns tasks to workers to be executed. Each task returns a binary value, and we want the master to accept only correct values with high probability. Furthermore, we assume that the service provided by the workers is not free; for each task that a worker is assigned, the master is charged with a work-unit.

Reliable Internet-based Master-Worker Computing in the Presence of Malicious Workers

2012

We consider a Master-Worker distributed system where a master processor assigns, over the Internet, tasks to a collection of n workers, which are untrusted and might act maliciously. In addition, a worker may not reply to the master, or its reply may not reach the master, due to unavailabilities or failures of the worker or the network. Each task returns a value, and the goal is for the master to accept only correct values with high probability.

Dealing with Undependable Workers in Decentralized Network Supercomputing

Lecture Notes in Computer Science, 2013

Internet supercomputing is an approach to solving partitionable, computation-intensive problems by harnessing the power of a vast number of interconnected computers. This paper presents a new algorithm for the problem of using network supercomputing to perform a large collection of independent tasks, while dealing with undependable processors. The adversary may cause the processors to return bogus results for tasks with certain probabilities, and may cause a subset F of the initial set of processors P to crash. The adversary is constrained in two ways. First, for the set of non-crashed processors P − F , the average probability of a processor returning a bogus result is inferior to 1 2. Second, the adversary may crash a subset of processors F , provided the size of P − F is bounded from below. We consider two models: the first bounds the size of P − F by a fractional polynomial, the second bounds this size by a poly-logarithm. Both models yield adversaries that are much stronger than previously studied. Our randomized synchronous algorithm is formulated for n processors and t tasks, with n ≤ t, where depending on the number of crashes each live processor is able to terminate dynamically with the knowledge that the problem is solved with high probability. For the adversary constrained by a fractional polynomial, the time complexity of the algorithm is O(t n ε log n log log n), its work is O(t log n log log n) and message complexity is O(n log n log log n). For the poly-log constrained adversary, the time complexity is O(n), work is O(t poly log n), and message complexity is O(n poly log n). All bounds are shown to hold with high probability.

A Probabilistic Approach for Task and Result Certification of Large-Scale Distributed Applications in Hostile Environments

Lecture Notes in Computer Science, 2005

This paper presents a new approach for certifying the correctness of program executions in hostile environments, where tasks or their results have been corrupted due to benign or malicious act. Extending previous results in the restricted context of independent tasks, we introduce a probabilistic certification that establishes whether the results of computations are correct. This probabilistic approach does not make any assumptions about the attack and certification errors are only due to unlucky random choices. Bounds associated with certification are provided for general graphs and for tasks with out-tree dependencies found in a medical image analysis application that motivated the research. This work has been supported by CNRS ACI Grid-DOCG and the Region Rhône-Alpes (Ragtime project).

Secure Distributed Human Computation

2005

We suggest a general paradigm of using large-scale distributed computation to solve difficult problems, but where humans can act as agents and provide candidate solutions. We are especially motivated by problem classes that appear to be difficult for computers to solve effectively, but are easier for humans; e.g., image analysis, speech recognition, and natural language processing. This paradigm already seems to be employed in several real-world scenarios, but we are unaware of any formal and unified attempt to study it. Nonetheless, this concept spawns interesting research questions in cryptography, algorithm design, human computer interfaces, and programming language / API design, among other fields. There are also interesting implications for Internet commerce and the B24b model. We describe this general research area at a high level and touch upon some preliminary work; a more extensive treatment can be found in [6].

Randomized Parallel Computation

Concurrent Computations, 1988

Informally, a randomized algorithm (in the sense of [59] and [82]) is one which bases some of its decisions on the outcomes of coin flips. We can think of the algorithm with one possible sequence of outcomes for the coin flips to be different from the same algorithm with a different sequence of outcomes for the coin flips. Therefore, a randomized algorithm is really a family of algorithms. For a given input, some of the algorithms in this family might run for an indefinitely long time. The objective in the design of a randomized algorithm is to ensure that the number of such bad algorithms in the family is only a small fraction of the total number of algorithms. If for any input we can find at least (1 −) (being very close to 0) portion of algorithms in the family that will run quickly on that input, then clearly, a random algorithm in the family will run quickly on any input with probability ≥ (1 −). In this case we say that this family of algorithms (or this randomized algorithm) runs quickly with probability at least (1 −). is called the error probability. Observe that this probability is independent of the input and the input distribution. To give a flavor for the above notions, we now give an example of a randomized algorithm. We are given a polynomial of n variables f (x 1 ,. .. , x n) over a field F. It is required to check if f is identically zero. We generate a random n−vector (r 1 ,. .. , r n) (r i ∈ F, i = 1,. .. , n) and check if f (r 1 ,. .. , r n) = 0. We repeat this for k independent random vectors. If there was at least one vector on which f evaluated to a non zero value, of course f is nonzero. If f evaluated to zero on all the k vectors tried, we conclude f is zero. It can be shown (see section 2.3.1) that the probability of error in our conclusion will be very small if we choose a sufficiently large k. In comparison, the best known deterministic algorithm for this problem is much more complicated and has a much higher time bound. 1.2 Advantages of Randomization Advantages of randomized algorithms are many. Two extremely important advantages are their simplicity and efficiency. A major portion of randomized algorithms found in the literature are extremely simpler and easier to understand than the best deterministic algorithms for the same problems. The reader would have already got a feel for this from the above given example of testing if a polynomial is identically zero. Randomized algorithms have also been shown to yield better complexity bounds. Numerous examples can be given to illustrate this fact. But we won't enlist all of them here since the algorithms described in the rest of the paper will convince the reader. A skeptical reader at this point might ask: How dependable are randomized algorithms in practice, after all there is a non zero probability that they might fail? This skeptic reader must realize that there is a probability (however small it might be) that the hardware itself might fail. Adleman and Manders [1] remark that if we can find a fast algorithm for a problem with an error probability < 2 −k for some integer k independent of the problem size, we can reduce the error probability far below the hardware error probability by making k large enough. 1.3 Randomization in Parallel Algorithms The tremendously low cost of hardware nowadays has prompted computer scientists to design parallel machines and algorithms to solve problems very efficiently. In an early paper Reif [68] proposed using randomization in parallel computation. In this paper he also solved many algebraic and graph theoretic problems in parallel using randomization. Since then a new area of CS research has evolved that tries to exploit the special features offered by both randomization and parallelization. This paper demonstrates the power of randomization in obtaining efficient parallel algorithms for various important computational problems. 1.4 Different types of randomized algorithms Two types of randomized algorithms can be found in the literature: 1) those that always output the correct answer but whose run-time is a random variable with a specified mean. These are called Las Vegas algorithms; and 2) those that run for a specified amount of time and whose output will be correct with a specified probability. These are called Monte Carlo algorithms. Primality testing algorithm of Rabin [59] is of the second type. The error of a randomized algorithm can either be 1-sided or 2-sided. Consider a randomized algorithm for recognizing a language. The output of the circuit is either yes or no. There are algorithms which when outputting yes will always be correct, but when outputting no they will be correct with high probability. These algorithms are said to have 1-sided error. Algorithms that have non zero error probability on both possible outputs are said to have 2-sided error.