One, two, three . . . infinity: lower bounds for parallel computation (original) (raw)

On the Limits to Speed Up Parallel Machines by Large Hardware and Unbounded Communication

Lower bounds for sequential and parallel random access machines (RAM'S, WRAM's) and distri- buted systems of RAM's (DRAM's) are proved. We show that, when p processors instead of one are avai- lable, the computation of certain functions cannot be speeded up by a factor p but only by a factor O(log(p)). For DRAM'S with communication graph of degree c a maximal speedup 0 (log(c) ) can be achie- ved for these problems. We apply these results to testing the solvability of linear diophantine equa- tions. This generalizes a lower bound of Yao for parallel computation trees. Improving results of Dobkin/Lipton and Klein/Meyer auf der Heide, we establish large lower bounds for the above problem on RAM's. Finally we prove that at least log(n)+l steps are necessary for computing the sum of n in- tegers by a WRAM regardless of the number of pro- cessors and the solution of write conflicts.

Finding the maximum, merging, and sorting in a parallel computation model

Journal of Algorithms, 1981

A model for syucbronized parallel computation is described iu which all p processors have access to a common memory. This model is used to solve tbe problems of finding the maximum, merging, and sorting byp processors. Tlte main results are: 1. Finding the maximumofnelements(1<p~n)witbinadeptbof O(n/p + loglogp); (optimal for p 5 n/loglogn). 2. Merging two sorted lists of length m and n (m I n) witbiu a deptb of O(n/p + logn) for p I n (optimal for p _< n/logn), O(logm/logp/n) for p 2 n(-O(k) if p = m%, k > 1). 3. Sorting n elements witbiu a depth of 0(n/p log n + log n logp) for p 5 n , (o@imal for p I n/logn). OQog2n/logp/n) + logn) for p 2 n (-O(klogn) if p = n'++, k > 1). The depth of O(klogn) forp = n'+'/* processors was also achieved by Hirschberg (Comm ACM 21, No. 8 (1978), 657-661) and Preparata (IEEE Trcuu Computers C-27 (July 1978), 669-673). 0ur algorithm is substantially simpler. All the elementary operations including allocation of processo rs to their jobs arc taken into account in deriving the deutb complexity and not only comparisous.

Models of Parallel Computation and Parallel Complexity

2010

This thesis reviews selected topics from the theory of parallel computation. The research begins with a survey of the proposed models of parallel computation. It examines the characteristics of each model and it discusses its use either for theoretical studies, or for practical applications. Subsequently, it employs common simulation techniques to evaluate the computational power of these models. The simulations establish certain model relations before advancing to a detailed study of the parallel complexity theory, which is the subject of the second part of this thesis. The second part examines classes of feasible highly parallel problems and it investigates the limits of parallelization. It is concerned with the benefits of the parallel solutions and the extent to which they can be applied to all problems. It analyzes the parallel complexity of various well-known tractable problems and it discusses the automatic parallelization of the efficient sequential algorithms. Moreover, it ...

Complexity issues in general purpose parallel computing

1991

In recent years, powerful theoretical techniques have been developed for supporting communication, synchronization and fault tolerance in general purpose parallel computing. The proposition of this thesis is that different techniques should be used to support different algorithms. The determining factor is granularity, or the extent to which an algorithm uses long blocks for communication between processors. We consider the Block PRAM model of Aggarwal, Chandra and Snir, a synchronous model of parallel computation in which the processors communicate by accessing a shared memory. In the Block PRAM model, there is a time cost for each access by a processor to a block of locations in the shared memory. This feature of the model encourages the use of long blocks for communication. In the thesis we present Block PRAM algorithms and lower bounds for specific problems on arrays, lists, expression trees, graphs, strings, binary trees and butterflies. These results introduce useful basic techniques for parallel computation in practice, and provide a classification of problems and algorithms according to their granularity. Also presented are optimal algorithms for universal hashing and skewing, which are techniques for supporting conflict-free memory access in general-and special-purpose parallel computations, respectively. We explore the Block PRAM model as a theoretical basis for the design of scalable general purpose parallel computers. Several simulation results are presented which show the Block PRAM model to be comparable to, and competitive with, other models that have been proposed for this role. Two major advantages of machines based on the Block PRAM model is that they are able to preserve the granularity properties of individual algorithms and can efficiently incorporate a significant degree of fault tolerance. The thesis also discusses methods for the design of algorithms that do not use synchronization. We apply these methods to define fast circuits for several fundamental Boolean functions.

A complexity theory of efficient parallel algorithms

Theoretical Computer Science, 1990

ALnrr& This paper outlines a theory of parallel algorithms that emphasizes two crucial aspects of parallel computation: speedup the improvement in running time due to parallelism. and cficienc,t; the ratio of work done by a parallel algorithm to the work done hv a sequential alponthm. We define six classes of algonthms in these terms: of particular Interest is the &cc. EP, of algorithms that achieve a polynomiai spredup with constant efficiency. The relations hr:ween these classes are examined. WC investigate the robustness of these classes across various models of parallel computation. To do so. w'e examine simulations across models where the simulating machine may be smaller than the simulated machine. These simulations are analyzed with respect to their efficiency and to the reducbon in the number of processors. We show that a large number of parallel computation models are related via efficient simulations. if a polynomial reduction of the number of processors is allowed. This implies that the class EP is invariant across all these models. Many open pmblemc motivated by our app oath are listed. I. IwNtdoetiom As parallel computers become increasingly available, a theory of para!lel algorithms is needed to guide the design of algorithms for such machines. To be useful, such a theory must address two major concerns in parallel computation, namely speedup and efficiency. It should classify algorithms and problems into a few, meaningful classes that are, to the largest exient possible, model independent. This paper outlines an approach to the analysis of parallel algorithms that we feel answers these concerns without sacrificing tc:, much generality or abstractness. We propose a classification of parallel algorithms in terms of parallel running time and inefficiency, which is the extra amount of work done by a parallel algorithm es compared to a sequential algorithm. Both running time and inefficiency are measured as a function of the sequential running time, which is used as a yardstick * A preliminary version of this paper was presented at 15th International Colloquium on Automata,

Parallel Integer Sorting Is More Efficient Than Parallel Comparison Sorting on Exclusive Write PRAMs

SIAM Journal on Computing, 2002

Yijie Han* and We present a significant improvement on parallel integer sorting. Our EREW PRAM algorithm sorts n integers in the range {O,l, . . . . m -1) in time O(log n) with O(n d-log n T) operations using word length klog(m + n), where 1 5 k 5 log n. 'When k = log n this algorithm sorts 71 integers in O(log n) time with linear operations. When k = 1 this algorithm sorts n integers in O(log n) time with O(nm operations.

The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms

Siam Journal on Computing, 1998

This paper introduces the queue-read queue-write (qrqw) parallel random access machine (pram) model, which permits concurrent reading and writing to shared-memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. Prior to this work there were no formal complexity models that accounted for the contention to memory locations, despite its large impact on the performance of parallel programs. The qrqw pram model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied crcw pram or erew pram models: the crcw model does not adequately penalize algorithms with high contention to shared-memory locations, while the erew model is too strict in its insistence on zero contention at each step.