Access Complexity: A New Complexity Analysis Framework for Parallel Computation (original) (raw)

Access Complexity: A New Framework for Computational Complexity

Computational complexity theories have been playing central roles in all areas of computer science. Traditionally, computational complexity is based on the random access memory (RAM) model, in which unit amount of data at an arbitrary location in the memory can be accessed with some fixed constant cost. Recent advances in information technologies made this assumption unrealistic. The speed gap between the processing units and the main memory has been widening dramatically, demanding for deep memory hierarchies. Recent parallel processing systems have more processors than that can cost-effectively share the same memory without cache mechanism. Distributed computation is getting more and more popular. All these demand for a computational complexity model that is more aware of locality. In this paper, we propose a new framework for computational complexity, named access complexity, in which the cost is in the data transfer than in computation itself. The model is designed so that it na...

On the Impact of Communication Complexity on the Design of Parallel Numerical Algorithms

IEEE Transactions on Computers, 2000

This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.

Models of Parallel Computation and Parallel Complexity

2010

This thesis reviews selected topics from the theory of parallel computation. The research begins with a survey of the proposed models of parallel computation. It examines the characteristics of each model and it discusses its use either for theoretical studies, or for practical applications. Subsequently, it employs common simulation techniques to evaluate the computational power of these models. The simulations establish certain model relations before advancing to a detailed study of the parallel complexity theory, which is the subject of the second part of this thesis. The second part examines classes of feasible highly parallel problems and it investigates the limits of parallelization. It is concerned with the benefits of the parallel solutions and the extent to which they can be applied to all problems. It analyzes the parallel complexity of various well-known tractable problems and it discusses the automatic parallelization of the efficient sequential algorithms. Moreover, it ...

A complexity theory of efficient parallel algorithms

Theoretical Computer Science, 1990

ALnrr& This paper outlines a theory of parallel algorithms that emphasizes two crucial aspects of parallel computation: speedup the improvement in running time due to parallelism. and cficienc,t; the ratio of work done by a parallel algorithm to the work done hv a sequential alponthm. We define six classes of algonthms in these terms: of particular Interest is the &cc. EP, of algorithms that achieve a polynomiai spredup with constant efficiency. The relations hr:ween these classes are examined. WC investigate the robustness of these classes across various models of parallel computation. To do so. w'e examine simulations across models where the simulating machine may be smaller than the simulated machine. These simulations are analyzed with respect to their efficiency and to the reducbon in the number of processors. We show that a large number of parallel computation models are related via efficient simulations. if a polynomial reduction of the number of processors is allowed. This implies that the class EP is invariant across all these models. Many open pmblemc motivated by our app oath are listed. I. IwNtdoetiom As parallel computers become increasingly available, a theory of para!lel algorithms is needed to guide the design of algorithms for such machines. To be useful, such a theory must address two major concerns in parallel computation, namely speedup and efficiency. It should classify algorithms and problems into a few, meaningful classes that are, to the largest exient possible, model independent. This paper outlines an approach to the analysis of parallel algorithms that we feel answers these concerns without sacrificing tc:, much generality or abstractness. We propose a classification of parallel algorithms in terms of parallel running time and inefficiency, which is the extra amount of work done by a parallel algorithm es compared to a sequential algorithm. Both running time and inefficiency are measured as a function of the sequential running time, which is used as a yardstick * A preliminary version of this paper was presented at 15th International Colloquium on Automata,

Performance Analysis of Parallel Algorithms

2016

In this paper, we provide a qualitative and quantitative analysis of the performance of parallel algorithms on modern multi-core hardware. We attempt to show a comparative study of the performances of algorithms (traditionally perceived as sequential in nature) in a parallel environment, using the Message Passing Interface (MPI) based on Amdahl’s Law. First, we study sorting algorithms. Sorting is a fundamental problem in computer science, and one where there is a limit on the efficiency of algorithms that exist. In theory it contains a large amount of parallelism and should not be difficult to accelerate sorting of very large datasets on modern architectures. Unfortunately, most serial sorting algorithms do not lend themselves to easy parallelization, especially in a distributed memory system such as we might use with MPI. While initial results show a promising speedup for sorting algorithms, owing to inter-process communication latency, we see an slower run-time, overall with incr...

An approach to scalability study of shared memory parallel systems

Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems - SIGMETRICS '94, 1994

The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines. Such overheads may be broadly classified into two components. The first one is intrinsic to the algorithm and arises due to factors such as the work-imbalance and the serial fraction. The second one is due to the interaction between the algorithm and the architecture and arises due to latency and contention in the network. A top-down approach to scalability study of shared memory parallel systems is proposed in this research. We define the notion of overhead functions associated with the different algorithmic and architectural characteristics to quantify the scalability of parallel systems; we isolate the algorithmic overhead and the overheads due to network latency and contention from the overall execution time of an application; we design and implement an execution-driven simulation platform that incorporates these methods for quantifying the overhead functions; and we use this simulator to study the scalability characteristics of five applications on shared memory platforms with different communication topologies.

Micro time cost analysis of parallel computations

IEEE Transactions on Computers, 1991

In this paper, we investigate the modeling and analysis of time cost behavior of parallel computations. It is assumed parallel computations under investigation reside in a computer system in which there is a limited number of processors, all the processors have the same speed, and they communicate with each other through a shared memory. It has been found the time costs of parallel computations depend on the input, the algorithm, the data structure, the processor speed, the number of processors, the processing power allocation, the communication, the execution overhead, and the execution environment. In this paper, we define time costs of parallel computations as a function of the first seven factors as listed above. The computation structure model is modified to describe the impact of these seven factors on time cost. Techniques based on the modified computation structure model are developed to analyze time cost. A software tool, TCAS (time cost analysis system) that uses both the analytic and the simulation approaches, is designed and implemented to aid users in determining the time cost behavior of their parallel computations.

Analysis of scalability of parallel algorithms and architectures

Proceedings of the 5th international conference on Supercomputing - ICS '91, 1991

The scalability of a parallel algorithm on a parallel architecture is a measure of its capability to effectively utilize an increasing number of processors. The scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size it may be used to determine the optimal number of processors to be used and the maximum possible speedup for that problem size. The objective of this paper is to critically assess the state of the art in the theory of scalability analysis, and motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms that have been developed for studying the scalability issues, and discuss their interrelationships.