Emulation of a PRAM on Leveled Networks (original) (raw)

Emulation Of A PRAM On Leveled Networks MS-CIS-91-06 GRASP LAB 251

2014

We present efficient emulations of the CRCW PRAM on a large class of processor interconnection networks called leveled networks. This class includes the star graph and the n-way shuffle, which have the interesting property that the network diameter is sub-logarithmic in the network size. We show that a CRCW PRAM can be emulated optimally on these networks (i.e., each emulation step takes time linear in the network diameter). This is the first result that demonstrates PRAM emulation in less than logarithmic time. We also present an efficient emulation of the CRCW PRAM on an n x n mesh. Although an O(n)-time emulation algorithm for the mesh is known, the underlying constant in the run-time is large, making it impractical. We give an improved emulation algorithm whose time bound is only 4n + o(n). Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-91-06. This technical report is available at ScholarlyCommons: http://repository....

Packet Routing and PRAM Emulation on Star Graphs and Leveled Networks

Journal of Parallel and Distributed Computing, 1994

We consider the problem of permutation routing on a star graph, an interconnection network which has better properties than the hypercube. In particular, its degree and diameter are sublogarithmic in the network size. We present optimal randomized routing algorithms that run in O(D) steps (where D is the network diameter) for the worst-case input with high probability. We also show that for the n-way shuffle network with N = n n nodes, there exits a randomized routing algorithm which runs in O(n) time with high probability. Another contribution of this paper is a universal randomized routing algorithm that could do optimal routing for a large class of networks (called leveled networks) which includes the star graph. The associative analysis is also network-independent. In addition, we present a deterministic routing algorithm, for the star graph, which is near optimal. All the algorithms we give are oblivious. As an application of our routing algorithms, we also show how to emulate a PRAM optimally on this class of networks. 2 An oblivious deterministic routing algorithm for the n-star graph 2.1 The star graph Definition 1 Let d 1 d 2. .. d n be a permutation of n symbols, e.g., 1. .. n. For 1 < j ≤ n, we define SW AP j (d 1 d 2. .. d n) = d j d 2. .. d j−1 d 1 d j+1. .. d n .

Fast, Efficient Mutual and Self Simulations for Shared Memory and Reconfigurable Mesh

International Journal of Parallel, Emergent and Distributed Systems, 1996

This paper studies relations between the parallel random access machine (PRAM) model, and the reconfigurable mesh (RMESH) model, by providing mutual simulations between the models. We present an algorithm simulating one step of an (nlglgn)-processor CRCW PRAM on an n×n RMESH with delay O(lglgn) with high probability. We use our PRAM simulation to obtain the first efficient self-simulation algorithm of an RMESH with general switches: An algorithm running on an n×n RMESH is simulated on a p×p RMESH with delay O((n/p)2+lgnlglg p) with high probability, which is optimal for all p⩽n/√(lgnlglgn). Finally, we consider the simulation of RMESH on the PRAM. We show that a 2×n RMESH can be optimally simulated on a CRCW PRAM in θ((α(n)) time, where α(·) is the slow-growing inverse Ackermann function. In contrast, a PRAM with polynomial number of processors cannot simulate the 3×n RMESH in less than Ω(lgn/lglgn) expected time

Efficient PRAM simulation on a distributed memory machine

Algorithmica, 1996

We present algorithms for the randomized simulation of a shared memory machine (PRAM) on a Distributed Memory Machine (DMM). In a PRAM, memory con icts occur only through concurrent access to the same cell, whereas the memory of a DMM is divided into modules, one for each processor, and concurrent accesses to the same module create a con ict. The delay of a simulation is the time needed to simulate a parallel memory access of the PRAM. Any general simulation of an m processor PRAM on a n processor DMM will necessarily have delay at least m=n. A randomized simulation is called time-processor optimal if the delay is O(m=n) with high probability. Using a novel simulation scheme based on hashing we obtain a time-processor optimal simulation with delay O(loglog(n)log (n)). The best previous simulations use a simpler scheme based on hashing and have much larger delay:

Fast Generation of Random Permutations Via Networks Simulation

Algorithmica, 1998

We consider the problem of generating random permutations with uniform distribution. That is, we require that for an arbitrary permutation π of n elements, with probability 1/n! the machine halts with the ith output cell containing π(i), for 1 ≤ i ≤ n. We study this problem on two models of parallel computations: the CREW PRAM and the EREW PRAM. The main result of the paper is an algorithm for generating random permutations that runs in O(log log n) time and uses O(n 1+o(1)) processors on the CREW PRAM. This is the first o(log n)-time CREW PRAM algorithm for this problem. On the EREW PRAM we present a simple algorithm that generates a random permutation in time O(log n) using n processors and O(n) space. This algorithm outperforms each of the previously known algorithms for the exclusive write PRAMs. The common and novel feature of both our algorithms is first to design a suitable random switching network generating a permutation and then to simulate this network on the PRAM model in a fast way.

Simulating Shared Memory in Real Time: On the Computation Power of Reconfigurable Architectures

Information and Computation, 1997

We consider randomized simulations of shared memory on a distributed memory machine (DMM) where the n processors and the n memory modules of the DMM are connected via a reconfigurable architecture. We first present a randomized simulation of a CRCW PRAM on a reconfigurable DMM having a complete reconfigurable interconnection. It guarantees delay O(log *n), with high probability. Next we study a reconfigurable mesh DMM (RM-DMM). Here the n processors and n modules are connected via an n_n reconfigurable mesh. It was already known that an n_m reconfigurable mesh can simulate in constant time an n-processor CRCW PRAM with shared memory of size m. In this paper we present a randomized step by step simulation of a CRCW PRAM with arbitrarily large shared memory on an RM-DMM. It guarantees constant delay with high probability, i.e., it simulates in real time. Finally we prove a lower bound showing that size 0(n 2) for the reconfigurable mesh is necessary for real time simulations.

Shuffle Exchange Mesh Topology for Networks On Chip

Networks on Chip (NoC) used as the system on chips paradigm. The concept of NoC is traditional interconnection networks, they have some special properties which are different from the traditional network. In designing a NoC, network topology is an important issue. Because different NoC topology can dramatically affect the network characteristic, such as average inter IP, distance, total wire length, and communication flow distribution.

Deterministic P-RAM simulation with constant redundancy

Information and Computation, 1991

In this paper, we show that distributing the memory of a parallel computer and, thereby, decreasing its granularity allows a reduction in the redundancy required to achieve polylog simulation time for each PRAM step. Previously, realistic models of parallel computation assigned one memory module to each processor and, as a result, insisted on relatively coarse-grain memory. We propose, on the other hand, a more flexible, but equally valid model of computation, the distributed-memory, bounded-degree nerwork (DMBDN) model. This model allows the use of fine-grain memory while maintaining the realism of a bounded-degree interconnection network. We describe a PRAM simulation scheme, which is admitted under the DMBDN model, that exploits the increased memory bandwidth provided by a twodimensional mesh of trees (2DMOT) network to achieve an overhead in memory redundancy lower than that required by other fast, deterministic PRAM simulations. Specifically, for a deterministic simulation of an n-processor PRAM on a bounded-degree network, we are able to reduce the number of copies of each variable from O(log n/log log n) to e(l) and still simulate each PRAM step in polylog time. !

Randomized algorithms for packet routing on the mesh

1991

Packet routing is an important problem of parallel computing since a fast algorithm for packet routing will imply 1) fast inter-processor communication, and 2) fast algorithms for emulating ideal models like PRAMs on fixed connection machines.There are three different models of packet routing, namely 1) Store and forward, 2) Multipacket, and 3) Cut through. In this paper we provide a survey of the best known randomized algorithms for store and forward routing, k-k routing, and cut through routing on the Mesh Connected Computers.

NC-G-SIM: A Parameterized Generic Simulator for 2D-Mesh, 3D-Mesh & Irregular On-chip Networks with Table-based Routing

Global journal of computer science and technology, 2013

As chip density keeps doubling during each course of generation, the use of NoC has become an integral part of modern microprocessors and a very prevalent architectural feature of all types of SoCs. To meet the ever expanding communication challenges, diverse and novel NoC solutions are being developed which rely on accurate modeling and simulations to evaluate the impact and analyze their performances. Consequently, this aggravates the need to rely on simulation tools to probe and optimize these NoC architectures. In this work, we present NC-G-SIM (Network on Chip-Generic-SIMulator), a highly flexible, modular, cycle-accurate, configurable simulator for NoCs. To make NC-G-SIM suitable for advanced NoC exploration, it is made highly generic that supports extensive range of cores in any kind of topology whether 2D, 3D or irregular. Simulation results have been evaluated in terms of latencies, throughput and the amount of energy consumed during the simulation period at different levels.