A comment on "A circular list-based mutual exclusion scheme for large shared-memory multiprocessor (original) (raw)

A new hierarchy of hypercube interconnection schemes for parallel computers

The Journal of Supercomputing, 1988

This paper introduces a new hierarchy of cube-based interconnection schemes, called the base-b cube (which properly contains the well-known binary cube), for the design of parallel computers. This hierarchy admits a recursive definition and allows many more reconfigurations than are possible with the binary cube. Our analysis addresses the inherent cost-delay trade-off for this hierarchy along with a number of related topological properties such as sparsity, diameter, existence of node disjoint paths, and odd and even cycles. Embeddings of standard interconnection schemes including linear and two-dimensional arrays, rings, and complete binary trees in a base-b cube are illustrated.

A method for exploiting communication/computation overlap in hypercubes 1 Expanded version of a talk presented at Euro-Par'96 (Lyon, France, August 1996). 1

Parallel Computing - PC, 1998

This paper presents a method to derive efficient algorithms for hypercubes. The method . exploits two features of the underlying hardware: a the parallelism provided by the multiple . communication links of each node and b the possibility of overlapping computations and communications which is a feature of machines supporting an asynchronous communication protocol. The method can be applied to a generic class of hypercube algorithms whose distinguishing features are quite frequent in common algorithms for hypercubes. Many examples of this class of algorithms are found in the literature for different problems. The paper shows the efficiency of the method for two case studies. The results show that the reduction in communication overhead is very significant in many cases. They also show that the algorithms produced by our method are always very close to the optimum in terms of execution time. q 0167-8191r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved.

Efficient implementation of barrier synchronization in wormhole-routed hypercube multicomputers

Journal of Parallel and Distributed Computing, 1992

This paper addresses e cient implementation of barrier synchronization in wormhole-routed hypercube multicomputers. For those systems supporting only unicast communication in hardware, a novel software tree approach, the U-cube tree, is proposed. An important feature of the U-cube tree is that all messages injected into the network are guaranteed to be contention-free. Performance measurements of several barrier synchronization techniques implemented on a 64-node nCUBE-2 are given.

About message routing in different hypercube interconnection network types

Computer Science Journal of Moldova, 1999

The paper treats the problem of message routing in different hypercube interconnection network types. Because the communication algorithms frequently use a few basic communication operations, the purpose was to optain relationships for the total communication time at the implementation of these basic operations in different hypercube interconnection types. The basic communication operations considered were: simple message transfer between two processors, one to all broadcast, all to all broadcast, one to all personalized communication, and all to all personalized communication. For establishing the desired relationships, the starting point were the relationships for the total communication time for the above mentioned operations implemented on three basic interconnection networks: classical hypercube, ring and mesh. The different hypercube interconnection network types considered were: the cube connected cycles network, the extended hypercube, the hypernet network, the k array n hyp...

Intensive hypercube communication: Prearranged communication in link-bound machines

Hypercube algorithms are developed for a variety of communication-intensive tasks such as transposing a matrix, histogramming, one node sending a long message to another, broadcasting a message from one node to all others, each node broadcasting a message to all others, and nodes exchanging messages via a xed permutation. The algorithm for exchanging via a xed permutation can be viewed as a deterministic analogue of Valiant's randomized routing. The algorithms are for link-bound hypercubes in which local processing time is ignored, communication time predominates, message headers are not needed because all nodes know the task being performed, and all nodes can use all communication links simultaneously. Through systematic use of techniques such as pipelining, batching, variable packet sizes, symmetrizing, and completing, for all problems algorithms are obtained which achieve a time with an optimal highest-order term.

Number Theoretic Model of Concurrent Access in Hypercube Interconnection Network

International Journal for Research in Applied Science and Engineering Technology

This paper proposes a mathematical logical model to use shared communication links in the hypercube interconnection network. To achieve our objective we have used the orthogonal property of binary coding used to label the processing nodes of the hypercube and applied binary coding based multiplexing technique and the XOR logic operation to separate the individual processing nodes data from the multiplexed signal.

Communication pipelining in hypercubes

Many parallel algorithms exhibit a hypercube communication topology. Such algorithms can easily be executed on a multicomputer with a hypercube interconnection topology. However, in most cases these parallel algorithms only make use of a small fraction of the interconnection bandwidth offered by the multicomputer. In particular, each processor of a hypercube multicomputer is connected to d different neighbors by d different links. Nevertheless, hypercube algorithms usually do not use more than one of these d links at the same time. This paper presents a technique called communication pipelining that enables a more efficient use of the interconnection network and, in consequence, a significant reduction in the execution time. This technique is based on a transformation of the original algorithm. The resulting equivalent code makes use of several links of each node simultaneously. Given a particular problem and a particular architecture, the degree of pipelining to be applied is a design parameter that must be decided when transforming the original algorithm. The paper presents analytical models that allow for an optimal choice of the degree of pipelining for each problem and a given architecture. To illustrate the performance of the communication pipelining technique, its application to the FFT computation is presented as an example. It is shown that an optimal choice of the degree of pipelining can achieve a reduction by a factor of d in the communication overhead of the algorithm.