Traffic characteristics of a distributed memory system (original) (raw)
Related papers
Traffic Characteristics of a Distributed Memory System
We believe that many distributed computing systems of the future will use distributed shared memory as a technique for interprocess communication. Thus, traffic generated by memory requests will be a major component of the traffic for any networks which connect nodes in such a system. In this paper, we study memory reference strings gathered with a tracing program we devised. We study several models. First, we look at raw reference data, as would be seen if the network were a backplane. Second, we examine references in units of ''blocks'', first using a one-block cache model and then with an infinite cache. Finally, we study the effect of predictive prepaging of these ''blocks'' on the traffic. We provide a novel representation of memory reference data which can used to calculate interarrival distributions directly. Integrating communication with computation can be used to control both traffic and performance.
1993
We believe that many distributed computing systems of the future will use distributed shared memory as a technique for interprocess communication. Thus, traffic generated by memory requests will be a major component of the traffic for any networks which connect nodes in such a system. In this paper, we study memory reference strings gathered with a tracing program we devised. We study several models. First, we look at raw reference data, as would be seen if the network were a backplane. Second, we examine references in units of "blocks", first using a one-block cache model and then with an infinite cache. Finally, we study the effect of predictive prepaging of these "blocks " on the traffic. We provide a novel representation of memory reference data which can be used to calculate interarrival distributions directly. Integrating communication
Architecture and evaluation of a high-speed networking subsystem for distributed-memory systems
ACM SIGARCH Computer Architecture News, 1994
Achieving high-speed network U0 on distributed-memory systems is difficult because their architecture is in general ill-suited for communication processing. Some of the common problems are: inability to do protocol processing, inefficient handling of data distribution, and poor management of the I/O. In this paper we present an 110 architecture that addresses these problems and supports high-speed network I/O on distributed-memory systems. The key to good performance is to partition the work appropriately between the system and the network interface. We perform some communication tasks on the distributed-memory parallel system since it is more powerful, and less likely to become a bottleneck than the network interface. Tasks that do not parallelize well are performed on the network interface and hardware support is provided for the most time-critical operations. We emphasize the use of simple I/O mechanisms that can be used by programming tools that map applications on the distributed-memory system to implement efficient YO for the class of applications they support. This architecture has been implemented for the iWarp distributed-memory system. We describe this implementation and present performance results.
Architecture and Performance of the Mether Network Shared Memory
Mether is a Network Shared Memory (NSM). It allows applications on autonomous computers connected by a network to share a segment of memory. NSMs o er the attraction of a simple abstraction for shared state, i.e., shared memory. NSMs have a potential performance problem in the cost of remote references, which is typically solved by grouping memory into larger units such as pages, and caching pages. While Mether employs grouping and caching to reduce the average memory reference delay, it also removes the need for many remote references (page faults) by providing a facility with relaxed consistency requirements. Applications ported from a multiprocessor supercomputer with shared memory to a 16-workstation Mether con guration showed a cost/performance advantage of over 300 in favor of the Mether system. While Mether is currently implemented for Sun-3 and Sun-4 systems connected via Ethernet, other characteristics (such as a choice of page sizes and a semaphore-like access mode useful for process synchronization) should suit it to a wide variety of networks. A reimplementation for an alternate con guration employing packet-switched networks is in progress.
VM-Based Shared Memory on Low-Latency, Remote-Memory-Access Networks
ACM Sigarch Computer Architecture News, 1997
Recent technological advances have produced network interfaces that provide users with very low-latency access to the memory of remote machines. We examine the impact of such networks on the implementation and performance of software DSM. Specifically, we compare two DSM systems-Cashmere and TreadMarks-on a 32-processor DEC Alpha cluster connected by a Memory Channel network.
Prospects of distributed shared memory for reducing global traffic in shared-bus multiprocessors
Journal of Systems Architecture, 1998
As an effort, not a mutually exclusive but rather complementary to developing better backplane bus, this paper considers adapting distributed shared-memory (DSM) architectures to improve traditional shared-bus designs. We consider two well-known DSM architectures, namely Cache-coherent Non-Uniform Memory Architecture (NUMA) and Cache-Only Memory Architecture (COMA), in reducing bus traffic. Our study shows that COMA provides excellent opportunity of significantly reducing bandwidth requirement for the bus while cache-coherent NUMA provides a marginal improvement.
A Queueing Network Model of Distributed Shared Memory
Computer Networks, Architecture and Applications, 1995
Distributed Shared Memory (DSM) is a mechanism which provides logical shared memory in a distributed system. This simplifies the task of distributed programming. We develop simple queueing network models that can be used to compare the performance characteristics of several DSM algorithms for location of shared pages. Our models can be evaluated efficiently using the MV A (Mean Value Analysis) technique. Using these models, we can explore the regions of applicability of various DSM algorithms. By modelling contention at shared resources such as the network and memory servers, our model is able to provide more accurate results than the earlier analysis by Stumm and Zhou.
Reducing host load, network load, and latency in a distributed shared memory
Distributed Computing Systems, 1990 …, 1990
Reducing Host Load, Network Load, and Latency in a Distributed ... Abstract Mether is a Distributed Shared Memory @SM) that runs on Sun'workstations under the SunOS 4.0 operating system. User programs access the Mether address space in a way khstinguishable from ...
Efficient Use of Memory-Mapped Network Interfaces for Shared Memory Computing
1997
Memory-mapped network interfaces provide users with fast and cheap access to remote memory on clusters of workstations. Software distributed shared memory (DSM) protocols built on top of these networks can take advantage of fast messaging to improve performance. The low latencies and remote memory access capabilities of these networks suggest the need to re-evaluate the assumptions underlying the design of DSM protocols. This paper describes some of the approaches currently being used to support shared memory e ciently on such networks. We discuss other possible design options for DSM systems on a memory-mapped network interface and propose methods by which the interface can best be used to implement coherent shared memory in software.
Evaluation of cluster interconnects for a distributed shared memory
1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305), 1999
Clusters of Symmetrical Multiprocessors (SMPs) have recently become popular as low-cost, high-performance computing solutions. The type of interconnection hardware used in these clusters can become a deciding factor in their overall pei$orinance. This paper evaluates the performance of three different communication systems, I O Mbps Ethernet, 100 Mbps FastEthemet and 155 Mbps ATM, using a multithreaded Distributed Shared Memory system, Strings. The raw peiformance of each network is$rst measured using netperf. Ten different upplications are then used for Performance evaluation, including programs from the SPLASH-2 benchmarks, a medical computing application, and some computational kernels. It is found that half of the programs tested are not significantly affected by changes in the bandwidth. Though the ATM network provides the highest overall bandwidth, the remaining applications show that the increase in latency compared to FastEthernet prevents any performance improvement. On the other hand, applications that require only moderately high bandwidths perform substantially better with FastEthernet.