Reconfigurable interconnection networks in Distributed Shared Memory systems: a study on communication patterns (original) (raw)
Related papers
hut.edu.vn
Communication has always been a limiting factor in making efficient computing architectures with large processor counts. Reconfigurable interconnects can help in this respect, since they can adapt the interprocessor network to the changing communication requirements imposed by the running application. In this paper, we present a performance evaluation of these reconfigurable interconnection networks in the context of shared-memory multiprocessor (SMP) machines. We look at the effects of architectural parameters such as reconfiguration speed and topological constraints, and analyze how these results scale up with higher processor counts. We find that for 16 processors connected in a torus topology, reconfigurable interconnects with switching speeds in the order of milliseconds can provide up to 20% reduction in communication delay. For larger networks, up to 64 processors, the expected gain can rise up to 40%. This shows that reconfigurable networks can help in removing the communication bottleneck from future interconnection designs.
Reconfigurable interconnects in DSM systems: a focus on context switch behavior
Recent advances in the development of reconfigurable optical interconnect technologies allow for the fabrication of low cost and run-time adaptable interconnects in large distributed shared-memory (DSM) multiprocessor machines. This can allow the use of adaptable interconnection networks that alleviate the huge bottleneck present due to the gap between the processing speed and the memory access time over the network. In this paper we have studied the scheduling of tasks by the kernel of the operating system (OS) and its influence on communication between the processing nodes of the system, focusing on the traffic generated just after a context switch. We aim to use these results as a basis to propose a potential reconfiguration of the network that could provide a significant speedup.
Predicting reconfigurable interconnect performance in distributed shared-memory systems
Integration, the VLSI Journal, 2007
Reconfigurable interconnection networks have been shown to benefit performance in distributed shared-memory multiprocessor machines. Usually, performance measurements for these networks require large numbers of slow full-system simulations, making designspace exploration a cumbersome and time-consuming task. In this paper, we present a prediction model for the performance of a reconfigurable network, based on a single full-system simulation and a much shorter, per parameter set post-processing phase. We provide simulation results establishing the relative accuracy of the technique and analyze the impact of several assumptions that were made. With our method, a quick evaluation of a large range of parameters is now possible, allowing the designer to make well-founded design trade-offs. r
Prediction model for evaluation of reconfigurable interconnects in distributed shared-memory systems
International Workshop on System Level Interconnect Prediction, SLIP, 2005
Reconfigurable interconnection networks for distributed shared memory machines exploit properties of the workload dynamics that are not easily captured by statistical traffic models. Therefore, when designing such a network, one should make trade-offs based on full-system simulation for all viable workloads. It is however very time-consuming to do such simulations. In this paper, we present a technique that can predict the performance of a machine for different network parameters, based on the results of only one full simulation run. We also define confidence intervals for our prediction, and analyze the impact of several assumptions that were made.
Evaluation of cluster interconnects for a distributed shared memory
1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305), 1999
Clusters of Symmetrical Multiprocessors (SMPs) have recently become popular as low-cost, high-performance computing solutions. The type of interconnection hardware used in these clusters can become a deciding factor in their overall pei$orinance. This paper evaluates the performance of three different communication systems, I O Mbps Ethernet, 100 Mbps FastEthemet and 155 Mbps ATM, using a multithreaded Distributed Shared Memory system, Strings. The raw peiformance of each network is$rst measured using netperf. Ten different upplications are then used for Performance evaluation, including programs from the SPLASH-2 benchmarks, a medical computing application, and some computational kernels. It is found that half of the programs tested are not significantly affected by changes in the bandwidth. Though the ATM network provides the highest overall bandwidth, the remaining applications show that the increase in latency compared to FastEthernet prevents any performance improvement. On the other hand, applications that require only moderately high bandwidths perform substantially better with FastEthernet.
Proceedings Symposium
An intriguing aspect of optical interconnects from an architectural point of view is their ability to reconfigure the topology in a data-transparent way. We focus in this work on the potentialities of such dynamically reconfigurable interconnects in particularly for parallel shared-memory machines and we identify the timescales at which this interprocessor network should be able to reconfigure to result in a performance gain. By performing full-system simulations of the shared memory architectures with parallelized benchmark applications, we show the existence of a considerable amount of bursts which last up to tens of milliseconds, indicating a possible match with current technology for reconfigurable optical interconnects. erhiteturl study of the opportunities for reon(gurle optil interonnets in distriuted shred memory systemsF PUV
Traffic temporal analysis for reconfigurable interconnects in shared-memory systems
Proceedings - 19th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2005, 2005
New advances in reconfigurable optical interconnect technologies will allow the fabrication of cheap, fast and run-time adaptable networks for connecting processors and memory modules in large shared-memory multiprocessor machines. Since the switching times of these components are typically high compared to the memory access time, reconfiguration can only take place on a time scale significantly above individual memory accesses. In this paper, we present preliminary results of our investigation into the exploitability of the space and time locality of address streams by a reconfigurable network.
Traffic characteristics of a distributed memory system
Computer Networks and ISDN Systems, 1991
We believe that many distributed computing systems of the future will use distributed shared memory as a technique for interprocess communication. Thus, traffic generated by memory requests will be a major component of the traffic for any networks which connect nodes in such a system. In this paper, we study memory reference strings gathered with a tracing program we devised. We study several models. First, we look at raw reference data, as would be seen if the network were a backplane. Second, we examine references in units of ''blocks'', first using a one-block cache model and then with an infinite cache. Finally, we study the effect of predictive prepaging of these ''blocks'' on the traffic. We provide a novel representation of memory reference data which can used to calculate interarrival distributions directly. Integrating communication with computation can be used to control both traffic and performance.
Overview of distributed shared memory
Trinity College Dublin, 1998
So much has already been written about everything that you can't nd out anything about it. | James Thurber, L a n terns and Lances (1961) Loosely-coupled distributed systems have e v olved using message passing as the main paradigm for sharing information. Other paradigms used in loosely-coupled distributed systems, such as rpc, are usually implemented on top of an underlying message-passing system. On the other hand, in tightly-coupled architectures, such a s m ulti-processor machines, the paradigm is usually based on shared memory with its attractively simple programming model. The shared-memory paradigm has recently been extended for use in more loosely-coupled architectures and is known as distributed shared memory (dsm 153, 1 7 8 , 5 8 ]) in this context. This chapter discusses some of the issues involved in the design and implementation of such a dsm in loosely-coupled distributed systems and brie y discusses related work in other elds.
Efficient Use of Memory-Mapped Network Interfaces for Shared Memory Computing
1997
Memory-mapped network interfaces provide users with fast and cheap access to remote memory on clusters of workstations. Software distributed shared memory (DSM) protocols built on top of these networks can take advantage of fast messaging to improve performance. The low latencies and remote memory access capabilities of these networks suggest the need to re-evaluate the assumptions underlying the design of DSM protocols. This paper describes some of the approaches currently being used to support shared memory e ciently on such networks. We discuss other possible design options for DSM systems on a memory-mapped network interface and propose methods by which the interface can best be used to implement coherent shared memory in software.