A Survey of Techniques for Analyzing Memory Interference in Multiprocessor Computer Systems (original) (raw)

Analysis of Memory Interference in Multiprocessors

IEEE Transactions on Computers, 2000

This paper presents Markov chain models for analyzing the extent of memory interference in multiprocessor systems with a crosspoint switch for processor-memory communication. Processor behavior is simplified to an ordered sequence of a memory request followed by a certain amount of processing time. The results predicted by the model are compared with some simulation results and some actual measurements on C.mmp, a multiprocessor system being built at Carnegie-Mellon University.

A multiprocessor based packet-switch: performance analysis of the communication infrastructure

The intra-chip communication infrastructures are receiving always more attention since they are becoming a crucial part in the development of current SoCs. Due to the high availability of pre-characterized hard-IP, the complexity of the design is moving toward global interconnections which are introducing always more constraints at each technology node. Power consumption, timing closure, bandwidth requirements, time to market, are some of the factors that are leading to the proposal of new solutions for next generation multi-million SoCs. The need of high programmable systems and the high gate-count availability is moving always more attention on multiprocessors systems (MP-SoC) and so an adequate solution must be found for the communication infrastructure. One of the most promising technologies is the Network-On-Chip (NoC) architecture, which seems to better fit with the new demanding complexity of such systems. Before starting to develop new solutions, it is crucial to fully understand if and when current bus architectures introduce strong limitations in the development of high speed systems. This article describes a case study of a multiprocessor based ethernet packet-switch application with a shared-bus communication infrastructure. This system aims to depict all the bottlenecks which a shared-bus introduces under heavy load. What emerges from this analysis is that, as expected, a shared-bus is not scalable and it strongly limits whole system performances. These results strengthen the hypothesis that new communication architectures (like the NoC) must be found.

Markov chain models for analyzing memory interference in multiprocessor computer systems

Computer architecture news, 1973

This paper discusses various analytical techniques for studying the extent of memory interference in a multiprocessor system with a crosspoint switch for processor-memory communication. Processor behavior is simplified to an ordered sequence of a memory request followed by an interval of processing time. The system is assumed to be bus bound; in other words, by the time the processor-memory bus completes servicing a processor's request the processor is ready to initiate another request and the memory module is ready to accept another request. The techniques discussed include discrete and continuous time Markov chain models as well as several approximate analytic methods. Discrete Constant Constant Exact Solution is Markov Chain tp=tw algorithmic, Unwieldy for large n. Strecker's Constant Constant Rpproximate gpproximatlon Continous Time Exponential Exponential Exact Markoy Chain Diffusion Constant Constant flpproximate Rpproximation Simulation flodol Approximate Closed form solution. Slmplo formula. Closed form solution. Simple iormula. Closed form solution. Simple formula. Unul=ld 9 due to slow stochastic COhVergeRCe.

Performance of processor-memory interconnections for multiprocessors

Computers, IEEE Transactions on, 1981

A class of interconnection networks based on some existing permutation networks is described with applications to processor to memory communication in multiprocessing systems. These networks, termed delta networks, allow a direct link between any processor to any memory module. The delta networks and full crossbars are analyzed with respect to their effective bandwidth and cost. The analysis shows that delta networks have a far better performance per cost than crossbars in large multiprocessing systems.

Some Performance Issues in Multiprocessor System Design

IEEE Transactions on Computers, 1977

In a multiprocessor system with n Pc's and m Mp modules, independent processes can make simultaneous requests to the same memory module and interference will occur. Several abstract models of the operation of a multiprocessor system at the Some Performance Issues in Multiprocessor System memory request level have been developed [7]-[10]. These models Design provide tools for analyzing the performance of various multiprocessor configurations. This correspondence summarizes in-DILEEP AP. BHANDARKAR sights into several performance aspects gleaned from the appli-DILEEP P. BHANDARKAR cation of these tools to the multiprocessor design space. The re-Abstract-Analytic and simulation models of memory interfer-sults presented in this correspondence are based on the analytic ence have been reported in the literature. These models provide models described in [8], unless otherwise indicated. tools for analyzing various system architecture alternatives. Some The main design parameters are processor speed, memory of the design parameters are processor speed, memory speed, speed, number of processors, and number of memory modules. number of processors, number of memories, use of cache memories, The speed of the processor TP, is characterized by the average high-order versus low-order interleaving, and memory allocation. time interval between the completion of the last memory access This correspondence applies existing analytic and simulation and the initiation of the next. The speed of the memory is defined models to the multiprocessor design space and presents guidelines in terms of its access time TA, and cycle time TC. The rewrite for the multiprocessor system architect. Preferred design altertime TW of the memory is the difference between the cycle and natives and tradeoffs are outlined. access times i.e., TW = TC-TA. ' C. K. Yuen, IEEE Tranls. Comput. (Corresp.

Exploring the switch design space in a CC-NUMA multiprocessor environment

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000, 2000

The switch design for interconnection networks plays an important role in the overall performance of multiprocessors and computer networks. It is therefore crucial to study various factors in the switch design space and their influence on the system performance. In this paper we first propose a 4-D framework for the design of input queuing switches with wormhole routing and virtual channels. Then we explore the design space to examine in detail the impact of four parameters: virtual channel allocation, intraswitch connectivity, buffer space allocation and link arbitration policy. Our simulations, performed with an execution driven simulator with ILP processors, show that the cumulative effect of the four switch enhancements ranges between 7% and 38%. The most important parameter proves to be VC allocation method (up to 28% improvements in execution time). The other three bring about the same level of performance: between 1% and 7% depending on the application.

Performance of Multiprocessor Interconnection Networks

IEEE Computer, 1989

w ith device characteristics approaching physical limits, parallel or distributed processing has been widely advocated as a promising approach for building high performance computing systems. The continued impetus in research in these areas arises from two factors: (a) the technological development in the areaof VLSI chips and (b) the observation that significant exploitable software parallelism is inherent in many scientific and engineering applications.

Modeling and analysis of a communication switching processor

Performance Evaluation, 1986

A communication switching processor is modeled as an M/G/1 queueing system with a processor sharing service discipline. Messages arrive at the processor according to a Poisson stochastic process. The service requirements of a message consist of various communication functions performed at the processor and are organized into a set of tasks. Each task service time is a random variable from a general distribution. Tasks are assigned prescribed priorities. The tasks of a message are served sequentially. The processor serves tasks according to their priority and tasks of the same priority on a FCFS basis. This ,paper analytically derives the average delay of an arbitrary task in the system. The use of the results in the design of communication processors is illustrated.