Modelling and Evaluation of Multiprocessor Architecture (original) (raw)

A load balancing mechanism for large scale multiprocessor systems and its implementation

New Generation Computing, 1990

In large scale multiprocessor systems, the distance between processors should be taken into account by software to reduce the network traffic and the communication overhead. A load balancing method based on p3 (Processing Power Plane) model is proposed to enable programmers to specify distributing computational load, keeping the locality of the computation. In this method, a process is allocated to a rectangle on a hypothetical processing power plane. The size of the rectangle represents the processing power given to the process, and the distance between rectangles represents the communication cost between them. This plane is divided to processors, and the region of the processor may be dynamically reshaped to alleviate imbalance on p3. Mechanism for realization of the method has been implemented on the Multi-PSI/version 2, which is a parallel processing system with 64 processing elements connected to form a 2-dimensional mesh network. A packet transmission mechanism of the Multi-PSI/version 2 is described, which realizes the process distribution along with the balancing method.

Comparative Study on Load Balancing Algorithm for Multiprocessor Interconnection Networks

World Academy of Research in Science and Engineering, 2019

To achieve a high performance network and effective sharing of computing resources, it is important to distribute the load evenly among different nodes. Therefore, efficient scheduling strategy is required to map the load onto the set of nodes. The main problem faced in the design of scheduling algorithm is the lack of information about the network load distribution and hence the task execution time estimation. Distributed System is important to distributing the work load on processors. Distributed system can be viewed as a collection of computing and communication resources shared by active users. When the demand for computing power increases, Load Balancing problem become important. The purpose of load balancing is to improve the performance of system through an application load. Load balancing is the method of distributing the load among the various nodes of an interconnection network to improve both imbalance load and job execution time while also avoiding a situation where some of the nodes are heavily loaded while other nodes are idle or doing very little work. Load balancing ensures that all the processor in the system or every node in the network does approximately the equal amount of work at any instant of time. In this paper, study of existing load balancing algorithm is brief discussed along with their objective, method, advantages and future work. In simulation study, our previous works DLBS, ITSLB and LBSM is compared on FCC, MC and X-Torus interconnection networks and experimental results have been reported accordingly.

Two Round Scheduling (TRS) Scheme for Linearly Extensible Multiprocessor Systems

International Journal of Computer Applications, 2012

Balancing the computational load over multiprocessor networks is an important problem in massively parallel systems. The key advantage of such systems is to allow concurrent execution of workload characterized by computation units known as processes or tasks. The scheduling problem is to maintain a balanced execution of all the tasks among the various available processors (nodes) in a multiprocessor network. This paper studies the scheduling of tasks on a pool of identical nodes which are connected through some interconnection network. A novel dynamic scheduling scheme named as Two Round Scheduling (TRS) scheme has been proposed and implemented for scheduling the load on various multiprocessor interconnection networks. In particular, the performance of the proposed scheme is evaluated for linearly extensible multiprocessor systems, however, a comparison is also made with other standard existing multiprocessor systems. The TRS operates in two steps to make the network fully balanced. The performance of this scheme is evaluated in terms of the performance index called Load Imbalance Factor (LIF), which represents the deviation of load among processors and the balancing time for different types of loads. The comparative simulation study shows that the proposed TRS scheme gives better performance in terms of task scheduling on various linearly extensible multiprocessor networks for both uniform and nonuniform types of loads.

An algorithm for load balancing in multiprocessor systems

Information Processing Letters, 1990

We present an algorithm for dynamic load balancing in a multiprocessor system that minimizes the number of accesses to the shared memory. The algorithm makes no assumptions, probabilistic or otherwise, regarding task arrivals or processing requirements. For k processors to process n tasks, the algorithm incurs O(k log k log n) potential memory collisions in the worst care. The algorithm itself is a simple variation of the strategy of visiting the longest queue. T'he key idea is to delay reporting task arrivals and completions, where the delay is a function of dynamic loading conditions.

An Adaptive Task-Core Ratio Load Balancing Strategy for Multi-core Processors

With the proliferation of multi-core processors in servers, desktops, game consoles, mobile phones and a magnitude of other embedded devices; the need to ensure effective utilization of the processing cores becomes essential. This calls for research and development emphasis for a well engineered operating systems load balancer for these multi-core processors. In this paper, an adaptive load balancing strategy is presented. The adaptive load balancer will trigger tasks migration based on the tasks to processing core ratio, as well as when a processing core becomes idle. In our work, we utilize LinSched, a Linux operating system scheduler simulator, to analyze the number of task migrations. The Linux operating system is representative of the whole spectrum of computing as it is used in supercomputers, servers, desktops, mobile phones and embedded devices. Results from the simulation show that unnecessary task migrations were eliminated whilst maintaining the load balancing function effectively, as compared to the default strategy employed by the Linux operating system. The overheads introduced by the adaptive load balancer were measure through implementing it in a Linux kernel and measurements were made using the hackbench scalability test. The implementation proves to have negligible effect on the scalability and we can conclude that it does not introduce overheads. From our research, it shows that the adaptive load balancer provides a scalable solution for a lower and more consistent triggering of task migrations.

Load Balancing Performance of Dynamic Scheduling on NUMA Multiprocessors

Self-scheduling is a method for task scheduling in parallel programs, in which e a c h processor acquires a new block of tasks for execution whenever it becomes idle. To get the best performance, the block s i z e m ust bechosen to balance the scheduling overhead against the load imbalance. To determine the best block size, a better understanding of the role of load imbalance in self-scheduling performance is needed. In this paper we study the e ect of memory contention on task duration distributions and, hence, load balancing in self-scheduling on a Non Uniform Memory Access (NUMA) machine. Experimental studies on a BBN TC2000 are used to reveal the strengths and weaknesses of analytical performance models to predict running time and optimal block size. The models are shown to be very accurate for small block sizes. However, the models fail when the block size is large due to a previously unrecognized source of load imbalance. We extend the analytical models to address this failure. The implications for the construction of compilers and runtime systems are discussed.

A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems

Parallel Processing (ICPP), 2012 41st International Conference on, 2012

Multi-core compute nodes with non-uniform memory access (NUMA) are now a common architecture in the assembly of large-scale parallel machines. On these machines, in addition to the network communication costs, the memory access costs within a compute node are also asymmetric. Ignoring this can lead to an increase in the data movement costs. Therefore, to fully exploit the potential of these nodes and reduce data access costs, it becomes crucial to have a complete view of the machine topology (i.e. the compute node topology and the interconnection network among the nodes). Furthermore, the parallel application behavior has an important role in determining how to utilize the machine efficiently. In this paper, we propose a hierarchical load balancing approach to improve the performance of applications on parallel multi-core systems. We introduce NUCOLB, a topology-aware load balancer that focuses on redistributing work while reducing communication costs among and within compute nodes. NUCOLB takes the asymmetric memory access costs present on NUMA multi-core compute nodes, the interconnection network overheads, and the application communication patterns into account in its balancing decisions. We have implemented NUCOLB using the CHARM++ parallel runtime system and evaluated its performance. Results show that our load balancer improves performance up to 20% when compared to state-of-theart load balancers on three different NUMA parallel machines.

A simple load balancing scheme for task allocation in parallel machines

Proceedings of the third annual ACM symposium on Parallel algorithms and architectures - SPAA '91, 1991

A collection of local workpiles (task queues) and a simple load balancing scheme is well suited for scheduling tasks in shared memory parallel machines. Task scheduling on such machines has usually been done through a single, globally accessible, workpile. The scheme introduced in this paper achieves a balancing comparable to that of a global workpile, while minimizing the overheads. In many parallel computer architectures, each processor has some memory that it can access more e ciently, and so it is desirable that tasks do not mirgrate frequently. The load balancing is simple and distributed: Whenever a processor accesses its local workpile, it performs a balancing operation with probability inversely proportional to the size of its workpile. The balancing operation consists of examining the workpile of a random processor and exchanging tasks so as to equalize the size of the two workpiles. The probabilistic analysis of the performance of the load balancing scheme proves that each tasks in the system receives its fair share of computation time. Speci cally, the expected size of each local task queue is within a small constant factor of the average, i.e. total number of tasks in the system divided by the number of processors.

A Pragmatic Analysis On Assorted Load Balancing Multiprocessing Scheduling Algorithms

2013

Parallel processing is one of the important processing types of applications that are used to execute multiple tasks on different number of processors at the same time. To utilize the processors in optimized manner, various scheduling algorithms are used. In multiprocessor scheduling algorithms, scheduling may be required for both related and unrelated tasks. The main objective of any multiprocessor scheduling algorithm is to schedule the tasks or jobs in optimized way so that performance of the processors may be optimized. In this paper, we have discussed about multiprocessor scheduling algorithms.

A Comparative Study of Various Load Balancing Algorithm in Parallel and Distributed Multiprocessor System

International Journal of Computer Applications

In modern days parallel and distributed computing is one of the greatest platform for research and innovation in the field of computer science. Rapid growth of communication network and need to solve large scale problem, complexity and efficiency of the system as a whole is the key issue. Load balancing is one of the most important problem in attaining high performance in parallel and distributed systems which may consist of many heterogeneous resources connected via one or more communication networks. Load balancing is the process of distributing or reassigning of load over the different nodes which provide good recourse utilization and better throughput. Although intense work has been done in the algorithm design of load balancing and its performance measure issues, we present a brief overview of various load balancing conditions and its algorithmic classification for tailor made applications. Various criteria were discussed for the classification of load balancing helping designers to compare and choose the most suitable algorithm for the application.

Modelling and Evaluation of Multiprocessor Architecture (original) (raw)

Related papers