Bandwidth-Aware Resource Allocation for Heterogeneous Computing Systems to Maximize Throughput (original) (raw)
Related papers
Bandwidth-centric allocation of independent tasks on heterogeneous platforms
2002
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing e orts like SETI@home. We use a tree to model a grid, where resources can have di erent speeds of computation and communication, as well as di erent overlap capabilities. We de ne a base model, and show how to determine the maximum steady-state throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottom-up traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is bandwidth-centric: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have su ciently small communication times, regardless of their computation power.
Bandwidth-Centric Allocation of Independent Tasks
2004
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing e orts like SETI@home. We use a tree to model a grid, where resources can have di erent speeds of computation and communication, as well as di erent overlap capabilities. We de ne a base model, and show how to determine the maximum steady-state throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottom-up traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is bandwidth-centric: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have su ciently small communication times, regardless of their computation power.
Optimal allocation of tasks onto networked heterogeneous computers using minimax criterion
Proceedings of International Network Optimization …
Advances in microprocessors and computer networks have made distributed systems reality. However, exploiting the full potential of these systems requires efficient allocation of tasks comprising a distributed application to the available processors of the systems. This problem is known to be NP-hard and therefore untractable as soon as the number of tasks and/or processors exceeds a few units. This paper presents an optimal, memory efficient, algorithm for allocating an application program onto processors of a distributed system to minimize the program completion time. The algorithm derived from the well known Branch-and-Bound with some modifications to minimize its computational time. Some experimental results are given to show the effectiveness of the proposed algorithm.
Data partitioning with a realistic performance model of networks of heterogeneous computers
18th International Parallel and Distributed Processing Symposium, 2004. Proceedings., 2004
In this paper, we address the problem of optimal distribution of computational tasks on a network of heterogeneous computers when one or more tasks do not fit into the main memory of the processors and when relative speeds cannot be accurately approximated by constant functions of problem size. We design efficient algorithms to solve the scheduling problem using a realistic performance model of network of heterogeneous computers. This model integrates many essential features of a network of heterogeneous computers having a major impact on its performance such as the processor heterogeneity, the heterogeneity of memory structure, and the effects of paging. Under this model, the speed of each processor is represented by a continuous and relatively smooth function of the size of the problem whereas standard models use single numbers to represent the speeds of the processors. We formulate a problem of partitioning of an n-element set over p heterogeneous processors using this model and design efficient algorithms for its solution whose worst-case complexity is O(p 2 ×log 2 n) but the best-case complexity is O(p×log 2 n).
Optimal Task Assignment in Heterogeneous Distributed Computing Systems
A distributed system comprising networked heterogeneous processors requires efficient task-to-processor assignment to achieve fast turnaround time. The authors propose two algorithms based on the A* technique, which are considerably faster, are more memory-efficient, and give optimal solutions. The first is a sequential algorithm that reduces the search space. The second proposes to lower time complexity, by running the assignment algorithm in parallel, and achieves significant speedup.
Performance Evaluation of Distributed Computing over Heterogeneous Networks
Lecture Notes in Computer Science, 2007
RWAPI is a low-level communication interface designed for clusters of PCs. It has been developed to provide performance to higher applications on a wide variety of architectures. We implemented RWAPI on top of the modular software architecture called GRWA. RWAPI supports Ethernet, InfiniBand and Myrinet network interconnects. This paper introduces RWAPI and the design of its network component on top of both InfiniBand and Myrinet interconnects. We obtained a very low latency and high throughput compared to MPI results.
An optimized cost-based data allocation model for heterogeneous distributed computing systems
International Journal of Electrical and Computer Engineering (IJECE)
Continuous attempts have been made to improve the flexibility and effectiveness of distributed computing systems. Extensive effort in the fields of connectivity technologies, network programs, high processing components, and storage helps to improvise results. However, concerns such as slowness in response, long execution time, and long completion time have been identified as stumbling blocks that hinder performance and require additional attention. These defects increased the total system cost and made the data allocation procedure for a geographically dispersed setup difficult. The load-based architectural model has been strengthened to improve data allocation performance. To do this, an abstract job model is employed, and a data query file containing input data is processed on a directed acyclic graph. The jobs are executed on the processing engine with the lowest execution cost, and the system's total cost is calculated. The total cost is computed by summing the costs of com...
Parallel Computing on Heterogeneous Networks
2003
In the paper, we analyse challenges associated with parallel programming for common networks of computers (NoCs) that are, unlike dedicated parallel computer systems, inherently heterogeneous and unreliable. This analysis results in description of main features of an ideal parallel program for NoCs. We also outline some recent parallel programming tools, which try and respond to some of the challenges.
Scheduling Independent Tasks in Heterogeneous Environments under Communication Constraints
2006
With the advent of the Grid, task scheduling in heterogeneous environments becomes more and more important. Of particular interest is the fact that especially in scientific experiments a non negligible amount of data must be transferred to the processing node before a task can commence execution. Given bandwidth constraints, scheduling both computations and data transfers is required. In this paper we first develop a suitable model that captures heterogeneity in the processing nodes while imposing communication constraints. We proceed by proposing scheduling heuristics with the aim of minimizing the total makespan of a set of independent tasks. Through a series of experiments we illustrate the potential of a particular heuristic that is based on backfilling.
Journal of Parallel and Distributed Computing, 2005
Message-passing network-based multicomputer systems emerge as a potential economical candidate to replace supercomputers. Despite enormous effort to evaluate the performance of those systems and to determine an optimum scheduling algorithm (which is known as an NP-complete), we still lack a complete and a good performance model to analyze distributed computing systems. The model is complete if all system parameters, network parameters, communication overhead parameters, and application parameters are considered explicitly in the solution. A good performance model, like a good scientific theory, should be able to explain all normal behavior, predict any abnormality in the system, and allow the designer to adjust some of the parameters, while abstracting unimportant details. In this paper, we develop a good and complete performance model, which predicts a minimum finish time, equally the maximum speed up. In addition, we develop a closed form solution which forecasts the optimum share of the parallel job (task) that has to be assigned to each processor (node). Task assignment may then be undertaken in a distributed manner, which enhances the distributive nature of the system and, thus, improve system performance. Most importantly, our analytical solution presents a mechanism to select, based on system and application parameters, the optimum number of processors (nodes) that has to be assigned to a given parallel job. The model helps the designer to study the effect of each individual parameter on the overall system performance. This then becomes a tool for a designer of a multicomputer system to manage limited resources in an optimal manner paying attention only to those parameters that are most critical.