Mapping Filtering Streaming Applications (original) (raw)

Mapping filtering streaming applications with communication costs

2009

In this paper, we explore the problem of mapping filtering streaming applications on large-scale homogeneous platforms, with a particular emphasis on communication models and their impact. Filtering application are streaming applications where each node also has a selectivity which either increases or decreases the size of its input data set. This selectivity makes the problem of scheduling these applications more challenging than the more studied problem of scheduling "non-filtering" streaming workflows.

Complexity Results for Collective Communications on Heterogeneous Platforms

International Journal of High Performance Computing Applications, 2006

In this paper, we consider the communications involved in the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macro-communication schemes, for example to broadcast data items, either to all resources (broadcast) or to a restricted set of targets (multicast). Rather than aiming at minimizing the execution time of a single collective communication, we focus on the steady-state operation. We assume that there is a large number of messages to be broadcast or multicast in pipelined fashion, and we aim at maximizing the throughput, i.e. the (rational) number of messages which can be broadcast or multicast every timestep. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. Achieving the best throughput may well require that the target platform is used in totality: different messages may need to be transferred along different paths.

Optimizing the steady-state throughput of Broadcasts on heterogeneous platforms

2003

In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous "grid" platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items. Rather than aiming at minimizing the execution time of a single broadcast, we focus on the steady-state operation. We assume that there is a large number of messages to be broadcast in pipeline fashion, and we aim at maximizing the throughput, i.e. the (rational) number of messages which can be broadcast every time-step. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. Achieving the best throughput may well require that the target platform is used in totality: we show that neither spanning trees nor DAGs are as powerful as general graphs.

Pipelining broadcasts on heterogeneous platforms

2005

In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example, to broadcast data items. Rather than aiming at minimizing the execution time of a single broadcast, we focus on the steady-state operation. We assume that there is a large number of messages to be broadcast in pipeline fashion, and we aim at maximizing the throughput, i.e., the (rational) number of messages which can be broadcast every time-step. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. Achieving the best throughput may well require that the target platform is used in totality: We show that neither spanning trees nor DAGs are as powerful as general graphs. We show how to compute the best throughput using linear programming, and how to exhibit a periodic schedule, first when restricting to a DAG, and then when using a general graph. The polynomial compactness of the description comes from the decomposition of the schedule into several broadcast trees that are used concurrently to reach the best throughput. It is important to point out that a concrete scheduling algorithm based upon the steady-state operation is asymptotically optimal, in the class of all possible schedules (not only periodic solutions).

Complexity results and heuristics for pipelined multicast operations on heterogeneous platforms

International Conference on Parallel Processing, 2004. ICPP 2004., 2004

In this paper, we consider the communications involved by the execution of a complex application deployed on a heterogeneous platform. Such applications extensively use macro-communication schemes, for example to broadcast data items to several targets, known as the multicast operation. Rather than seeking to minimize the execution time of a single multicast, we focus on steady-state performance. We target heterogeneous platforms, modeled by a graph where resources have different communication speeds. We show that the problem of computing the best throughput for a multicast operation is NP-hard, whereas the best throughput to broadcast a message to every node in a graph can be computed in polynomial time. Thus we introduce several heuristics to deal with this problem; most of them are based on linear programming. We prove that some of these heuristics are approximation algorithms. We perform simulations to test these heuristics and show that their results are close to a theoretical upper bound on the throughput that we obtain with the linear programming approach.

Mapping Linear Workflows with Computation/Communication Overlap

2008

This paper presents theoretical results related to mapping and scheduling linear workflows onto heterogeneous platforms. We use a realistic architectural model with bounded communication capabilities and full computation/communication overlap. This model is representative of current multi-threaded systems. In these workflow applications, the goal is often to maximize throughput or to minimize latency. We present several complexity results related to both these criteria. To be precise, we prove that maximizing the throughput is NP-complete even for homogeneous platforms and minimizing the latency is NP-complete for heterogeneous platforms. Moreover, we present an approximation algorithm for throughput maximization for linear chain applications on homogeneous platforms, and an approximation algorithm for latency minimization for linear chain applications on all platforms where communication is homogeneous (the processor speeds can differ). In addition, we present algorithms for several important special cases for linear chain applications. Finally, we consider the implications of adding feedback loops to linear chain applications.

Broadcasting on a budget in the multi-service communication model

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)

which implies the lemma. Let T 0 be an optimal broadcast schedule for for n processors that uses only q k. Certainly heightT heightT 0. However, heightT 0 = Opolylg n [3, 9].

Bandwidth-centric allocation of independent tasks on heterogeneous platforms

2002

In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing e orts like SETI@home. We use a tree to model a grid, where resources can have di erent speeds of computation and communication, as well as di erent overlap capabilities. We de ne a base model, and show how to determine the maximum steady-state throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottom-up traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is bandwidth-centric: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have su ciently small communication times, regardless of their computation power.

On Multicast Algorithms for Heterogeneous

Networks of workstations (NOWs) provide an economical platform for high performance parallel computing. Such networks may comprise a variety of different types of workstations and network devices. This paper addresses the problem of efficient multicast in a heterogeneous communication model. Although the problem of finding optimal multicast schedules is known to be NP-complete in this model, a greedy algorithm has been shown experimentally to find good solutions in practice. In this paper we show that the greedy algorithm finds provably near-optimal schedules in polynomial time and that optimal schedules can be found in polynomial time when the number of distinct types of workstations is bounded by a constant. Specifically, this paper presents three results. First, when there are n workstations of some constant k distinct types, the greedy algorithm is shown to find schedules that complete at most a constant additive term later than optimal. Second, an algorithm is given that finds optimal schedules in time O(n 2k ). Finally, it is shown that for the general problem, the greedy algorithm finds solutions that complete the multicast in at most twice the optimal time.