Improved Resource Utilization with Buffered Coscheduling (original) (raw)

Buffered coscheduling: a new methodology for multitasking parallel jobs on distributed systems

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000, 2000

Buffered coscheduling is a scheduling methodology for time-sharing communicating processes in parallel and distributed systems. The methodology has two primary features: communication buffering and strobing. With communication buffering, communication generated by each processor is buffered and performed at the end of regular intervals to amortize communication and scheduling overhead. This infrastructure is then leveraged by a strobing mechanism to perform a total exchange of information at the end of each interval, thus providing global information to more efficiently schedule communicating processes. This paper describes how buffered coscheduling can optimize resource utilization by analyzing workloads with varying computational granularities, load imbalances, and communication patterns. The experimental results, performed using a detailed simulation model, show that buffered coscheduling is very effective on fast SANs such as Myrinet as well as slower switch-based LANs.

Efficient Scheduling of Parallel Jobs on Massively Parallel Systems

2007

We present bu ered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the e cient implementation of a distributed operating system. Bu ered coscheduling is based on three innovative techniques: communication bu ering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of bu ered coscheduling include higher resource utilization, reduced communication overhead, e cient implementation of ow-control strategies and fault-tolerant protocols, accurate performance modeling, and a simpli ed yet still expressive parallel programming model. Preliminary experimental results show that bu ered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.

Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System

2003

A parallel application benefits from scheduling policies that include a global perspective of the application's process working set. As the interactions among cooperating processes increase, mechanisms to ameliorate waiting within one or more of the processes become more important. In particular, collective operations such as barriers and reductions are extremely sensitive to even usually harmless events such as context switches among members of the process working set. For the last 18 months, we have been researching the impact of random short-lived interruptions such as timer-decrement processing and periodic daemon activity, and developing strategies to minimize their impact on large processor-count SPMD bulk-synchronous programming styles. We present a novel co-scheduling scheme for improving performance of fine-grain collective activities such as barriers and reductions, describe an implementation consisting of operating system kernel modifications and run-time system, and present a set of empirical results comparing the technique with traditional operating system scheduling. Our results indicate a speedup of over 300% on synchronizing collectives.

Scheduling with global information in distributed systems

Proceedings 20th IEEE International Conference on Distributed Computing Systems, 2000

One of the major problems faced by the developers of parallel programs is the lack of a clear separation between the programming model and the operating system. In this paper, we present a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. This methodology is based on three innovative techniques: communication buffering, strobing, and non-blocking, one-sided communication. By leveraging these techniques, we can perform effective optimizations based on the gloabl status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of the proposed methodology include higher resource utilization, reduced communication overhead, efficient implementation of flowcontrol strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Some preliminary experimental results show that this methodology is very effective in increasing the overall performance in the presence of load imbalance and communication intensive workloads.

Coscheduling techniques and monitoring tools for non-dedicated cluster computing

Our efforts are directed towards the understanding of the coscheduling mechanism in a NOW system when a parallel job is executed jointly with local workloads, balancing parallel performance against the local interactive response. Explicit and implicit coscheduling techniques in a PVM-Linux NOW (or cluster) have been implemented.

Scalable co-scheduling strategies in distributed computing

… on Computer Systems …, 2010

In this paper, we present an approach to scalable coscheduling in distributed computing for complex sets of interrelated tasks (jobs). The scalability means that schedules are formed for job models with various levels of task granularity, data replication policies, and the processor resource and memory can be upgraded. The necessity of guaranteed job execution at the required quality of service causes taking into account the distributed environment dynamics, namely, changes in the number of jobs for servicing, volumes of computations, possible failures of processor nodes, etc. As a consequence, in the general case, a set of versions of scheduling, or a strategy, is required instead of a single version. We propose a scalable model of scheduling based on multicriteria strategies. The choice of the specific schedule depends on the load level of the resource dynamics and is formed as a resource query which is sent to a local batch-job management system.

Autoscheduling in a distributed shared-memory environment

Lecture Notes in Computer Science, 1995

The ease of programming and compiling for the shared memory multiprocessor model, coupled with the scalability and cost advantages of distributed memory computers, give an obvious appeal to distributed shared memory architectures. In this paper we discuss the design and implementation issues of a dynamic data management and scheduling environment for distributed shared memory architectures. Unlike the predominantly static approaches used on distributed and message passing machines, we advocate the advantages of dynamic resource allocation, especially in the case of multi-user environments. We propose hybrid data and work distribution techniques that adjust to variations in the physical partition, achieving better load balance than purely static schemes. We present the architecture of our execution environment and discuss implementation details of some of the critical components. Preliminary results using benchmarks of representative execution pro les support our main thesis: With minimal control, the load balancing and resource utilization advantages o ered by dynamic methods often outweigh the disadvantage of increased memory latency stemming from slightly compromised data locality, and perhaps additional run-time overhead.

Modeling and analysis of dynamic coscheduling in parallel and distributed environments

ACM SIGMETRICS Performance Evaluation Review, 2002

Scheduling in large-scale parallel systems has been and continues to be an important and challenging research problem. Several key factors, including the increasing use of off-the-shelf clusters of workstations to build such parallel systems, have resulted in the emergence of a new class of scheduling strategies, broadly referred to as dynamic coscheduling. Unfortunately, the size of both the design and performance spaces of these emerging scheduling strategies is quite large, due in part to the numerous dynamic interactions among the different components of the parallel computing environment as well as the wide range of applications and systems that can comprise the parallel environment. This in turn makes it difficult to fully explore the benefits and limitations of the various proposed dynamic coscheduling approaches for large-scale systems solely with the use of simulation and/or experimentation.

Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements

Lecture Notes in Computer Science, 2000

Buffered coscheduling is a new methodology that can substantially increase resource utilization, improve response time, and simplify the development of the run-time support in a parallel machine. In this paper, we provide an in-depth analysis of three important aspects of the proposed methodology: the impact of the communication pattern and type of synchronization, the impact of memory constraints, and the processor utilization. The experimental results show that if jobs use non-blocking or collectivecommunication patterns, the response time becomes largely insensitive to the job communication pattern. Using a simple job access policy, we also demonstrate the robustness of buffered coscheduling in the presence of memory constraints. Overall, buffered coscheduling generally outperforms backfilling and backfilling gang scheduling with respect to response time, wait time, run-time slowdown, and processor utilization.