Job Coscheduling on Coupled High-End Computing Systems (original) (raw)

Supercomputer centers often deploy large-scale computing systems together with an associated data analysis or visualization system. In this paper, we propose a coscheduling mechanism, providing the ability to coordinate execution between jobs on different systems. The mechanism is built on top of a lightweight protocol for coordination between policy domains without manual intervention. We have evaluated this system using real job traces from Intrepid and Eureka, the production Blue Gene/P and data analysis systems, respectively, deployed at Argonne National Laboratory. Our experimental results quantify the costs of coscheduling and demonstrate that coscheduling can be achieved with limited impact on system performance under varying workloads.