Mor Harchol-balter - Academia.edu (original) (raw)

Papers by Mor Harchol-balter

Research paper thumbnail of Queueing Theory Terminology

Performance Modeling and Design of Computer Systems

Research paper thumbnail of Optimally Scheduling Jobs with Multiple Tasks

ACM SIGMETRICS Performance Evaluation Review, 2017

We consider optimal job scheduling where each job consists of multiple tasks, each of unknown dur... more We consider optimal job scheduling where each job consists of multiple tasks, each of unknown duration, with precedence constraints between tasks. A job is not considered complete until all of its tasks are complete. Traditional heuristics, such as favoring the job of shortest expected remaining processing time, are suboptimal in this setting. Furthermore, even if we know which job to run, it is not obvious which task within that job to serve. In this paper, we characterize the optimal policy for a class of such scheduling problems and show that the policy is simple to compute.

Research paper thumbnail of heSRPT

Performance evaluation review, Mar 5, 2021

Modern data centers serve workloads which can exploit parallelism. When a job parallelizes across... more Modern data centers serve workloads which can exploit parallelism. When a job parallelizes across multiple servers it completes more quickly. However, it is unclear how to share a limited number of servers between many parallelizable jobs. In this paper we consider a typical scenario where a data center composed of N servers will be tasked with completing a set of M parallelizable jobs. Typically, M is much smaller than N. In our scenario, each job consists of some amount of inherent work which we refer to as a job's size. We assume that job sizes are known up front to the system, and each job can utilize any number of servers at any moment in time. These assumptions are reasonable for many parallelizable workloads such as training neural networks using TensorFlow [2]. Our goal in this paper is to allocate servers to jobs so as to minimize the mean slowdown across all jobs, where the slowdown of a job is the job's completion time divided by its running time if given exclusive access to all N servers. Slowdown measures how a job was interfered with by other jobs in the system, and is often the metric of interest in the theoretical parallel scheduling literature (where it is also called stretch), as well as the HPC community (where it is called expansion factor).

Research paper thumbnail of Session details: Scheduling I

Performance evaluation review, Jun 12, 2018

Research paper thumbnail of Closed Networks of Queues

Cambridge University Press eBooks, Feb 5, 2013

Research paper thumbnail of Networks with Time-Sharing (PS) Servers (BCMP)

Cambridge University Press eBooks, Feb 5, 2013

Research paper thumbnail of Real-World Workloads: High Variability and Heavy Tails

Cambridge University Press eBooks, Feb 5, 2013

Research paper thumbnail of Scheduling: SRPT and Fairness

Cambridge University Press eBooks, Feb 5, 2013

Research paper thumbnail of Analysis of M/G/1/SRPT under transient overload

Performance evaluation review, Dec 1, 2001

This short paper contains an approximate analysis for the M/G/1/SRPT queue under alternating peri... more This short paper contains an approximate analysis for the M/G/1/SRPT queue under alternating periods of overload and low load. The result in this paper along with several other results on systems under transient overload are contained in our recent technical report [2].

Research paper thumbnail of Stability for Two-class Multiserver-job Systems

arXiv (Cornell University), Oct 1, 2020

Research paper thumbnail of WCFS: a new framework for analyzing multiserver systems

Research paper thumbnail of Scaling Properties of Queues with Time-Varying Load Processes: Extensions and Applications

Probability in the Engineering and Informational Sciences

New computing and communications paradigms will result in traffic loads in information server sys... more New computing and communications paradigms will result in traffic loads in information server systems that fluctuate over much broader ranges of time scales than current systems. In addition, these fluctuation time scales may only be indirectly known or even be unknown. However, we should still be able to accurately design and manage such systems. This paper addresses this issue: we consider an M/M/1 queueing system operating in a random environment (denoted M/M/1(R)) that alternates between HIGH and LOW phases, where the load in the HIGH phase is higher than in the LOW phase. Previous work on the performance characteristics of M/M/1(R) systems established fundamental properties of the shape of performance curves. In this paper, we extend monotonicity results to include convexity and concavity properties, provide a partial answer to an open problem on stochastic ordering, develop new computational techniques, and include boundary cases and various degenerate M/M/1(R) systems. The ba...

Research paper thumbnail of The most common queueing theory questions asked by computer systems practitioners

ACM SIGMETRICS Performance Evaluation Review

This document examines five performance questions which are repeatedly asked by practitioners in ... more This document examines five performance questions which are repeatedly asked by practitioners in industry: (i) My system utilization is very low, so why are job delays so high? (ii) What should I do to lower job delays? (iii) How can I favor short jobs if I don't know which jobs are short? (iv) If some jobs are more important than others, how do I negotiate importance versus size? (v) How do answers change when dealing with a closed-loop system, rather than an open system? All these questions have simple answers through queueing theory. This short paper elaborates on the questions and their answers. To keep things readable, our tone is purposely informal throughout. For more formal statements of these questions and answers, please see [14].

Research paper thumbnail of Optimal Scheduling and Exact Response Time Analysis for Multistage Jobs

ArXiv, 2018

Scheduling to minimize mean response time in an M/G/1 queue is a classic problem. The problem is ... more Scheduling to minimize mean response time in an M/G/1 queue is a classic problem. The problem is usually addressed in one of two scenarios. In the perfect-information scenario, the scheduler knows each job's exact size, or service requirement. In the zero-information scenario, the scheduler knows only each job's size distribution. The well-known shortest remaining processing time (SRPT) policy is optimal in the perfect-information scenario, and the more complex Gittins policy is optimal in the zero-information scenario. In real systems the scheduler often has partial but incomplete information about each job's size. We introduce a new job model, that of multistage jobs, to capture this partial-information scenario. A multistage job consists of a sequence of stages, where both the sequence of stages and stage sizes are unknown, but the scheduler always knows which stage of a job is in progress. We give an optimal algorithm for scheduling multistage jobs in an M/G/1 queue ...

Research paper thumbnail of Session details: Session: Scheduling I

Research paper thumbnail of The Finite-Skip Method for Multiserver Analysis

ArXiv, 2021

Multiserver queueing systems are found at the core of a wide variety of practical systems. Unfort... more Multiserver queueing systems are found at the core of a wide variety of practical systems. Unfortunately, existing tools for analyzing multiserver models have major limitations: Techniques for exact analysis often struggle with high-dimensional models, while techniques for deriving bounds are often too specialized to handle realistic system features, such as variable service rates of jobs. New techniques are needed to handle these complex, important, high-dimensional models. In this paper we introduce the work-conserving finite-skip class of models. This class includes many important models, such as the heterogeneous M/G/k, the limited processor sharing policy for the M/G/1, the threshold parallelism model, and the multiserver-job model under a simple scheduling policy. We prove upper and lower bounds on mean response time for any model in the work-conserving finite-skip class. Our bounds are separated by an additive constant, giving a strong characterization of mean response time a...

Research paper thumbnail of Webサーバ向けSRPTスケジューリング | 文献情報 | J-GLOBAL 科学技術総合リンクセンター

Lecture Notes in Computer Science, 2001

Research paper thumbnail of Massive Indexed Directories in DeltaFS

Faster storage media, faster interconnection networks, and improvements in systems software have ... more Faster storage media, faster interconnection networks, and improvements in systems software have significantly mitigated the effect of I/O bottlenecks in HPC applications. Even so, applications that read and write data in small chunks are limited by the ability of both the hardware and the software to handle such workloads efficiently. Often, scientific applications partition their output using one file per process. This is a problem on HPC computers with hundreds of thousands of cores and will only worsen with exascale computers, which will be an order of magnitude larger. To avoid wasting time creating output files on such machines, scientific applications are forced to use libraries that combine multiple I/O streams into a single file. For many applications where output is produced out-of-order, this must be followed by a costly, massive data sorting operation. DeltaFS allows applications to write to an arbitrarily large number of files, while also guaranteeing efficient data acc...

Research paper thumbnail of WorkloadCompactor

Proceedings of the 2017 Symposium on Cloud Computing, 2017

Research paper thumbnail of To clean or not to clean: Malware removal strategies for servers under load

European Journal of Operational Research, 2021

Abstract We consider how to best schedule reparative downtime for a customer-facing online servic... more Abstract We consider how to best schedule reparative downtime for a customer-facing online service that is vulnerable to cyber attacks such as malware infections. These infections can cause performance degradation (i.e., a slower service rate) and facilitate data theft, both of which have monetary repercussions. Infections may go undetected and can only be removed by time-consuming cleanup procedures, which require temporarily taking the service offline. From a security-oriented perspective, cleanups should be undertaken as frequently as possible. From a performance-oriented perspective, frequent cleanups are desirable because they maintain faster service, but they are simultaneously undesirable because they lead to more frequent downtimes and subsequent loss of revenue. We ask when and how often cleanups should happen. In order to analyze various downtime scheduling policies, we combine queueing-theoretic techniques with a revenue model to capture the problem’s tradeoffs. Unlike classical repair problems, this problem necessitates the analysis of a quasi-birth-death Markov chain, tracking the number of customer requests in the system and the (possibly unknown) infection state. We adapt a recent analytic technique, Clearing Analysis on Phases (CAP), to determine the exact steady-state distribution of the underlying Markov chain, which we then use to compute revenue rates and make recommendations. Prior work on downtime scheduling under cyber attacks relies on heuristic approaches, with our work being the first to address this problem analytically.

Research paper thumbnail of Queueing Theory Terminology

Performance Modeling and Design of Computer Systems

Research paper thumbnail of Optimally Scheduling Jobs with Multiple Tasks

ACM SIGMETRICS Performance Evaluation Review, 2017

We consider optimal job scheduling where each job consists of multiple tasks, each of unknown dur... more We consider optimal job scheduling where each job consists of multiple tasks, each of unknown duration, with precedence constraints between tasks. A job is not considered complete until all of its tasks are complete. Traditional heuristics, such as favoring the job of shortest expected remaining processing time, are suboptimal in this setting. Furthermore, even if we know which job to run, it is not obvious which task within that job to serve. In this paper, we characterize the optimal policy for a class of such scheduling problems and show that the policy is simple to compute.

Research paper thumbnail of heSRPT

Performance evaluation review, Mar 5, 2021

Modern data centers serve workloads which can exploit parallelism. When a job parallelizes across... more Modern data centers serve workloads which can exploit parallelism. When a job parallelizes across multiple servers it completes more quickly. However, it is unclear how to share a limited number of servers between many parallelizable jobs. In this paper we consider a typical scenario where a data center composed of N servers will be tasked with completing a set of M parallelizable jobs. Typically, M is much smaller than N. In our scenario, each job consists of some amount of inherent work which we refer to as a job's size. We assume that job sizes are known up front to the system, and each job can utilize any number of servers at any moment in time. These assumptions are reasonable for many parallelizable workloads such as training neural networks using TensorFlow [2]. Our goal in this paper is to allocate servers to jobs so as to minimize the mean slowdown across all jobs, where the slowdown of a job is the job's completion time divided by its running time if given exclusive access to all N servers. Slowdown measures how a job was interfered with by other jobs in the system, and is often the metric of interest in the theoretical parallel scheduling literature (where it is also called stretch), as well as the HPC community (where it is called expansion factor).

Research paper thumbnail of Session details: Scheduling I

Performance evaluation review, Jun 12, 2018

Research paper thumbnail of Closed Networks of Queues

Cambridge University Press eBooks, Feb 5, 2013

Research paper thumbnail of Networks with Time-Sharing (PS) Servers (BCMP)

Cambridge University Press eBooks, Feb 5, 2013

Research paper thumbnail of Real-World Workloads: High Variability and Heavy Tails

Cambridge University Press eBooks, Feb 5, 2013

Research paper thumbnail of Scheduling: SRPT and Fairness

Cambridge University Press eBooks, Feb 5, 2013

Research paper thumbnail of Analysis of M/G/1/SRPT under transient overload

Performance evaluation review, Dec 1, 2001

This short paper contains an approximate analysis for the M/G/1/SRPT queue under alternating peri... more This short paper contains an approximate analysis for the M/G/1/SRPT queue under alternating periods of overload and low load. The result in this paper along with several other results on systems under transient overload are contained in our recent technical report [2].

Research paper thumbnail of Stability for Two-class Multiserver-job Systems

arXiv (Cornell University), Oct 1, 2020

Research paper thumbnail of WCFS: a new framework for analyzing multiserver systems

Research paper thumbnail of Scaling Properties of Queues with Time-Varying Load Processes: Extensions and Applications

Probability in the Engineering and Informational Sciences

New computing and communications paradigms will result in traffic loads in information server sys... more New computing and communications paradigms will result in traffic loads in information server systems that fluctuate over much broader ranges of time scales than current systems. In addition, these fluctuation time scales may only be indirectly known or even be unknown. However, we should still be able to accurately design and manage such systems. This paper addresses this issue: we consider an M/M/1 queueing system operating in a random environment (denoted M/M/1(R)) that alternates between HIGH and LOW phases, where the load in the HIGH phase is higher than in the LOW phase. Previous work on the performance characteristics of M/M/1(R) systems established fundamental properties of the shape of performance curves. In this paper, we extend monotonicity results to include convexity and concavity properties, provide a partial answer to an open problem on stochastic ordering, develop new computational techniques, and include boundary cases and various degenerate M/M/1(R) systems. The ba...

Research paper thumbnail of The most common queueing theory questions asked by computer systems practitioners

ACM SIGMETRICS Performance Evaluation Review

This document examines five performance questions which are repeatedly asked by practitioners in ... more This document examines five performance questions which are repeatedly asked by practitioners in industry: (i) My system utilization is very low, so why are job delays so high? (ii) What should I do to lower job delays? (iii) How can I favor short jobs if I don't know which jobs are short? (iv) If some jobs are more important than others, how do I negotiate importance versus size? (v) How do answers change when dealing with a closed-loop system, rather than an open system? All these questions have simple answers through queueing theory. This short paper elaborates on the questions and their answers. To keep things readable, our tone is purposely informal throughout. For more formal statements of these questions and answers, please see [14].

Research paper thumbnail of Optimal Scheduling and Exact Response Time Analysis for Multistage Jobs

ArXiv, 2018

Scheduling to minimize mean response time in an M/G/1 queue is a classic problem. The problem is ... more Scheduling to minimize mean response time in an M/G/1 queue is a classic problem. The problem is usually addressed in one of two scenarios. In the perfect-information scenario, the scheduler knows each job's exact size, or service requirement. In the zero-information scenario, the scheduler knows only each job's size distribution. The well-known shortest remaining processing time (SRPT) policy is optimal in the perfect-information scenario, and the more complex Gittins policy is optimal in the zero-information scenario. In real systems the scheduler often has partial but incomplete information about each job's size. We introduce a new job model, that of multistage jobs, to capture this partial-information scenario. A multistage job consists of a sequence of stages, where both the sequence of stages and stage sizes are unknown, but the scheduler always knows which stage of a job is in progress. We give an optimal algorithm for scheduling multistage jobs in an M/G/1 queue ...

Research paper thumbnail of Session details: Session: Scheduling I

Research paper thumbnail of The Finite-Skip Method for Multiserver Analysis

ArXiv, 2021

Multiserver queueing systems are found at the core of a wide variety of practical systems. Unfort... more Multiserver queueing systems are found at the core of a wide variety of practical systems. Unfortunately, existing tools for analyzing multiserver models have major limitations: Techniques for exact analysis often struggle with high-dimensional models, while techniques for deriving bounds are often too specialized to handle realistic system features, such as variable service rates of jobs. New techniques are needed to handle these complex, important, high-dimensional models. In this paper we introduce the work-conserving finite-skip class of models. This class includes many important models, such as the heterogeneous M/G/k, the limited processor sharing policy for the M/G/1, the threshold parallelism model, and the multiserver-job model under a simple scheduling policy. We prove upper and lower bounds on mean response time for any model in the work-conserving finite-skip class. Our bounds are separated by an additive constant, giving a strong characterization of mean response time a...

Research paper thumbnail of Webサーバ向けSRPTスケジューリング | 文献情報 | J-GLOBAL 科学技術総合リンクセンター

Lecture Notes in Computer Science, 2001

Research paper thumbnail of Massive Indexed Directories in DeltaFS

Faster storage media, faster interconnection networks, and improvements in systems software have ... more Faster storage media, faster interconnection networks, and improvements in systems software have significantly mitigated the effect of I/O bottlenecks in HPC applications. Even so, applications that read and write data in small chunks are limited by the ability of both the hardware and the software to handle such workloads efficiently. Often, scientific applications partition their output using one file per process. This is a problem on HPC computers with hundreds of thousands of cores and will only worsen with exascale computers, which will be an order of magnitude larger. To avoid wasting time creating output files on such machines, scientific applications are forced to use libraries that combine multiple I/O streams into a single file. For many applications where output is produced out-of-order, this must be followed by a costly, massive data sorting operation. DeltaFS allows applications to write to an arbitrarily large number of files, while also guaranteeing efficient data acc...

Research paper thumbnail of WorkloadCompactor

Proceedings of the 2017 Symposium on Cloud Computing, 2017

Research paper thumbnail of To clean or not to clean: Malware removal strategies for servers under load

European Journal of Operational Research, 2021

Abstract We consider how to best schedule reparative downtime for a customer-facing online servic... more Abstract We consider how to best schedule reparative downtime for a customer-facing online service that is vulnerable to cyber attacks such as malware infections. These infections can cause performance degradation (i.e., a slower service rate) and facilitate data theft, both of which have monetary repercussions. Infections may go undetected and can only be removed by time-consuming cleanup procedures, which require temporarily taking the service offline. From a security-oriented perspective, cleanups should be undertaken as frequently as possible. From a performance-oriented perspective, frequent cleanups are desirable because they maintain faster service, but they are simultaneously undesirable because they lead to more frequent downtimes and subsequent loss of revenue. We ask when and how often cleanups should happen. In order to analyze various downtime scheduling policies, we combine queueing-theoretic techniques with a revenue model to capture the problem’s tradeoffs. Unlike classical repair problems, this problem necessitates the analysis of a quasi-birth-death Markov chain, tracking the number of customer requests in the system and the (possibly unknown) infection state. We adapt a recent analytic technique, Clearing Analysis on Phases (CAP), to determine the exact steady-state distribution of the underlying Markov chain, which we then use to compute revenue rates and make recommendations. Prior work on downtime scheduling under cyber attacks relies on heuristic approaches, with our work being the first to address this problem analytically.