Optimal resource allocation for multiqueue systems with a shared server pool (original) (raw)

Structural properties of the optimal resource allocation policy for single-queue systems

Annals of Operations Research, 2013

This paper studies structural properties of the optimal resource allocation policy for singlequeue systems. Jobs arrive at a service facility and are sent one by one to a pool of computing resources for parallel processing. The facility poses a constraint on the maximum expected sojourn time of a job. A central decision maker allocates the servers dynamically to the facility. We consider two models: a limited resource allocation model, where the allocation of resources can only be changed at the start of a new service, and a fully flexible allocation model, where the allocation of resources can also change during a service period. In these two models, the objective is to minimize the average utilization costs whilst satisfying the time constraint. To this end, we cast these optimization problems as Markov decision problems and derive structural properties of the relative value function. We show via dynamic programming that (1) the optimal allocation policy has a work-conservation property, and (2) the optimal number of servers follows a step function with as extreme policy the bang-bang control policy.

Optimal Dynamic Server Allocation in Systems with On/Off Sources

Lecture Notes in Computer Science, 2007

A system consisting of several servers, where demands of different types arrive in bursts, is examined. The servers can be dynamically reallocated to deal with the different requests, but these switches take time and incur a cost. The problem is to find the optimal dynamic allocation policy. To this end a Markov decision process is solved, using two different techniques. The effects of different solution methods and modeling decisions on the resulting solution are examined.

Server allocation subject to variance constraints

Performance Evaluation, 1996

The design of service policies whose aim is to minimize a linear combination of average queue lengths, while keeping their variances below given bounds, is considered in the context of a single-server system with two job types. A family of readily implementable threshold policies which can be used for this purpose is analysed in the steady state. The performance of these policies, as well as their ability to satisfy constraints, is examined numerically and is compared to that of another simple family.

The optimal service time allocation of a versatile server to queue jobs and stochastically available non-queue jobs of different types

Computers & Operations Research, 2007

In this paper, we consider a service system in which the server can process N + 1 different types of jobs. Jobs of type 0 are generated randomly according to a Poisson stream. Jobs of types 1 to N are non-queue types which may or may not be available for completion by the server. To optimally allocate the server's time to these jobs, we formulate a finite state semi-Markov decision process model to this environment. With this model, the optimal stationary policies can be numerically determined via a policy-iteration algorithm. We also discuss the practical applications of this model to tele-service and tele-marketing operations.

Optimal Scheduling in the Multiserver-job Model under Heavy Traffic

Proceedings of the ACM on Measurement and Analysis of Computing Systems

Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Essentially all of the theoretical work on multiserver-job systems focuses on maximizing utilization, with almost nothing known about mean response time. In simpler settings, such as various known-size single-server-job settings, minimizing mean response time is merely a matter of prioritizing small jobs. However, for the multiserver-job system, prioritizing small jobs is not enough, because we must also ensure servers are not unnecessarily left idle. Thus, minimizing mean response time requires prioritizing small jobs while simultaneously maximizing throughput. Our question is how to achieve these joint objectives. We devise the ServerFilling-SRPT scheduling policy, which is the first policy to minimize mean response time in the multiserver-job model in the heavy traffic limit. In addition to proving this heavy-traffic result, we present empirical evidence that ServerFilling-SR...

Optimal resource allocation for time-reservation systems

Performance Evaluation, 2011

This paper studies the optimal resource allocation in time-reservation systems. Customers arrive at a service facility and receive service in two steps; in the first step information is gathered from the customer, which is then sent to a pool of computing resources, and in the second step the information is processed after which the customer leaves the system. A central decision maker has to decide when to reserve computing power from the pool of resources, such that the customer does not have to wait for the start of the second service step and that the processing capacity is not wasted due to the customer still being serviced at the first step. The decision maker simultaneously has to decide on how many processors to allocate for the second processing step such that reservation and holding costs are minimized. Since an exact analysis of the system is difficult, we decompose the system into two parts which are solved sequentially leading to nearly optimal solutions. We show via dynamic programming that the near-optimal number of processors follows a step function with as an extreme policy the bang-bang control. Moreover, we provide new fundamental insights in the dependence of the near-optimal policy on the distribution of the information gathering times. Numerical experiments demonstrate that the near-optimal policy closely matches the performance of the optimal policy of the original problem.

Randomized Assignment of Jobs to Servers in Heterogeneous Clusters of Shared Servers for Low Delay

We consider the job assignment problem in a multi-server system consisting of N parallel processor sharing servers, categorized into M (≪ N ) different types according to their processing capacity or speed. Jobs of random sizes arrive at the system according to a Poisson process with rate N λ. Upon each arrival, a small number of servers from each type is sampled uniformly at random. The job is then assigned to one of the sampled servers based on a selection rule. We propose two schemes, each corresponding to a specific selection rule that aims at reducing the mean sojourn time of jobs in the system.

A Lagrangian Approach to Dynamic Resource Allocation

2014

We define a class of discrete-time resource allocation problems where multiple renewable resources must be dynamically allocated to different types of jobs arriving randomly. Jobs have geometric service durations, demand resources, incur a holding cost while waiting in queue, a penalty cost of rejection when the queue is filled to capacity, and generate a reward on completion. The goal is to select which jobs to service in each time-period so as to maximize total infinite-horizon discounted expected profit. We present Markov Decision Process (MDP) models of these problems and apply a Lagrangian relaxation-based method that exploits the structure of the MDP models to approximate their optimal value functions. We then develop a dynamic programming technique to efficiently recover resource allocation decisions from this approximate value function on the fly. Numerical experiments demonstrate that these decisions outperform well-known heuristics by at least 35 % but as much as 220 % on ...

Server Allocation in Grid Systems with On/Off Sources

Lecture Notes in Computer Science, 2006

A system consisting of a number of servers, where demands of different types arrive in bursts (modelled by interrupted Poisson processes), is examined in the steady state. The problem is to decide how many servers to allocate to each job type, so as to minimize a cost function expressed in terms of average queue sizes. First, an exact analysis is provided for an isolated IP/M/n queue. The results are used to compute the optimal static server allocation policy. The latter is then compared to two heuristic policies which employ dynamic switching of servers from one queue to another (such switches take time and hence incur costs).

Optimal allocation of servers and processing time in a load balancing system

Computers & Operations Research, 2010

We consider the problem of allocating processing time in a multi-channel load balancing system by focusing on systems where processing times have distributions characterized by high variability. Our objective is to reduce congestion by routing jobs to servers based on their workload. Specifically, we arrange servers in two stations in series, and require that the load be balanced between the two stations. All arrivals join the first service center where they receive a maximum of T units of service. Arrivals with service requirements that exceed the value T join the second station where they receive their remaining service. For a variety of heavy tail service time distributions, characterized by high variability, analytical and numerical comparisons show that our scheme provides better system performance than the standard parallel multi-server model in the sense of reducing the mean delay per customer when the traffic intensity is not too low. In particular, we develop lower bounds on the traffic intensity and the service time coefficient of variation beyond which the balanced series system outperforms the parallel system.