An Efficient Model for Dynamic and Constrained Resource Allocation Problems (original) (raw)

Modeling and Optimizing Resource Allocation Decisions through Multi-model Markov Decision Processes with Capacity Constraints

arXiv (Cornell University), 2020

This paper proposes a new formulation for the dynamic resource allocation problem, which converts the traditional MDP model with known parameters and no capacity constraints to a new model with uncertain parameters and a resource capacity constraint. Our motivating example comes from a medical resource allocation problem: patients with multiple chronic diseases can be provided either normal or special care, where the capacity of special care is limited due to financial or human resources. In such systems, it is difficult, if not impossible, to generate good estimates for the evolution of health for each patient. We formulate the problem as a two-stage stochastic integer program. However, it becomes easily intractable in larger instances of the problem for which we propose and test a parallel approximate dynamic programming algorithm. We show that commercial solvers are not capable of solving the problem instances with a large number of scenarios. Nevertheless, the proposed algorithm provides a solution in seconds even for very large problem instances. In our computational experiments, it finds the optimal solution for 42.86% of the instances. On aggregate, it achieves 0.073% mean gap value. Finally, we estimate the value of our contribution for different realizations of the parameters. Our findings show that there is a significant amount of additional utility contributed by our model.

Optimal resource allocation and policy formulation in loosely-coupled Markov decision processes

… of the Fourteenth International Conference on …, 2004

The problem of optimal policy formulation for teams of resource-limited agents in stochastic environments is composed of two strongly-coupled subproblems: a resource allocation problem and a policy optimization problem. We show how to combine the two problems into a single constrained optimization problem that yields optimal resource allocations and policies that are optimal under these allocations. We model the system as a multiagent Markov decision process (MDP), with social welfare of the group as the optimization criterion. The straightforward approach of modeling both the resource allocation and the actual operation of the agents as a multiagent MDP on the joint state and action spaces of all agents is not feasible, because of the exponential increase in the size of the state space. As an alternative, we describe a technique that exploits problem structure by recognizing that agents are only loosely-coupled via the shared resource constraints. This allows us to formulate a constrained policy optimization problem that yields optimal policies among the class of realizable ones given the shared resource limitations. Although our complexity analysis shows the constrained optimization problem to be NP-complete, our results demonstrate that, by exploiting problem structure and via a reduction to a mixed integer program, we are able to solve problems orders of magnitude larger than what is possible using a traditional multiagent MDP formulation.

Linear Dynamic Programs for Resource Management

Sustainable resource management in many domains presents large continuous stochastic optimization problems, which can often be modeled as Markov decision processes (MDPs). To solve such large MDPs, we identify and leverage linearity in state and action sets that is common in resource management. In particular, we introduce linear dynamic programs (LDPs) that generalize resource management problems and partially observable MDPs (POMDPs). We show that the LDP framework makes it possible to adapt point-based methods-the state of the art in solving POMDPs-to solving LDPs. The experimental results demonstrate the efficiency of this approach in managing the water level of a river reservoir. Finally, we discuss the relationship with dual dynamic programming, a method used to optimize hydroelectric systems.

Constrained Multiagent Markov Decision Processes: a Taxonomy of Problems and Algorithms

Journal of Artificial Intelligence Research

In domains such as electric vehicle charging, smart distribution grids and autonomous warehouses, multiple agents share the same resources. When planning the use of these resources, agents need to deal with the uncertainty in these domains. Although several models and algorithms for such constrained multiagent planning problems under uncertainty have been proposed in the literature, it remains unclear when which algorithm can be applied. In this survey we conceptualize these domains and establish a generic problem class based on Markov decision processes. We identify and compare the conditions under which algorithms from the planning literature for problems in this class can be applied: whether constraints are soft or hard, whether agents are continuously connected, whether the domain is fully observable, whether a constraint is momentarily (instantaneous) or on a budget, and whether the constraint is on a single resource or on multiple. Further we discuss the advantages and disadva...

Dynamic policies for uncertain time-critical tasking problems

Naval Research Logistics, 2008

A recent paper by Gaver et al. [6] argued the importance of studying service control problems in which the usual assumptions (i) that tasks will wait indefinitely for service and (ii) that successful service completions can be observed instantaneously are relaxed. Military and other applications were cited. They proposed a model in which arriving tasks are available for service for a period whose duration is unknown to the system's controller. The allocation of a large amount of processing to a task may make more likely its own successful completion but may also result in the loss of many unserved tasks from the system. Gaver et al. called for the design of dynamic policies for the allocation of service which maximizes the rate of successful task completions achieved, or which come close to doing so. This is the theme of the paper. We utilize dynamic programming policy improvement approaches to design heuristic dynamic policies for service allocation which may be easily computed. In all cases studied, these policies achieve throughputs close to optimal.

Approximate Dynamic Programming for Stochastic Resource Allocation Problems

IEEE/CAA Journal of Automatica Sinica, 2020

A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations (i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming (DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations, occurs. In particular, an approximate dynamic programming (ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.

R-FRTDP: A Real-Time DP Algorithm with Tight Bounds for a Stochastic Resource Allocation Problem

Resource allocation is a widely studied class of problems in Operation Research and Artificial Intelligence. Specially, constrained stochastic resource allocation problems, where the assignment of a constrained resource do not automatically imply the realization of the task. This kind of problems are generally addressed with Markov Decision Processes (mdps). In this paper, we present efficient lower and upper bounds in the context of a constrained stochastic resource allocation problem for a heuristic search algorithm called Focused Real Time Dynamic Programming (frtdp). Experiments show that this algorithm is relevant for this kind of problems and that the proposed tight bounds reduce the number of backups to perform comparatively to previous existing bounds.

Stochastic Constraint Programming: A Scenario-Based Approach," S. A. Tarim, S. Manandhar and T. Walsh, Constraints, Vol.11, pp.53-80, 2006

To model combinatorial decision problems involving uncertainty and probability, we introduce scenario based stochastic constraint programming. Stochastic constraint programs contain both decision variables, which we can set, and stochastic variables, which follow a discrete probability distribution. We provide a semantics for stochastic constraint programs based on scenario trees. Using this semantics, we can compile stochastic constraint programs down into conventional (nonstochastic) constraint programs. This allows us to exploit the full power of existing constraint solvers. We have implemented this framework for decision making under uncertainty in stochastic OPL, a language which is based on the OPL constraint modelling language [Hentenryck et al., 1999]. To illustrate the potential of this framework, we model a wide range of problems in areas as diverse as portfolio diversification, agricultural planning and production/inventory management.

Hierarchical solution of Markov decision processes using macro-actions

Proceedings of the …, 1998

We present a technique for computing approximately optimal solutions to stochastic resource allocation problems modeled as Markov decision processes (MDPs). We exploit two key properties to avoid explicitly enumerating the very large state and action spaces associated with these problems. First, the problems are composed of multiple tasks whose utilities are independent. Second, the actions taken with respect to (or resources allocated to) a task do not influence the status of any other task. We can therefore view each task as an MDP. However, these MDPs are weakly coupled by resource constraints: actions selected for one MDP restrict the actions available to others. We describe heuristic techniques for dealing with several classes of constraints that use the solutions for individual MDPs to construct an approximate global solution. We demonstrate this technique on problems involving hundreds of state variables and thousands of tasks, approximating the solution to problems that are far beyond the reach of standard methods.