Online Resource Allocation with Stochastic Resource Consumption (original) (raw)
Related papers
Online Resource Allocation with Samples
SSRN Electronic Journal
The problem of online allocation of scarce resources is one of the most important problems faced by governments (e.g., during a pandemic crisis), hospitals, e-commerce, etc. We study an online resource allocation problem where we have uncertainty about demand and the reward of each type of demand (agents) for resources. While dealing with demand uncertainty in resource allocation problems has been the topic of many papers in the literature, the challenge of not knowing rewards has been barely explored. The lack of knowledge about agents' rewards is inspired by the problem of allocating new resources (e.g., newly developed vaccines or drugs) with unknown effectiveness/value. For such settings, we assume that we have the ability to test the market before the allocation period starts. During the test period, we collect some limited information, called sample information, about agents' expected rewards, as well as, the size of the market for each type of agent. We study how to optimally exploit the sample information in our online resource allocation problem under adversarial arrival processes. We present an asymptotically optimal protection level algorithm that achieves 1 − Θ(1/ √ m) competitive ratio, where m is the number of resources. Via presenting an upper bound on the competitive ratio of any randomized and deterministic algorithms, we show that our competitive ratio of 1 − Θ(1/ √ m) is tight. We further demonstrate the efficacy of our proposed algorithm using a dataset that contains the number of COVID-19 related hospitalized patients across different age groups.
Online Resource Allocation with Time-Flexible Customers
ArXiv, 2021
In classic online resource allocation problems, a decision-maker tries to maximize her reward through making immediate and irrevocable choices regarding arriving demand points (agents). However, in many settings such as hospitals, services, ride-sharing, and cloud computing platforms, some arriving agents may be patient and willing to wait a short amount of time for the resource. Motivated by this, we study the online resource allocation problem in the presence of time-flexible agents under an adversarial online arrival model. We present a setting with flexible and inflexible agents who seek a resource or service that replenishes periodically. Inflexible agents demand the resource immediately upon arrival while flexible agents are willing to wait a short period of time. Our work presents a class of POLYtope-based Resource Allocation (polyra) algorithms that achieve optimal or near-optimal competitive ratios under an adversarial arrival process. Such polyra algorithms work by consult...
Resource Allocation with Non-deterministic Demands and Profits
2013 IEEE 10th International Conference on Mobile Ad-Hoc and Sensor Systems, 2013
Support for intelligent and autonomous resource management is one key factor to the success of modern sensor network systems. The limited resources, such as exhaustible battery life, moderate processing ability and finite bandwidth, restrict the system's ability to serve multiple users simultaneously. It always happens that only a subset of tasks is selected with the goal of maximizing total profit. Besides, because of uncertain factors like unreliable wireless medium or variable quality of sensor outputs, it is not practical to assume that both demands and profits of tasks are deterministic and known a priori, both of which may be stochastic following certain distributions.
Simple Policies with (as) Arbitrarily Slow Growing Regret for Sequential Allocation Problems
Consider the problem of sampling sequentially from a finite number of N ≥ 2 populations or 'bandits', where each population i is specified by a sequence of random variables {X i k } k≥1 , X i k representing the reward received the k th time population i is sampled. For each i, the {X i k } k≥1 are taken to be i.i.d. random variables with finite mean. In this paper, for any function g, that is unbounded, positive, increasing, concave, differentiable, sub-linear, we construct two simple adaptive policies that almost surely have arbitrarily slow growing (e.g. sub-logarithmic) regret. This result depends only on the assumption that for each i = 1,. .. , N , a strong law of large numbers holds. Other contributions of the paper include: i) the derivation of order g lower and upper bounds on the regret for those policies, establishing the minimal value of the corresponding order constants for each policy, and ii) establishing bounds on the remainder terms of the regret, under the additional assumption that the law of iterative logarithm holds.
Approximate Dynamic Programming for Stochastic Resource Allocation Problems
IEEE/CAA Journal of Automatica Sinica, 2020
A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations (i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming (DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations, occurs. In particular, an approximate dynamic programming (ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.
Maximizing Online Utilization with Commitment
Cornell University - arXiv, 2019
We investigate online scheduling with commitment for parallel identical machines. Our objective is to maximize the total processing time of accepted jobs. As soon as a job has been submitted, the commitment constraint forces us to decide immediately whether we accept or reject the job. Upon acceptance of a job, we must complete it before its deadline d that satisfies d ≥ (1 + ε) • p + r, with p and r being the processing time and the submission time of the job, respectively while ε > 0 is the slack of the system. Since the hard case typically arises for near-tight deadlines, that is ε → 0, we consider ε ≤ 1. We use competitive analysis to evaluate our algorithms. While there are simple online algorithms with optimal competitive ratios for the single machine model, little is known for parallel identical machines. Our first main contribution is a deterministic preemptive online algorithm with an almost tight competitive ratio on any number of machines. For a single machine, the competitive factor matches the optimal bound 1+ε ε of the greedy acceptance policy. Then the competitive ratio improves with an increasing number of machines and approaches (1 + ε) • ln 1+ε ε as the number of machines converges to infinity. This is an exponential improvement over the greedy acceptance policy for small ε. In the non-preemptive case, we present a deterministic algorithm on m machines with a competitive ratio of
Optimal Resource Allocation with Semi-Bandit Feedback
We study a sequential resource allocation problem involving a fixed number of recurring jobs. At each time-step the manager should distribute available resources among the jobs in order to maximise the expected number of completed jobs. Allocating more resources to a given job increases the probability that it completes, but with a cut-off. Specifically, we assume a linear model where the probability increases linearly until it equals one, after which allocating additional resources is wasteful. We assume the difficulty of each job is unknown and present the first algorithm for this problem and prove upper and lower bounds on its regret. Despite its apparent simplicity, the problem has a rich structure: we show that an appropriate optimistic algorithm can improve its learning speed dramatically beyond the results one normally expects for similar problems as the problem becomes resource-laden.