Bias Optimality versus Strong 0Discount Optimality in Markov Control Processes with Unbounded Costs (original) (raw)
Related papers
Markov control processes with randomized discounted cost
Mathematical Methods of Operations Research, 2007
In this paper we consider Markov Decision Processes with discounted cost and a random rate in Borel spaces. We establish the dynamic programming algorithm in finite and infinity horizon cases. We provide conditions for the existence of measurable selectors. And we show an example of consumption-investment problem.
Time and Ratio Expected Average Cost Optimality for Semi-Markov Control Processes on Borel Spaces
Communications in Statistics - Theory and Methods, 2004
We deal with semi-Markov control models with Borel state and control spaces, and unbounded cost functions under the ratio and the time expected average cost criteria. Under suitable growth conditions on the costs and the mean holding times together with stability conditions on the embedded Markov chains, we show the following facts: (i) the ratio and the time average costs coincide in the class of the stationary policies; (ii) there exists an stationary policy which is optimal for both criteria. Moreover, we provide a generalization of the classical Wald's Lemma to semi-Markov processes. These results are obtained combining the existence of solutions of the average cost optimality equation and the Optional Stopping Theorem.
Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey
Siam Journal on Control and Optimization, 1993
This work is a survey of the average cost control problem for discrete-time Markov processes. The authors have attempted to put together a comprehensive account of the considerable research on this problem over the past three decades. The exposition ranges from finite to Borel state and action spaces and includes a variety of methodologies to find and characterize optimal policies. The authors have included a brief historical perspective of the research efforts in this area and have compiled a substantial yet not exhaustive bibliography. The authors have also identified several important questions that are still open to investigation. 282 discrete-time controlled markov processes 2.1. The model. A discrete-time, stationary controlled Markov process (CMP), or Markov decision process, is a stochastic dynamical system specified by the five-tuple S, A, U, P, c , where (a) S is a Borel space, called the state space, the elements of which are called states;
Applied Mathematics and Optimization, 2010
This note concerns discrete-time controlled Markov chains with Borel state and action spaces. Given a nonnegative cost function, the performance of a control policy is measured by the superior limit risk-sensitive average criterion associated with a constant and positive risk sensitivity coefficient. Within such a framework, the discounted approach is used (a) to establish the existence of solutions for the corresponding optimality inequality, and (b) to show that, under mild conditions on the cost function, the optimal value functions corresponding to the superior and inferior limit average criteria coincide on a certain subset of the state space. The approach of the paper relies on standard dynamic programming ideas and on a simple analytical derivation of a Tauberian relation.
Constrained Markov control processes in Borel spaces: the discounted case
Mathematical Methods of Operations Research, 2000
We consider constrained discounted-cost Markov control processes in Borel spaces, with unbounded costs. Conditions are given for the constrained problem to be solvable, and also equivalent to an equality-constrained (EC) linear program. In addition, it is shown that there is no duality gap between EC and its dual program EC*, and that, under additional assumptions, also EC* is solvable, so that in fact the strong duality condition holds. Finally, a Farkas-like theorem is included, which gives necessary and sufficient conditions for the primal program EC to be consistent.
Proceedings of 1994 33rd IEEE Conference on Decision and Control, 1994
This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effect vanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselecrive criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality and strong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee that every AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality.
Mathematical Methods of Operations Research, 1996
This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effect vanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselecrive criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality and strong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee that every AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality.
On the expected total cost with unbounded returns for Markov decision processes
arXiv: Probability, 2017
We consider a discrete-time Markov decision process with Borel state and action spaces. The performance criterion is to maximize a total expected cost with unbounded reward function. It is shown the existence of optimal strategies under general conditions allowing the reward function to be unbounded both from above and below and the action sets available at each step to the decision maker to be not necessarily compact. To deal with unbounded reward functions, a new characterization for the weak convergence of probability measures is derived. Our results are illustrated by examples.
Mathematical Methods of Operations Research (ZOR), 2003
This work concerns discrete-time Markov decision processes with finite state space and bounded costs per stage. The decision maker ranks random costs via the expectation of the utility function associated to a constant risk sensitivity coe‰cient, and the performance of a control policy is measured by the corresponding (long-run) risk-sensitive average cost criterion. The main structural restriction on the system is the following communication assumption: For every pair of states x and y, there exists a policy p, possibly depending on x and y, such that when the system evolves under p starting at x, the probability of reaching y is positive. Within this framework, the paper establishes the existence of solutions to the optimality equation whenever the constant risk sensitivity coe‰cient does not exceed certain positive value.