Bias Optimality versus Strong 0Discount Optimality in Markov Control Processes with Unbounded Costs (original) (raw)

Constrained Markov control processes in Borel spaces: the discounted case

Mathematical Methods of Operations Research, 2000

We consider constrained discounted-cost Markov control processes in Borel spaces, with unbounded costs. Conditions are given for the constrained problem to be solvable, and also equivalent to an equality-constrained (EC) linear program. In addition, it is shown that there is no duality gap between EC and its dual program EC*, and that, under additional assumptions, also EC* is solvable, so that in fact the strong duality condition holds. Finally, a Farkas-like theorem is included, which gives necessary and sufficient conditions for the primal program EC to be consistent.

Denumerable controlled Markov chains with strong average optimality criterion: bounded and unbounded costs

Proceedings of 1994 33rd IEEE Conference on Decision and Control, 1994

This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effect vanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselecrive criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality and strong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee that every AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality.

Denumerable controlled Markov chains with strong average optimality criterion: Bounded & unbounded costs

Mathematical Methods of Operations Research, 1996

This paper studies discrete-time nonlinear controlled stochastic systems, modeled by controlled Markov chains (CMC) with denumerable state space and compact action space, and with an infinite planning horizon. Recently, there has been a renewed interest in CMC with a long-run, expected average cost (AC) optimality criterion. A classical approach to study average optimality consists in formulating the AC case as a limit of the discounted cost (DC) case, as the discount factor increases to 1, i.e., as the discounting effect vanishes. This approach has been rekindled in recent years, with the introduction by Sennott and others of conditions under which AC optimal stationary policies are shown to exist. However, AC optimality is a rather underselecrive criterion, which completely neglects the finite-time evolution of the controlled process. Our main interest in this paper is to study the relation between the notions of AC optimality and strong average cost (SAC) optimality. The latter criterion is introduced to asses the performance of a policy over long but finite horizons, as well as in the long-run average sense. We show that for bounded one-stage cost functions, Sennott's conditions are sufficient to guarantee that every AC optimal policy is also SAC optimal. On the other hand, a detailed counterexample is given that shows that the latter result does not extend to the case of unbounded cost functions. In this counterexample, Sennott's conditions are verified and a policy is exhibited that is both average and Blackwell optimal and satisfies the average cost inequality.

On the expected total cost with unbounded returns for Markov decision processes

arXiv: Probability, 2017

We consider a discrete-time Markov decision process with Borel state and action spaces. The performance criterion is to maximize a total expected cost with unbounded reward function. It is shown the existence of optimal strategies under general conditions allowing the reward function to be unbounded both from above and below and the action sets available at each step to the decision maker to be not necessarily compact. To deal with unbounded reward functions, a new characterization for the weak convergence of probability measures is derived. Our results are illustrated by examples.

Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space

Mathematical Methods of Operations Research (ZOR), 2003

This work concerns discrete-time Markov decision processes with finite state space and bounded costs per stage. The decision maker ranks random costs via the expectation of the utility function associated to a constant risk sensitivity coe‰cient, and the performance of a control policy is measured by the corresponding (long-run) risk-sensitive average cost criterion. The main structural restriction on the system is the following communication assumption: For every pair of states x and y, there exists a policy p, possibly depending on x and y, such that when the system evolves under p starting at x, the probability of reaching y is positive. Within this framework, the paper establishes the existence of solutions to the optimality equation whenever the constant risk sensitivity coe‰cient does not exceed certain positive value.

On weak conditions and optimality inequality solutions in risk-sensitive controlled Markov processes with average criterion

Proceedings of the 41st IEEE Conference on Decision and Control, 2002.

A standard approach to the problem of finding optimal policies for controlled Markov processes with average cost is based on the existence of solutions to an average optimality equation [1,9,12], or an average o p timality inequality, see 16, 131. In the latter, conditions are imposed on the solutions to the inequalities such that if one such solution is found, then optimal policies are obtained for all values of the state. In [lo], such conditions are relaxed, at the expense that perhaps o p timal policies are characterized for only a proper subset of the state space. Motivated by the work in [lo], optimality inequality results were presented in [SI, for the risk-sensitive case, purposely trying to emulate in the risk-sensitive case what had been done previously for the risk-neutral case. However, a s it is illustrated in the sequel, the results in [8] exhibit an acute fragzlzty not present in their risk-null counterparts. and, irregardless of the previous states and actions, the state of the system at time t i 1 will be Xr+l = y E X with probability P (y I qa). We will restrict attention to stationary deterministic policies, that is, rules for prescribing how to choose actions by means of a decision function f : X + A. Such a policy will be denoted by f", meaning that action f(x) is chosen if the system is in state Xt = x, irregardless of the time/epoch t. Following standard notation, we sill denote by Pf and Ef respectively the probabilit,y measure and the expectation operator induced by the policy f" on the canonical product space (X-,P) [I, 91. The performance index used here is the exponential auerage cost (EAC), which is the (exponential utility) risk-sensitive version of the well known (risk-neutral) average cost (e.g., see [I, 2, 4, 5, 7, 91,). The EAC corresponding to a policy f" is defined as 1 1 J f (y , z) := limsup-logE,f [exp(rS,)]: 1 Introduction. n-" n y In this paper we study the standard model for a discrete-time controlled Markov process, or chain (CMC), specified by the four tuple @,A, P,C), where X, the state space, is a countable or finite set; A, the action space, is a finite set; P is a transition probability kernel from K := X x A to X and C : K-+ [0, K ] , K > 0, is the cost per stage function, see [l, 9> 121. For this type of model, the probability kernel is specified by means of a set of matrices { P (a) : a E A}, so that P,,(a) := P (y I $,a) gives the probability of a transhion from state x to y, under action a. The evolution of the controlled Markov chain {Xn} is as follows. At each time t E {O, 1,.. . } the state of the system is observed, say X, = x E X, and an action At = a E A is chosen. Then a cost C(z, Q) is incurred

Constrained discounted Markov decision processes with Borel state spaces

Automatica, 2020

We study discrete-time discounted constrained Markov decision processes (CMDPs) with Borel state and action spaces. These CMDPs satisfy either weak (W) continuity conditions, that is, the transition probability is weakly continuous and the reward function is upper semicontinuous in state-action pairs, or setwise (S) continuity conditions, that is, the transition probability is setwise continuous and the reward function is upper semicontinuous in actions. Our main goal is to study models with unbounded reward functions, which are often encountered in applications, e.g., in consumption/investment problems. We provide some general assumptions under which the optimization problems in CMDPs are solvable in the class of randomized stationary policies and in the class of chattering policies introduced in this paper. If the initial distribution and transition probabilities are atomless, then using a general "purification result" of Feinberg and Piunovskiy we show the existence of a deterministic (stationary) optimal policy. Our main results are illustrated by examples.

Nonstationary Continuous-Time Markov Control Processes with Discounted Costs on Infinite Horizon

2001

This paper concerns nonstationary continuous-time Markov control processes on Polish spaces, with the infinite-horizon discounted cost criterion. Necessary and sufficient conditions are given for a control policy to be optimal and asymptotically optimal. In addition, under suitable hypotheses, it is shown that the successive approximation procedure converges in the sense that the sequence of finite-horizon optimal cost functions and the corresponding optimal control policies both converge. : 93E20, 49K45.

Markov Decision Processes on Borel Spaces with Total Cost and Random Horizon

Journal of Optimization Theory and Applications, 2013

In this paper, an optimal control problem with the expected total cost as performance criterion is considered. In this case, a random horizon and a nonzero terminal cost in the performance criterion is supposed. The terminal cost depends on the state of the system at the random occurrence time of termination of the process. Then, under the assumption that the random horizon is independent of the stochastic control process and its probability distribution has a finite support, the dynamic programming equation is obtained, which solves the problem proposed. Also, two examples are included, one is a linear quadratic control problem and other is an inventory control problem, both with a random horizon and a nonzero terminal cost.