The vanishing approach for the average continuous control of piecewise deterministic Markov processes (original) (raw)

The vanishing discount approach for the average continuous control of piecewise deterministic Markov processes

Journal of Applied Probability, 2009

This work is concerned with the existence of an optimal control strategy for the long-run average continuous control problem of piecewise-deterministic Markov processes (PDMPs). In Costa and Dufour (2008), sufficient conditions were derived to ensure the existence of an optimal control by using the vanishing discount approach. These conditions were mainly expressed in terms of the relative difference of the α-discount value functions. The main goal of this paper is to derive tractable conditions directly related to the primitive data of the PDMP to ensure the existence of an optimal control. The present work can be seen as a continuation of the results derived in Costa and Dufour (2008). Our main assumptions are written in terms of some integro-differential inequalities related to the so-called expected growth condition, and geometric convergence of the post-jump location kernel associated to the PDMP. An example based on the capacity expansion problem is presented, illustrating the...

Average Continuous Control of Piecewise Deterministic Markov Processes

SIAM Journal on Control and Optimization

This paper deals with the long run average continuous control problem of piecewise deterministic Markov processes (PDMP's) taking values in a general Borel space and with compact action space depending on the state variable. The control variable acts on the jump rate and transition measure of the PDMP, and the running and boundary costs are assumed to be positive but not necessarily bounded. Our first main result is to obtain an optimality equation for the long run average cost in terms of a discrete-time optimality equation related to the embedded Markov chain given by the post-jump location of the PDMP. Our second main result guarantees the existence of a feedback measurable selector for the discrete-time optimality equation by establishing a connection between this equation and an integro-differential equation. Our final main result is to obtain some sufficient conditions for the existence of a solution for a discrete-time optimality inequality and an ordinary optimal feedback control for the long run average cost using the so-called vanishing discount approach (see [16], page 83).

Singular Perturbation for the Discounted Continuous Control of Piecewise Deterministic Markov Processes

Applied Mathematics & Optimization, 2011

This paper deals with the expected discounted continuous control of piecewise deterministic Markov processes (PDMP's) using a singular perturbation approach for dealing with rapidly oscillating parameters. The state space of the PDMP is written as the product of a finite set and a subset of the Euclidean space R n. The discrete part of the state, called the regime, characterizes the mode of operation of the physical system under consideration, and is supposed to have a fast (associated to a small parameter > 0) and a slow behavior. By using a similar approach as developed in Yin and Zhang (Continuous-Time Markov Chains and Applications: A Singular Perturbation Approach, Applications of Mathematics, vol. 37, Springer, New York, 1998, Chaps. 1 and 3) the idea in this paper is to reduce the number of regimes by considering an averaged model in which the regimes within the same class are aggregated through the quasi-stationary distribution so that the different states in this class are replaced by a single one. The main goal is to show that the value function of the control problem for the system driven by the perturbed Markov chain converges to the value function of this limit control problem as goes to zero. This convergence is obtained by, roughly speaking, showing that the infimum and supremum limits of the value functions satisfy two optimality inequalities as goes to zero. This enables O.L.V.

Optimal Control of Piecewise Deterministic Markov Processes with Finite Time Horizon

2010

In this paper we study controlled Piecewise Deterministic Markov Processes with finite time horizon and unbounded rewards. Using an embedding procedure we reduce these problems to discrete-time Markov Decision Processes. Under some continuity and compactness conditions we establish the existence of an optimal policy and show that the value function is the unique solution of the Bellman equation. It is remarkable that this statement is true for unbounded rewards and without any contraction assumptions. Further conditions imply the existence of optimal nonrelaxed controls. We highlight our findings by two examples from financial mathematics.

The Policy Iteration Algorithm for Average Continuous Control of Piecewise Deterministic Markov Processes

Applied Mathematics & Optimization, 2010

The main goal of this paper is to apply the so-called policy iteration algorithm (PIA) for the long run average continuous control problem of piecewise deterministic Markov processes (PDMP's) taking values in a general Borel space and with compact action space depending on the state variable. In order to do that we first derive some important properties for a pseudo-Poisson equation associated to the problem. In the sequence it is shown that the convergence of the PIA to a solution satisfying the optimality equation holds under some classical hypotheses and that this optimal solution yields to an optimal control strategy for the average control problem for the continuoustime PDMP in a feedback form.

Time and Ratio Expected Average Cost Optimality for Semi-Markov Control Processes on Borel Spaces

Communications in Statistics - Theory and Methods, 2004

We deal with semi-Markov control models with Borel state and control spaces, and unbounded cost functions under the ratio and the time expected average cost criteria. Under suitable growth conditions on the costs and the mean holding times together with stability conditions on the embedded Markov chains, we show the following facts: (i) the ratio and the time average costs coincide in the class of the stationary policies; (ii) there exists an stationary policy which is optimal for both criteria. Moreover, we provide a generalization of the classical Wald's Lemma to semi-Markov processes. These results are obtained combining the existence of solutions of the average cost optimality equation and the Optional Stopping Theorem.

Bias Optimality versus Strong 0Discount Optimality in Markov Control Processes with Unbounded Costs

Acta Applicandae Mathematicae, 2003

This paper deals with expected average cost (EAC) and discount-sensitive criteria for discrete-time Markov control processes on Borel spaces, with possibly unbounded costs. Conditions are given under which (a) EAC optimality and strong -1-discount optimality are equivalent; (b) strong 0-discount optimality implies bias optimality; and, conversely, under an additional hypothesis, (c) bias optimality implies strong 0-discount optimality. Thus, in particular,

On weak conditions and optimality inequality solutions in risk-sensitive controlled Markov processes with average criterion

Proceedings of the 41st IEEE Conference on Decision and Control, 2002.

A standard approach to the problem of finding optimal policies for controlled Markov processes with average cost is based on the existence of solutions to an average optimality equation [1,9,12], or an average o p timality inequality, see 16, 131. In the latter, conditions are imposed on the solutions to the inequalities such that if one such solution is found, then optimal policies are obtained for all values of the state. In [lo], such conditions are relaxed, at the expense that perhaps o p timal policies are characterized for only a proper subset of the state space. Motivated by the work in [lo], optimality inequality results were presented in [SI, for the risk-sensitive case, purposely trying to emulate in the risk-sensitive case what had been done previously for the risk-neutral case. However, a s it is illustrated in the sequel, the results in [8] exhibit an acute fragzlzty not present in their risk-null counterparts. and, irregardless of the previous states and actions, the state of the system at time t i 1 will be Xr+l = y E X with probability P (y I qa). We will restrict attention to stationary deterministic policies, that is, rules for prescribing how to choose actions by means of a decision function f : X + A. Such a policy will be denoted by f", meaning that action f(x) is chosen if the system is in state Xt = x, irregardless of the time/epoch t. Following standard notation, we sill denote by Pf and Ef respectively the probabilit,y measure and the expectation operator induced by the policy f" on the canonical product space (X-,P) [I, 91. The performance index used here is the exponential auerage cost (EAC), which is the (exponential utility) risk-sensitive version of the well known (risk-neutral) average cost (e.g., see [I, 2, 4, 5, 7, 91,). The EAC corresponding to a policy f" is defined as 1 1 J f (y , z) := limsup-logE,f [exp(rS,)]: 1 Introduction. n-" n y In this paper we study the standard model for a discrete-time controlled Markov process, or chain (CMC), specified by the four tuple @,A, P,C), where X, the state space, is a countable or finite set; A, the action space, is a finite set; P is a transition probability kernel from K := X x A to X and C : K-+ [0, K ] , K > 0, is the cost per stage function, see [l, 9> 121. For this type of model, the probability kernel is specified by means of a set of matrices { P (a) : a E A}, so that P,,(a) := P (y I $,a) gives the probability of a transhion from state x to y, under action a. The evolution of the controlled Markov chain {Xn} is as follows. At each time t E {O, 1,.. . } the state of the system is observed, say X, = x E X, and an action At = a E A is chosen. Then a cost C(z, Q) is incurred

Nonstationary Continuous-Time Markov Control Processes with Discounted Costs on Infinite Horizon

2001

This paper concerns nonstationary continuous-time Markov control processes on Polish spaces, with the infinite-horizon discounted cost criterion. Necessary and sufficient conditions are given for a control policy to be optimal and asymptotically optimal. In addition, under suitable hypotheses, it is shown that the successive approximation procedure converges in the sense that the sequence of finite-horizon optimal cost functions and the corresponding optimal control policies both converge. : 93E20, 49K45.

Average optimality for continuous-time Markov decision processes in Polish spaces

The Annals of Applied Probability, 2006

This paper is devoted to studying the average optimality in continuous-time Markov decision processes with fairly general state and action spaces. The criterion to be maximized is expected average rewards. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We first provide two optimality inequalities with opposed directions, and also give suitable conditions under which the existence of solutions to the two optimality inequalities is ensured. Then, from the two optimality inequalities we prove the existence of optimal (deterministic) stationary policies by using the Dynkin formula. Moreover, we present a "semimartingale characterization" of an optimal stationary policy. Finally, we use a generalized Potlach process with control to illustrate the difference between our conditions and those in the previous literature, and then further apply our results to average optimal control problems of generalized birth-death systems, upwardly skip-free processes and two queueing systems. The approach developed in this paper is slightly different from the "optimality inequality approach" widely used in the previous literature. . This reprint differs from the original in pagination and typographic detail. 1 2 X. GUO AND U. RIEDER specified by four primitive data: a state space S; an action space A with subsets A(x) of admissible actions, which may depend on the current state x ∈ S; transition rates q(·|x, a); and reward (or cost) rates r(x, a). Using these terms, we now briefly describe some existing works on the expected average criterion. When the state space is finite, a bounded solution to the average optimality equation (AOE) and methods for computing optimal stationary policies have been investigated in . Since then, most work has focused on the case of a denumerable state space; for instance, see for bounded transition and reward rates, for bounded transition rates but unbounded reward rates, [16, 35] for unbounded transition rates but bounded reward rates and [12, 13, 17] for unbounded transition and reward rates. For the case of an arbitrary state space, to the best of our knowledge, only Doshi [5] and Hernández-Lerma [19] have addressed this issue. They ensured the existence of optimal stationary policies. However, the treatments in [5] and [19] are restricted to uniformly bounded reward rates and nonnegative cost rates, respectively, and the AOE plays a key role in the proof of the existence of average optimal policies. Moreover, to establish the AOE, Doshi [5] needed the hypothesis that all admissible action sets are finite and the relative difference of the optimal discounted value function is equicontinuous, whereas in [19] the assumption about the existence of a solution to the AOE is imposed. On the other hand, it is worth mentioning that some of the conditions in are imposed on the family of weak infinitesimal operators deduced from all admissible policies, instead of the primitive data. In this paper we study the much more general case. That is, the reward rates may have neither upper nor lower bounds, all of the state and action spaces are fairly general and the transition rates are allowed to be unbounded. We first provide two optimality inequalities rather than one for the "optimality inequality approach" used in , for instance. Under suitable assumptions we not only prove the existence of solutions to the two optimality inequalities, but also ensure the existence of optimal stationary policies by using the two inequalities and the Dynkin formula. Also, to verify our assumptions, we further give sufficient conditions which are imposed on the primitive data. Moreover, we present a semimartingale characterization of an optimal stationary policy. Finally, we use controlled generalized Potlach processes to show that all conditions in this paper are satisfied, whereas the earlier conditions fail to hold. Then we further apply our results to average optimal control problems of generalized birth-death systems and upwardly skip-free processes [1], a pair of controlled queues in tandem , and M/M/N/0 queue systems . It should be noted that, on the one hand, the optimality inequality approach used in the previous literature (see, e.g., for continuous-time MDPs and [20, 21, 31, 34] for discrete-time MDPs) is not applied to our case, because in our model the reward rates may have neither upper nor lower bounds. On the other hand, we not only CONTINUOUS-TIME MARKOV DECISION PROCESSES 3