Sensor Scheduling for Optimal Observability Using Estimation Entropy (original) (raw)

The Optimal Observability of Partially Observable Markov Decision Processes: Discrete State Space

2010

Abstract We consider autonomous partially observable Markov decision processes where the control action influences the observation process only. Considering entropy as the cost incurred by the Markov information state process, the optimal observability problem is posed as a Markov decision scheduling problem that minimizes the infinite horizon cost. This scheduling problem is shown to be equivalent to minimization of an entropy measure, called estimation entropy which is related to the invariant measure of the information state.

Computing optimal policies for partially observable decision processes using compact representations

Proceedings of the National Conference on …, 1996

Partially-observable Markov decision processes provide a very general model for decision-theoretic planning problems, allowing the trade-offs between various courses of actions to be determined under conditions of uncertainty, and incorporating partial observations made by an agent. Dynamic programming algorithms based on the information or belief state of an agent can be used to construct optimal policies without explicit consideration of past history, but at high computational cost. In this paper, we discuss how structured representations of the system dynamics can be incorporated in classic POMDP solution algorithms. We use Bayesian networks with structured conditional probability matrices to represent POMDPs, and use this representation to structure the belief space for POMDP algorithms. This allows irrelevant distinctions to be ignored. Apart from speeding up optimal policy construction, we suggest that such representations can be exploited to great extent in the development of useful approximation methods.

Good policies for partially-observable Markov decision processes are hard to find

Optimal policy computation in nite-horizon Markov decision processes is a classical problem in optimization with lots of pratical applications. For stationary policies and in nite horizon it is known to be solvable in polynomial time by linear programming, whereas for nite-horizon it is a longstanding open problem. We consider this problem for a slightly generalized model, namely partially-observable Markov decision processes (POMDPs). We show that it is NP-complete and that { unless P = NP { the optimal policy cannot be polynomial time "-approximated for any " < 1. A similar result is shown for the average policy. The problem of whether the average performance is positive is shown to be PPcomplete for stationary policies, and PL-complete for time-dependent policies, and of whether the median performance is positive is PP-complete for both types of policies. Furthermore, we systematically investigate the complexity of stationary, time-and history-dependent policy existence problems for POMDPs in di erent compressed representations.

Partially observable markov decision process approximations for adaptive sensing

2009

Abstract Adaptive sensing involves actively managing sensor resources to achieve a sensing task, such as object detection, classification, and tracking, and represents a promising direction for new applications of discrete event system methods. We describe an approach to adaptive sensing based on approximately solving a partially observable Markov decision process (POMDP) formulation of the problem.

Algorithms for partially observable Markov decision processes

Chapter 1 Introduction 1 1.1 Planning 1 1.2 Applications 4 1.3 Thesis 6 1.4 Outline 8 Chapter 2 POMDP Theory and Algorithms 11 2.1 POMDP Model 2.1.1 Model de nition 2.1.2 Belief states, policies and value functions 2.1.3 Belief space MDP 2.1.4 Value iteration 2.2 Properties of Value Functions 2.2.1 Policy tree 2.2.2 Piecewise linear and convex property 2.2.3 Parsimonious representations 2.3 Di culties in Solving a POMDP 2.4 Standard Algorithms 2.4.1 Value iteration vi 2.4.2 Policy iteration 2.5 Theoretical Results 2.6 An Overview of POMDP Algorithms 2.6.1 Decomposing value functions 2.6.2 Value iteration: superset algorithms 2.6.3 Value iteration: subset algorithms 2.7 Current Research Status Chapter 3 Modi ed Value Iteration 3.1 Motivation 3.2 Uniformly Improvable Value Function 3.3 Modi ed Value Iteration: the Algorithm 3.3.1 Backing up on witness points of input vectors 3.3.2 Retaining uniform improvability 3.3.3 The algorithm 3.3.4 Stopping point-based value iteration 3.3.5 Convergence of modi ed value iteration 3.3.6 Computing the Bellman residual 3.4 Empirical Studies 3.4.1 E ectiveness of point-based improvements 3.4.2 Variations of point-based DP update 3.5 Related Work 3.5.1 Point-based and standard DP updates 3.5.2 Point-based procedure and value function approximation 3.5.3 Previous work related to modi ed value iteration 3.6 Conclusion Chapter 4 Value Iteration over subspace 4.

Optimal sensor selection for discrete-event systems with partial observation

IEEE Transactions on Automatic Control, 2003

For discrete event systems under partial observation, we study the problem of selection of an optimal set of sensors that can provide sufficient yet minimal events observation information needed to accomplish the task at hand such as that of control or estimation. The sufficiency of the observed information is captured as the fulfillment of a desired formal property such as (co-)observability or normality (for control under partial observation), state-observability (for state estimation under partial observation), diagnosability (for failure diagnosis under partial observation), etc. A selection of sensors can be viewed as a selection of an observation mask and also of an equivalence class of events. A sensor set is called optimal if any coarser selection of the corresponding equivalence class of events results in some loss of the events observation information so that the task at hand cannot be accomplished, or equivalently the desired formal property cannot be fulfilled. We study an optimal selection of sensors over the set of general "non-projection" observation masks. We show that this problem is N P-hard in general. For mask-monotonic properties (that are preserved under increasing precision in events observation information, such as (co)-observability, normality, state-observability, diagnosability, etc.), we present a "top-down" and a "bottom-up" algorithm each of polynomial complexity. We show that observer-ness is not mask-monotonic. We show that the computational complexity can be further improved if the property is preserved under the projection via an intermediary observation mask that is an observer. Our results are obtained in a general setting so that they can be adapted for an optimal selection of sensors for a variety of applications in discrete event systems including (co-)observability, normality, diagnosability (single failure as well as repeated failures), state-observability, and invertibility.

Efficient estimation and control for Markov processes

Proceedings of 1995 34th IEEE Conference on Decision and Control

We consider the problem of sequential control for a finite state and action Markovian Decision Process with incomplete information regarding the transition probabilities P E @. Under suitable irreducibility assumptions for @, we construct adaptive policies that maximize the rate of convergence of realized rewards to that of the optimal (non adaptive) policy under complete information. These adaptive policies are specified via an easily computable index function, of states, controls and statistics, so that one takes a control with the largest index value in the current state in every period.

On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes

Annals of Operations Research, 1991

We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.

Optimal sensor scheduling via classification reduction of policy search (CROPS

The problem of sensor scheduling in multimodal sensing systems is formulated as the sequential choice of experiments problem and solved via reinforcement learning methods. The sequential choice of experiments problem is a partially observed Markov decision problem (POMDP) in which the underlying state of nature is the system's state and the sensors' data are noisy state observations. The goal is to find a policy that sequentially determines the best sensor to deploy based on past data, which maximizes a given utility function while minimizing the deployment cost. Several examples are considered in which the exact model of the measurements given the state of nature is unknown but a generative model (a simulation or an experiment) is available. The problem is formulated as a reinforcement learning problem and solved via a reduction to a sequence of supervised classification subproblems. Finally, a simulation and an experiment with real data demonstrate the promise of our approach.