Optimal sensor scheduling via classification reduction of policy search (CROPS (original) (raw)

Article Intelligent Sensing in Dynamic Environments Using Markov Decision Process

2011

In a network of low-powered wireless sensors, it is essential to capture as many environmental events as possible while still preserving the battery life of the sensor node. This paper focuses on a real-time learning algorithm to extend the lifetime of a sensor node to sense and transmit environmental events. A common method that is generally adopted in ad-hoc sensor networks is to periodically put the sensor nodes to sleep. The purpose of the learning algorithm is to couple the sensor's sleeping behavior to the natural statistics of the environment hence that it can be in optimal harmony with changes in the environment, the sensors can sleep when steady environment and stay awake when turbulent environment. This paper presents theoretical and experimental validation of a reward based learning algorithm that can be implemented on an embedded sensor. The key contribution of the proposed approach is the design and implementation of a reward function that satisfies a trade-off between the above two mutually contradicting objectives, and a linear critic function to approximate the discounted sum of future rewards in order to perform policy learning.

Intelligent Sensing in Dynamic Environments Using Markov Decision Process

…, 2011

In a network of low-powered wireless sensors, it is essential to capture as many environmental events as possible while still preserving the battery life of the sensor node. This paper focuses on a real-time learning algorithm to extend the lifetime of a sensor node to sense and transmit environmental events. A common method that is generally adopted in ad-hoc sensor networks is to periodically put the sensor nodes to sleep. The purpose of the learning algorithm is to couple the sensor’s sleeping behavior to the natural statistics of the environment hence that it can be in optimal harmony with changes in the environment, the sensors can sleep when steady environment and stay awake when turbulent environment. This paper presents theoretical and experimental validation of a reward based learning algorithm that can be implemented on an embedded sensor. The key contribution of the proposed approach is the design and implementation of a reward function that satisfies a trade-off between the above two mutually contradicting objectives, and a linear critic function to approximate the discounted sum of future rewards in order to perform policy learning.

Thrishantha Nanayakkara, Malka N. Halgamuge, Prasanna Sridhar, and Asad M. Madni, ``Intelligent Sensing in Dynamic Environments Using Markov Decision Process'', Sensors, vol. 11, no. 1, pp. 1229-1242, 2011.

In a network of low-powered wireless sensors, it is essential to capture as many environmental events as possible while still preserving the battery life of the sensor node. This paper focuses on a real-time learning algorithm to extend the lifetime of a sensor node to sense and transmit environmental events. A common method that is generally adopted in ad-hoc sensor networks is to periodically put the sensor nodes to sleep. The purpose of the learning algorithm is to couple the sensor’s sleeping behavior to the natural statistics of the environment hence that it can be in optimal harmony with changes in the environment, the sensors can sleep when steady environment and stay awake when turbulent environment. This paper presents theoretical and experimental validation of a reward based learning algorithm that can be implemented on an embedded sensor. The key contribution of the proposed approach is the design and implementation of a reward function that satisfies a trade-off between the above two mutually contradicting objectives, and a linear critic function to approximate the discounted sum of future rewards in order to perform policy learning.

Integrating distributed Bayesian inference and reinforcement learning for sensor management

2009

This paper introduces a sensor management approach that integrates distributed Bayesian inference (DBI) and reinforcement learning (RL). DBI is implemented using distributed perception networks (DPNs), a multiagent approach to performing efficient inference, while RL is used to automatically discover a mapping from the beliefs generated by the DPNs to the actions that enable active sensors to gather the most useful observations. The resulting method is evaluated on a simulation of a chemical leak localization task and the results demonstrate 1) that the integrated approach can learn policies that perform effective sensor management, 2) that inference based on a correct observation model, which the DPNs make feasible, is critical to performance, and 3) that the system scales to larger versions of the task.

(.pdf)

The problem of sensor scheduling in multimodal sensing systems is formulated as the sequential choice of experiments problem and solved via reinforcement learning methods. The sequential choice of experiments problem is a partially observed Markov decision problem (POMDP) in which the underlying state of nature is the system's state and the sensors' data are noisy state observations. The goal is to find a policy that sequentially determines the best sensor to deploy based on past data, which maximizes a given utility function while minimizing the deployment cost. Several examples are considered in which the exact model of the measurements given the state of nature is unknown but a generative model (a simulation or an experiment) is available. The problem is formulated as a reinforcement learning problem and solved via a reduction to a sequence of supervised classification subproblems. Finally, a simulation and an experiment with real data demonstrate the promise of our approach.

Simulation-based optimal sensor scheduling with application to observer trajectory planning

2007

The sensor scheduling problem can be formulated as a controlled hidden Markov model and this paper solves the problem when the state, observation and action spaces are continuous. This general case is important as it is the natural framework for many applications. The aim is to minimise the variance of the estimation error of the hidden state wrt the action sequence. We present a novel simulation-based method that uses a stochastic gradient algorithm to find optimal actions.

Dynamic Sensor Policies

1994

When an agent’s task environment is largely benign and partially predictable (although uncertain), and the goals involve accomplishing tasks, we can make the agent more adaptive by planning to acquire unknown or uncertain information during execution of the task. Of course we pay a price for this flexibility. In this paper we dicuss a strategy for measuring this price in a realistic way and reducing it by making rational decisions about how to acquire unknown environmental information with imperfect sensors. Ultimately, we are interested in a generM framework for making optimal sensor decisions which will minimize a cost (or set of costs) we expect to incur by employing sensors. In particular, we propose to generate a tree of possible sensing policies offline (using dynamic programming), cache the optimal sensor decisions at each level, and subsequently use actual world states as indexes into this structure to make a rational sensor choice online. Because no states are discarded in ...

On a stochastic sensor selection algorithm with applications in sensor scheduling and sensor coverage

Automatica, 2006

In this note we consider the following problem. Suppose a set of sensors is jointly trying to estimate a process. One sensor takes a measurement at every time step and the measurements are then exchanged among all the sensors. What is the sensor schedule that results in the minimum error covariance? We describe a stochastic sensor selection strategy that is easy to implement and is computationally tractable. The problem described above comes up in many domains out of which we discuss two. In the sensor selection problem, there are multiple sensors that cannot operate simultaneously (e.g., sonars in the same frequency band). Thus measurements need to be scheduled. In the sensor coverage problem, a geographical area needs to be covered by mobile sensors each with limited range. Thus from every position, the sensors obtain a different view-point of the area and the sensors need to optimize their trajectories. The algorithm is applied to these problems and illustrated through simple examples. ᭧

Rollout strategy for Hidden Markov Model (HMM)-based dynamic sensor scheduling

2007 IEEE International Conference on Systems, Man and Cybernetics, 2007

In this paper, a hidden Markov model (HMM)-based dynamic sensor scheduling problem is formulated, and solved using rollout concepts to overcome the computational intractability of the dynamic programming (DP) recursion. The problem considered here involves dynamically sequencing a set of sensors to minimize the sum of sensor cost and the HMM state estimation error cost. The surveillance task is modeled as a single HMM with multiple emission matrices corresponding to each of the sensors. The rollout information gain (RIG) algorithm proposed herein employs the information gain (IG) heuristic as the base algorithm. The RIG algorithm is illustrated on an intelligence, surveillance, and reconnaissance (ISR) scenario of a village for the presence of weapons and terrorists/refugees. Extension of the RIG strategy to monitor multiple HMMs involves combining the information gain heuristic with the auction algorithm that computes the κ -best assignments at each decision epoch of rollout.