Incorporating External Evidence in Reinforcement Learning via Power Prior Bayesian Analysis (original) (raw)

Sample Efficient Bayesian Reinforcement Learning

2020

Artificial Intelligence (AI) has been an active field of research for over a century now. The research field of AI may be grouped into various tasks that are expected from an intelligent agent; two major ones being learning & inference and planning. The act of storing new knowledge is known as learning while inference refers to the act to extracting conclusions given agent’s limited knowledge base. They are tightly knit by the design of its knowledge base. The process of deciding long-term actions or plans given its current knowledge is called planning. Reinforcement Learning (RL) brings together these two tasks by posing a seemingly benign question “How to act optimally in an unknown environment?”. This requires the agent to learn about its environment as well as plan actions given its current knowledge about it. In RL, the environment can be represented by a mathematical model and we associate an intrinsic value to the actions that the agent may choose. In this thesis, we present ...

Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning

arXiv: Learning, 2020

Bayesian reinforcement learning (BRL) offers a decision-theoretic solution for reinforcement learning. While "model-based" BRL algorithms have focused either on maintaining a posterior distribution on models or value functions and combining this with approximate dynamic programming or tree search, previous Bayesian "model-free" value function distribution approaches implicitly make strong assumptions or approximations. We describe a novel Bayesian framework, Inferential Induction, for correctly inferring value function distributions from data, which leads to the development of a new class of BRL algorithms. We design an algorithm, Bayesian Backwards Induction, with this framework. We experimentally demonstrate that the proposed algorithm is competitive with respect to the state of the art.

Inferential Induction: Joint Bayesian Estimation of MDPs and Value Functions

2020

Bayesian reinforcement learning (BRL) offers a decision-theoretic solution to the problem of reinforcement learning. However, typical model-based BRL algorithms have focused either on ma intaining a posterior distribution on models or value functions and combining this with approx imate dynamic programming or tree search. This paper describes a novel backwards induction pri nciple for performing joint Bayesian estimation of models and value functions, from which many new BRL algorithms can be obtained. We demonstrate this idea with algorithms and experiments in discrete state spaces.

Bayesian models of nonstationary markov decision processes

2005

Abstract Standard reinforcement learning algorithms generate polices that optimize expected future rewards in a priori unknown domains, but they assume that the domain does not change over time. Prior work cast the reinforcement learning problem as a Bayesian estimation problem, using experience data to condition a probability distribution over domains.

Bayesian Policy Optimization for Model Uncertainty

2019

Addressing uncertainty is critical for autonomous systems to robustly adapt to the real world. We formulate the problem of model uncertainty as a continuous Bayes-Adaptive Markov Decision Process (BAMDP), where an agent maintains a posterior distribution over latent model parameters given a history of observations and maximizes its expected long-term reward with respect to this belief distribution. Our algorithm, Bayesian Policy Optimization, builds on recent policy optimization algorithms to learn a universal policy that navigates the exploration-exploitation trade-off to maximize the Bayesian value function. To address challenges from discretizing the continuous latent parameter space, we propose a new policy network architecture that encodes the belief distribution independently from the observable state. Our method significantly outperforms algorithms that address model uncertainty without explicitly reasoning about belief distributions and is competitive with state-of-the-art P...

A Bayesian Approach to Model Learning in Non-Markovian Environments

Most of the reinforcement learning (RL) algorithms assume that the learning processes of embedded agents can be formulated as Markov Decision Processes (MDPs). However , the assumption is not valid for many realistic problems. Therefore, research o n RL techniques for non-Markovian environments is gaining more attention recently. We h a ve developed a Bayesian approach t o R L i n n o n-Markovian environments, in which t h e e n vi-ronment is modeled as a history tree model, a stochastic model with variable memory length. In our approach, given a class of history trees, the agent explores the environment and learns the maximum a posteriori (MAP) model on the basis of Bayesian Statistics. The optimal policy can be computed by Dynamic Programming, after the agent has learned the environment model. Unlike many other model learning techniques, our approach does not suuer from the problems of noise and overrtting, thanks to the Bayesian framework. We have analyzed the asymptotic behavior of the proposed algorithm and have proved that if the given class contains the exact model of the environment, the model learned by our algorithm converges to it. We also present the results of our experiments in two non-Markovian environments.

Bayesian Reinforcement Learning in Factored POMDPs

2019

Model-based Bayesian Reinforcement Learning (BRL) provides a principled solution to dealing with the exploration-exploitation trade-off, but such methods typically assume a fully observable environments. The few Bayesian RL methods that are applicable in partially observable domains, such as the Bayes-Adaptive POMDP (BA-POMDP), scale poorly. To address this issue, we introduce the Factored BA-POMDP model (FBA-POMDP), a framework that is able to learn a compact model of the dynamics by exploiting the underlying structure of a POMDP. The FBA-POMDP framework casts the problem as a planning task, for which we adapt the Monte-Carlo Tree Search planning algorithm and develop a belief tracking method to approximate the joint posterior over the state and model variables. Our empirical results show that this method outperforms a number of BRL baselines and is able to learn efficiently when the factorization is known, as well as learn both the factorization and the model parameters simultaneo...

PAC-Bayesian Policy Evaluation for Reinforcement Learning

arXiv preprint arXiv:1202.3717, 2012

Abstract: Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first PAC-Bayesian bound for the batch reinforcement learning problem with function approximation. We show how this ...