Reinforcement Learning via Recurrent Convolutional Neural Networks (original) (raw)

Solving deep memory POMDPs with recurrent policy gradients

2007

This paper presents Recurrent Policy Gradients, a model-free reinforcement learning (RL) method creating limited-memory sto-chastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a “Long Short-Term Memory” architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.

Deep Reactive Policies for Planning in Stochastic Nonlinear Domains

Proceedings of the AAAI Conference on Artificial Intelligence, 2019

Recent advances in applying deep learning to planning have shown that Deep Reactive Policies (DRPs) can be powerful for fast decision-making in complex environments. However, an important limitation of current DRP-based approaches is either the need of optimal planners to be used as ground truth in a supervised learning setting or the sample complexity of high-variance policy gradient estimators, which are particularly troublesome in continuous state-action domains. In order to overcome those limitations, we introduce a framework for training DRPs in continuous stochastic spaces via gradient-based policy search. The general approach is to explicitly encode a parametric policy as a deep neural network, and to formulate the probabilistic planning problem as an optimization task in a stochastic computation graph by exploiting the re-parameterization of the transition probability densities; the optimization is then solved by leveraging gradient descent algorithms that are able to handle...

Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing

Machine Learning and Knowledge Extraction

The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. Although efficient algorithms are being widely used, it seems essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. In this overview, we introduce Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. A follow-up paper will...

Discovering symbolic policies with deep reinforcement learning

2021

Deep reinforcement learning (DRL) has proven successful for many difficult control problems by learning policies represented by neural networks. However, the complexity of neural network-based policies—involving thousands of composed nonlinear operators—can render them problematic to understand, trust, and deploy. In contrast, simple policies comprising short symbolic expressions can facilitate human understanding, while also being transparent and exhibiting predictable behavior. To this end, we propose deep symbolic policy, a novel approach to directly search the space of symbolic policies. We use an autoregressive recurrent neural network to generate control policies represented by tractable mathematical expressions, employing a risk-seeking policy gradient to maximize performance of the generated policies. To scale to environments with multidimensional action spaces, we propose an “anchoring” algorithm that distills pre-trained neural network-based policies into fully symbolic po...

Constrained representation learning for recurrent policy optimisation under uncertainty

Adaptive Behavior

Learning to make decisions in partially observable environments is a notorious problem that requires a complex representation of controllers. In most work, the controllers are designed as a non-linear mapping from a sequence of temporal observations to actions. These problems can, in principle, be formulated as a partially observable Markov decision process whose policy can be parameterised through the use of recurrent neural networks. In this paper, we will propose an alternative framework that (a) uses the Long-Short-Term-Memory (LSTM) Encoder-Decoder framework to learn an internal state representation for historical observations and then (b) integrates it into existing recurrent policy models to improve the task performance. The LSTM Encoder encodes a history of observations as input into a representation of internal states. The LSTM Decoder can perform two alternative decoding tasks: predicting the same input observation sequence or predicting future observation sequences. The f...

Deep reinforcement learning using compositional representations for performing instructions

2018

Spoken language is one of the most efficient ways to instruct robots about performing domestic tasks. However , the state of the environment has to be considered to plan and execute actions successfully. We propose a system that learns to recognise the user's intention and map it to a goal. A reinforcement learning (RL) system then generates a sequence of actions toward this goal considering the state of the environment. A novel contribution in this paper is the use of symbolic representations for both input and output of a neural Deep Q-network (DQN), which enables it to be used in a hybrid system. To show the effectiveness of our approach, the Tell-Me-Dave corpus is used to train an intention detection model and in a second step an RL agent generates the sequences of actions towards the detected objective, represented by a set of state predicates. We show that the system can successfully recognise command sequences from this corpus as well as train the deep-RL network with symbolic input. We further show that the performance can be significantly increased by exploiting the symbolic representation to generate intermediate rewards .

DEEP REINFORCEMENT LEARNING: A SURVEY

IAEME PUBLICATION, 2020

Reinforcement learning (RL) is poised to revolutionize the sector of AI, and represents a step toward building autonomous systems with a higher-level understanding of the real world. Currently, Deep Learning (DL) is enabling reinforcement learning (RL) to scale to issues that were previously intractable, like learning to play video games directly from pixels. Deep Reinforcement Learning (DRL) algorithms are applied to AI, allowing control policies for robots to be learned directly from camera inputs within the world. The success of Reinforcement Learning (RL) is because of its strong mathematical roots within the principles of deep learning, Monte Carlo simulation, function approximation, and Artificial Intelligence (AI). Topics treated in some details during this survey are: Temporal variations, Q-Learning, semi-MDPs and stochastic games. Many recent advances in Deep Reinforcement Learning (DRL), eg. Policy gradients and hierarchical Reinforcement Learning (RL), are covered besides references. Pointers to various examples of applications are provided. Since no presently available technique works in all situations, this paper tends to propose guidelines for using previous information regarding the characteristics of the control problem at hand to decide on the suitable experience replay strategy.

Reinforcement Learning via Recurrent Convolutional Neural Networks (original) (raw)

Related papers