RLOC: Neurobiologically Inspired Hierarchical Reinforcement Learning Algorithm for Continuous Control of Nonlinear Dynamical Systems (original) (raw)

Combining Reinforcement Learning And Optimal Control For The Control Of Nonlinear Dynamical Systems

PhD Thesis, 2016

This thesis presents a novel hierarchical learning framework, Reinforcement Learning Optimal Control, for controlling nonlinear dynamical systems with continuous states and actions. The adapted approach mimics the neural computations that allow our brain to bridge across the divide between symbolic action-selection and low-level actuation control by operating at two levels of abstraction. First, current findings demonstrate that at the level of limb coordination human behaviour is explained by linear optimal feedback control theory, where cost functions match energy and timing constraints of tasks. Second, humans learn cognitive tasks involving learning symbolic level action selection, in terms of both model-free and model-based reinforcement learning algorithms. We postulate that the ease with which humans learn complex nonlinear tasks arises from combining these two levels of abstraction. The Reinforcement Learning Optimal Control framework learns the local task dynamics from naive experience using an expectation maximization algorithm for estimation of linear dynamical systems and forms locally optimal Linear Quadratic Regulators, producing continuous low-level control. A high-level reinforcement learning agent uses these available controllers as actions and learns how to combine them in state space, while maximizing a long term reward. The optimal control costs form training signals for high-level symbolic learner. The algorithm demonstrates that a small number of locally optimal linear controllers can be combined in a smart way to solve global nonlinear control problems and forms a proof-of-principle to how the brain may bridge the divide between low-level continuous control and high-level symbolic action selection. It competes in terms of computational cost and solution quality with state-of-the-art control, which is illustrated with solutions to benchmark problems.

Efficient reinforcement learning: computational theories, neuroscience and robotics

Current Opinion in Neurobiology, 2007

Reinforcement learning algorithms have provided some of the most influential computational theories for behavioral learning that depends on reward and penalty. After briefly reviewing supporting experimental data, this paper tackles three difficult theoretical issues that remain to be explored. First, plain reinforcement learning is much too slow to be considered a plausible brain model. Second, although the temporaldifference error has an important role both in theory and in experiments, how to compute it remains an enigma. Third, function of all brain areas, including the cerebral cortex, cerebellum, brainstem and basal ganglia, seems to necessitate a new computational framework. Computational studies that emphasize meta-parameters, hierarchy, modularity and supervised learning to resolve these issues are reviewed here, together with the related experimental data.

A bioinspired hierarchical reinforcement learning architecture for modeling learning of multiple skills with continuous states and actions

2010

Organisms, and especially primates, are able to learn several skills while avoiding catastrophic interference and enhancing generalisation. This paper proposes a novel reinforcement learning (RL) architecture which has a number of features that make it suitable to investigate these phenomena. The model instantiates a mixture of expert architecture within a neural-network actor-critic system trained with the TD(λ) RL algorithm. The "responsibility signals" provided by the gating network are used both to weight the outputs of the multiple "expert" controllers and to modulate their learning. The model is tested in a simulated dynamic 2D robotic arm which autonomously learns to reach a target in (up to) three different conditions. The results show that the model is able to train same or different experts to solve the task(s) in the various conditions depending on the similarity of the sensorimotor mappings they require.

Optimal Control of Nonlinear Systems Using Experience Inference Human-Behavior Learning

IEEE/CAA Journal of Automatica Sinica, 2023

Safety critical control is often trained in a simulated environment to mitigate risk. Subsequent migration of the biased controller requires further adjustments. In this paper, an experience inference human-behavior learning is proposed to solve the migration problem of optimal controllers applied to real-world nonlinear systems. The approach is inspired in the complementary properties that exhibits the hippocampus, the neocortex, and the striatum learning systems located in the brain. The hippocampus defines a physics informed reference model of the real-world nonlinear system for experience inference and the neocortex is the adaptive dynamic programming (ADP) or reinforcement learning (RL) algorithm that ensures optimal performance of the reference model. This optimal performance is inferred to the real-world nonlinear system by means of an adaptive neocortex/striatum control policy that forces the nonlinear system to behave as the reference model. Stability and convergence of the proposed approach is analyzed using Lyapunov stability theory. Simulation studies are carried out to verify the approach.

A neural model of hierarchical reinforcement learning

Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2014

We present the first model capable of performing hierarchical reinforcement learning in a general, neurally detailed implementation. We show that this model is able to learn a spatial pickup and delivery task more quickly than one without hierarchical abilities. In addition, we show that this model is able to leverage its hierarchical structure to transfer learned knowledge between related tasks. These results point towards the advantages to be gained by using a hierarchical RL framework to understand the brain's powerful learning ability.

On the Reliability and Generalizability of Brain-inspired Reinforcement Learning Algorithms

ArXiv, 2020

Although deep RL models have shown a great potential for solving various types of tasks with minimal supervision, several key challenges remain in terms of learning from limited experience, adapting to environmental changes, and generalizing learning from a single task. Recent evidence in decision neuroscience has shown that the human brain has an innate capacity to resolve these issues, leading to optimism regarding the development of neuroscience-inspired solutions toward sample-efficient, and generalizable RL algorithms. We show that the computational model combining model-based and model-free control, which we term the prefrontal RL, reliably encodes the information of high-level policy that humans learned, and this model can generalize the learned policy to a wide range of tasks. First, we trained the prefrontal RL, and deep RL algorithms on 82 subjects' data, collected while human participants were performing two-stage Markov decision tasks, in which we manipulated the goa...

Hierarchical, Heterogeneous Control of Non-Linear Dynamical Systems using Reinforcement Learning

Non-adaptive methods are currently state of the art in approximating solutions to nonlinear optimal control problems. These carry a large computational cost associated with iterative calculations and have to be solve individually for different start and end points. In addition they may not scale well for real-world problems and require considerable tuning to converge. As an alternative, we present a novel hierarchical approach to non-Linear Control using Reinforcement Learning to choose between Heterogeneous Controllers, including localised optimal linear controllers and proportional-integral-derivative (PID) controllers, illustrating this with solutions to benchmark problems. We show that our approach (RLHC) competes in terms of computational cost and solution quality with state-of-the-art control algorithm iLQR, and offers a robust, flexible framework to address large scale non-linear control problems.

Reinforcement learning in intelligent control : a biologically-inspired approach to the relearning problem

1998

The increasingly complex demands placed on control systems have resulted in a need for intelligent control, an approach that attempts to meet these demands by emulating the capabilities found in biological systems. The need to exploit existing knowledge is a desirable feature of any intelligent control system, and this leads to the relearning problem. The problem arises when a control system is required to effectively learn new knowledge whilst exploiting still useful knowledge from past experiences. This thesis describes the adaptive critic system using reinforcement learning, a computational framework that can effectively address many of the demands in intelligent control, but is less effective when it comes to addressing the relearning problem. The thesis argues that biological mechanisms of reinforcement learning (and relearning) may provide inspiration for developing artificial intelligent control mechanisms that can better address the relearning problem. A conceptual model of ...