Hierarchical, Heterogeneous Control of Non-Linear Dynamical Systems using Reinforcement Learning (original) (raw)
Related papers
Combining Reinforcement Learning And Optimal Control For The Control Of Nonlinear Dynamical Systems
PhD Thesis, 2016
This thesis presents a novel hierarchical learning framework, Reinforcement Learning Optimal Control, for controlling nonlinear dynamical systems with continuous states and actions. The adapted approach mimics the neural computations that allow our brain to bridge across the divide between symbolic action-selection and low-level actuation control by operating at two levels of abstraction. First, current findings demonstrate that at the level of limb coordination human behaviour is explained by linear optimal feedback control theory, where cost functions match energy and timing constraints of tasks. Second, humans learn cognitive tasks involving learning symbolic level action selection, in terms of both model-free and model-based reinforcement learning algorithms. We postulate that the ease with which humans learn complex nonlinear tasks arises from combining these two levels of abstraction. The Reinforcement Learning Optimal Control framework learns the local task dynamics from naive experience using an expectation maximization algorithm for estimation of linear dynamical systems and forms locally optimal Linear Quadratic Regulators, producing continuous low-level control. A high-level reinforcement learning agent uses these available controllers as actions and learns how to combine them in state space, while maximizing a long term reward. The optimal control costs form training signals for high-level symbolic learner. The algorithm demonstrates that a small number of locally optimal linear controllers can be combined in a smart way to solve global nonlinear control problems and forms a proof-of-principle to how the brain may bridge the divide between low-level continuous control and high-level symbolic action selection. It competes in terms of computational cost and solution quality with state-of-the-art control, which is illustrated with solutions to benchmark problems.
RLOC, 2019
Nonlinear optimal control problems are often solved with numerical methods that require knowledge of system's dynamics which may be difficult to infer, and that carry a large computational cost associated with iterative calculations. We present a novel neurobiologically inspired hierarchical learning framework, Reinforcement Learning Optimal Control, which operates on two levels of abstraction and utilises a reduced number of controllers to solve nonlinear systems with unknown dynamics in continuous state and action spaces. Our approach is inspired by research at two levels of abstraction: first, at the level of limb coordination human behaviour is explained by linear optimal feedback control theory. Second, in cognitive tasks involving learning symbolic level action selection, humans learn such problems using model-free and model-based reinforcement learning algorithms. We propose that combining these two levels of abstraction leads to a fast global solution of nonlinear control problems using reduced number of controllers. Our framework learns the local task dynamics from naive experience and forms locally optimal infinite horizon Linear Quadratic Regulators which produce continuous low-level control. A top-level reinforcement learner uses the controllers as actions and learns how to best combine them in state space while maximising a long-term reward. A single optimal control objective function drives high-level symbolic learning by providing training signals on desirability of each selected controller. We show that a small number of locally optimal linear controllers are able to solve global nonlinear control problems with unknown dynamics when combined with a reinforcement learner in this hierarchical framework. Our algorithm competes in terms of computational cost and solution quality with sophisticated control algorithms and we illustrate this with solutions to benchmark problems.
Reinforcement learning applied to linear quadratic regulation." In
1993
Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con vergence proofs for problems with continuous state and action spaces, or for systems involving non-linear function approximators (such as multilayer perceptrons). This paper presents research applying DP-based reinforcement learning theory to Linear Quadratic Regulation (LQR), an important class of control problems involving continuous state and action spaces and requiring a simple type of non-linear function approximator. We describe an algorithm based on Q-Iearning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly different algorithm that is only local...
International Journal of Adaptive Control and Signal Processing
Conventional closed-form solution to the optimal control problem using optimal control theory is only available under the assumption that there are known system dynamics/models described as differential equations. Without such models, reinforcement learning (RL) as a candidate technique has been successfully applied to iteratively solve the optimal control problem for unknown or varying systems. For the optimal tracking control problem, existing RL techniques in the literature assume either the use of a predetermined feedforward input for the tracking control, restrictive assumptions on the reference model dynamics, or discounted tracking costs. Furthermore, by using discounted tracking costs, zero steady-state error cannot be guaranteed by the existing RL methods. This article therefore presents an optimal online RL tracking control framework for discrete-time (DT) systems, which does not impose any restrictive assumptions of the existing methods and equally guarantees zero steady-state tracking error. This is achieved by augmenting the original system dynamics with the integral of the error between the reference inputs and the tracked outputs for use in the online RL framework. It is further shown that the resulting value function for the DT linear quadratic tracker using the augmented formulation with integral control is also quadratic. This enables the development of Bellman equations, which use only the system measurements to solve the corresponding DT algebraic Riccati equation and obtain the optimal tracking control inputs online. Two RL strategies are thereafter proposed based on both the value function approximation and the Q-learning along with bounds on excitation for the convergence of the parameter estimates. Simulation case studies show the effectiveness of the proposed approach. K E Y W O R D S adaptive control, adaptive dynamic programming, optimal tracking control, Q-function approximation, reinforcement learning This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Gaussian Based Non-linear Function Approximation for Reinforcement Learning
SN Computer Science, 2021
Reinforcement learning (RL) problems with continuous states and discrete actions (CSDA) can be found in classic examples such as Cart Pole and Puck World, as well as real world applications such as Market Making. Solutions to CSDA problems typically involve a function approximation (FA) of the mapping from states to actions and can be linear or nonlinear. Linear FAs such as tile-coding (Sutton and Barto in Reinforcement learning, 2nd ed, 2009) suffer from state information loss due to state discretization, whilst non-linear FAs such as DQN (Mnih et al. in Playing atari with deep reinforcement learning,https://arxiv.org/abs/1312.5602, 2013) are practically infeasible in infinitely large state spaces due to their cubic time complexity ($$O(n^3)$$O(n3)). In this paper, we propose a novel, general solution to CSDA problems, called Gaussian distribution based non-linear function approximation (GBNLFA). Experimentation on three CSDA RL problems (Cart Pole, Puck World, Market Marking) demo...
Reinforcement learning applied to linear quadratic regulation
Advances in neural information processing systems, 1993
Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are signi cant gaps between practice and theory. In particular, there are no convergence proofs for problems with continuous state and action spaces, or for systems involving non-linear function approximators (such as multilayer perceptrons). This paper presents research applying DP-based reinforcement learning theory to Linear Quadratic Regulation (LQR), an important class of control problems involving continuous state and action spaces and requiring a simple type of non-linear function approximator. We describe an algorithm based on Q-learning that is proven to converge to the optimal controller for a large class of LQR problems. We also describe a slightly di erent algorithm that is only locally convergent to the optimal Q-function, demonstrating one of the possible pitfalls of using a non-linear function approximator with DP-based learning.
2019
Integral reinforcement learning (IRL) was proposed in literature to obviate the requirement of drift dynamics in adaptive dynamic programming framework. Most of the online IRL schemes in literature require two sets of neural network (NNs), known as actor-critic NN and an initial stabilizing controller. Recently, for RL-based robust tracking this requirement of initial stabilizing controller and dual-approximator structure could be obviated by using a modified gradient descent-based update law containing a stabilizing term with critic-only structure. To the best of the authors' knowledge, there has been no study on leveraging such stabilizing term in IRL algorithm framework to solve optimal trajectory tracking problems for continuous time nonlinear systems with actuator constraints. To this end a novel update law leveraging the stabilizing term along with variable gain gradient descent in IRL framework is presented in this paper. With these modifications, the IRL tracking control...
Combining Local and Global Direct Derivative-free Optimization for Reinforcement Learning
2012
Abstract: We consider the problem of optimization in policy space for reinforcement learning. While a plethora of methods have been applied to this problem, only a narrow category of them proved feasible in robotics. We consider the peculiar characteristics of reinforcement learning in robotics, and devise a combination of two algorithms from the literature of derivative-free optimization.
American Control Conference (ACC), 2023
We develop a policy iteration-based model-free reinforcement learning (RL) control for nonlinear systems with single input. First, Carleman linearization, a commonly used linearization technique in the Hilbert space, is applied to express the nonlinear system as an infinite-dimensional Carleman state-space model, followed by derivation of an online state-feedback RL controller using state and input data in this infinite-dimensional space. Next, the practicality of using any finite-order truncation of this controller, and the corresponding closed-loop stability of the nonlinear plant is established. Steps for online implementation of the nonlinear RL controller using both on-policy and off-policy algorithms are derived. Results are validated using two numerical examples, where we show how our proposed method provides solutions close to the optimal control resulting from the model-based Carleman controllers. We also compare our controller to alternative datadriven methods, and show its relative advantage in terms of shorter learning time.