Optimal Control of Nonlinear Systems Using Experience Inference Human-Behavior Learning (original) (raw)
Related papers
Combining Reinforcement Learning And Optimal Control For The Control Of Nonlinear Dynamical Systems
PhD Thesis, 2016
This thesis presents a novel hierarchical learning framework, Reinforcement Learning Optimal Control, for controlling nonlinear dynamical systems with continuous states and actions. The adapted approach mimics the neural computations that allow our brain to bridge across the divide between symbolic action-selection and low-level actuation control by operating at two levels of abstraction. First, current findings demonstrate that at the level of limb coordination human behaviour is explained by linear optimal feedback control theory, where cost functions match energy and timing constraints of tasks. Second, humans learn cognitive tasks involving learning symbolic level action selection, in terms of both model-free and model-based reinforcement learning algorithms. We postulate that the ease with which humans learn complex nonlinear tasks arises from combining these two levels of abstraction. The Reinforcement Learning Optimal Control framework learns the local task dynamics from naive experience using an expectation maximization algorithm for estimation of linear dynamical systems and forms locally optimal Linear Quadratic Regulators, producing continuous low-level control. A high-level reinforcement learning agent uses these available controllers as actions and learns how to combine them in state space, while maximizing a long term reward. The optimal control costs form training signals for high-level symbolic learner. The algorithm demonstrates that a small number of locally optimal linear controllers can be combined in a smart way to solve global nonlinear control problems and forms a proof-of-principle to how the brain may bridge the divide between low-level continuous control and high-level symbolic action selection. It competes in terms of computational cost and solution quality with state-of-the-art control, which is illustrated with solutions to benchmark problems.
Using natUral decision methods to design optimal adaptive controllers
T his article describes the use of principles of reinforcement learning to design feedback controllers for discrete-and continuous-time dynamical systems that combine features of adaptive control and optimal control. Adaptive control [1], and optimal control [3] represent different philosophies for designing feedback controllers. Optimal controllers are normally designed offline by solving Hamilton-Jacobi-Bellman (HJB) equations, for example, the Riccati equation, using complete knowledge of the system dynamics. Determining optimal control policies for nonlinear systems 1066-033X/12/$31.00©2012ieee december 2012 « IEEE CONTROL SYSTEMS MAGAZINE 77 requires the offline solution of nonlinear HJB equations, which are often difficult or impossible to solve. By contrast, adaptive controllers learn online to control unknown systems using data measured in real time along the system trajectories. Adaptive controllers are not usually designed to be optimal in the sense of minimizing user-prescribed performance functions. Indirect adaptive controllers use system identification techniques to first identify the system parameters and then use the obtained model to solve optimal design equations [1]. Adaptive controllers may satisfy certain inverse optimality conditions .
RLOC, 2019
Nonlinear optimal control problems are often solved with numerical methods that require knowledge of system's dynamics which may be difficult to infer, and that carry a large computational cost associated with iterative calculations. We present a novel neurobiologically inspired hierarchical learning framework, Reinforcement Learning Optimal Control, which operates on two levels of abstraction and utilises a reduced number of controllers to solve nonlinear systems with unknown dynamics in continuous state and action spaces. Our approach is inspired by research at two levels of abstraction: first, at the level of limb coordination human behaviour is explained by linear optimal feedback control theory. Second, in cognitive tasks involving learning symbolic level action selection, humans learn such problems using model-free and model-based reinforcement learning algorithms. We propose that combining these two levels of abstraction leads to a fast global solution of nonlinear control problems using reduced number of controllers. Our framework learns the local task dynamics from naive experience and forms locally optimal infinite horizon Linear Quadratic Regulators which produce continuous low-level control. A top-level reinforcement learner uses the controllers as actions and learns how to best combine them in state space while maximising a long-term reward. A single optimal control objective function drives high-level symbolic learning by providing training signals on desirability of each selected controller. We show that a small number of locally optimal linear controllers are able to solve global nonlinear control problems with unknown dynamics when combined with a reinforcement learner in this hierarchical framework. Our algorithm competes in terms of computational cost and solution quality with sophisticated control algorithms and we illustrate this with solutions to benchmark problems.
Adaptive dynamic programming approach to experience-based systems identification and control
Neural Networks, 2009
Humans have the ability to make use of experience while selecting their control actions for distinct and changing situations, and their process speeds up and have enhanced effectiveness as more experience is gained. In contrast, current technological implementations slow down as more knowledge is stored. A novel way of employing Approximate (or Adaptive) Dynamic Programming (ADP) is described that shifts the underlying Adaptive Critic type of Reinforcement Learning method ''up a level'', away from designing individual (optimal) controllers to that of developing on-line algorithms that efficiently and effectively select designs from a repository of existing controller solutions (perhaps previously developed via application of ADP methods). The resulting approach is called Higher-Level Learning Algorithm. The approach and its rationale are described and some examples of its application are given. The notions of context and context discernment are important to understanding the human abilities noted above. These are first defined, in a manner appropriate to controls and system-identification, and as a foundation relating to the application arena, a historical view of the various phases during development of the controls field is given, organized by how the notion 'context' was, or was not, involved in each phase.
Reinforcement learning and optimal adaptive control: An overview and implementation examples
Annual Reviews in Control, 2012
This paper provides an overview of the reinforcement learning and optimal adaptive control literature and its application to robotics. Reinforcement learning is bridging the gap between traditional optimal control, adaptive control and bio-inspired learning techniques borrowed from animals. This work is highlighting some of the key techniques presented by well known researchers from the combined areas of reinforcement learning and optimal control theory. At the end, an example of an implementation of a novel model-free Q-learning based discrete optimal adaptive controller for a humanoid robot arm is presented. The controller uses a novel adaptive dynamic programming (ADP) reinforcement learning (RL) approach to develop an optimal policy on-line. The RL joint space tracking controller was implemented for two links (shoulder flexion and elbow flexion joints) of the arm of the humanoid Bristol-Elumotion-Robotic-Torso II (BERT II) torso. The constrained case (joint limits) of the RL scheme was tested for a single link (elbow flexion) of the BERT II arm by modifying the cost function to deal with the extra nonlinearity due to the joint constraints.
1998
The increasingly complex demands placed on control systems have resulted in a need for intelligent control, an approach that attempts to meet these demands by emulating the capabilities found in biological systems. The need to exploit existing knowledge is a desirable feature of any intelligent control system, and this leads to the relearning problem. The problem arises when a control system is required to effectively learn new knowledge whilst exploiting still useful knowledge from past experiences. This thesis describes the adaptive critic system using reinforcement learning, a computational framework that can effectively address many of the demands in intelligent control, but is less effective when it comes to addressing the relearning problem. The thesis argues that biological mechanisms of reinforcement learning (and relearning) may provide inspiration for developing artificial intelligent control mechanisms that can better address the relearning problem. A conceptual model of ...
Reinforcement Learning Behavioral Control for Nonlinear Autonomous System
IEEE/CAA Journal of Automatica Sinica, 2022
Behavior-based autonomous systems rely on human intelligence to resolve multi-mission conflicts by designing mission priority rules and nonlinear controllers. In this work, a novel twolayer reinforcement learning behavioral control (RLBC) method is proposed to reduce such dependence by trial-and-error learning. Specifically, in the upper layer, a reinforcement learning mission supervisor (RLMS) is designed to learn the optimal mission priority. Compared with existing mission supervisors, the RLMS improves the dynamic performance of mission priority adjustment by maximizing cumulative rewards and reducing hardware storage demand when using neural networks. In the lower layer, a reinforcement learning controller (RLC) is designed to learn the optimal control policy. Compared with existing behavioral controllers, the RLC reduces the control cost of mission priority adjustment by balancing control performance and consumption. All error signals are proved to be semi-globally uniformly ultimately bounded (SGUUB). Simulation results show that the number of mission priority adjustment and the control cost are significantly reduced compared to some existing mission supervisors and behavioral controllers, respectively.
Training the human adaptive controller
Proceedings of the Institution of Electrical Engineers, 1968
The training of human operators for skilled tasks may be regarded as the synthesis of a specific controller from a general-purpose adaptive device, by influencing its adaption through selection and variation of the learning environment. Selection of environments to maximise the rate of learning is itself a control problem, and an automatic feedback training system is proposed which feeds back information about the operator's performance to control the parameters of his environment. The stability and performance of the trainer have been investigated both theoretically and experimentally, and its utility has been tested in a fairly realistic training situation using as trainees both human operators and computer-simulated learning machines.
Stochastic Control Strategies and Adaptive Critic Methods
Adaptive critic methods have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Since they approximate the dynamic programming solutions, they are potentially suitable for learning in noisy, nonlinear and nonstationary environments. In this study, a novel probabilistic dual heuristic programming (DHP) based adaptive critic controller is proposed. Distinct to current approaches, the proposed probabilistic (DHP) adaptive critic method takes uncertainties of forward model and inverse controller into consideration. Therefore, it is suitable for deterministic and stochastic control problems characterized by functional uncertainty. Theoretical development of the proposed method is validated by analytically evaluating the correct value of the cost function which satisfies the Bellman equation in a linear quadratic control problem. The target value of the critic network is then calculated and shown to be equal to the analytically derived correct value.
Evolution of adaptive learning for nonlinear dynamic systems: a systematic survey
Intelligence & Robotics, 2022
The extreme nonlinearity of robotic systems renders the control design step harder. The consideration of adaptive control in robotic manipulation started in the 1970s. However, in the presence of bounded disturbances, the limitations of adaptive control rise considerably, which led researchers to exploit some "algorithm modifications". Unfortunately, these modifications often require a priori knowledge of bounds on the parameters and the perturbations and noise. In the 1990s, the field of Artificial Neural Networks was hugely investigated in general, and for control of dynamical systems in particular. Several types of Neural Networks (NNs) appear to be promising candidates for control system applications. In robotics, it all boils down to making the actuator perform the desired action. While purely control-based robots use the system model to define their input-output relations, Artificial Intelligence (AI)-based robots may or may not use the system model and rather manipulate the robot based on the experience they have with the system while training or possibly enhance it in real-time as well. In this paper, after discussing the drawbacks of adaptive control with bounded disturbances and the proposed modifications to overcome these limitations, we focus on presenting the work that implemented AI in nonlinear dynamical systems and particularly in robotics. We cite some work that targeted the inverted pendulum control problem using NNs. Finally, we emphasize the previous research concerning RL and Deep RL-based control problems and their implementation in robotics manipulation, while highlighting some of their major drawbacks in the field.