Learning Stable Task Sequences from Demonstration with Linear Parameter Varying Systems and Hidden Markov Models (original) (raw)
Related papers
Learning Sequential Tasks Interactively from Demonstrations and Own Experience
Intelligent Robots and Systems, IEEE/RSJ International Conference on , 2013
Deploying robots to our day-to-day life requires them to have the ability to learn from their environment in order to acquire new task knowledge and to flexibly adapt existing skills to various situations. For typical real-world tasks, it is not sufficient to endow robots with a set of primitive actions. Rather, they need to learn how to sequence these in order to achieve a desired effect on their environment. In this paper, we propose an intuitive learning method for a robot to acquire sequences of motions by combining learning from human demonstrations and reinforcement learning. In every situation, our approach treats both ways of learning as alternative control flows to optimally exploit their strengths without inheriting their shortcomings. Using a Gaussian Process approximation of the state-action sequence value function, our approach generalizes values observed from demonstrated and autonomously generated action sequences to unknown inputs. This approximation is based on a kernel we designed to account for different representations of tasks and action sequences as well as inputs of variable length. From the expected deviation of value estimates, we devise a greedy exploration policy following a Bayesian optimization criterion that quickly converges learning to promising action sequences while protecting the robot from sequences with unpredictable outcome. We demonstrate the ability of our approach to efficiently learn appropriate action sequences in various situations on a manipulation task involving stacked boxes.
Learning and generalization of complex tasks from unstructured demonstrations
2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012
We present a novel method for segmenting demonstrations, recognizing repeated skills, and generalizing complex tasks from unstructured demonstrations. This method combines many of the advantages of recent automatic segmentation methods for learning from demonstration into a single principled, integrated framework. Specifically, we use the Beta Process Autoregressive Hidden Markov Model and Dynamic Movement Primitives to learn and generalize a multi-step task on the PR2 mobile manipulator and to demonstrate the potential of our framework to learn a large library of skills over time.
Learning Variable-Length Markov Models of Behavior
Computer Vision and Image Understanding, 2001
In recent years there h a s b een an increased i n t e r est in the modelling and recognition of human activities involving highly structured and semantically rich behaviour such as dance, aerobics, and sign language. A novel approa c h i s p r esented for automatically acquiring stochastic models of the high-level structure o f an activity without the assumption of any prior knowledge. The process involves temporal segmentation into plausible atomic behaviour components and the use of variable length Markov models for the e cient representation of behaviours. Experimental results are p r esented which demonstrate the synthesis of realistic sample behaviours and the performance o f m o dels for long-term temporal prediction.
Learning Continuous State/Action Models for Humanoid Robots
The Florida AI Research Society, 2016
Reinforcement learning (RL) is a popular choice for solving robotic control problems. However, applying RL techniques to controlling humanoid robots with high degrees of freedom remains problematic due to the difficulty of acquiring sufficient training data. The problem is compounded by the fact that most real-world problems involve continuous states and actions. In order for RL to be scalable to these situations it is crucial that the algorithm be sample efficient. Model-based methods tend to be more data efficient than model-free approaches and have the added advantage that a single model can generalize to multiple control problems. This paper proposes a model approximation algorithm for continuous states and actions that integrates case-based reasoning (CBR) and Hidden Markov Models (HMM) to generalize from a small set of state instances. The paper demonstrates that the performance of the learned model is close to that of the system dynamics it approximates, where performance is measured in terms of sampling error.
Learning multiple behaviors from unlabeled demonstrations in a latent controller space
In this paper we introduce a method to learn multiple behaviors in the form of motor primitives from an unlabeled dataset. One of the difficulties of this problem is that in the measurement space, behaviors can be very mixed, despite existing a latent representation where they can be easily separated. We propose a mixture model based on a Dirichlet Process (DP) to simultaneously cluster the observed time-series and recover a sparse representation of the behaviors using a Laplacian prior as the base measure of the DP. We show that for linear models, e.g potential functions generated by linear combinations of a large number of features, it is possible to compute analytically the marginal of the observations and derive an efficient sampler. The method is evaluated using robot behaviors and real data from human motion and compared to other techniques.
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2000
The main objective of this paper is to develop an efficient method for learning and reproduction of complex trajectories for robot programming by demonstration. Encoding of the demonstrated trajectories is performed with hidden Markov model, and generation of a generalized trajectory is achieved by using the concept of key points. Identification of the key points is based on significant changes in position and velocity in the demonstrated trajectories. The resulting sequences of trajectory key points are temporally aligned using the multidimensional dynamic time warping algorithm, and a generalized trajectory is obtained by smoothing spline interpolation of the clustered key points. The principal advantage of our proposed approach is utilization of the trajectory key points from all demonstrations for generation of a generalized trajectory. In addition, variability of the key points' clusters across the demonstrated set is employed for assigning weighting coefficients, resulting in a generalization procedure which accounts for the relevance of reproduction of different parts of the trajectories. The approach is verified experimentally for trajectories with two different levels of complexity.
CSR-08-02: Learning Goal-Based Motion Sequences Object Manipulation
Recognition of object manipulation requires reasoning about the agent's intentions and how the environmental context and object features relate to the way an object is manipulated. As such, we argue that simply encoding motion patterns with hidden Markov models (HMM) is not sufficient. We propose a factorized-HMM that realistically models how the agent's intention evolves and how the movements are adapted to the evolving goals and surrounding situations.
IEEE transactions on cybernetics, 2015
This paper presents an approach for learning robust models of humanoid robot trajectories from demonstration. In this formulation, a model of the joint space trajectory is represented as a sequence of motion primitives where a nonlinear dynamical system is learned by constructing a hidden Markov model (HMM) predicting the probability of residing in each motion primitive. With a coordinated mixture of factor analyzers as the emission probability density of the HMM, we are able to synthesize motion from a dynamic system acting along a manifold shared by both demonstrator and robot. This provides significant advantages in model complexity for kinematically redundant robots and can reduce the number of corresponding observations required for further learning. A stability analysis shows that the system is robust to deviations from the expected trajectory as well as transitional motion between manifolds. This approach is demonstrated experimentally by recording human motion with inertial ...
Interactive Learning of Temporal Features for Control
2020
The ongoing industry revolution is demanding more flexible products, including robots in household environments or medium scale factories. Such robots should be able to adapt to new conditions and environments, and to be programmed with ease. As an example, let us suppose that there are robot manipulators working in an industrial production line that need to perform a new task. If these robots were hard coded, it could take days to adapt them to the new settings, which would stop the production of the factory. Easily programmable robots by non-expert humans would speed up this process considerably. In this regard, we present a framework in which robots are capable to quickly learn new control policies and state representations, by using occasional corrective human feedback. To achieve this, we focus on interactively learning these policies from non-expert humans that act as teachers. We present a Neural Network (NN) architecture, along with an Interactive Imitation Learning (IIL) me...