A sample efficient model-based deep reinforcement learning algorithm with experience replay for robot manipulation (original) (raw)
Related papers
A Survey on Deep Reinforcement Learning Algorithms for Robotic Manipulation
Sensors
Robotic manipulation challenges, such as grasping and object manipulation, have been tackled successfully with the help of deep reinforcement learning systems. We give an overview of the recent advances in deep reinforcement learning algorithms for robotic manipulation tasks in this review. We begin by outlining the fundamental ideas of reinforcement learning and the parts of a reinforcement learning system. The many deep reinforcement learning algorithms, such as value-based methods, policy-based methods, and actor–critic approaches, that have been suggested for robotic manipulation tasks are then covered. We also examine the numerous issues that have arisen when applying these algorithms to robotics tasks, as well as the various solutions that have been put forth to deal with these issues. Finally, we highlight several unsolved research issues and talk about possible future directions for the subject.
Australasian Conference on Robotics and Automation, 2023
Applications using Model-Free Reinforcement Learning (MFRL) have grown exponentially and have shown remarkable results in the last decade. The application of MFRL to robots shows significant promise for its capability to solve complex control problems, at least virtually or in simulation. Due to the practical challenges of training in a real-world environment, there is limited work bridging the gap to real physical robots. This article benchmarks the state-of-the-art MFRL algorithms training on an open-source robotic manipulation testbed consisting of a fully actuated, 4-Degrees of Freedom (DoF), two-fingered robot gripper to understand the limitations and challenges involved in real-world applications. Experimental analysis using two different statespace representations is presented to understand their impact on executing a dexterous manipulation task. The source code, the CAD files of the robotic manipulation testbed, and a handy guide on how to approach MFRL's application to real-world are provided to facilitate replication of the results and further experimentation by other research groups.
Robotic Arm Control and Task Training through Deep Reinforcement Learning
ArXiv, 2020
This paper proposes a detailed and extensive comparison of the Trust Region Policy Optimization and DeepQ-Network with Normalized Advantage Functions with respect to other state of the art algorithms, namely Deep Deterministic Policy Gradient and Vanilla Policy Gradient. Comparisons demonstrate that the former have better performances then the latter when asking robotic arms to accomplish manipulation tasks such as reaching a random target pose and pick &placing an object. Both simulated and real-world experiments are provided. Simulation lets us show the procedures that we adopted to precisely estimate the algorithms hyper-parameters and to correctly design good policies. Real-world experiments let show that our polices, if correctly trained on simulation, can be transferred and executed in a real environment with almost no changes.
2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020
Recent advances in deep reinforcement learning (RL) have demonstrated its potential to learn complex robotic manipulation tasks. However, RL still requires the robot to collect a large amount of real-world experience. To address this problem, recent works have proposed learning from expert demonstrations (LfD), particularly via inverse reinforcement learning (IRL), given its ability to achieve robust performance with only a small number of expert demonstrations. Nevertheless, deploying IRL on real robots is still challenging due to the large number of robot experiences it requires. This paper aims to address this scalability challenge with a robust, sample-efficient, and general meta-IRL algorithm, SQUIRL, that performs a new but related long-horizon task robustly given only a single video demonstration. First, this algorithm bootstraps the learning of a task encoder and a task-conditioned policy using behavioral cloning (BC). It then collects realrobot experiences and bypasses reward learning by directly recovering a Q-function from the combined robot and expert trajectories. Next, this algorithm uses the Q-function to reevaluate all cumulative experiences collected by the robot to improve the policy quickly. In the end, the policy performs more robustly (90%+ success) than BC on new tasks while requiring no trial-and-errors at test time. Finally, our real-robot and simulated experiments demonstrate our algorithm's generality across different state spaces, action spaces, and vision-based manipulation tasks, e.g., pick-pour-place and pick-carry-drop.
arXiv (Cornell University), 2022
Agents can base decisions made using reinforcement learning (RL) on a reward function. The selection of values for the learning algorithm parameters can, nevertheless, have a substantial impact on the overall learning process. In order to discover values for the learning parameters that are close to optimal, we extended our previously proposed genetic algorithm-based Deep Deterministic Policy Gradient and Hindsight Experience Replay approach (referred to as GA+DDPG+HER) in this study. On the robotic manipulation tasks of FetchReach, FetchSlide, FetchPush, FetchPick&Place, and DoorOpening, we applied the GA+DDPG+HER methodology. Our technique GA+DDPG+HER was also used in the AuboReach environment with a few adjustments. Our experimental analysis demonstrates that our method produces performance that is noticeably better and occurs faster than the original algorithm. We also offer proof that GA+DDPG+HER beat the current approaches. The final results support our assertion and offer sufficient proof that automating the parameter tuning procedure is crucial and does cut down learning time by as much as 57%.
Deep Reinforcement Learning for Contact-Rich Skills Using Compliant Movement Primitives
2020
In recent years, industrial robots have been installed in various industries to handle advanced manufacturing and high precision tasks. However, further integration of industrial robots is hampered by their limited flexibility, adaptability and decision making skills compared to human operators. Assembly tasks are especially challenging for robots since they are contact-rich and sensitive to even small uncertainties. While reinforcement learning (RL) offers a promising framework to learn contact-rich control policies from scratch, its applicability to high-dimensional continuous state-action spaces remains rather limited due to high brittleness and sample complexity. To address those issues, we propose different pruning methods that facilitate convergence and generalization. In particular, we divide the task into free and contact-rich sub-tasks, perform the control in Cartesian rather than joint space, and parameterize the control policy. Those pruning methods are naturally implemen...
Deep Reinforcement Learning for Large Scale Robotic Simulations
2019
Deep reinforcement learning enables algorithms to learn complex behavior, deal with continuous action spaces and find good strategies in environments with high dimensional state spaces. With deep reinforcement learning being an active area of research and many concurrent inventions, we decided to focus on a simple robotic task to evaluate a set of ideas that might help to solve recent reinforcement learning problems. The focus is on enabling distributed set up to execute and run experiments with the least amount of time and benefit from the available computational power. Another focus is on the transferability between different physics engines, where we experiment on how to use a trained agent from one environment into another different environment with a different physics engine. The purpose of this thesis is to unify the differences between reinforcement learning environments by implementing simple abstract classes between the selected environments which can be extended to support more environment. With this, the focus was on setting and enabling distribution for training to reduce the time of the experiment. We select two of the state of the art reinforcement learning methods to train, evaluate and test the distributed and transferability. The goal of this strategy is to reduce training time and eventually help the algorithms to scale, collect experiences, and train the agents effectively. The concluding evaluation and results prove the general applicability of the described concepts by testing them using selected environments. In our experiments, the effect of distribution was shown in the training time between the experiments. Furthermore, the last performed experiment we demonstrate how to use transfer learning and trained agents in a new learning environment. These concepts might be reused for future experiments.
Sim–Real Mapping of an Image-Based Robot Arm Controller Using Deep Reinforcement Learning
Applied Sciences
Models trained with Deep Reinforcement learning (DRL) have been deployed in various areas of robotics with varying degree of success. To overcome the limitations of data gathering in the real world, DRL training utilizes simulated environments and transfers the learned policy to real-world scenarios, i.e., sim–real transfer. Simulators fail to accurately capture the entire dynamics of the real world, so simulation-trained policies often fail when applied to reality, termed a reality gap (RG). In this paper, we propose a search (mapping) algorithm that takes in real-world observation (images) and maps them to the policy-equivalent images in the simulated environment using a convolution neural network (CNN) model. The two-step training process, DRL policy and a mapping model, overcomes the RG problem with simulated data only. We evaluated the proposed system with a gripping task of a custom-made robot arm in the real world and compared the performance against a conventional DRL sim–re...
Applied Sciences
While working side-by-side, humans and robots complete each other nowadays, and we may say that they work hand in hand. This study aims to evolve the grasping task by reaching the intended object based on deep reinforcement learning. Thereby, in this paper, we propose a deep deterministic policy gradient approach that can be applied to a numerous-degrees-of-freedom robotic arm towards autonomous objects grasping according to their classification and a given task. In this study, this approach is realized by a five-degrees-of-freedom robotic arm that reaches the targeted object using the inverse kinematics method. You Only Look Once v5 is employed for object detection, and backward projection is used to detect the three-dimensional position of the target. After computing the angles of the joints at the detected position by inverse kinematics, the robot’s arm is moved towards the target object’s emplacement thanks to the algorithm. Our approach provides a neural inverse kinematics solu...
Reinforcement learning based on movement primitives for contact tasks
Robotics and Computer-Integrated Manufacturing, 2020
Recently, robot learning through deep reinforcement learning has incorporated various robot tasks through deep neural networks, without using specific control or recognition algorithms. However, this learning method is difficult to apply to the contact tasks of a robot, due to the exertion of excessive force from the random search process of reinforcement learning. Therefore, when applying reinforcement learning to contact tasks, solving the contact problem using an existing force controller is necessary. A neural-network-based movement primitive (NNMP) that generates a continuous trajectory which can be transmitted to the force controller and learned through a deep deterministic policy gradient (DDPG) algorithm is proposed for this study. In addition, an imitation learning algorithm suitable for NNMP is proposed such that the trajectories similar to the demonstration trajectory are stably generated. The performance of the proposed algorithms was verified using a square peg-in-hole assembly task with a tolerance of 0.1 mm. The results confirm that the complicated assembly trajectory can be learned stably through NNMP by the proposed imitation learning algorithm, and that the assembly trajectory is improved by learning the proposed NNMP through the DDPG algorithm.