Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization (original) (raw)
Related papers
Generalized Learning to Create an Energy Efficient ZMP-Based Walking
Lecture Notes in Computer Science, 2015
In biped locomotion, the energy minimization problem is a challenging topic. This problem cannot be solved analytically since modeling the whole robot dynamics is intractable. Using the inverted pendulum model, researchers have defined the Zero Moment Point (ZMP) target trajectory and derived the corresponding Center of Mass (CoM) motion trajectory, which enables a robot to walk stably. A changing vertical CoM position has proved to be crucial factor in reducing mechanical energy costs and generating an energy efficient walk [1]. The use of Covariance Matrix Adaptation Evolution Strategy (CMA-ES) on a Fourier basis representation, which models the vertical CoM trajectory, is investigated in this paper to achieve energy efficient walk with specific step length and period. The results show that different step lengths and step periods lead to different learned energy efficient vertical CoM trajectories. For the first time, a generalization approach is used to generalize the learned results, by using a programmable Central Pattern Generator (CPG) on the learned results. Online modulation of the trajectory is performed while the robot changes its walking speed using the CPG dynamics. This approach is implemented and evaluated on the simulated and real NAO robot.
Using a controller based on reinforcement learning for a passive dynamic walking robot
2005
One of the difficulties with passive dynamic walking is the stability of walking. In our robot, small uneven or tilted parts of the floor disturb the locomotion and must be dealt with by the feedback controller of the hip actuation mechanism. This paper presents a solution to the problem in the form of controller that is based on reinforcement learning. The control mechanism is studied using a simulation model that is based on a mechanical prototype of passive dynamic walking robot with a conventional feedback controller. The successful walking results of our simulated walking robot with a controller based on reinforcement learning showed that in addition to the prime principle of our mechanical prototype, new possibilities such as optimization towards various goals like maximum speed and minimal cost of transport, and adaptation to unknown situations can be quickly found.
A learning approach to optimize walking cycle of a passivity-based biped robot
… Automation Robotics & …
A learning mechanism based on Powell's optimization algorithm is proposed to optimize walking behavior of a passivity based biped robot. To this end, a passivity-based biped robot has been simulated in MSC ADAMS and a control policy inspired from humanoid walking is adopted for a stable walking of the robot. Linear controllers try to control the joints of robot in each walking phase to implement the gait proposed by the control policy. Learning is employed using Powell's optimization algorithm to adjust the control parameters so that the robot enters to an optimum limit cycle in a finite time. The fitness function is defined to evaluate the robot's optimum behavior. The results are verified by simulations in SIMULINK+ADAMS.
2007
The biped Lucy, powered by pleated pneumatic artificial muscles, has been built and controlled and is able to walk up to a speed of 0.15m/s. The pressures inside the muscles are controlled by a joint trajectory tracking controller to track the desired joint trajectories calculated by a trajectory generator. However, the actuators are set to a fixed stiffness value. In this paper a compliance controller is proposed which can be added in the control architecture to reduce the energy consumption by exploiting the natural dynamics. The goal of this research is to preserve the versatility of actively controlled humanoids, while reducing their energy consumption. A mathematical formulation has been developed to find an optimal stiffness setting depending on the desired trajectory and physical properties of the system and the proposed strategy has been validated on a pendulum structure powered by artificial muscles. This strategy has not been implemented on the real robot because the walking speed of the robot is currently too slow to benefit already from compliance control.
Power Usage Reduction of Humanoid Standing Process Using Q-Learning
Lecture Notes in Computer Science, 2015
An important area of research in humanoid robots is energy consumption, as it limits autonomy, and can harm task performance. This work focuses on power aware motion planning. Its principal aim is to find joint trajectories to allow for a humanoid robot to go from crouch to stand position while minimizing power consumption. Q-Learning (QL) is used to search for optimal joint paths subject to angular position and torque restrictions. A planar model of the humanoid is used, which interacts with QL during a simulated offline learning phase. The best joint trajectories found during learning are then executed by a physical humanoid robot, the Aldebaran NAO. Position, velocity, acceleration, and current of the humanoid system are measured to evaluate energy, mechanical power, and Center of Mass (CoM) in order to estimate the performance of the new trajectory which yield a considerable reduction in power consumption.
Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot
The International Journal of Robotics Research, 2008
In this paper we describe a learning framework for a central pattern generator (CPG)-based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve CPG-based biped walking with a 3D hardware humanoid and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feedback controller can be acquired within a few thousand trials by numerical simulations and the controller obtained in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluate the walking velocity and stability. The results suggest that the learning algorithm is capable of adapting to environmental changes. Furthermore, we present an online learning scheme with an initial policy for a hardware robot to improve the controller within 200 iterations.
Machine Optimisation of Dynamic Gait Parameters for Bipedal Walking
This paper describes a compact gait generator that runs on the on-board Atmega microcontroller of the Robotis Bioloid robot. An overview of the parameters that effect dynamic gait is included along with a discussion of how these parameters are implemented in a servo skeleton. The paper reports on the optimisation of the gait parameters using a variant of natural actor-critic method and demonstrates that this learning technique is applicable in a real context. The quality of a parameter set is evaluated based on minimising the time for the robot to travel one meter. The learned gait parameters result in a faster robot than the hand optimised parameters when the learning algorithm has a reasonable initialisation.
Robust biped locomotion using deep reinforcement learning on top of an analytical control approach
Robotics and Autonomous Systems, 2021
This paper proposes a modular framework to generate robust biped locomotion using a tight coupling between an analytical walking approach and deep reinforcement learning. This framework is composed of six main modules which are hierarchically connected to reduce the overall complexity and increase its flexibility. The core of this framework is a specific dynamics model which abstracts a humanoid's dynamics model into two masses for modeling upper and lower body. This dynamics model is used to design an adaptive reference trajectories planner and an optimal controller which are fully parametric. Furthermore, a learning framework is developed based on Genetic Algorithm (GA) and Proximal Policy Optimization (PPO) to find the optimum parameters and to learn how to improve the stability of the robot by moving the arms and changing its center of mass (COM) height. A set of simulations are performed to validate the performance of the framework using the official RoboCup 3D League simulation environment. The results validate the performance of the framework, not only in creating a fast and stable gait but also in learning to improve the upper body efficiency.
Dynamic Control Algorithm for Biped Walking Based on Policy Gradient Fuzzy Reinforcement Learning
Proceedings of the 17th IFAC World Congress, 2008, 2008
This paper presents a novel dynamic control approach to acquire biped walking of humanoid robots focussed on policy gradient reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: conventional computed torque controller including impact-force controller and reinforcement learning computed torque controller. Reinforcement learning part includes fuzzy information about Zero-Moment Point errors. To demonstrate the effectiveness of our method, we apply it in simulation to the learning of a biped walking.