An advanced reinforcement learning control method for quadruped robots in typical urban terrains (original) (raw)
Abstract
Quadruped robots, with their exceptional flexibility and stable structure, are highly suitable for traversing the complex unstructured terrains in urban environments. However, the current flexibility and stability of quadruped robots based on reinforcement learning are still not ideal in these terrains. To address this limitation, a large-scale parallel technology-based end-to-end teacher-student learning network framework is proposed, where the Gated Recurrent Unit achieves a potential estimation of the heights surrounding the robot. Meanwhile, by introducing an omnidirectional terrain learning curriculum, the robot can move in any commanded direction, achieving smooth output and tracking of motor joint angles. By utilizing state machines, the model trained from the simulation is deployed in the Unitree Go1 robot via zero-shot learning. Simulation and real-world experiments have demonstrated that this approach significantly enhances the robot’s adaptability and mobility across various urban terrains such as gravel, grass, slopes, and steps.
Access this article
Subscribe and save
- Starting from 10 chapters or articles per month
- Access and download chapters and articles from more than 300k books and 2,500 journals
- Cancel anytime View plans
Buy Now
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Similar content being viewed by others
Data availability
Partial code is publicly available at https://github.com/dstx123/unitree_rl. Some demonstration videos showcasing the results can be accessed at https://dstx123.github.io/RL-control/. These resources are freely available for further research purposes.
Change history
30 January 2025
The original online version of this article was revised due to change in order of affiliation.
18 February 2025
A Correction to this paper has been published: https://doi.org/10.1007/s13042-025-02564-6
References
- Peng XB, Coumans E, Zhang T, Lee T-WE, Tan J, Levine S (2020) Learning agile robotic locomotion skills by imitating animals. In: Robotics: Science and Systems
- Nahrendra I, Oh M, Yu B, Lim H, Myung H (2023) Robust recovery motion control for quadrupedal robots via learned terrain imagination. arXiv preprint arXiv:2306.12712
- Bledt G, Powell MJ, Katz B, Di Carlo J, Wensing PM, Kim S (2018) Mit cheetah 3: Design and control of a robust, dynamic quadruped robot. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2245–2252
- Di Carlo J, Wensing PM, Katz B, Bledt G, Kim S (2018) Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 1–9
- Kim D, Di Carlo J, Katz B, Bledt G, Kim S (2019) Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. arXiv preprint arXiv:1909.06586
- Katz B, Di Carlo J, Kim S (2019) Mini cheetah: A platform for pushing the limits of dynamic quadruped control. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, 6295–6301
- Hoeller D, Rudin N, Sako D, Hutter M (2024) Anymal parkour: Learning agile navigation for quadrupedal robots. Sci Robot 9(88):7566
Article Google Scholar - Schneider L, Frey J, Miki T, Hutter M (2023) Learning risk-aware quadrupedal locomotion using distributional reinforcement learning. arXiv preprint arXiv:2309.14246
- Vollenweider E, Bjelonic M, Klemm V, Rudin N, Lee J, Hutter M (2023) Advanced skills through multiple adversarial motion priors in reinforcement learning. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 5120–5126
- Margolis GB, Yang G, Paigwar K, Chen T, Agrawal P (2022) Rapid locomotion via reinforcement learning. Int J Robot Res 43:572–587
Article Google Scholar - Yang R, Zhang M, Hansen N, Xu H, Wang X (2021) Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers. arXiv preprint arXiv:2107.03996
- Miki T, Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M (2022) Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci Robot 7(62):2822
Article Google Scholar - Loquercio A, Kumar A, Malik J (2023) Learning visual locomotion with cross-modal supervision. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 7295–7302
- Cheng X, Shi K, Agarwal A, Pathak D (2023) Extreme parkour with legged robots. arXiv preprint arXiv:2309.14341
- Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M (2020) Learning quadrupedal locomotion over challenging terrain. Sci Robot 5(47):5986
Article Google Scholar - Wu P, Escontrela A, Hafner D, Abbeel P, Goldberg K (2023) Daydreamer: World models for physical robot learning. In: Conference on Robot Learning, PMLR, 2226–2240
- Smith L, Kostrikov I, Levine S (2022) A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. arXiv preprint arXiv:2208.07860
- Nahrendra IMA, Yu B, Myung H (2023) Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 5078–5084
- Kumar A, Fu Z, Pathak D, Malik J (2021) Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034
- Margolis, G.B., Agrawal, P.: Walk these ways: Tuning robot control for generalization with multiplicity of behavior. In: Conference on Robot Learning, pp. 22–31 (2023). PMLR
- Wu J, Xin G, Qi C, Xue Y (2023) Learning robust and agile legged locomotion using adversarial motion priors. IEEE Robot Autom Lett 8:4975
Article Google Scholar - Rudin N, Hoeller D, Reist P, Hutter M (2022) Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning, PMLR, 91–100
- Yu W, Yang C, McGreavy C, Triantafyllidis E, Bellegarda G, Shafiee M, Ijspeert AJ, Li Z (2023) Identifying important sensory feedback for learning locomotion skills. Nat Mach Intell 5(8):919–932
Article Google Scholar - Schulman J, Wolski F, Dhariwal P, Radford A (2017) Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
- Long J, Wang Z, Li Q, Gao J, Cao L, Pang J (2023) Hybrid internal model: Learning agile legged locomotion with simulated robot response. ArXiv
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Grant No. U2013601), in part by Anhui Province Natural Science Funds for Distinguished Young Scholar (Grant No. 2308085J02), and in part by Innovation Leading Talent of Anhui Province TeZhi plan.
Author information
Author notes
- Chi Yan and Ning Wang: These authors contributed equally to this work.
Authors and Affiliations
- School of Information Science and Technology, University of Science and Technology of China, No. 96 Jinzhai Road, Hefei, 230026, Anhui, China
Chi Yan, Hongbo Gao, Xinmiao Wang, Chao Tang & Lin Zhou - School of Information and Security, Chongqing College of Mobile Communication, No. 36 Dengying Avenue, Qijiang District, Chongqing, 401520, Sichuan, China
Ning Wang - Institute of Advanced Technology, University of Science and Technology of China, No. 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
Hongbo Gao - School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Nanyang, Singapore
Hongbo Gao - Zhejiang Lab, Kechuang Avenue, Zhongtai Sub-District, Hangzhou, 311121, Zhejiang, China
Yuehua Li - College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
Yue Wang
Authors
- Chi Yan
- Ning Wang
- Hongbo Gao
- Xinmiao Wang
- Chao Tang
- Lin Zhou
- Yuehua Li
- Yue Wang
Corresponding author
Correspondence toHongbo Gao.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised due to change in order of affiliation.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yan, C., Wang, N., Gao, H. et al. An advanced reinforcement learning control method for quadruped robots in typical urban terrains.Int. J. Mach. Learn. & Cyber. 16, 3747–3757 (2025). https://doi.org/10.1007/s13042-024-02478-9
- Received: 09 April 2024
- Accepted: 21 November 2024
- Published: 03 December 2024
- Version of record: 03 December 2024
- Issue date: June 2025
- DOI: https://doi.org/10.1007/s13042-024-02478-9