An advanced reinforcement learning control method for quadruped robots in typical urban terrains (original) (raw)

Abstract

Quadruped robots, with their exceptional flexibility and stable structure, are highly suitable for traversing the complex unstructured terrains in urban environments. However, the current flexibility and stability of quadruped robots based on reinforcement learning are still not ideal in these terrains. To address this limitation, a large-scale parallel technology-based end-to-end teacher-student learning network framework is proposed, where the Gated Recurrent Unit achieves a potential estimation of the heights surrounding the robot. Meanwhile, by introducing an omnidirectional terrain learning curriculum, the robot can move in any commanded direction, achieving smooth output and tracking of motor joint angles. By utilizing state machines, the model trained from the simulation is deployed in the Unitree Go1 robot via zero-shot learning. Simulation and real-world experiments have demonstrated that this approach significantly enhances the robot’s adaptability and mobility across various urban terrains such as gravel, grass, slopes, and steps.

Access this article

Log in via an institution

Subscribe and save

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availability

Partial code is publicly available at https://github.com/dstx123/unitree_rl. Some demonstration videos showcasing the results can be accessed at https://dstx123.github.io/RL-control/. These resources are freely available for further research purposes.

Change history

The original online version of this article was revised due to change in order of affiliation.

A Correction to this paper has been published: https://doi.org/10.1007/s13042-025-02564-6

References

  1. Peng XB, Coumans E, Zhang T, Lee T-WE, Tan J, Levine S (2020) Learning agile robotic locomotion skills by imitating animals. In: Robotics: Science and Systems
  2. Nahrendra I, Oh M, Yu B, Lim H, Myung H (2023) Robust recovery motion control for quadrupedal robots via learned terrain imagination. arXiv preprint arXiv:2306.12712
  3. Bledt G, Powell MJ, Katz B, Di Carlo J, Wensing PM, Kim S (2018) Mit cheetah 3: Design and control of a robust, dynamic quadruped robot. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2245–2252
  4. Di Carlo J, Wensing PM, Katz B, Bledt G, Kim S (2018) Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 1–9
  5. Kim D, Di Carlo J, Katz B, Bledt G, Kim S (2019) Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. arXiv preprint arXiv:1909.06586
  6. Katz B, Di Carlo J, Kim S (2019) Mini cheetah: A platform for pushing the limits of dynamic quadruped control. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, 6295–6301
  7. Hoeller D, Rudin N, Sako D, Hutter M (2024) Anymal parkour: Learning agile navigation for quadrupedal robots. Sci Robot 9(88):7566
    Article Google Scholar
  8. Schneider L, Frey J, Miki T, Hutter M (2023) Learning risk-aware quadrupedal locomotion using distributional reinforcement learning. arXiv preprint arXiv:2309.14246
  9. Vollenweider E, Bjelonic M, Klemm V, Rudin N, Lee J, Hutter M (2023) Advanced skills through multiple adversarial motion priors in reinforcement learning. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 5120–5126
  10. Margolis GB, Yang G, Paigwar K, Chen T, Agrawal P (2022) Rapid locomotion via reinforcement learning. Int J Robot Res 43:572–587
    Article Google Scholar
  11. Yang R, Zhang M, Hansen N, Xu H, Wang X (2021) Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers. arXiv preprint arXiv:2107.03996
  12. Miki T, Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M (2022) Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci Robot 7(62):2822
    Article Google Scholar
  13. Loquercio A, Kumar A, Malik J (2023) Learning visual locomotion with cross-modal supervision. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 7295–7302
  14. Cheng X, Shi K, Agarwal A, Pathak D (2023) Extreme parkour with legged robots. arXiv preprint arXiv:2309.14341
  15. Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M (2020) Learning quadrupedal locomotion over challenging terrain. Sci Robot 5(47):5986
    Article Google Scholar
  16. Wu P, Escontrela A, Hafner D, Abbeel P, Goldberg K (2023) Daydreamer: World models for physical robot learning. In: Conference on Robot Learning, PMLR, 2226–2240
  17. Smith L, Kostrikov I, Levine S (2022) A walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning. arXiv preprint arXiv:2208.07860
  18. Nahrendra IMA, Yu B, Myung H (2023) Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 5078–5084
  19. Kumar A, Fu Z, Pathak D, Malik J (2021) Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034
  20. Margolis, G.B., Agrawal, P.: Walk these ways: Tuning robot control for generalization with multiplicity of behavior. In: Conference on Robot Learning, pp. 22–31 (2023). PMLR
  21. Wu J, Xin G, Qi C, Xue Y (2023) Learning robust and agile legged locomotion using adversarial motion priors. IEEE Robot Autom Lett 8:4975
    Article Google Scholar
  22. Rudin N, Hoeller D, Reist P, Hutter M (2022) Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning, PMLR, 91–100
  23. Yu W, Yang C, McGreavy C, Triantafyllidis E, Bellegarda G, Shafiee M, Ijspeert AJ, Li Z (2023) Identifying important sensory feedback for learning locomotion skills. Nat Mach Intell 5(8):919–932
    Article Google Scholar
  24. Schulman J, Wolski F, Dhariwal P, Radford A (2017) Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
  25. Long J, Wang Z, Li Q, Gao J, Cao L, Pang J (2023) Hybrid internal model: Learning agile legged locomotion with simulated robot response. ArXiv

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant No. U2013601), in part by Anhui Province Natural Science Funds for Distinguished Young Scholar (Grant No. 2308085J02), and in part by Innovation Leading Talent of Anhui Province TeZhi plan.

Author information

Author notes

  1. Chi Yan and Ning Wang: These authors contributed equally to this work.

Authors and Affiliations

  1. School of Information Science and Technology, University of Science and Technology of China, No. 96 Jinzhai Road, Hefei, 230026, Anhui, China
    Chi Yan, Hongbo Gao, Xinmiao Wang, Chao Tang & Lin Zhou
  2. School of Information and Security, Chongqing College of Mobile Communication, No. 36 Dengying Avenue, Qijiang District, Chongqing, 401520, Sichuan, China
    Ning Wang
  3. Institute of Advanced Technology, University of Science and Technology of China, No. 5089 Wangjiang West Road, Hefei, 230088, Anhui, China
    Hongbo Gao
  4. School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Nanyang, Singapore
    Hongbo Gao
  5. Zhejiang Lab, Kechuang Avenue, Zhongtai Sub-District, Hangzhou, 311121, Zhejiang, China
    Yuehua Li
  6. College of Control Science and Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
    Yue Wang

Authors

  1. Chi Yan
  2. Ning Wang
  3. Hongbo Gao
  4. Xinmiao Wang
  5. Chao Tang
  6. Lin Zhou
  7. Yuehua Li
  8. Yue Wang

Corresponding author

Correspondence toHongbo Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised due to change in order of affiliation.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yan, C., Wang, N., Gao, H. et al. An advanced reinforcement learning control method for quadruped robots in typical urban terrains.Int. J. Mach. Learn. & Cyber. 16, 3747–3757 (2025). https://doi.org/10.1007/s13042-024-02478-9

Download citation

Keywords