All Title Author
Keywords Abstract

Publish in OALib Journal
ISSN: 2333-9721
APC: Only $99


Relative Articles


A Comparison of PPO, TD3 and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation

DOI: 10.4236/jilsa.2023.151003, PP. 36-56

Keywords: Reinforcement Learning, Machine Learning, Markov Decision Process, Domain Randomization

Full-Text   Cite this paper   Add to My Lib


Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Reinforcement Algorithms, to mention a few, have been investigated for training robots to walk. However, conflicting performance results of these algorithms have been reported in the literature. In this work, we present the performance analysis of the above three state-of-the-art Deep Reinforcement algorithms for a constant velocity walking task on a quadruped. The performance is analyzed by simulating the walking task of a quadruped equipped with a range of sensors present on a physical quadruped robot. Simulations of the three algorithms across a range of sensor inputs and with domain randomization are performed. The strengths and weaknesses of each algorithm for the given task are discussed. We also identify a set of sensors that contribute to the best performance of each Deep Reinforcement algorithm.


[1]  Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V. and Hutter, M. (2019) Learning Agile and Dynamic Motor Skills for Legged Robots. Science Robotics, 4.
[2]  Biswal, P. and Mohanty, P.K. (2021) Development of Quadruped Walking Robots: A review. Ain Shams Engineering Journal, 12, 2017-2031.
[3]  Radford, N.A., Strawser, P., Hambuchen, K., Mehling, J.S., Verdeyen, W.K., Donnan, A.S., Holley, J., Sanchez, J., Nguyen, V., Bridgwater, L., Berka, R., Ambrose, R., Myles Markee, M., Fraser-Chanpong, N.J., McQuin, C., Yamokoski, J.D., Hart, S., Guo, R., Parsons, A., Wightman, B., Dinh, P., Ames, B., Blakely, C., Edmondson, C., Sommers, B., Rea, R., Tobler, C., Bibby, H., Howard, B., Niu, L., Lee, A., Conover, M., Truong, L., Reed, R., Chesney, D., Platt Jr., R., Johnson, G., Fok, C.-L., Paine, N., Sentis, L., Cousineau, E., Sinnet, R., Lack, J., Powell, M., Morris, B., Ames, A. and Akinyode, J. (2015) Valkyrie: Nasa’s First Bipedal Humanoid Robot. Journal of Field Robotics, 32, 397-419.
[4]  Bledt, G., Powell, M.J., Katz, B., Carlo, J.D., Wensing, P.M. and Kim, S. (2018) MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, 1-5 October 2018, 2245-2252.
[5]  Hutter, M., Gehring, C., Jud, D., Lauber, A., Bellicoso, C. D., Tsounis, V., Hwangbo, J., Bodie, K., Fankhauser, P., Bloesch, M., Diethelm, R., Bachmann, S., Melzer, A. and Hoepflinger, M. A. (2016) ANYmal—A Highly Mobile and Dynamic Quadrupedal Robot. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, Daejeon, 9-14 October 2016, 38-44.
[6]  Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction. 2nd Edition, The MIT Press, Cambridge.
[7]  Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O. (2017) Proximal Policy Optimization Algorithms. ArXiv: 1707.06347.
[8]  Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ArXiv: 1801.01290.
[9]  Fujimoto, S., van Hoof, H. and Meger, D. (2018) Addressing Function Approximation Error in Actor-Critic Methods. ArXiv: 1802.09477.
[10]  Schulman, J., Levine, S., Moritz, P., Jordan, M.I. and Abbeel, P. (2015) Trust Region Policy Optimization. ArXiv: 1502.05477.
[11]  Schulman, J., Moritz, P., Levine, S., Jordan, M.I. and Abbeel, P. (2016) High-Dimensional Continuous Control Using Generalized Advantage Estimation. 4th International Conference on Learning Representations, ICLR 2016, San Juan, 2-4 May 2016.
[12]  Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D. (2016) Continuous Control with Deep Reinforcement Learning. 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2-4 May 2016.
[13]  Todorov, E., Erez, T. and Tassa, Y. (2012) Mujoco: A Physics Engine for Model-Based Control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, 7-12 October 2012, 5026-5033.
[14]  Muratore, F., Gienger, M. and Peters, J. (2021) Assessing Transferability from Simulation to Reality for Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1172-1183.
[15]  Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S. and Vanhoucke, V. (2018) Sim-to-Real: Learning Agile Locomotion for Quadruped Robots. ArXiv: 1804.10332.
[16]  Zhao, W., Queralta, J.P. and Westerlund, T. (2020) Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. 2020 IEEE Symposium Series on Computational Intelligence, Canberra, 1-4 December 2020, 737-744.
[17]  Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P. (2017) Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, 24-28 September 2017, 23-30.
[18]  Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W. and Abbeel, P. (2017) Asymmetric Actor Critic for Image-Based Robot Learning. ArXiv: 1710.06542.
[19]  Rudin, N., Hoeller, D., Reist, P. and Hutter, M. (2021) Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. ArXiv: 2109.11978.
[20]  Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M. and Dormann, N. (2021) Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22, 1-8.


comments powered by Disqus

Contact Us


WhatsApp +8615387084133

WeChat 1538708413