Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Reinforcement Algorithms, to mention a few, have been investigated for training robots to walk. However, conflicting performance results of these algorithms have been reported in the literature. In this work, we present the performance analysis of the above three state-of-the-art Deep Reinforcement algorithms for a constant velocity walking task on a quadruped. The performance is analyzed by simulating the walking task of a quadruped equipped with a range of sensors present on a physical quadruped robot. Simulations of the three algorithms across a range of sensor inputs and with domain randomization are performed. The strengths and weaknesses of each algorithm for the given task are discussed. We also identify a set of sensors that contribute to the best performance of each Deep Reinforcement algorithm.
References
[1]
Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V. and Hutter, M. (2019) Learning Agile and Dynamic Motor Skills for Legged Robots. Science Robotics, 4. http://arxiv.org/abs/1901.08652 https://doi.org/10.1126/scirobotics.aau5872
[2]
Biswal, P. and Mohanty, P.K. (2021) Development of Quadruped Walking Robots: A review. Ain Shams Engineering Journal, 12, 2017-2031. https://doi.org/10.1016/j.asej.2020.11.005 https://www.sciencedirect.com/science/article/pii/S2090447920302501
[3]
Radford, N.A., Strawser, P., Hambuchen, K., Mehling, J.S., Verdeyen, W.K., Donnan, A.S., Holley, J., Sanchez, J., Nguyen, V., Bridgwater, L., Berka, R., Ambrose, R., Myles Markee, M., Fraser-Chanpong, N.J., McQuin, C., Yamokoski, J.D., Hart, S., Guo, R., Parsons, A., Wightman, B., Dinh, P., Ames, B., Blakely, C., Edmondson, C., Sommers, B., Rea, R., Tobler, C., Bibby, H., Howard, B., Niu, L., Lee, A., Conover, M., Truong, L., Reed, R., Chesney, D., Platt Jr., R., Johnson, G., Fok, C.-L., Paine, N., Sentis, L., Cousineau, E., Sinnet, R., Lack, J., Powell, M., Morris, B., Ames, A. and Akinyode, J. (2015) Valkyrie: Nasa’s First Bipedal Humanoid Robot. Journal of Field Robotics, 32, 397-419. https://doi.org/10.1002/rob.21560 https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.21560
[4]
Bledt, G., Powell, M.J., Katz, B., Carlo, J.D., Wensing, P.M. and Kim, S. (2018) MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, 1-5 October 2018, 2245-2252. https://doi.org/10.1109/IROS.2018.8593885
[5]
Hutter, M., Gehring, C., Jud, D., Lauber, A., Bellicoso, C. D., Tsounis, V., Hwangbo, J., Bodie, K., Fankhauser, P., Bloesch, M., Diethelm, R., Bachmann, S., Melzer, A. and Hoepflinger, M. A. (2016) ANYmal—A Highly Mobile and Dynamic Quadrupedal Robot. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, Daejeon, 9-14 October 2016, 38-44. https://doi.org/10.1109/IROS.2016.7758092
[6]
Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction. 2nd Edition, The MIT Press, Cambridge. http://incompleteideas.net/book/the-book-2nd.html
[7]
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O. (2017) Proximal Policy Optimization Algorithms. ArXiv: 1707.06347. http://arxiv.org/abs/1707.06347
[8]
Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ArXiv: 1801.01290. http://arxiv.org/abs/1801.01290
[9]
Fujimoto, S., van Hoof, H. and Meger, D. (2018) Addressing Function Approximation Error in Actor-Critic Methods. ArXiv: 1802.09477. http://arxiv.org/abs/1802.09477
[10]
Schulman, J., Levine, S., Moritz, P., Jordan, M.I. and Abbeel, P. (2015) Trust Region Policy Optimization. ArXiv: 1502.05477. http://arxiv.org/abs/1502.05477
[11]
Schulman, J., Moritz, P., Levine, S., Jordan, M.I. and Abbeel, P. (2016) High-Dimensional Continuous Control Using Generalized Advantage Estimation. 4th International Conference on Learning Representations, ICLR 2016, San Juan, 2-4 May 2016. http://arxiv.org/abs/1506.02438
[12]
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D. (2016) Continuous Control with Deep Reinforcement Learning. 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2-4 May 2016. http://arxiv.org/abs/1509.02971
[13]
Todorov, E., Erez, T. and Tassa, Y. (2012) Mujoco: A Physics Engine for Model-Based Control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, 7-12 October 2012, 5026-5033. https://doi.org/10.1109/IROS.2012.6386109
[14]
Muratore, F., Gienger, M. and Peters, J. (2021) Assessing Transferability from Simulation to Reality for Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1172-1183. http://arxiv.org/abs/1907.04685 https://doi.org/10.1109/TPAMI.2019.2952353
[15]
Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S. and Vanhoucke, V. (2018) Sim-to-Real: Learning Agile Locomotion for Quadruped Robots. ArXiv: 1804.10332. http://arxiv.org/abs/1804.10332 https://doi.org/10.15607/RSS.2018.XIV.010
[16]
Zhao, W., Queralta, J.P. and Westerlund, T. (2020) Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. 2020 IEEE Symposium Series on Computational Intelligence, Canberra, 1-4 December 2020, 737-744. https://arxiv.org/abs/2009.13303 https://doi.org/10.1109/SSCI47803.2020.9308468
[17]
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P. (2017) Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, 24-28 September 2017, 23-30. http://arxiv.org/abs/1703.06907 https://doi.org/10.1109/IROS.2017.8202133
[18]
Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W. and Abbeel, P. (2017) Asymmetric Actor Critic for Image-Based Robot Learning. ArXiv: 1710.06542. http://arxiv.org/abs/1710.06542 https://doi.org/10.15607/RSS.2018.XIV.008
[19]
Rudin, N., Hoeller, D., Reist, P. and Hutter, M. (2021) Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. ArXiv: 2109.11978. https://arxiv.org/abs/2109.11978
[20]
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M. and Dormann, N. (2021) Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22, 1-8. http://jmlr.org/papers/v22/20-1364.html