OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Journal of Intelligent Learning Systems and Applications 2023

A Comparison of PPO, TD3 and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation

DOI: 10.4236/jilsa.2023.151003, PP. 36-56

James W. Mock, Suresh S. Muknahallipatna

Keywords: Reinforcement Learning, Machine Learning, Markov Decision Process, Domain Randomization

Full-Text Cite this paper Add to My Lib

Abstract:

Deep reinforcement learning (deep RL) has the potential to replace classic robotic controllers. State-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and Soft Actor-Critic Reinforcement Algorithms, to mention a few, have been investigated for training robots to walk. However, conflicting performance results of these algorithms have been reported in the literature. In this work, we present the performance analysis of the above three state-of-the-art Deep Reinforcement algorithms for a constant velocity walking task on a quadruped. The performance is analyzed by simulating the walking task of a quadruped equipped with a range of sensors present on a physical quadruped robot. Simulations of the three algorithms across a range of sensor inputs and with domain randomization are performed. The strengths and weaknesses of each algorithm for the given task are discussed. We also identify a set of sensors that contribute to the best performance of each Deep Reinforcement algorithm.

References

[1]	Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V. and Hutter, M. (2019) Learning Agile and Dynamic Motor Skills for Legged Robots. Science Robotics, 4. http://arxiv.org/abs/1901.08652 https://doi.org/10.1126/scirobotics.aau5872
[2]	Biswal, P. and Mohanty, P.K. (2021) Development of Quadruped Walking Robots: A review. Ain Shams Engineering Journal, 12, 2017-2031. https://doi.org/10.1016/j.asej.2020.11.005 https://www.sciencedirect.com/science/article/pii/S2090447920302501
[3]	Radford, N.A., Strawser, P., Hambuchen, K., Mehling, J.S., Verdeyen, W.K., Donnan, A.S., Holley, J., Sanchez, J., Nguyen, V., Bridgwater, L., Berka, R., Ambrose, R., Myles Markee, M., Fraser-Chanpong, N.J., McQuin, C., Yamokoski, J.D., Hart, S., Guo, R., Parsons, A., Wightman, B., Dinh, P., Ames, B., Blakely, C., Edmondson, C., Sommers, B., Rea, R., Tobler, C., Bibby, H., Howard, B., Niu, L., Lee, A., Conover, M., Truong, L., Reed, R., Chesney, D., Platt Jr., R., Johnson, G., Fok, C.-L., Paine, N., Sentis, L., Cousineau, E., Sinnet, R., Lack, J., Powell, M., Morris, B., Ames, A. and Akinyode, J. (2015) Valkyrie: Nasa’s First Bipedal Humanoid Robot. Journal of Field Robotics, 32, 397-419. https://doi.org/10.1002/rob.21560 https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.21560
[4]	Bledt, G., Powell, M.J., Katz, B., Carlo, J.D., Wensing, P.M. and Kim, S. (2018) MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, 1-5 October 2018, 2245-2252. https://doi.org/10.1109/IROS.2018.8593885
[5]	Hutter, M., Gehring, C., Jud, D., Lauber, A., Bellicoso, C. D., Tsounis, V., Hwangbo, J., Bodie, K., Fankhauser, P., Bloesch, M., Diethelm, R., Bachmann, S., Melzer, A. and Hoepflinger, M. A. (2016) ANYmal—A Highly Mobile and Dynamic Quadrupedal Robot. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, Daejeon, 9-14 October 2016, 38-44. https://doi.org/10.1109/IROS.2016.7758092
[6]	Sutton, R.S. and Barto, A.G. (2018) Reinforcement Learning: An Introduction. 2nd Edition, The MIT Press, Cambridge. http://incompleteideas.net/book/the-book-2nd.html
[7]	Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O. (2017) Proximal Policy Optimization Algorithms. ArXiv: 1707.06347. http://arxiv.org/abs/1707.06347
[8]	Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ArXiv: 1801.01290. http://arxiv.org/abs/1801.01290
[9]	Fujimoto, S., van Hoof, H. and Meger, D. (2018) Addressing Function Approximation Error in Actor-Critic Methods. ArXiv: 1802.09477. http://arxiv.org/abs/1802.09477
[10]	Schulman, J., Levine, S., Moritz, P., Jordan, M.I. and Abbeel, P. (2015) Trust Region Policy Optimization. ArXiv: 1502.05477. http://arxiv.org/abs/1502.05477
[11]	Schulman, J., Moritz, P., Levine, S., Jordan, M.I. and Abbeel, P. (2016) High-Dimensional Continuous Control Using Generalized Advantage Estimation. 4th International Conference on Learning Representations, ICLR 2016, San Juan, 2-4 May 2016. http://arxiv.org/abs/1506.02438
[12]	Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D. (2016) Continuous Control with Deep Reinforcement Learning. 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2-4 May 2016. http://arxiv.org/abs/1509.02971
[13]	Todorov, E., Erez, T. and Tassa, Y. (2012) Mujoco: A Physics Engine for Model-Based Control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, 7-12 October 2012, 5026-5033. https://doi.org/10.1109/IROS.2012.6386109
[14]	Muratore, F., Gienger, M. and Peters, J. (2021) Assessing Transferability from Simulation to Reality for Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1172-1183. http://arxiv.org/abs/1907.04685 https://doi.org/10.1109/TPAMI.2019.2952353
[15]	Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S. and Vanhoucke, V. (2018) Sim-to-Real: Learning Agile Locomotion for Quadruped Robots. ArXiv: 1804.10332. http://arxiv.org/abs/1804.10332 https://doi.org/10.15607/RSS.2018.XIV.010
[16]	Zhao, W., Queralta, J.P. and Westerlund, T. (2020) Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. 2020 IEEE Symposium Series on Computational Intelligence, Canberra, 1-4 December 2020, 737-744. https://arxiv.org/abs/2009.13303 https://doi.org/10.1109/SSCI47803.2020.9308468
[17]	Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P. (2017) Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, 24-28 September 2017, 23-30. http://arxiv.org/abs/1703.06907 https://doi.org/10.1109/IROS.2017.8202133
[18]	Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W. and Abbeel, P. (2017) Asymmetric Actor Critic for Image-Based Robot Learning. ArXiv: 1710.06542. http://arxiv.org/abs/1710.06542 https://doi.org/10.15607/RSS.2018.XIV.008
[19]	Rudin, N., Hoeller, D., Reist, P. and Hutter, M. (2021) Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. ArXiv: 2109.11978. https://arxiv.org/abs/2109.11978
[20]	Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M. and Dormann, N. (2021) Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22, 1-8. http://jmlr.org/papers/v22/20-1364.html

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133