|
基于关键状态的扩散模型轨迹规划方法
|
Abstract:
在离线强化学习的轨迹规划任务中,传统基于自回归的规划方法因误差逐级累积效应而限制了模型性能。近年来,扩散模型凭借其出色的分布建模能力被引入该领域,以缓解误差累积问题。然而,现有方法在高维动作空间生成长时序轨迹时仍面临性能不足的挑战。为此,本文提出了一种基于关键状态的扩散模型轨迹规划方法,通过提取原始轨迹中的关键状态特征,并结合条件扩散生成模型进行轨迹规划,将传统的自回归式轨迹规划范式转化为基于关键状态的条件生成问题。在确保生成轨迹时序连续性的同时,提升了模型轨迹规划的性能。在D4RL基准测试的Gym-Mujoco、Maze2d、AntMaze和Adroit等多个环境中进行的实验表明,本文方法在轨迹规划性能和算法鲁棒性方面均优于现有方法。
In trajectory planning for offline reinforcement learning, conventional autoregressive planning methods suffer from performance limitations due to error accumulation effects. While diffusion models have recently been introduced to this domain to mitigate error accumulation through their exceptional distribution modeling capabilities, existing approaches still face performance challenges when generating long-horizon trajectories in high-dimensional action spaces. To address this, we propose a Key-State-Conditioned Diffusion Models for Trajectory Planning method that integrates key states with diffusion models. Our approach extracts critical state features from original trajectories and combines them with conditional diffusion generative models for trajectory planning, effectively transforming the traditional autoregressive planning paradigm into a key state-conditioned generation problem. This method not only maintains temporal continuity in generated trajectories but also significantly enhances planning performance. Extensive experiments conducted on multiple D4RL benchmark environments, including Gym-Mujoco, Maze2d, AntMaze, and Adroit, demonstrate that our method outperforms existing approaches in both trajectory planning performance and algorithmic robustness.
[1] | Singh, B., Kumar, R. and Singh, V.P. (2021) Reinforcement Learning in Robotic Applications: A Comprehensive Survey. Artificial Intelligence Review, 55, 945-990. https://doi.org/10.1007/s10462-021-09997-9 |
[2] | Tang, C., Abbatematteo, B., Hu, J., et al. (2024) Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes. arXiv:2408.03539. |
[3] | Wang, Z., Yan, H., Wei, C., Wang, J., Bo, S. and Xiao, M. (2024) Research on Autonomous Driving Decision-Making Strategies Based Deep Reinforcement Learning. Proceedings of the 2024 4th International Conference on Internet of Things and Machine Learning, Nanchang, 9-11 August 2024, 211-215. https://doi.org/10.1145/3697467.3697643 |
[4] | Levine, S., Kumar, A., Tucker, G., et al. (2020) Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv:2005.01643. |
[5] | Fujimoto, S., Meger, D. and Precup, D. (2019) Off-Policy Deep Reinforcement Learning without Exploration. International Conference on Machine Learning, Long Beach, 10-15 June 2019, 2052-2062. |
[6] | Kumar, A., Zhou, A., Tucker, G., et al. (2020) Conservative Q-Learning for Offline Reinforcement Learning. Advances in Neural Information Processing Systems, 33, 1179-1191. |
[7] | Kidambi, R., Rajeswaran, A., Netrapalli, P., et al. (2020) Morel: Model-Based Offline Reinforcement Learning. Advances in Neural Information Processing Systems, 33, 21810-21823. |
[8] | Zhan, X., Zhu, X. and Xu, H. (2022) Model-Based Offline Planning with Trajectory Pruning. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, 23-29 July 2022, 3716-3722. https://doi.org/10.24963/ijcai.2022/516 |
[9] | Kostrikov, I., Nair, A. and Levine, S. (2021) Offline Reinforcement Learning with Implicit Q-Learning. arXiv:2110.06169. |
[10] | Janner, M., Li, Q. and Levine, S. (2021) Offline Reinforcement Learning as One Big Sequence Modeling Problem. Advances in Neural Information Processing Systems, 34, 1273-1286. |
[11] | Vaswani, A., Shazeer, N., Parmar, N., et al. (2017) Attention Is All You Need. arXiv: 1706.03762. |
[12] | Chen, L., Lu, K., Rajeswaran, A., et al. (2021) Decision Transformer: Reinforcement Learning via Sequence Modeling. Advances in Neural Information Processing Systems, 34, 15 084-15097. |
[13] | Esser, P., Kulal, S., Blattmann, A., et al. (2024) Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. arXiv: 2403.03206. |
[14] | Zhu, Z., Zhao, H., He, H., et al. (2023) Diffusion Models for Reinforcement Learning: A Survey. arXiv:2311.01223. |
[15] | Janner, M., Du, Y., Tenenbaum, J.B., et al. (2022) Planning with Diffusion for Flexible Behavior Synthesis. arXiv:2205.09991. |
[16] | Shaw, P., Uszkoreit, J. and Vaswani, A. (2018) Self-Attention with Relative Position Representations. arXiv:1803.02155. |
[17] | Higgins, I., Matthey, L., Pal, A., et al. (2017) Beta-Vae: Learning Basic Visual Concepts with a Constrained Variational Framework. International Conference on Learning Representations, Toulon, 24-26 April 2017, 1-13. |
[18] | Tedrake, R. (2009) Underactuated Robotics: Learning, Planning, and Control for Efficient and Agile Machines Course Notes for MIT 6.832. Working Draft Edition, 1-13. |
[19] | Pathak, D., Mahmoudieh, P., Luo, G., et al. (2018) Zero-Shot Visual Imitation. arXiv:1804.08606. |
[20] | Ho, J. and Salimans, T. (2022) Classifier-Free Diffusion Guidance. arXiv:2207.12598. |
[21] | Ho, J., Jain, A. and Abbeel, P. (2020) Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, 33, 6840-6851. |
[22] | Jun, H. and Nichol, A. (2023) Shape: Generating Conditional 3D Implicit Functions. arXiv:2305.02463. |
[23] | Pearce, T., Rashid, T., Kanervisto, A., et al. (2023) Imitating Human Behaviour with Diffusion Models. arXiv:2301.10677. |
[24] | Fu, J., Kumar, A., Nachum, O., et al. (2020) D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219. |
[25] | Williams, G., Drews, P., Goldfain, B., Rehg, J.M. and Theodorou, E.A. (2016) Aggressive Driving with Model Predictive Path Integral Control. 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, 16-21 May 2016, 1433-1440. https://doi.org/10.1109/icra.2016.7487277 |
[26] | Li, J., Tang, C., Tomizuka, M. and Zhan, W. (2022) Hierarchical Planning through Goal-Conditioned Offline Reinforcement Learning. IEEE Robotics and Automation Letters, 7, 10216-10223. https://doi.org/10.1109/lra.2022.3190100 |
[27] | Argenson, A. and Dulac-Arnold, G. (2020) Model-Based Offline Planning. arXiv:2008.05556. |
[28] | Ajay, A., Du, Y., Gupta, A., et al. (2022) Is Conditional Generative Modeling All You Need for Decision-Making? arXiv:2211.15657. |