|
自动化学报 2012
Learning Control of Dynamical Systems Based on Markov Decision Processes: Research Frontiers and Outlooks
|
Abstract:
Learning control of dynamical systems based on Markov decision processes (MDPs) is an interdisciplinary research area of machine learning, control theory, and operations research. The main objective in this research area is to realize data-driven multi-stage optimal control for complex or uncertain dynamical systems. This paper presents a comprehensive survey on the theory, algorithms, and applications of MDP-based learning control of dynamical systems. Emphases are put on recent advances in the theory and methods of reinforcement learning (RL) and adaptive/approximate dynamic programming (ADP), including temporal-difference learning theory, value function approximation for continuous state and action spaces, direct policy search, approximate policy iteration, and adaptive critic designs. Applications and the trends for future research and developments in related fields are also discussed.