%0 Journal Article
%T Learning Control of Dynamical Systems Based on Markov Decision Processes: Research Frontiers and Outlooks<br>基于马氏决策过程模型的动态系统学习控制:研究前沿与展望
%A XU Xin
%A SHEN Dong
%A GAO Yan-Qing
%A WANG Kai
%A <br>徐昕
%A 沈栋
%A 高岩青
%A 王凯
%J 自动化学报
%D 2012
%I 
%X Learning control of dynamical systems based on Markov decision processes (MDPs) is an interdisciplinary research area of machine learning, control theory, and operations research. The main objective in this research area is to realize data-driven multi-stage optimal control for complex or uncertain dynamical systems. This paper presents a comprehensive survey on the theory, algorithms, and applications of MDP-based learning control of dynamical systems. Emphases are put on recent advances in the theory and methods of reinforcement learning (RL) and adaptive/approximate dynamic programming (ADP), including temporal-difference learning theory, value function approximation for continuous state and action spaces, direct policy search, approximate policy iteration, and adaptive critic designs. Applications and the trends for future research and developments in related fields are also discussed.
%K Learning control
%K Markov decision processes (MDP)
%K reinforcement learning (RL)
%K approximate dynamic programming (ADP)
%K machine learning
%K adaptive control<br>学习控制
%K Markov决策过程
%K 增强学习
%K 近似动态规划
%K 机器学习
%K 自适应控制
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=E76622685B64B2AA896A7F777B64EB3A&aid=1393146D229514ED79A4DC99EB53D08D&yid=99E9153A83D4CB11&vid=16D8618C6164A3ED&iid=94C357A881DFC066&sid=B28C697BC3A1BA62&eid=F4C2D192FB73A21F&journal_id=0254-4156&journal_name=自动化学报&referenced_num=0&reference_num=143