%0 Journal Article
%T On-line optimization algorithm for Markov control processes based on a single sample path
Markov控制过程基于单个样本轨道的在线优化算法
%A TANG Hao
%A XI Hong-sheng
%A YIN Bao-qun
%A
唐 昊
%A 奚宏生
%A 殷保群
%J 控制理论与应用
%D 2002
%I
%X Based on the theory of Markov performance potentials, this paper studies a performance optimization algorithm for Markov control processes. Different from the traditional computation-based approaches, this algorithm could estimate the gradients of performance with respect to the policy parameters by simulating a single sample path, and look for an optimal (or suboptimal) randomized stationary policy. The algorithm provided here could satisfy the needs of on-line optimization of many different real-world engineering systems, because we can select suitable parameters in the algorithm according to the properties of a real system. Finally, the convergence of the algorithm with probability one on an infinite sample path is considered, and a numerical example for a three-state controlled Markov chain is provided.
%K Markov control processes
%K Markov performance potentials
%K randomized stationary policies
%K on-line optimization
Markov控制过程
%K Markov性能势
%K 随机平稳策略
%K 在线优化
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=970898A57DFC021F93AB51667BAED7F7&aid=F91FDAD4A07E1AB7&yid=C3ACC247184A22C1&vid=2A8D03AD8076A2E3&iid=B31275AF3241DB2D&sid=461E94ABCF58C63F&eid=1F8584045E0BED57&journal_id=1000-8152&journal_name=控制理论与应用&referenced_num=4&reference_num=11