%0 Journal Article
%T SMDP基于性能势的神经元动态规划
%A 唐昊
%A 袁继彬
%A 陆阳
%A 程文娟
%J 自动化学报
%P 642-645
%D 2005
%X ？Analpha-uniformizedMarkovchainisdefinedbytheconceptofequivalentinfinitesimalgeneratorforasemi-Markovdecisionprocess(SMDP)withbothaverage-anddiscounted-criteria.Accordingtotherelationsoftheirperformancemeasuresandperformancepotentials,theoptimizationofanSMDPcanberealizedbysimulatingthechain.Forthecriticmodelofneuro-dynamicprogramming(NDP),aneuro-policyiteration(NPI)algorithmispresented,andtheperformanceerrorboundisshownasthereareapproximateerrorandimprovementerrorineachiterationstep.TheobtainedresultsmaybeextendedtoMarkovsystems,andhavemuchapplicability.Finally,anumericalexampleisprovided.
%K Semi-Markovdecisionprocesses
%K performancepotentials
%K neuro-dynamicprogramming
%U http://www.aas.net.cn/CN/abstract/abstract16011.shtml