%0 Journal Article %T SMDP基于性能势的神经元动态规划 %A 唐昊 %A 袁继彬 %A 陆阳 %A 程文娟 %J 自动化学报 %P 642-645 %D 2005 %X ?Analpha-uniformizedMarkovchainisdefinedbytheconceptofequivalentinfinitesimalgeneratorforasemi-Markovdecisionprocess(SMDP)withbothaverage-anddiscounted-criteria.Accordingtotherelationsoftheirperformancemeasuresandperformancepotentials,theoptimizationofanSMDPcanberealizedbysimulatingthechain.Forthecriticmodelofneuro-dynamicprogramming(NDP),aneuro-policyiteration(NPI)algorithmispresented,andtheperformanceerrorboundisshownasthereareapproximateerrorandimprovementerrorineachiterationstep.TheobtainedresultsmaybeextendedtoMarkovsystems,andhavemuchapplicability.Finally,anumericalexampleisprovided. %K Semi-Markovdecisionprocesses %K performancepotentials %K neuro-dynamicprogramming %U http://www.aas.net.cn/CN/abstract/abstract16011.shtml