|
自动化学报 2005
SMDP基于性能势的神经元动态规划, PP. 642-645 Keywords: Semi-Markovdecisionprocesses,performancepotentials,neuro-dynamicprogramming Abstract: ?Analpha-uniformizedMarkovchainisdefinedbytheconceptofequivalentinfinitesimalgeneratorforasemi-Markovdecisionprocess(SMDP)withbothaverage-anddiscounted-criteria.Accordingtotherelationsoftheirperformancemeasuresandperformancepotentials,theoptimizationofanSMDPcanberealizedbysimulatingthechain.Forthecriticmodelofneuro-dynamicprogramming(NDP),aneuro-policyiteration(NPI)algorithmispresented,andtheperformanceerrorboundisshownasthereareapproximateerrorandimprovementerrorineachiterationstep.TheobtainedresultsmaybeextendedtoMarkovsystems,andhavemuchapplicability.Finally,anumericalexampleisprovided.
|