OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Computer Science 2015

Emphatic TD Bellman Operator is a Contraction

Assaf Hallak,Aviv Tamar,Shie Mannor

Full-Text Cite this paper Add to My Lib

Abstract:

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD involves a contraction operator, with a $\sqrt{\gamma}$-contraction modulus (where $\gamma$ is the discount factor). This allows us to provide error bounds on the approximation error of ETD. To our knowledge, these are the first error bounds for an off-policy evaluation algorithm under general target and behavior policies.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133