|
自动化学报 1964
AN OPTIMAL POLICY FOR CONTROLLING THE CONTROLLABLE MARKOV CHAINS
|
Abstract:
This paper is concerned with one type of the optimal Markov controlled systems. The controlled system is described by a Markov chain whose statistical property depends on the sequence of decisions that we call a policy. There exists an objective state with the property that once the system reaches this state, it remains unchanged forever. Our purpose is to choose a policy which maximizes all the probabilities that the system ever reaches this objective state from every initial state. First we give a policy-iteration method for obtaining an optimal policy over the set of stable policies. We then prove such a policy is also optimal over the set containing both stable and unstable policies.