If a policy yields an action for a state, how come a 3-state MDP with 2 possible actions, i.e. $S = \{Hot, Mild, Cold\}$, $A = \{East, West\}$, has 8 possible policies? Isn't it 6 if there are 2 possible action for every state?
Asked
Active
Viewed 2,398 times
1 Answers
4
Looks like you are a bit confused by the notion of MDP policy. There's a detailed discussion with lots of examples in this question.
A policy is any possible strategy in a given environment. Example: "go $East$ in any state" is a valid strategy in your MDP (though maybe not optimal), as well as "go $West$ in any state".
So there are 3 states and 2 possible actions per each, hence $|A|^{|S|} = 2^3 = 8$ possible strategies: $EEE$, $EEW$, ..., $WWW$.
Intrastellar Explorer
- 117
- 6
Maxim
- 640
- 1
- 8
- 17