3

If a policy yields an action for a state, how come a 3-state MDP with 2 possible actions, i.e. $S = \{Hot, Mild, Cold\}$, $A = \{East, West\}$, has 8 possible policies? Isn't it 6 if there are 2 possible action for every state?

Free
  • 17
  • 5
daftpunk99
  • 31
  • 3

1 Answers1

4

Looks like you are a bit confused by the notion of MDP policy. There's a detailed discussion with lots of examples in this question.

A policy is any possible strategy in a given environment. Example: "go $East$ in any state" is a valid strategy in your MDP (though maybe not optimal), as well as "go $West$ in any state".

So there are 3 states and 2 possible actions per each, hence $|A|^{|S|} = 2^3 = 8$ possible strategies: $EEE$, $EEW$, ..., $WWW$.

Maxim
  • 640
  • 1
  • 8
  • 17