0

I have gone through this answer regarding the difference between Q-value and state value. My specific question is:

If Q-value calculates immediate reward after taking a particular action and then calculates the rest by following a policy, how will the expected reward change if value function also follows the same policy? For example, in the image below, how will Q-value and state value change depending on the policy?

enter image description here

bkshi
  • 2,303
  • 2
  • 14
  • 23
kanishka
  • 1
  • 1

0 Answers0