Illegal action reward strategy for reinforcement learning : reward shaping and termination / truncation

Question

I have some questions about strategy to adopt regarding illagal action handling in reinforcement learning (Stable Baselines 3 / SAC algo). First is about reward shaping, second is about terminating / truncating episode when performing an illegal action.

What is the good practice for illegal action penality ? Game cumulates 1 if wins and -1 if loses. I apply a -100 penality for illegal action because the evaluation is done over 100 episodes so that an illegal action cannot be compensated by some wins. Is it the good way? Should i do illegal_rew = -100 or illegal_rew = -100 - episode_cumulated_rew?
Should I stop the game when performs an illegal action ? If the answer is "yes", should i terminate or truncate the episode ?

score -1 · Accepted Answer · answered Feb 22 '24 at 20:29

Penalty for Illegal Actions:

The penalty for illegal actions is an important aspect of reinforcement learning. In your case, applying a -100 penalty for illegal actions is a reasonable approach. This is because the evaluation is done over 100 episodes, and the significant penalty ensures that an illegal action cannot be compensated by some wins Therefore, setting the illegal action penalty as illegal_rew = -100 seems like a suitable choice in my opinion.

Stopping the Game for Illegal Actions:

Regarding whether to stop the game when an illegal action is performed, it's generally advisable to terminate the episode when an illegal action occurs. This ensures that the agent learns the consequences of illegal actions and avoids further exploration of illegal action spaces. Terminating the episode upon an illegal action is a common practice in reinforcement learning to enforce the agent's adherence to the rules of the environment.

Read this tips and tricks page if you haven't already for more info.

Hope this answers your question! :)

Illegal action reward strategy for reinforcement learning : reward shaping and termination / truncation

1 Answers1