Questions tagged [reward]

A reward is the network feedback in a reinforcement-learning setting. Reward functions describe how an agent is awarded for its actions in a given state.

When an agent takes a step, the feedback from the network is known as the reward.

5 questions
1
vote
1 answer

What is a good reward function when objective is to minimize the average along with the variance?

I am trying to formulate a problem where we are trying to minimize the average resource allocated to different users. Due to some inherent properties of the environment, some users can be easily minimized while it is difficult for other users due to…
user3656142
  • 181
  • 1
  • 6
1
vote
1 answer

Train Reward Model using Llama2:

this is my code that use to train reward model: import os import torch from datasets import load_dataset,Dataset from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline, …
0
votes
0 answers

How to find the optimal features and rewards to get a deep learning AI based on the cross entropy method to learn well?

I am a beginner in programming, but managed to get a little pong game done. For my studies I had to understand an AI that solved the Lunar-Lander-V2 environment of the Gymnasium API. Therefore it used deep learning and the cross-entropy method. It…
0
votes
1 answer

Illegal action reward strategy for reinforcement learning : reward shaping and termination / truncation

I have some questions about strategy to adopt regarding illagal action handling in reinforcement learning (Stable Baselines 3 / SAC algo). First is about reward shaping, second is about terminating / truncating episode when performing an illegal…
GerardL
  • 13
  • 2
0
votes
1 answer

How to write a reward function that optimizes for profit and revenue?

So I want to write a reward function for a reinforcement learning model which picks products to display to a customer. Each product has a profit margin %. Higher price products will have a higher profit margin but lower probability of being…
JimDoe
  • 23
  • 4