Questions tagged [dqn]

The DQN (Deep Q-Network) algorithm was developed by DeepMind. It was able to solve a wide range of Atari games (some to superhuman level) by combining reinforcement learning and deep neural networks at scale

DQN overcomes unstable learning by mainly 4 techniques.

Experience Replay
Target Network
Clipping Rewards
Skipping Frames

Experience Replay:

Experience Replay is originally proposed in Reinforcement Learning for Robots Using Neural Networks in 1993. DNN is easily overfitting current episodes. Once DNN is overfitted, it’s hard to produce various experiences. To solve this problem, Experience Replay stores experiences including state transitions, rewards and actions, which are necessary data to perform Q learning, and makes mini-batches to update neural networks. This technique expects the following merits. reduces correlation between experiences in updating DNN increases learning speed with mini-batches reuses past transitions to avoid catastrophic forgetting

80 questions

votes

3 answers

How does Implicit Quantile-Regression Network (IQN) differ from QR-DQN?

For several months I browsed the internet hoping to find a user-friendly explanation of the Implicit Quantile Regression Network (IQN). But, it seems there is none at all. How does IQN differ from Quantile Regression Network, in plain language? In…

regression reinforcement-learning dqn

asked Nov 07 '18 at 14:57

Kari

2,756
2
21
51

votes

2 answers

what is difference between the DDQN and DQN?

I think I did not understand what is the difference between DQN and DDQN in implementation. I understand that we change the traget network during the running of DDQN but I do not understand how it is done in this code. We put the…

reinforcement-learning dqn deep-learning weight-initialization

asked Sep 22 '18 at 05:19

user10296606

1,906
6
18
33

votes

2 answers

How to choose between discounted reward and average reward?

How to select between average reward and discounted reward? And when average reward is more effective in comparison with discounter reward and when vice versa is correct? Is is possible to use both of them in a problem? Because as I understand the…

reinforcement-learning dqn discounted-reward

asked Feb 18 '19 at 12:29

user10296606

1,906
6
18
33

votes

1 answer

What are the effects of clipping the reward in stability?

I am looking for stabilizing my results of DQN, I found clipping is one technique to do it but I did not understand it completely! 1- what are the effects of clipping the reward, clipping the gradient, clipping the error in stability and how makes…

tensorflow training dqn deep-learning keras-rl

asked Sep 30 '18 at 03:04

user10296606

1,906
6
18
33

votes

3 answers

Why random sample from replay for DQN?

I'm trying to gain an intuitive understanding of deep reinforcement learning. In deep Q-networks (DQN) we store all actions/environments/rewards in a memory array and at the end of the episode, "replay" them through our neural network. This makes…

neural-network deep-learning reinforcement-learning q-learning dqn

asked Nov 19 '17 at 15:25

ZAR

votes

1 answer

Difference between advantages of Experience Replay in DQN2013 paper

I've been re-reading the Playing Atari with Deep Reinforcement Learning (2013) paper. It lists three advantages of experience replay: This approach has several advantages over standard online Q-learning [23]. First, each step of experience is…

machine-learning deep-learning reinforcement-learning dqn

asked Aug 14 '18 at 05:42

seungjaeryanlee

votes

2 answers

Agent always takes a same action in DQN - Reinforcement Learning

I have trained an RL agent using DQN algorithm. After 20000 episodes my rewards are converged. Now when I test this agent, the agent is always taking the same action , irrespective of state. I find this very weird. Can someone help me with this. Is…

reinforcement-learning dqn policy-gradients actor-critic

asked Oct 04 '19 at 15:02

chink

votes

1 answer

DQN fails to find optimal policy

Based on DeepMind publication, I've recreated the environment and I am trying to make the DQN find and converge to an optimal policy. The task of an agent is to learn how to sustainably collect apples (objects), with the regrowth of the apples…

reinforcement-learning q-learning dqn convergence deepmind

asked Apr 01 '19 at 01:23

macwiatrak

votes

1 answer

How to implement clipping the reward in DQN in keras

How to implement clipping the reward in DQN in keras? especially how to implement clipping the reward? Is this pseudo code correct: if reward<-threshold reward=-1 elseif reward>threshold reward=1 elseif -threshold

deep-learning tensorflow training dqn keras-rl

asked Oct 02 '18 at 09:06

user10296606

1,906
6
18
33

votes

1 answer

Why does exploration in DQN not lead to instability?

Why does action exploration in DQN not lead to instability? I see in DQN algorithms, that it selects random actions even after some iterations. My question is how does this approach not lead to instability? Even the final value of epsilon (the…

machine-learning neural-network training dqn

asked Sep 10 '18 at 19:15

user10296606

1,906
6
18
33

votes

1 answer

What is a minimal setup to solve the CartPole-v0 with DQN?

I solved the CartPole-v0 with a CEM agent pretty easily (experiments and code), but I struggle to find a setup which works with DQN. Do you know which parameters should be adjusted so that the mean reward is about 200 for this problem? What I…

reinforcement-learning keras-rl openai-gym dqn

asked Nov 09 '17 at 08:14

Martin Thoma

19,540
36
98
170

votes

2 answers

How exactly does DQN learn?

I created my custom environment in gym, which is a maze. I use a DQN model with BoltzmannQPolicy. It trains fine with the following variables: position of the agent distance from the endpoint position of the endpoint which directions can it move…

machine-learning python reinforcement-learning dqn openai-gym

asked Feb 28 '21 at 15:52

Marci

votes

1 answer

Is it possible to solve Rubik's cube using DQN?

I'm trying to solve Rubik's cube using deep learning and I came across with DQN, so I decided to give it a try. I developed all the code and started training but I got this results: Loss goes up and test never get better results. I have tried to…

python deep-learning pytorch dqn

asked Apr 26 '20 at 11:20

Javier Jiménez de la Jara

votes

1 answer

Evaluating a trained Reinforcement Learning Agent?

I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or…

reinforcement-learning dqn actor-critic monte-carlo keras-rl

asked Oct 30 '19 at 11:41

chink

votes

1 answer

Policy Gradient with continuous action space

How to apply reinforce/policy-gradient algorithms for continuous action space. I have learnt that one of the advantages of policy gradients is , it is applicable for continuous action space. One way I can think of is discretizing the action space…

reinforcement-learning dqn policy-gradients ai

asked Oct 14 '19 at 11:51

chink

2 3 4 5 6 Next