Use for questions about Backpropagation, which is commonly used in training Neural Networks in conjunction with an optimization method such as gradient descent.
Questions tagged [backpropagation]
302 questions
111
votes
6 answers
Backprop Through Max-Pooling Layers?
This is a small conceptual question that's been nagging me for a while: How can we back-propagate through a max-pooling layer in a neural network?
I came across max-pooling layers while going through this tutorial for Torch 7's nn library. The…
shinvu
- 1,240
- 2
- 9
- 7
44
votes
4 answers
Guidelines for selecting an optimizer for training neural networks
I have been using neural networks for a while now. However, one thing that I constantly struggle with is the selection of an optimizer for training the network (using backprop). What I usually do is just start with one (e.g. standard SGD) and then…
mplappert
- 541
- 1
- 4
- 4
29
votes
5 answers
Gradients for bias terms in backpropagation
I was trying to implement neural network from scratch to understand the maths behind it. My problem is completely related to backpropagation when we take derivative with respect to bias) and I derived all the equations used in backpropagation. Now…
user34042
- 425
- 1
- 4
- 7
26
votes
1 answer
back propagation in CNN
I have the following CNN:
I start with an input image of size 5x5
Then I apply convolution using 2x2 kernel and stride = 1, that produces feature map of size 4x4.
Then I apply 2x2 max-pooling with stride = 2, that reduces feature map to size 2x2.…
koryakinp
- 436
- 2
- 5
- 14
25
votes
1 answer
Deep Neural Network - Backpropogation with ReLU
I'm having some difficulty in deriving back propagation with ReLU, and I did some work, but I'm not sure if I'm on the right track.
Cost Function: $\frac{1}{2}(y-\hat y)^2$ where $y$ is the real value, and $\hat y$ is a predicted value. Also assume…
user1157751
- 709
- 1
- 8
- 22
24
votes
1 answer
How does Gradient Descent and Backpropagation work together?
Please forgive me as I am new to this. I have attached a diagram trying to model my understanding of neural network and Back-propagation? From videos on Coursera and resources online I formed the following understanding of how neural network…
Mohamed Mahyoub
- 355
- 1
- 2
- 5
24
votes
2 answers
Sliding window leads to overfitting in LSTM?
Will I overfit my LSTM if I train it via the sliding-window approach? Why do people not seem to use it for LSTMs?
For a simplified example, assume that we have to predict the sequence of characters:
A B C D E F G H I J K L M N O P Q R S T U V W X Y…
Kari
- 2,756
- 2
- 21
- 51
22
votes
1 answer
Understanding Timestamps and Batchsize of Keras LSTM considering Hiddenstates and TBPTT
What I'm trying to do
What I am trying to do is predicting the next data-point $x_t$ for each point in the timeseries $[x_0, x_1, x_2,...,x_T]$ in the context of a date-stream in real-time, in theory the series is infinity. If a new value $x$ is…
KenMarsu
21
votes
1 answer
What do "compile", "fit", and "predict" do in Keras sequential models?
I am a little confused between these two parts of Keras sequential models functions. May someone explains what is exactly the job of each one? I mean compile doing forward pass and calculating cost function then pass it through fit to do backward…
user3486308
- 1,310
- 5
- 19
- 29
18
votes
4 answers
Question about bias in Convolutional Networks
I am trying to figure out how many weights and biases are needed for CNN.
Say I have a (3, 32, 32)-image and want to apply a (32, 5, 5)-filter.
For each feature map I have 5x5 weights, so I should have 3 x (5x5) x 32 parameters. Now I need to add…
user
- 2,023
- 7
- 23
- 38
16
votes
1 answer
Back-propagation through max pooling layers
I have a small sub-question to this question.
I understand that when back-propagating through a max pooling layer the gradient is routed back in a way that the neuron in the previous layer which was selected as max gets all the gradient. What I'm…
Majster
- 263
- 2
- 8
14
votes
1 answer
Differences between gradient calculated by different reduction methods in PyTorch
I'm playing with different reduction methods provided in built-in loss functions. In particular, I would like to compare the following.
The averaged gradient by performing backward pass for each loss value calculated with reduction="none"
The…
Zhuoran Liu
- 141
- 1
- 3
14
votes
3 answers
Creating neural net for xor function
It is a well known fact that a 1-layer network cannot predict the xor function, since it is not linearly separable. I attempted to create a 2-layer network, using the logistic sigmoid function and backprop, to predict xor. My network has 2 neurons…
user
- 2,023
- 7
- 23
- 38
12
votes
2 answers
How does backpropagation works through Max Pooling layer when doing a batch?
Let's assume that we are using a batch size of 100 samples for learning.
So in every batch, the weight of every neuron (and bias, etc) is being updated by adding the minus of the learning rate * the average error value that we found using the 100…
Nathan G
- 241
- 1
- 2
- 5
11
votes
1 answer
Synthetic Gradients good number of Layers & neurons
I would like to train my LSTM with a "synthetic gradients" Decoupled Neural Interface (DNI).
How to decide on the number of layers and neurons for my DNI?
Searching for them by trial end error or what's worse - by Genetic
algorithm which would…
Kari
- 2,756
- 2
- 21
- 51