2

I am working on CNN, and I have some doubts. Let's assume I only want one feature map, just to make things easier. And let's suppose my image is grayscale, to make things even easier. So, let's say my image is (32,32) --grayscale, hence just a channel and we don't need to write it explicitly, and my filter is (3,3) --again, one feature map, so I won't bother writing 1. I understand this will map to a (30,30) layer.

How many parameters will I have? If I understand it correctly, I will have 9 weights and one bias, so a total of 10, because we map each (3,3) subregion using the same weights. Back-propagation will determine the best values for those weights and that will give me one feature map, or a filter.

So far, so good. What I don't understand is how does the training work? I need to keep the same weights and bias when moving across the image (that's why I only have 10 parameters), but won't those change when I do back-propagation? How can I apply back-propagation and keep the same values for the weights regardless of the subregion they are applied to?

Ethan
  • 1,657
  • 9
  • 25
  • 39
user
  • 2,023
  • 7
  • 23
  • 38

1 Answers1

2

You are right there are just 10 params in your example.

For determining gradients, you just add up all the deltas from backpropagation in each location - i.e. you run backpropagation 30x30 = 900 times, for each position the 3x3 kernel is used, for every example in your batch (or just for one example if you are running most simple onine stochastic gradient descent), and for each position you add those delta values into a suitably-sized buffer (10 values for weight deltas, or 9 values for previous layer activation deltas). You will end up with one set of summed deltas matching your single 3x3 filter (plus a delta bias term). You then apply the summed version to update the weights of your single filter + bias.

Note this is a general rule you can apply whenever multiple gradient sources from backpropagation can be applied to any parameter - they just add. This occurs in RNNs too, or in any structure where you can set an objective function for non-output neurons.

Neil Slater
  • 29,388
  • 5
  • 82
  • 101