18

I am trying to figure out how many weights and biases are needed for CNN.

Say I have a (3, 32, 32)-image and want to apply a (32, 5, 5)-filter. For each feature map I have 5x5 weights, so I should have 3 x (5x5) x 32 parameters. Now I need to add the bias. I believe I only have (3 x (5x5) + 1) x 32 parameters, so is the bias the same across all colors (RGB)?

Is this correct? Do I keep the same bias for each image across its depth (in this case 3) while I use different weights? Why is that?

Ethan
  • 1,657
  • 9
  • 25
  • 39
user
  • 2,023
  • 7
  • 23
  • 38

4 Answers4

12

Bias operates per virtual neuron, so there is no value in having multiple bias inputs where there is a single output - that would equivalent to just adding up the different bias weights into a single bias.

In the feature maps that are the output of the first hidden layer, the colours are no longer kept separate*. Effectively each feature map is a "channel" in the next layer, although they are usually visualised separately where the input is visualised with channels combined. Another way of thinking about this is that the separate RGB channels in the original image are 3 "feature maps" in the input.

It doesn't matter how many channels or features are in a previous layer, the output to each feature map in the next layer is a single value in that map. One output value corresponds to a single virtual neuron, needing one bias weight.

In a CNN, as you explain in the question, the same weights (including bias weight) are shared at each point in the output feature map. So each feature map has its own bias weight as well as previous_layer_num_features x kernel_width x kernel_height connection weights.

So yes, your example resulting in (3 x (5x5) + 1) x 32 weights total for the first layer is correct for a CNN with first hidden layer processing RGB input into 32 separate feature maps.


* You may be getting confused by seeing visualisation of CNN weights which can be separated into the colour channels that they operate on.

Neil Slater
  • 29,388
  • 5
  • 82
  • 101
2

here an easy example:

If your input is I X I X C e.g a picture with I X I = 10 x 10 pixels, grayscale so you have only 1 Channel = C results in input = 10 X 10 x 1. And your filter is F X F and you apply K Filters e.g 3 X 3 X 10 Filters then your output is (F X F X C +1) X K where +1 means you have 1 bias per filter. (3 X 3 X 1 + 1) X 10 = 100 parameters to train. If you have RGB then you have (3 X 3 X 3 + 1) X 10 = 280 parameters to train. For every channel, you have another filter. And every filter has one bias.

Here to check this again after reading:

Input I x I x C
Filter F x F (x K) // K times applied
Parameters (F x F x C + 1) x K // where +1 bias per filter, and K is the number of filters

Now check this awesome cheat sheet from Stanford university here

Khan
  • 121
  • 2
2

Since in CNN, we are taking one filter to indicate one feature. We introduce a variable(b) to incorporate the bias from that particular filter. Hence, each filter takes into account the bias that it can cause. The bias is not due to individual filter weights but the whole filter itself. I hope that answers your question.

1

It is the property of CNNs that they use shared weights and biases(same weights and bias for all the hidden neurons in a layer) in order to detect the same feature. This leads to a more deep learning as compared to simple neural networks. You can read this out as a reference :

http://deeplearning.net/tutorial/lenet.html
http://neuralnetworksanddeeplearning.com/chap6.html#introducing_convolutional_networks

enterML
  • 3,091
  • 9
  • 28
  • 38