Why don't we increase the parameter from 64 to 128 in this CNN model?

Question

I'm looking at an example lab from a coursera course titled Intro to Tensorflow. In this CNN model, they're gradually increasing the no. of filters from 16 to 32 and then 64. Why don't we increase it to 128 and more? I'm curious why the last three conv2D layers are all 64 and if these are redundant in terms of the model's performance.

Thanks!

score 1 · Answer 1 · answered Jun 26 '24 at 18:21

Increasing filters in CNNs follows a "Goldilocks" principle - not too few, not too many. While going from 16 to 32 to 64 filters can capture more complex features, jumping to 128+ often leads to diminishing returns and potential overfitting. It's like the bias-variance tradeoff curve: too few filters, you underfit; too many, you overfit. The repeated 64-filter layers aren't redundant; they allow the network to learn hierarchical features and increase the receptive field. Ultimately, the best architecture depends on your specific problem and dataset. Experimentation is key!

Somewhat tangential but useful: https://botcampus.ai/understanding-the-bias-variance-tradeoff-in-machine-learning/

Ciodar · Answer 2 · 2023-03-03T21:01:11.997

Why don't we increase it to 128 and more?

You can do it, the number of filters of each convolutional layer can be seen as an hyperparameter. You can try to change it and see if the performance gets better or worse.

What's the point of having multiple filters in each convolutional layer? Each filter will hopefully compute a different representation of the input, that could be used by the following layer to obtain different and more high-level representation of the input (see also this answer).

I'm curious why the last three conv2D layers are all 64

Why are those 64? Because the creator of the network chose them to be 64, according to some criteria (his knowledge, some experimentation, a neural architecture search ... )

and if these are redundant in terms of the model's performance.

Generally speaking, deeper models may compute a more detailed representation of the input and achieve a better performance if trained correctly (see for example VGG which does an extensive examination of the depth of a CNN). Whether they are redundant or not in a specific dataset should be evaluated by experimentation.

Why don't we increase the parameter from 64 to 128 in this CNN model?

2 Answers2