Why not use more than 3 hidden layers for MNIST classification?

Question

Many works use 2-hidden-layer neural networks for classifying MNIST handwritten digits sets.

To improve accuracy, other techniques (dropout, ReLU.. etc) have been used without increasing the number of hidden layers.

Is there any reason not to use more than three hidden layers? for example, overfitting?

Neil Slater · Answer 1 · 2017-08-11T15:36:15.833

Empirically, the network performance does not increase much for a fully-connected network on MNIST when you add layers, but you can probably find ways to improve it on networks with 3+ hidden layers, such as data augmentation (e.g. variations of all inputs translated +-0..2 pixels in x and y, roughly 25 times the original data size, as a start).

I don't think this idea is pursued very far in practice, because CNNs offer a much better performance increase for the effort required. You hit the point of diminishing returns earlier with a basic MLP (around 96-97% accuracy) than you can reach easily with a CNN (around 99% accuracy).

The theory basis for this difference is not obvious to me, but very likely yes this is related to over-fitting. The weight sharing and feature pooling in a CNN is very effective way of processing image data for classification tasks, and avoids over-fitting by reducing the number of parameters, whilst re-using the parameters for the task in a way that makes very good sense given the nature of the inputs.

Why not use more than 3 hidden layers for MNIST classification?

1 Answers1