More layers in NN give worse result

Question

So I was working on a classification task with the help of a NN. The data-set was normalised, weights random between 0-1, and all the activations were sigmoid function.

Now, when I used a 2 hidden layer model the accuracy was 50%, whereas, when I used 1 hidden layer model the accuracy was 99%. Isn't this contrary to intuitive understanding about NN's. I knew more layers means better fitting even over-fitting, but apparently something different is happening in this case (maybe the values outputted by the second hidden layer is too small for the output layer to discern). So what exactly am I missing?

score 3 · Answer 1 · answered Jun 08 '18 at 12:54

Maybe you are making a mistake, put your code here. But without seeing your code, these are possible points:

Vanishing problem, I don't think you this problem due to having a very shallow network. You can change your activation function to relu for avoiding that.
Covariat shift, What it means is that similar to input features which have to be normalized, the inputs of the deeper layers have to be normalized to. The normalization process is in a way that the inputs to each layer should have a special distribution that does not change. You can use batch normalisation for avoiding that problem.
Bug in coding, you may have fed the activations of each layer to the next layer in a wrong way or you may have updated the weights not simultaneously If you have not used vectorization. There are also numerous different reasons which may lead to bugs in your code.
The number of neurons may not be enough for each layer, try to increase them. For understanding the meaning of increasing the layers and the number of neurons in each layer, you can take a look at here.

score 1 · Answer 2 · answered Jun 06 '18 at 23:24

Run 'Gradient Checking' to locate place where the error occurs because it looks a usual typical mistake. You shouldn't experience any numerical precision at such a shallow network

However, numerical precision might pop-in during the actual gradient checking. Have a look at this to be warned: link

Also, your weights might be too big - read about Xavier init

score 1 · Answer 3 · answered Jun 07 '18 at 07:57

Training a nn is not always easy. If your traning accuracy is only 50% it means that your network is not really learning. Many problems may arise, but with such a simple problem I would bet that you are having a vanishing gradient problem. If you try activating your hidden layers with relu's, you might solve your vanishing gradient problem. I think it is worth giving a try.

If you are running your nn with Tensorflow, a way to check if your gradients are 0 is using Tensorboard.

More layers in NN give worse result

3 Answers3