Why does the MAE still remain, at all?

Question

This may seem to be a silly question. But I just wonder why the MAE doesn't reduce to values close to 0.

It's the result of an MLP with 2 hidden layers and 6 neurons per hidden layer, trying to estimate one outputvalue depending on three input values.

Why is the NN (simple feedforward and backprop, nothing special) not able to maybe even overfit and meet the desired training values?

Costfunction = $0.5 (Target - Modeloutput)^2$

EDIT:

Indeed I found an inconsistency in the inputdata.

Already cheering, I was hoping to see a better result, after fixing the issue with the input data. But what I got is this:

I'm using a Minibatch-SGD and now I think it might get trapped in a local minimum. I read about Levenberg-Marquardt algorithm, which is said to be stable and fast. Is it a better algorithm for global minimum detection?

score 5 · Accepted Answer · answered Apr 16 '21 at 09:30

There can be other reasons related to the model but the most simple explanation is that the data contains contradicting patterns: if the same features correspond to different target values, there is no way for any model to achieve perfect performance.

Let me illustrate with a toy example:

The best a model can do with this data is to associate x=0 with y=4, x=1 with y=3 and x=2 with y=7. Applying this best model to the training set would lead to 3 errors and the MAE would be 0.3.

Why does the MAE still remain, at all?

EDIT:

1 Answers1