21

I'm a little bit new to machine learning.

I am using a neural network to classify images. There are two possible classes. I am using a Sigmoid activation at the last layer so the scores of images are between 0 to 1.

I expected the scores to be sometimes close to 0.5 when the neural net is not sure about the class of the image, but all scores are either 1.0000000e+00 (due to rounding I guess) or very close to zero (for exemple 2.68440009e-15). In general, is that a good or bad thing ? How can this behaviour be avoided?

In my use case I wanted to optimize for recall by setting a lower threshold but this has no impact because of what I described above.

More generally, how can I minimize the number of false negatives when in training the neural net only cares about my not ad-hoc loss ? I am ok with decreasing accuracy a little bit to increase recall.

Louis
  • 404
  • 1
  • 4
  • 12

4 Answers4

12

To answer the last question, suppose that you have a binary classification problem. It is customary to label the class as positive if the output of the Sigmoid is more than 0.5 and negative if it's less than 0.5. For increasing recall rate you can change this threshold to a value less than 0.5, e.g. 0.2. For tasks which you may want a better precision you can increase the threshold to bigger value than 0.5.

About the first part of your question, it highly depends on your data and its feature space. There are problems which the data is linearly separable in higher dimensions which means you can easily employ just a single neuron for classifying the data by a single hyper-plane. If it has happened that you have such a good accuracy you can not say anything unless you try to find the value of cross validation error. By interpreting the difference between the value of training data, and cross-validation or maybe test data, you can figure out whether your classifier performs well or not.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
10

Train to avoid false negatives

What your network learns depends on the loss function you pass it. By choosing this function you can emphasize various things - overall accuracy, avoiding false negatives, false positives etc.

In your case you probably use a cross entropy loss in combination with a softmax classifier. While softmax squashes the prediction values to be 1 when combined across all classes, the cross entropy loss will penalise the distance between the actual ground truth and the prediction. In this calculation it will not take into account what the values of the "false negative" predictions are. In other words: The loss function only cares for the correct class and its related prediction, not for the values of all other classes.

Since you want to avoid false negatives this behaviour is probably the exact thing you need. But if you also want the distance between the actual class and the false predictions another loss function that also takes into account the false values might even serve you better. Give your high accuracy this poses the risk that your overall performance will drop.

What to do then?

Making the wrong prediction and being very sure about it is not uncommon. There are millions of things you could look at, so your best guess probably is to investigate the error. E.g. you could use a confusion matrix to recognize patterns which classes are mixed with which. If there is structure you might need more samples of a certain class or there are probably labelling errors in your training data.

Another way to go ahead would be to manually look at all (or some) examples of errors. Something very basic as listing the errors in a table and trying to find specific characteristics can guide you towards what you need to do. E.g. it would be understandable if your network usually gets the "difficult" examples wrong. But maybe there is some other clear systematic your network did not pick up yet due to lack of data?

Gegenwind
  • 508
  • 2
  • 13
1

One thing I want to mention here. What kind of loss function you are using? From your results, I deduce that you are using cross entropy with the parameter from_logits = True (that would explain the mentioned phenomenon) if you are with Keras, and you have the option from_logits = True, set it to false. I also recommend using label_smoothing = 0.1 or more (depending on what you need). I leave you the link to the TensorFlow cross entropy documentation if this is your case.

Shayan Shafiq
  • 1,008
  • 4
  • 13
  • 24
0

You can train the network to optimize for recall instead of accuracy.

from tensorflow.keras.metrics import Recall
model.compile(metrics=[Recall()])

You can increase the weight of the class.

model.fit(class_weight={0: 1., 1: 3.}) #weight class 0 once and class 1 three times
Ethan
  • 1,657
  • 9
  • 25
  • 39
Ferro
  • 101
  • 3