8

I am trying to understand the purpose of a 3rd split in the form of a validation dataset. I am not necessarily talking about cross-validation here.


In the scenario below, it would appear that the model is overfit to the training dataset.

Train dataset {acc: 97%, loss: 0.07}
Test dataset {acc: 90%, loss: 8.02}

However, in this scenario it appears much more balanced.

Train dataset {acc: 95%, loss: 1.14}
Test dataset {acc: 93%, loss: 1.83}

Do I need validation data if my train and test accuracy/loss is consistent? Is the purpose of setting a validation split of 10% to ensure this kind of balance before evaluating the model on the training set? What does it prove?

LayneSadler
  • 549
  • 6
  • 17

2 Answers2

9

You don't always need 3 separate datasets. You usually split a dataset into 3 if you are doing some parameter or hyperparameter tuning before choosing a final model. Tuning will usually add bias from the 2nd dataset into your model, decreasing it's performance. For instance:

  1. If you are manually tuning a model over several iterations and using the results from the 2nd dataset to find the optimal parameters. By doing so, you built some information from the 2nd dataset into your model. This will make the 2nd dataset not a good, unbiased benchmark for your final model. Therefore, you will want to use a 3rd untouched dataset, to give you unbiased final performance measurements of your model

  2. Some models use a validation dataset internally while building model to evaluate loss, etc. This will cause the same problem with including bias into the model. Example:

    model.fit(
        train_features,
        train_labels,
        batch_size=20,
        epochs=20,
        validation_data=(val_features, val_labels), # <- here
        verbose=0)
    
Donald S
  • 2,079
  • 3
  • 9
  • 28
1

The only reason you keep different set it to test the model on unseen data.

Seeing is not just about using the data in training but also when you use it for testing and tweak your parm. In that hit-and-trial, you are actually fitting your model to the test data.

Crux is - The last set must be treated as new data and tried only a few times

10xAI
  • 5,929
  • 2
  • 9
  • 25