Validation vs. test vs. training accuracy. Which one should I compare for claiming overfit?

Question

I have read on the several answers here and on the Internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting.

But I am confused that which two accuracies/errors amoung test/training/validation should I compare to be able to see if the model is overfitting or not?

For example:

I divide my data for 70% training and 30% test.

When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?

Afterward, I test the model on 30% test data and get Test Accuracy.

In this case, what will be training accuracy? And which two accuracies should I compare to see if the model is overfitting or not?

Esmailian · Accepted Answer · 2019-03-27T18:46:53.890

Which two accuracies I compare to see if the model is overfitting or not?

You should compare the training and test accuracies to identify over-fitting. A training accuracy that is subjectively far higher than test accuracy indicates over-fitting.

Here, "accuracy" is used in a broad sense, it can be replaced with F1, AUC, error (increase becomes decrease, higher becomes lower), etc.

I suggest "Bias and Variance" and "Learning curves" parts of "Machine Learning Yearning - Andrew Ng". It presents plots and interpretations for all the cases with a clear narration.

When I get to run 10 fold cross-validation, I get 10 accuracies that I can take the average/mean of. should I call this mean as validation accuracy?

No. It is a [estimate of] test accuracy.
The difference between validation and test sets (and their corresponding accuracies) is that validation set is used to build/select a better model, meaning it affects the final model. However, since 10-fold CV always tests an already-built model on its 10% held-out, and it is not used here to select between models, its 10% held-out is a test set not a validation set.

Afterward, I test the model on 30% test data and get Test Accuracy.

If you don't use the K-fold to select between multiple models, this part is not needed, run K-fold on 100% of data to get the test accuracy. Otherwise, you should keep this test set, since the result of K-fold would be a validation accuracy.

In this case, what will be training accuracy?

From each of 10 folds you can get a test accuracy on 10% of data, and a training accuracy on 90% of data. In python, method cross_val_score only calculates the test accuracies. Here is how to calculate both:

from  sklearn import model_selection
from sklearn import datasets
from sklearn import svm

iris = datasets.load_iris()
clf = svm.SVC(kernel='linear', C=1)
scores = model_selection.cross_validate(clf, iris.data, iris.target, cv=5, return_train_score=True)
print('Train scores:')
print(scores['train_score'])
print('Test scores:')
print(scores['test_score'])

Set return_estimator = True to get the trained models too.

More on validation set

Validation set shows up in two general cases: (1) building a model, and (2) selecting between multiple models,

Two examples for building a model: we (a) stop training a neural network, or (b) stop pruning a decision tree when accuracy of model on validation set starts to decrease. Then, we test the final model on a held-out set, to get the test accuracy.
Two examples for selecting between multiple models:

a. We do K-fold CV on one neural network with 3 layers, and one with 5 layers (to get K models for each), then we select the NN with the highest validation accuracy averaged over K models; suppose the 5 layer NN. Finally, we train the 5 layer NN on a 80% train, 20% validation split of combined K folds, and then test it on a held out set to get the test accuracy.

b. We apply two already-built SVM and decision tree models on a validation set, then we select the one with the highest validation accuracy. Finally, we test the selected model on a held-out set to get the test accuracy.

score 6 · Answer 2 · answered Mar 13 '19 at 19:32

Cross validation splits your data into K folds. Each fold contains a set of training data and test data. You are correct that you get K different error rates that you then take the mean of. These error rates come from the test set of each of your K folds. If you want to get the training error rate, you would calculate the error rate on the training part of each of these K folds and then take the average.

Validation vs. test vs. training accuracy. Which one should I compare for claiming overfit?

2 Answers2

Linked