4

I read two articles by the same guy where he uses the whole dataset for hyperparameter optimisation using with CV and then evaluates the model with the best hyperparameters using leave one out on the same dataset.

This seems fishy, from what I know I believe that by tuning the model to the whole dataset and then evaluating on the same dataset he will be overfitting it and have a overly optimistic result.

However, this guy managed to publish two articles with this same methodoly (one in a Q1 journal) so I'm wondering if I'm the one missing something.

3 Answers3

6

Yes, you're right. Using the whole dataset for hyperparameter tuning and then evaluating with LOOCV on the same data causes data leakage and overfitting.

This setup gives overly optimistic results because the model has already seen patterns in every data point during tuning.

Thumb Rule: final evaluation must be on data not used in any part of model selection.

Ben Reiniger
  • 12,855
  • 3
  • 20
  • 63
Guna
  • 747
  • 1
  • 16
2

Yes, this will lead to an optimistic bias in the performance estimate, see the paper that Mrs Marsupial and I wrote on exactly this topic:

GC Cawley and NLC Talbot, "On over-fitting in model selection and subsequent selection bias in performance evaluation ",Journal of Machine Learning Research 11, 2079-2107 (pdf)

Particularly section 5.3 which looks at using cross-validation for hyper-parameter tuning and then using cross-validation with a different partitioning for performance evaluation, which is a fairly similar situation to that in the question.

Dikran Marsupial
  • 650
  • 3
  • 11
1

have also seen that phenomenon in some variations.

authors performing n-fold Hyper Parameter tuning on all available data, and than, as testing, report the combined result of the fold models on their respective evaluation targets.

petr
  • 176
  • 4