Can cross validation for tuning and LOO for evaluation on the exact same dataset cause bias?

Question

I read two articles by the same guy where he uses the whole dataset for hyperparameter optimisation using with CV and then evaluates the model with the best hyperparameters using leave one out on the same dataset.

This seems fishy, from what I know I believe that by tuning the model to the whole dataset and then evaluating on the same dataset he will be overfitting it and have a overly optimistic result.

However, this guy managed to publish two articles with this same methodoly (one in a Q1 journal) so I'm wondering if I'm the one missing something.

score 6 · Accepted Answer · edited May 15 '25 at 12:29

6

Yes, you're right. Using the whole dataset for hyperparameter tuning and then evaluating with LOOCV on the same data causes data leakage and overfitting.

This setup gives overly optimistic results because the model has already seen patterns in every data point during tuning.

Thumb Rule: final evaluation must be on data not used in any part of model selection.

edited May 15 '25 at 12:29

Ben Reiniger

12,855
3
20
63

answered May 15 '25 at 02:01

Guna

747
1
16

score 2 · Answer 2 · answered May 24 '25 at 09:49

Yes, this will lead to an optimistic bias in the performance estimate, see the paper that Mrs Marsupial and I wrote on exactly this topic:

GC Cawley and NLC Talbot, "On over-fitting in model selection and subsequent selection bias in performance evaluation ",Journal of Machine Learning Research 11, 2079-2107 (pdf)

Particularly section 5.3 which looks at using cross-validation for hyper-parameter tuning and then using cross-validation with a different partitioning for performance evaluation, which is a fairly similar situation to that in the question.

score 1 · Answer 3 · answered May 23 '25 at 02:13

1

have also seen that phenomenon in some variations.

authors performing n-fold Hyper Parameter tuning on all available data, and than, as testing, report the combined result of the fold models on their respective evaluation targets.

answered May 23 '25 at 02:13

petr

176
4

Can cross validation for tuning and LOO for evaluation on the exact same dataset cause bias?

3 Answers3