33

I am confused about how I choose the number of folds (in k-fold CV) when I apply cross validation to check the model. Is it dependent on data size or other parameters?

Taimur Islam
  • 951
  • 4
  • 12
  • 17

2 Answers2

26

The number of folds is usually determined by the number of instances contained in your dataset. For example, if you have 10 instances in your data, 10-fold cross-validation wouldn't make sense. $k$-fold cross validation is used for two main purposes, to tune hyper parameters and to better evaluate the performance of a model.

In both of these cases selecting $k$ depends on the same thing. You must ensure that the training set and testing set are drawn from the same distribution. And that both sets contain sufficient variation such that the underlining distribution is represented. In a 10-fold cross validation with only 10 instances, there would only be 1 instance in the testing set. This instance does not properly represent the variation of the underlying distribution.

That being said, selecting $k$ is not an exact science because it's hard to estimate how well your fold represents your overall dataset. I usually use 5-fold cross validation. This means that 20% of the data is used for testing, this is usually pretty accurate. However, if your dataset size increases dramatically, like if you have over 100,000 instances, it can be seen that a 10-fold cross validation would lead in folds of 10,000 instances. This should be sufficient to reliably test your model.

In short, yes the number of folds depends on the data size. I usually stick with 4- or 5-fold. Make sure to shuffle your data, such that your folds do not contain inherent bias.

JahKnows
  • 9,086
  • 31
  • 45
13

Depends on how much CPU juice you are willing to afford for the same. Having a lower K means less variance and thus, more bias, while having a higher K means more variance and thus, and lower bias.

Also, one should keep in mind the computational costs for the different values. High K means more folds, thus higher computational time and vice versa. So, one needs to find a sweet spot between those by doing a hyper tuning analysis.

Also, you need to keep the size of your data in mind. If your data is very less, then even using a k-fold crossval wouldn't make sense. So, you might want to use a leave-one-out CV (LOOCV).

Dawny33
  • 8,476
  • 12
  • 49
  • 106