Questions tagged [cross-validation]

Refers to general procedures that attempt to determine the generalizability of a statistical result. Cross-validation arises frequently in the context of assessing how a particular model fit predicts future observations. Methods for cross-validation usually involve withholding a random subset of the data during model fitting and quantifying how accurate the withheld data are predicted and repeating this process to get a measure of prediction accuracy.

646 questions

206

votes

18 answers

Train/Test/Validation Set Splitting in Sklearn

How could I randomly split a data matrix and the corresponding label vector into a X_train, X_test, X_val, y_train, y_test, y_val with scikit-learn? As far as I know, sklearn.model_selection.train_test_split is only capable of splitting into two not…

machine-learning scikit-learn cross-validation

asked Nov 15 '16 at 14:55

Hendrik

8,767
17
43
55

votes

4 answers

What is the difference between bootstrapping and cross-validation?

I used to apply K-fold cross-validation for robust evaluation of my machine learning models. But I'm aware of the existence of the bootstrapping method for this purpose as well. However, I cannot see the main difference between them in terms of…

cross-validation model-evaluations

asked May 28 '18 at 13:16

Fredrik

1,047
3
10
12

votes

2 answers

How does the validation_split parameter of Keras' fit function work?

Validation-split in Keras Sequential model fit function is documented as following on https://keras.io/models/sequential/ : validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set…

keras data cross-validation

asked Sep 30 '18 at 06:30

rnso

1,608
3
19
35

votes

3 answers

Why use both validation set and test set?

Consider a neural network: For a given set of data, we divide it into training, validation and test set. Suppose we do it in the classic 60:20:20 ratio, then we prevent overfitting by validating the network by checking it on validation set. Then…

machine-learning neural-network cross-validation

asked Apr 13 '17 at 19:33

user1825567

1,416
1
14
24

votes

2 answers

How to use the output of GridSearch?

I'm currently working with Python and Scikit learn for classification purposes, and doing some reading around GridSearch I thought this was a great way for optimising my estimator parameters to get the best results. My methodology is this: Split my…

machine-learning cross-validation

asked Aug 01 '17 at 13:20

Dee Carter

1,752
1
13
26

votes

3 answers

Does modeling with Random Forests require cross-validation?

As far as I've seen, opinions tend to differ about this. Best practice would certainly dictate using cross-validation (especially if comparing RFs with other algorithms on the same dataset). On the other hand, the original source states that the…

random-forest cross-validation

asked Jul 20 '15 at 13:42

neuron

votes

6 answers

Merging multiple data frames row-wise in PySpark

I have 10 data frames pyspark.sql.dataframe.DataFrame, obtained from randomSplit as (td1, td2, td3, td4, td5, td6, td7, td8, td9, td10) = td.randomSplit([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1], seed = 100) Now I want to join 9 td's into a single…

python apache-spark cross-validation pyspark

asked Apr 22 '16 at 04:27

krishna Prasad

1,147
1
14
23

votes

2 answers

How to calculate the fold number (k-fold) in cross validation?

I am confused about how I choose the number of folds (in k-fold CV) when I apply cross validation to check the model. Is it dependent on data size or other parameters?

machine-learning python scikit-learn cross-validation

asked Feb 22 '18 at 05:23

Taimur Islam

votes

4 answers

Cross validation Vs. Train Validate Test

I have a doubt regarding the cross validation approach and train-validation-test approach. I was told that I can split a dataset into 3 parts: Train: we train the model. Validation: we validate and adjust model parameters. Test: never seen before…

machine-learning cross-validation

asked May 26 '19 at 06:15

NaveganTeX

votes

3 answers

What is the proper way to use early stopping with cross-validation?

I am not sure what is the proper way to use early stopping with cross-validation for a gradient boosting algorithm. For a simple train/valid split, we can use the valid dataset as the evaluation dataset for the early stopping and when refitting we…

xgboost cross-validation lightgbm early-stopping

asked May 17 '20 at 15:15

amine456

votes

1 answer

Stratify on regression

I have worked in classification problems, and stratified cross-validation is one of the most useful and simple techniques I've found. In that case, what it means is to build a training and validation set that have the same prorportions of classes of…

machine-learning python regression cross-validation

asked Jun 14 '18 at 08:48

David Masip

6,136
2
28
62

votes

2 answers

Can overfitting occur even with validation loss still dropping?

I have a convolutional + LSTM model in Keras, similar to this (ref 1), that I am using for a Kaggle contest. Architecture is shown below. I have trained it on my labeled set of 11000 samples (two classes, initial prevalence is ~9:1, so I upsampled…

keras cross-validation overfitting

asked Nov 20 '16 at 13:43

DeusXMachina

votes

3 answers

How to choose a classifier after cross-validation?

When we do k-fold cross validation, should we just use the classifier that has the highest test accuracy? What is generally the best approach in getting a classifier from cross validation?

machine-learning cross-validation

asked Sep 13 '16 at 03:23

Armon Safai

votes

2 answers

Validation vs. test vs. training accuracy. Which one should I compare for claiming overfit?

I have read on the several answers here and on the Internet that cross-validation helps to indicate that if the model will generalize well or not and about overfitting. But I am confused that which two accuracies/errors amoung…

machine-learning cross-validation accuracy overfitting

asked Mar 13 '19 at 19:14

A.B

votes

2 answers

Cross-validation: K-fold vs Repeated random sub-sampling

I wonder which type of model cross-validation to choose for classification problem: K-fold or random sub-sampling (bootstrap sampling)? My best guess is to use 2/3 of the data set (which is ~1000 items) for training and 1/3 for validation. In this…

cross-validation sampling

asked Jun 20 '14 at 17:57

IgorS

5,474
11
34
43

2 3

…

43 44 Next