6

In stacked generalization, if I understood well, we divide the training set into train/test set. We use train set to train M models, and make predictions on test set. Then we use the predictions as input of a new model. Thus, the new training set will have M features corresponding to the M models predictions. Finally, we use the last model to make the final predictions. First, is my understanding correct ? If so, how is it possible to use the last model to make predictions as it has different features.

Spider
  • 1,279
  • 2
  • 12
  • 12

1 Answers1

4

You have created a model pipeline and must run all trained models ("lower level" ones first) in order to make a prediction on new data using the stack.

With test data set, it is slightly easier, since you can store the predictions from the "level 1" models when testing them, and only run the final model across this stored data.

In addition to your brief description, usually to avoid bias from re-using training data, you would use k-fold cross-validation or similar mechanism, and your training data for the final model should be the cv predictions from each model. You do not want to use the training predictions from those models, because they are likely to be overfit whilst "level 1" test and production predictions will not be, and this would introduce population differences between train and test data in your "level 2" model.

It is also quite a common variation to use the M new features from your "level 1" models alongside some or all of the original features. This gives the meta model more data to base its decision on when deciding the relative weights between the first stage models (assuming this top-level model is non-linear).

Neil Slater
  • 29,388
  • 5
  • 82
  • 101