From what I have understood about model stacking: the meta estimator trains to combine the N-models predictions to fit the ground truth. Once trained, it combines the 1st level output to approach the ground truth.
The meta estimator is a model of type : $ (y_{pred1}, y_{pred2}, y_{pred3})\rightarrow y_{pred-stack}$
So the combination is only based on the values of 1st level predictions. However, each line of the stacking data is also linked to other attributes: "Brand", "Model", "Power". Why won't we take those attributes to determine the optimal combination? So if the model 1 is the best when the brand is "NaN", the meta will learn it and redirect every prediction having NaN brand to model 1.
So the meta estimator I propose is as follow : $ (y_{pred1},y_{pred2},y_{pred3},$brandIsNull$)\rightarrow y_{pred-stack}$
- Does this approach exist?
- If not, would it be a good or bad idea?