Why use fit when already have fit_transform?

Question

This is a follow up question to: What's the difference between fit and fit_transform in scikit-learn models?

I want to know why should we use fit at all when we have fit_transform which is much faster than using fit and transform separately? After all we will always transform the training data after fitting it. Do we have any use of fit all by itself?

score 9 · Accepted Answer · answered Apr 12 '21 at 14:28

It probably is fairly rare to need to use fit and not instead fit_transform for a sklearn transformer. It nevertheless makes sense to keep the method separate: fitting a transformer is learning relevant information about the data, while transforming produces an altered dataset. Fitting still makes sense for sklearn predictors, and only some of those (particularly clusterers and outlier detectors) provide a combined fit_predict.

I can think of at least one instance where a transformer gets fitted but does not (immediately) transform data, but it is internal. In KBinsDiscretizer, if encode='onehot', then an internal instance of OneHotEncoder is created, and at fit time for the discretizer, the encoder is fitted (to dummy data) just to prepare it to transform future data. Transforming the data given to KBinsDiscretizer.fit would be wasteful at this point.

Finally, one comment on your post:

we have fit_transform which is much faster than using fit and transform separately

In most (but not all) cases, fit_transform is literally the same as fit(X, y).transform(X), so this should not be faster.

score 3 · Answer 2 · edited Jan 16 '25 at 13:55

3

It is also very convenient to first fit all the data then during the deep learning training loop transform only the current batch. Otherwise, you have to keep all the arrays in memory, and depending on the amount of data and number of categories, this can become quite huge!

edited Jan 16 '25 at 13:55

desertnaut

2,154
2
16
25

answered Jun 26 '22 at 17:21

Robert

131
2

Why use fit when already have fit_transform?

2 Answers2