-1

I am a newbie in ML, and I am learning how to fill missing data in a dataset using Imputer. These are the few lines of code that I came across

from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])

Now I am not able to understand what is the role of the fit and the transform function. It will be great if someone can help. Thank you.

Turing101
  • 99
  • 3

2 Answers2

1

fit is learning how make the transformation on array X(not fill missing values, only have knowledge how do it). All info is saved inside of Imputer.

If we want to use info from fit and transform data in the way which is saved in clf we use command transform.

fuwiak
  • 1,373
  • 8
  • 14
  • 26
1

In a very simple words, Imputer(), is the definition of how you want to fill in the missing values. For example, you define what will be the strategy, what will be the axis. This line doesn't do anything because you have just created an object.

Finally, you fit this imputer object for letting it learn your dataset and accordingly it will transform your missing values when you call transform method.