0

I often read the recommendation of using StandardScaler for Normally distributed data, otherwise MinMax Scaling. For example the answers here.

Curious to know the reason/maths behind it.

I get that the idea is to bring the distribution to standard normal, but why can't we do it for other distributions? And why does MinMax scaling works good for all except normal distribution?

PS: Please correct me if you think I have a wrong understanding somewhere.

1 Answers1

0

Stating that a preprocessing is better than another without information about the model which is used afterward is useless


Spoiler: for example tree-based models that split along a feature direction are not influenced at al by the preprocessing


My guess on what the original author might have hinted is something along this line:
say you have a feature $X_i \sim N(\mu, \sigma^2)$, then you have a non-zero probability of having a sample $x_i \sim p(X_i)$ that is arbitrarily far from $\mu$... then, using min max scaler, you risk having such min-max being very far because of some "outliers", and everything is squeezed because of them

This, for some family of model (say neural network) might be a problem because they need then to be very sensitive to that specific feature