Why is Standard Scaler recommended for Normally distributed data?

Question

I often read the recommendation of using StandardScaler for Normally distributed data, otherwise MinMax Scaling. For example the answers here.

Curious to know the reason/maths behind it.

I get that the idea is to bring the distribution to standard normal, but why can't we do it for other distributions? And why does MinMax scaling works good for all except normal distribution?

PS: Please correct me if you think I have a wrong understanding somewhere.

score 0 · Answer 1 · answered Jul 09 '24 at 23:14

Stating that a preprocessing is better than another without information about the model which is used afterward is useless

Spoiler: for example tree-based models that split along a feature direction are not influenced at al by the preprocessing

My guess on what the original author might have hinted is something along this line:
say you have a feature $X_i \sim N(\mu, \sigma^2)$, then you have a non-zero probability of having a sample $x_i \sim p(X_i)$ that is arbitrarily far from $\mu$... then, using min max scaler, you risk having such min-max being very far because of some "outliers", and everything is squeezed because of them

This, for some family of model (say neural network) might be a problem because they need then to be very sensitive to that specific feature

Why is Standard Scaler recommended for Normally distributed data?

1 Answers1