When should I NOT scale features

Question

Feature scaling can be crucially necessary when using distance-, variance- or gradient-based methods (KNN, PCA, neural networks...), because depending on the case, it can improve the quality of results or the computational effort.

In some cases (tree-based models in particular), scaling has no impact on the performance.

There are many discussions out there about when one should scale their features, and why they should do it. Apart from interpretability (which is not a problem as long as the scaling can be reverted), I'm wondering about the opposite: are there cases when scaling is a bad idea, i.e. can have a negative impact on model quality? or less importantly, on computation time?

score 8 · Accepted Answer · answered Dec 09 '19 at 08:52

Scaling often assumes you know the min/max or mean/standard deviation, so directly scaling features where these information is not really known, can be a bad idea.

For example, clipped signals may hide this info, so scaling them can have a negative result because you may distort its true values.

Below is an image of 1) a signal that can be scaled, and 2) a clipped signal that scaling should not be done.

score 1 · Answer 2 · answered Dec 06 '19 at 04:30

The example that comes to mind is images ; I’ve never heard of scaling pixel intensities before processing with CNN. Presumably it’s useful to maintain mean differences between the features — eg it could be signal that the top right corner is usually less red , etc .

score 1 · Answer 3 · answered Dec 06 '19 at 12:25

1

If the features are correlated, don't scale them. You can damage your data applying scaling to each feature separately. It depends on your data, problem and operator you'll be aplying.

answered Dec 06 '19 at 12:25

Piotr Rarus

854
1
5
15

score 1 · Answer 4 · answered Dec 10 '19 at 10:49

1

An immediate example is standard scaling or whitening data before a PCA. By normalizing each variance, these scalings erase the relative magnitude of the eigenvalues of the covariance matrix. Hence it defeats the purpose of a PCA.

answered Dec 10 '19 at 10:49

Learning is a mess

646
1
8
16

Grzegorz Sionkowski · Answer 5 · 2019-12-13T09:05:10.040

The majority of features, especially in physical sciences, have names, definitions, values and units (s, m, kg, etc.), not only names and values. Knowing this, it is easy to manually or even automatically create new features basing on the units. It makes no sense to add meters to seconds, but (x1^2+x2^2+x3^2)^0.5, where x1, x2, x3 are the space coordinates of the same unit is potentially a very valuable feature (distance). Scaling before the ~creative feature engineering stage successfully destroys these (hidden for many) dataset properties and decreases the chance to find new valuable features.

score 0 · Answer 6 · answered Dec 11 '19 at 10:25

In a regression problem and based on algorithm of your choice (such as multiple linear regression, or symbolic regression) you don't need to scale your data. As I examined in several problems, scaling hurts the model fit when data is scaled. However for SVM ans ANN you may need to scale your data

There is another case that one needs to select right scaling method, which is a dataset with both categorical and numerical variables.

If one uses min/max method, model may be confused to determine that 1 is for numerical or categorical feature (discrete/continuous); especially if one wants to do clustering! So the right method may be standardizing (I am working on such a problem now)

When should I NOT scale features

6 Answers6