7

I am working my way through Normalization (data transformation) of data and was curious about four methods:

  1. min-max normalization, 2. z-score, 3. z-score mean absolute deviation, and 4. decimal scaling.

I am reading through a book so this is difficult to understand but it seems to me that the first three normalization methods output to a value range between 0 and 1 and the last with a range of -1 to 1.

Am I understanding this correctly or is the range of values different?

Reference: Data Mining Concepts and Techniques In the book it mentions:

To help avoid dependence on the choice of measurement units, the data should be normalized. This involves transforming the data to fall within a smaller or common range such as [-1,1] or [0.0-1.0].

As you can see it says "common range" so I am not sure if that means what i mentioned above for the different methods or if it can actually be "anything"

Kairan
  • 173
  • Maybe you can add a reference to which book you mean. Also, the question is hard to follow as currently written, maybe you can add details about what normalization you mean. Regards – Amzoti Apr 16 '13 at 00:49
  • @Amzoti I have done so. – Kairan Apr 16 '13 at 00:55

2 Answers2

7

Min-Max-Scaling means that one linearly transforms real data values such that the minimum and the maximum of the transformed data take certain values -- frequently 0 and 1 or -1 and 1. This depends on the context. For example the formula

$ x^\prime := (x-x_{\min})/(x_{\max} -x_{\min} ) $

does the job for the values 0 and 1. Here $x_{\min}$ is the minimal data value appearing and similarly $x_{\max}$.

The z-score linearly transforms the data in such a way, that the mean value of the transformed data equals 0 while their standard deviation equals 1. The transformed values themselves do not lie in a particular interval like [0,1] or so. The transformation formula thus is:

$ x^\prime := (x-\overline{x})/s $

where $\overline{x}$ denotes the mean value of the data and $s$ its standard deviation.

Hagen Knaf
  • 9,387
1

I am working on this problem as well for my data mining class.

range for min-max is [new min, new max] or commonly [0.0, 1.0] or [ -1.0, 1.0 ].

range for z-score using std dev is [ - infinity, infinity ] although it is very unlikely to get extreme values.

range for z-score using mean absolute deviation should be the same as the other z-score.

range for decimal scaling is [ -1, 1 ].

This is what I have answered, and I think I got it right, but I would have troubles proving it for the z-score.

sami
  • 11