1

I have a dataset of cars and it has many features including 'acceleration’, ‘horsepower’, and ‘mpg'.

I am supposed to check which of these features is the most similar to a normal distribution, so I made histograms of each feature, acceleration was definitely the most visually similar.

histograms

But I am also supposed to support my answer by using a quantitative measure.

First I tried to use skew-ness measure, but can it indicate which is "most normal" if all these features measure diffrent things?

I also considered the Shapiro-Wilk test where acceleration got the closest to 0.05.But is this really an indication that it's the most similar to normal distribution.

The following are the measures I got for each feature:

acceleration: Skewness = 0.2788, Shapiro-Wilk Test - Statistic = 0.9924, p-value = 0.0399

horsepower: Skewness = 1.1062, Shapiro-Wilk Test - Statistic = 0.9024, p-value = 0.0000

mpg: Skewness = 0.4571, Shapiro-Wilk Test - Statistic = 0.9680, p-value = 0.0000

Subhash C. Davar
  • 661
  • 5
  • 20

2 Answers2

1

I can recommend trying the Epps-Pall test. It can be used to compare the theoretical and empirical function of the normal distribution.

Gromov
  • 136
  • 3
0

See : "Choosing a normality test - GraphPad Prism 10 Statistics Guide" Shapiro-Wilk test does not check for normality. Hence, it is really not an indication that it's the most similar to normal distribution.

Subhash C. Davar
  • 661
  • 5
  • 20