How do you use KS-test in a data science report?

Question

I'm writing a data science report, I want to find an exist distribution to fit the sample. I got a good looking result , but when I use KS-test to test the model, I got a low p-value,1.2e-4, definitely I should reject the model.

I mean, whatever what distribution/model you use to fit the sample, you cannot expect to have a perfect result, especially working with huge amount of data. So what does KS-test do in a data science report? Does it means only if we got high p-value in KS-test then the model is correct?

Robert Long · Answer 1 · 2024-11-17T10:49:13.417

I am surprised that both answers, this and this, contain errors concerning the interpretation and meaning of p-values in a frequentist hypothesis testing framework.

Firstly, p-values are conditional probabilities. Thus, any interpretation lacking a condition is inherently flawed. Here are common mistakes, taken directly from Haller & Krauss (2002):

"The improbability of observed results being due to error.
"The probability that an observed difference is real."
"If the probability is low, the null hypothesis is improbable."
"The statistical confidence ... with odds of 95 out of 100 that the observed difference will hold up in investigations."
"The degree to which experimental results are taken 'seriously'."
"The danger of accepting a statistical result as real when it is actually due only to error."
"The degree of faith that can be placed in the reality of the finding”
"The investigator can have 95 percent confidence that the sample mean actually differs from the population mean."

Although these mistakes are understandable, they reflect questions that frequentist methods don’t directly answer, such as:

What is the probability that the null hypothesis is true ?
What is the probability that the result is....?
The p-value proves that .....

The frequentist framework does not confirm any of these statements.

As already mentioned p-values are conditional probabilities. So any statement./ definition / interpretation that does not contain wording such as:

".....given that ..", or
"....provided that......", or
"....assuming that......",

and things like that, are wrong. Even if someone does have the correct understanding, it is very easy to get tripped up by the wording, For that reason I think it is far better to work with some simple algebra instead. So, let's do a little bit of maths.

The general setup is that we have some observed data, let's call it $\mathcal{D}$. In reality, it is the test statistic which is computed from the data, rather than the data themselves. Then we need a null hypothesis, let's call that $\mathcal{H_0}$, which could be something like: the difference between the means of two populations is zero. Then, the p-value is:

$$\mathcal{P}(\mathcal{D} \mid \mathcal{H_0})$$

which represents the probability of obtaining data at least as extreme as what we observed, assuming the null hypothesis $\mathcal{H_0}$ is true. Let's just dwell on that for a few moments. This means that we assume that the null hypothesis is true, and under that condition the p-value is the probability of observing data at least as extreme as that which we did observe. And that is pretty much it. We can, of course, come up with wording that might be useful for a non-technical audience, something along the lines of the comment on the OP by @Dave:

"The p-value, loosely speaking, is the probability of getting the observations you got if the null hypothesis is true. Thus, the low p-value is evidence against the null hypothesis."

However I would advise caution because it is very easy to get tripped up by the wording which can be misleading.

For further reason I highly recommend several threads over at CrossValidated:

ASA discusses limitations of p--values - what are the alternatives?

Interpretation of p-value in hypothesis testing

Understand a statement about P value

How much do we know about p-hacking "in the wild"?

Is it wrong to refer to results as being "highly significant"?

When to use Fisher versus Neyman-Pearson framework?

Are smaller p-values more convincing?

Frequentist properties of p-values in relation to type I error

Is the exact value of a 'p-value' meaningless?

Who first used/invented p-values?

What about the "p-value" of the non-null hypothesis?

Is p-value essentially useless and dangerous to use?

References:

Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers. Methods of psychological research, 7(1), 1-20.

score 0 · Answer 2 · answered Feb 12 '21 at 19:07

In your case, the null hypothesis $H_0$ is that your sample follows your the distribution that your model has learned. The alternative hypothesis $H_1$ is that it follows some other distribution. Assuming you have fixed your significance level $\alpha$ to be $0.05$ (the most common choice for $\alpha$, but up to you if you want to go lower), getting a p-value lower than that means you should reject the null hypothesis.

The p-value can be interpreted as the probability of a type I error, in other words a false positive: the probability that you reject the null hypothesis when it is in fact true. In your case, rejecting the hypothesis means stating that there is statistically significant evidence that the distribution your model has learned is not the underlying distribution of the sample. So yes, you would like as large a p-value as possible.

You are using a Kolmogorov-Smirnov test to compare your sample to a reference distribution, in this case, so it's a one-sample KS test. The way I would put it is that getting a high p-value means that: "it is highly unlikely that your model has learned a wrong distribution". In other words, it is highly likely it has learned a pretty good approximation of the underlying distribution. However, nothing is certain when doing statistical hypothesis testing!

I'm not sure what you're showing on your plots though, since there doesn't seem to be an empirical cumulative distribution function on them (lines look smooth).

Subhash C. Davar · Answer 3 · 2023-03-17T05:03:22.283

The p-value is interpreted as the probability of a type I error. In other words a false positive: the probability that you reject the null hypothesis when it is in fact true.

You are invoking a Kolmogorov-Smirnov test.

" when I use KS-test to test the model, I got a low p-value,1.2e-4, definitely I should reject the model." Answer - Your low p-value does not indicate that the proposed model should be rejected. p value simply indicates the chance for comiting type - 1 error which is quite low in your case. The low value of p i.e. alpha implies that your model predicts very well. In nutshell, the test confirms correctness of your model.

Does it mean only if we got high p-value in KS-test then the model is correct? - "No".

How do you use KS-test in a data science report?

3 Answers3