finding normality from a set of samples of MEAN

Question

I have set of 1000 samples. each sample represents MEAN of X amount transactions response time.

Now I have a running transaction , I know it's current response time but I want to know if this particular transactions elapsed time is normal relative to my all previously collected sample.

is there a statistical method that is a good fit for this case ?

score 0 · Answer 1 · answered Apr 13 '17 at 20:54

You can't, from the information listed.

You say in the comments that the distribution of transaction times is Gaussian. A Gaussian distribution has two parameters: $\mu$ (its mean) and $\sigma^2$ (its variance). Based on the 1000 observed values, you can make a good estimate of what the underlying value of $\mu$ is. However, there is no way to infer $\sigma^2$, the variance. Consequently, when you observe a new transaction time, you can't tell whether it is normal or abnormal.

Suppose $\mu = 100$ and you observe a transaction whose response time is $105$. Is that abnormal? There's no way to know. If the variance of the Gaussian distribution is $0.1$, it is highly abnormal. If the variance is $10$, it is entirely typical/normal.

score 0 · Answer 2 · answered Jun 13 '17 at 00:33

Generally, data gathered from natural phenomena obeys normal distribution, I think your data should obey the normal distribution as well. The more you have data the better you may do estimation or prediction. Wikipedia says:

In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed.

There are many methods to do the test.

Simple back-of-the-envelope test takes the sample maximum and minimum and computes their z-score, or more properly t-statistic (number of sample standard deviations that a sample is above or below the sample mean), and compares it to the 68–95–99.7 rule: if one has a 3σ event (properly, a 3s event) and substantially fewer than 300 samples, or a 4s event and substantially fewer than 15,000 samples, then a normal distribution will understate the maximum magnitude of deviations in the sample data.

But if your sample data is fixed ( I mean does not change with time) then you can precompute necessary statistics and later compare and draw conclusion in $O(1)$ time. But I would update my sample data once a day or hour, and then check it visually (for example you may detect outliers) and recompute statistics on a regular base.

finding normality from a set of samples of MEAN

2 Answers2