4

Recently, I have been studying the Multinomial Probability Distribution

Suppose you go to a casino and there is a game that involves rolling a six-sided die (i.e. one dice). However, you are not told what is the probability that this die lands on any one of these sides - this raises your suspicions and leads you to believe that perhaps the die might not be fair, therefore it might not be worth playing this game. You are still considering whether its worth playing this game - and suddenly find out that the casino has a large screen television that displays the last $100$ numbers that came from this die. Since you know that a die follows a Multinomial Distribution, you can use this fact to estimate the probabilities of the die assuming any given number, as well as the "spread" (i.e. variance) for each of these probabilities.

  • Using the Maximum Likelihood Estimation, I have been trying to derive the formulae for the parameters of the Multinomial Probability Distribution. In short, given an event $i$ (e.g. the number $2$ on a die), the (very obvious) estimate for the probability $p_i$ of this event is $$\hat{p_{i}}_{\text{MLE}} = \frac{n_{i}}{N}$$ where the number of times that the event $n_{i}$ appears and $N$ is the total number of events that were recorded. As always, probabilities are only defined between $0$ and $1$ - therefore these individual estimates for $p_{i}$ can never be greater than $1$ or less than $0$.

  • Next, using the equivalence between the (inverse) Fisher Information and Variance, I was able to work out the formula for the "variance of these probabilities". In short, the variance of $p_{i}$ is given by $$\text{var}(\hat{p_{i}}_{\text{MLE}}) = \frac{p_{i}^{2}}{n_{i}}$$

  • Finally, using the theory of Asymptotic Normality of MLE, we can derive Confidence Intervals for the estimates of these parameter estimates (i.e. each individual value of $p_{i}$). That is, you might have observed that the probability of rolling a $2$ on this die is $0.31$ - but there is also a $95\%$ chance that the probability of rolling a $2$ might be anywhere between $(0.28, 0.33)$. We can construct a $95\%$ Confidence Interval for any of these probabilities as: $$p_{i} \pm 1.96 \cdot \left( \sqrt{\frac{p_{i}^{2}}{n_{i}}} \right)$$

Question: I am worried that for certain values of $p_{i}$ and $n_{i}$, this expression $$p_{i} \pm 1.96 \cdot \left( \sqrt{\frac{p_{i}^{2}}{n_{i}}} \right)$$ might be greater than $1$ or less than $0$.

As an example, if $p_{i} = 0.9$ and $n_{i} = 16$, this results in a range estimate for the probability exceeding $1$, i.e. $$0.9 + 1.96 \cdot \sqrt{\frac{0.9^2}{16}}$$

Have I done this correctly? Is it really possible for a probability value to be outside a range of $(0,1)$?

Thanks!

Note: I obviously think I have done something wrong, because I don't know much in math - but out of the few things I know, probabilities will never be outside the range of $[0,1]$.

FD_bfa
  • 4,757
stats_noob
  • 4,107
  • Please, please take the time to incorporate MathJax in your posts. You've been here long enough to know this. https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference – Sean Roberson Dec 17 '22 at 07:05
  • 5
    Short answer to the title question: no. And I think you may have done your confidence interval incorrectly. – Sean Roberson Dec 17 '22 at 07:06
  • 1
    @ Sean: I am learning how to do this (mathjax), I struggle a lot with this - please see the updates – stats_noob Dec 17 '22 at 07:07
  • 5
    The definition of a probability measure implies that it is nonnegative and its maximum value is 1. – Gribouillis Dec 17 '22 at 07:18
  • 7
    Your problem is that you have approximated a binomial distribution by a normal distribution but forgot that it is just an approximation. The CDF of $B(n,p)$ is rather close to the CDF of $N(np, np(1-p))$ but is not the same, and the fact that the former is exactly $0$ "below" zero and exactly $1$ "above" $n$, but the latter is not is the root cause of your confusion. –  Dec 17 '22 at 08:01
  • 5
    Also call me frequentist, but "there is $95%$ chance that the probability .. might be ... between $(0.28,0.33)$" is not correct. It actually means "the probability might be still anywhere in $[0,1]$, but if it is outside of $(0.28,0.33)$, then what I observed was a rare event, i.e. rarer than $5%$". –  Dec 17 '22 at 08:11
  • @ Gribouillis : thank you for pointing this out! – stats_noob Dec 17 '22 at 19:01
  • @ Stinking Bishop: Thank you for your reply! Is there a better approximation I should be using instead? – stats_noob Dec 17 '22 at 19:02
  • @ Stinking Bishop: Great comic/cartoon ! :) – stats_noob Dec 17 '22 at 19:02
  • The probabilities are not in fact normal distributions; this is an approximation only. Because of the nature of a normal distribution, it always assigns nonzero probability to the complement of $[0,1]$. This is an approximation only, and the fact that it's telling you that negative probabilities are in your 95% confidence interval should tell you something about how accurate the approximation is in that example. – Dark Malthorp Dec 29 '22 at 05:14

2 Answers2

4

A statistical model is the triad $(\Omega, \mathscr{A}, \mathbb{P})$ where $\Omega$ is a state space with $\omega \in \Omega$; $\mathscr{A}$ is a collection of interesting events, called $\sigma-$algebra, with $A \in \mathscr{A}$; and $\mathbb{P}$ is a probability measure such that, for each pair of events $A \subset \Omega, B \subset \Omega$ in $\mathscr{A}$

  • $\mathbb{P}(\Omega) = 1$
  • $\mathbb{P}(\emptyset) = 0$
  • $\mathbb{P}(A^C) = 1- \mathbb{P}(A)$
  • $\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A\cap B)$

Importantly, the probability measure asigns, to any event $A$, a number between zero and one, that is, $\mathbb{P}(A)\in [0,1], \quad \forall \ A \in \mathscr{A}$.

0

Direct Answer

There are two notions to consider here, measures and probability measures. As probability measure is a specific type of measure. I will define both below and then give you some commentary in relation to your question.

A measure is a function $\mu: \mathscr{F} \rightarrow [0, \infty]$ on a measurable space $(\Omega, \mathscr{F})$ which satisfies:

  1. $\mu (\emptyset) = 0$

  2. $\mu (\cup _{n=1}^{\infty} A_n) = \sum _{n=1}^{\infty} \mu (A_n)$ for every disjoint sequence of sets $(A_n)_{n \in \mathbb{N}} \subseteq \mathscr{F}$

Measures take sets in a $\sigma$-algebra and assign a value to them to give each set some notion of "size" or "measure". These can take any values between $0$ and $\infty$ and are not restricted not giving values between $0$ and $1$.

One example of a measure would be the counting measure. This measure assigns "measure" to sets based on their cardinailty. For example the measure of the set $\{A,B,C\}$ would be $3$ since there are $3$ elements.

A probability measure is a function $\mathbb{P}$ on the measurable space $(\Omega , \mathscr{F})$ which satisfies the above properties of a measure and also satisfies the additional requirement that $\mathbb{P}(\Omega) = 1$.

Therefore, any choice of probability measure will always lie between $0$ and $1$ since the measure of the entire sample space is $1$ (by definition).

Therefore, whilst it is true that a measure can take values outside of the interval $[0,1]$, this is not the case when we are dealing specficially with probability measures.

Clarifying the confusion

Now that this is clarified, we can look specifically at the confusion that you have made in your post.

Here you have made an approximation and therefore, you can't expect it to necessarily predict values in the $[0,1]$ range. The fact that this approximation is not in this range is, however, indicative of the fact that it isn't the best approximation.

Unlike the way that probability measures are defined (above), this approximation is not subject to the same constraints and therefore there is no reason why we should think that the values it suggests will be in the desired interval.

FD_bfa
  • 4,757