9

Let's say I have two normal distributions with means $\mu_1$, $\mu_2$ and standard deviations $\sigma_1$, $\sigma_2$ (which I know). I am handed a random variate from one of the distributions (I don't know which). What is the likelihood that my variate belongs to distribution 1 and not distribution 2?

UPDATE: a concrete example. Machine one generates normally-distributed variates with mean 1053 and standard deviation 59. Machine two generates normally-distributed variates with mean 1187 and standard deviation 73. One of them is picked at random, the handle is turned (unseen by me) and the number 1162.4 comes out. What is the likelihood that number was generated by machine 1 as opposed to machine 2?

David G
  • 365
  • Do you have a prior or something on the likelihood of the observations? – Batman Jun 08 '14 at 04:21
  • You can probably use a Bayesian approach assuming that there is equal probability for each set prior to the observation. – Avraham Jun 08 '14 at 05:01

1 Answers1

9

You can use a Bayesian approach and compute the odds ratio. Let $\Theta_1 = (\mu_1, \sigma_1) = (1053, 59)$ and similarly for $\Theta_2$. Right now, we will assume that there is equal probability of coming from either distribution, so $$ \frac{P(\Theta_1)}{P(\Theta_2)} = 1 $$ What we can calculate is $$ \frac{P(\Theta_1|D)}{P(\Theta_2|D)} $$ where $D$ is the new data point. $$ \begin{align} P(\Theta_1|D) &= \frac{P(D|\Theta_1)\;P(\Theta_1)}{P(D)}\\ P(\Theta_2|D) &= \frac{P(D|\Theta_2)\;P(\Theta_2)}{P(D)}\\ \frac{P(\Theta_1|D)}{P(\Theta_2|D)} &= \frac{P(D|\Theta_1)\;P(\Theta_1)}{P(D)}\cdot\frac{P(D)}{P(D|\Theta_2)\;P(\Theta_2)}\\ &= \frac{P(D|\Theta_1)\;P(\Theta_1)}{P(D|\Theta_2)\;P(\Theta_2)} \end{align} $$ Plugging in the numbers we get: $$ \large P(D|\Theta_1) = \frac{1}{59\sqrt{2\pi}}e^{-\frac{\left(1162.4 - 1053\right)^2}{2\cdot 59^2}} \approx 0.00121189\\ \large P(D|\Theta_2) = \frac{1}{73\sqrt{2\pi}}e^{-\frac{\left(1162.4 - 1187\right)^2}{2\cdot 73^2}} \approx 0.005163308\\ \large P(\Theta_1) = P(\Theta_2) = \frac{1}{2}\;\textrm{so they cancel}\\ \large \frac{P(\Theta_1|D)}{P(\Theta_2|D)} \approx \frac{ 0.00121189}{0.005163308} $$ So the odds ratio is now not $1:1$ but closer to $1:4.26$ so it is a bit more than 4 times as likely that the second machine was used.

Avraham
  • 3,335
  • 2
    So in other words, it's simply the ratio of the PDFs of the distributions at the point in question. The Bayesian proof is a nice confirmation. – David G Jun 08 '14 at 06:15
  • Thank you for the answer. I wonder what would happen if there were more than 2 machines? How would you then find the most likely one? – Dmitry Kamenetsky Nov 29 '18 at 23:41
  • 1
    @DmitryKamenetsky , in that case, pick the distribution with the highest PDF value at the point in question. – Sam Mar 25 '20 at 02:48
  • Hi, I can't understand where does the π and e come from, could you explain? – Jonas Palačionis Jun 27 '22 at 07:13
  • 1
    The likelihood of the normal distribution: $f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$ – Avraham Jun 27 '22 at 08:24