3

Suppose there is a hat with $m$ red balls and $n$ blue balls. Each turn, we randomly pick a ball (each ball has equal probability of being selected) and then put it back, as well as add another ball of the same color.

I am trying to learn how to derive the probability distribution for the contents of the hat at the $t^{th}$ turn. (i.e. https://en.wikipedia.org/wiki/P%C3%B3lya_urn_model)

First, I tried to manually expand possibilities at different turns. I am using the notation $(m,n)$ to denote the number of balls, i.e. $(m,n)$ means that there are $m$ red balls and $n$ blue balls. I treated this like a "probability tree", i.e. multiply each event by the probabilities and number of ways to reach that event.

Turn 1:

  1. $$(m+1, n) \text{ with probability } \frac{m}{m+n}$$
  2. $$(m, n+1) \text{ with probability } \frac{n}{m+n}$$

Turn 2:

  1. $$(m+2, n) \text{ with probability } \frac{m}{m+n} \cdot \frac{m+1}{m+n+1}$$
  2. $$(m+1, n+1) \text{ with probability } \frac{m}{m+n} \cdot \frac{n}{m+n+1} + \frac{n}{m+n} \cdot \frac{m}{m+n+1}$$
  3. $$(m, n+2) \text{ with probability } \frac{n}{m+n} \cdot \frac{n+1}{m+n+1}$$

Turn 3:

  1. $$(m+3, n) \text{ with probability } \frac{m}{m+n} \cdot \frac{m+1}{m+n+1} \cdot \frac{m+2}{m+n+2}$$

  2. $$(m+2, n+1) \text{ with probability } \frac{m}{m+n} \cdot \frac{m+1}{m+n+1} \cdot \frac{n}{m+n+2} + \frac{m}{m+n} \cdot \frac{n}{m+n+1} \cdot \frac{m+1}{m+n+2} + \frac{n}{m+n} \cdot \frac{m}{m+n+1} \cdot \frac{m+1}{m+n+2}$$

  3. $$(m+1, n+2) \text{ with probability } \frac{m}{m+n} \cdot \frac{n}{m+n+1} \cdot \frac{n+1}{m+n+2} + \frac{n}{m+n} \cdot \frac{m}{m+n+1} \cdot \frac{n+1}{m+n+2} + \frac{n}{m+n} \cdot \frac{n+1}{m+n+1} \cdot \frac{m}{m+n+2}$$

  4. $$(m, n+3) \text{ with probability } \frac{n}{m+n} \cdot \frac{n+1}{m+n+1} \cdot \frac{n+2}{m+n+2}$$

Based on this, I tried to recognize a general formula for $t$ number of turns using the binomial distribution. I defined

  • $m$: initial number of red balls
  • $n$: initial number of blue balls
  • $t$: number of turns
  • $k$: number of red balls drawn from the first turn to the t-th turn, i.e. 0 ≤ $k$$t$
  1. First, I tried to consider the probability of a specific sequence of draws. For example, if we draw k red balls in a row and then (t-k) blue balls in a row the probability would be:

    $$P(\text{specific sequence}) = \frac{m}{m+n} \cdot \frac{m+1}{m+n+1} \cdot ... \cdot \frac{m+k-1}{m+n+k-1} \cdot \frac{n}{m+n+k} \cdot \frac{n+1}{m+n+k+1} \cdot ... \cdot \frac{n+t-k-1}{m+n+t-1}$$

  2. We can write this more compactly using product notation:

    $$P(\text{specific sequence}) = \prod_{i=0}^{k-1}\frac{m+i}{m+n+i} \cdot \prod_{j=0}^{t-k-1}\frac{n+j}{m+n+k+j}$$

  3. However, we're not interested in a specific sequence, but in the probability of ending up with k red balls drawn after t turns, regardless of the order. The number of ways to choose k items from t items is given by the binomial coefficient:

    $$\binom{t}{k} = \frac{t!}{k!(t-k)!}$$

  4. Therefore, the probability of drawing k red balls in any order out of t draws is:

    $$P(k \text{ red balls in } t \text{ draws}) = \binom{t}{k} \cdot \prod_{i=0}^{k-1}\frac{m+i}{m+n+i} \cdot \prod_{j=0}^{t-k-1}\frac{n+j}{m+n+k+j}$$

  5. This gives us the probability of ending up with (m+k, n+(t-k)) balls after t turns.

  6. To get all possibilities for t turns, we sum this probability for all possible values of k, from 0 to t:

    $$P(\text{all possibilities}) = \sum_{k=0}^t \binom{t}{k} \cdot \prod_{i=0}^{k-1}\frac{m+i}{m+n+i} \cdot \prod_{j=0}^{t-k-1}\frac{n+j}{m+n+k+j}$$

  • $\sum_{k=0}^t$ : Summing over all possible numbers of red balls drawn, from 0 to t.
  • $\binom{t}{k}$ : This is the number of ways to choose k red balls out of t draws.
  • $\prod_{i=0}^{k-1}\frac{m+i}{m+n+i}$ : This is the probability of drawing k red balls.
  • $\prod_{j=0}^{t-k-1}\frac{n+j}{m+n+k+j}$ : This is the probability of drawing (t-k) blue balls.

This above formula can be used to answer questions such as "What is the probability of $k$ red balls at the the t-th turn?" or "What is the probability of there being less than $k$ balls at the t-th turn"?

$$ P(R_t = m + k) = \binom{t}{k} \cdot \prod_{i=0}^{k-1} \frac{m + i}{m + n + i} \cdot \prod_{j=0}^{t-k-1} \frac{n + j}{m + n + k + j} $$

$$ P(R_t < m + k) = \sum_{r=0}^{k-1} \left[ \binom{t}{r} \cdot \prod_{i=0}^{r-1} \frac{m + i}{m + n + i} \cdot \prod_{j=0}^{t-r-1} \frac{n + j}{m + n + r + j} \right] $$

This concludes my work.

Is it correct?

  • PS: I also started reading about the "rising factorial" as it comes up in these types of problems.

A regular factorial n! is the product of all positive integers less than or equal to n.

$$n! = n \times (n-1) \times (n-2) \times ... \times 2 \times 1$$

On the other hand, the rising factorial notation (https://en.wikipedia.org/wiki/Falling_and_rising_factorials), denoted by $x^(n)$ or $(x)_n$ is the product of n consecutive integers starting from x.

$$x^{(n)} = x \times (x+1) \times (x+2) \times ... \times (x+n-1)$$

For example: $$3^{(4)} = 3 \times 4 \times 5 \times 6 = 360$$

Here, we did a factorial from $3$ to $[(3+4)-1]$, i.e. 3 to 6.

Going back to the original formula, we can write:

$$ \frac{(m)_k}{(m+n)_k} = \frac{m (m + 1) (m + 2) \cdots (m + k - 1)}{(m + n) (m + n + (m + n + 2) \cdots (m + n + k - 1)} $$

$$ \frac{(n)_{t-k}}{(m+n+k)_{t-k}} = \frac{n (n + 1) (n + 2) \cdots (n + (t - k) - 1)}{(m + n + k) (m + n + k + 1) (m + n + k + 2) \cdots (m + n + k + (t - k) - 1)} $$

And putting everything together:

$$P(\text{all possibilities}) = \sum_{k=0}^t \binom{t}{k} \cdot \frac{(m)_k}{(m+n)_k} \cdot \frac{(n)_{t-k}}{(m+n+k)_{t-k}}$$

konofoso
  • 681

1 Answers1

3

I have not checked it all but what you have written seems sensible. I think there may be a faster and tidier way:

  1. Drawing a red ball followed by a blue ball has the same probability as drawing a blue ball followed by a red ball since $\frac{m}{m+n}\times \frac{n}{(m+1)+n} = \frac{n}{m+n}\times \frac{m}{m+(n+1)}$

  2. So any particular order of drawing $k$ red balls and $t-k$ blue balls has the same probability of a different order, and each order has probability $\frac{m \times (m+1)\times \cdots (m+k-1) \times n \times (n+1) \times \cdots \times (n+t-k-1)}{(m+n) \times (m+n+1) \times \cdots \times (m+n+t-1)}$ $=\dfrac{\frac{(m+k-1)!}{(m-1)!} \times \frac{(n+t-k-1)!}{(n-1)!}}{\frac{(m+n+t-1)!}{(m+n-1)!}}$ [you might see rising factorials in there but I am going to end up with combinations]

  3. There are ${t\choose k}=\frac{t!}{k!\,(t-k)!}$ equally probable orders of drawing $k$ red balls and $t-k$ blue balls

  4. So the overall probability of drawing $k$ red balls and $t-k$ blue balls can be found in by multiplying the values in 2 and 3 together, giving $$\frac{{m+k-1 \choose k}{n+t-k-1 \choose t-k}}{{m+n+t-1 \choose t}}.$$

  5. Summing these over $k$ with $k \in \{0,1,\ldots,t\}$ will give $1$ since that would cover all possibilities.

As a check, if you substitute $m=n=1$, you get the probability in 4 as $\frac1{t+1}$ not varying with $k$, which is the surprising but well-known solution to the original Polya Urn problem.

Henry
  • 169,616