Suppose there is a hat with $m$ red balls and $n$ blue balls. Each turn, we randomly pick a ball (each ball has equal probability of being selected) and then put it back, as well as add another ball of the same color.
I am trying to learn how to derive the probability distribution for the contents of the hat at the $t^{th}$ turn. (i.e. https://en.wikipedia.org/wiki/P%C3%B3lya_urn_model)
First, I tried to manually expand possibilities at different turns. I am using the notation $(m,n)$ to denote the number of balls, i.e. $(m,n)$ means that there are $m$ red balls and $n$ blue balls. I treated this like a "probability tree", i.e. multiply each event by the probabilities and number of ways to reach that event.
Turn 1:
- $$(m+1, n) \text{ with probability } \frac{m}{m+n}$$
- $$(m, n+1) \text{ with probability } \frac{n}{m+n}$$
Turn 2:
- $$(m+2, n) \text{ with probability } \frac{m}{m+n} \cdot \frac{m+1}{m+n+1}$$
- $$(m+1, n+1) \text{ with probability } \frac{m}{m+n} \cdot \frac{n}{m+n+1} + \frac{n}{m+n} \cdot \frac{m}{m+n+1}$$
- $$(m, n+2) \text{ with probability } \frac{n}{m+n} \cdot \frac{n+1}{m+n+1}$$
Turn 3:
$$(m+3, n) \text{ with probability } \frac{m}{m+n} \cdot \frac{m+1}{m+n+1} \cdot \frac{m+2}{m+n+2}$$
$$(m+2, n+1) \text{ with probability } \frac{m}{m+n} \cdot \frac{m+1}{m+n+1} \cdot \frac{n}{m+n+2} + \frac{m}{m+n} \cdot \frac{n}{m+n+1} \cdot \frac{m+1}{m+n+2} + \frac{n}{m+n} \cdot \frac{m}{m+n+1} \cdot \frac{m+1}{m+n+2}$$
$$(m+1, n+2) \text{ with probability } \frac{m}{m+n} \cdot \frac{n}{m+n+1} \cdot \frac{n+1}{m+n+2} + \frac{n}{m+n} \cdot \frac{m}{m+n+1} \cdot \frac{n+1}{m+n+2} + \frac{n}{m+n} \cdot \frac{n+1}{m+n+1} \cdot \frac{m}{m+n+2}$$
$$(m, n+3) \text{ with probability } \frac{n}{m+n} \cdot \frac{n+1}{m+n+1} \cdot \frac{n+2}{m+n+2}$$
Based on this, I tried to recognize a general formula for $t$ number of turns using the binomial distribution. I defined
- $m$: initial number of red balls
- $n$: initial number of blue balls
- $t$: number of turns
- $k$: number of red balls drawn from the first turn to the t-th turn, i.e. 0 ≤ $k$ ≤ $t$
First, I tried to consider the probability of a specific sequence of draws. For example, if we draw k red balls in a row and then (t-k) blue balls in a row the probability would be:
$$P(\text{specific sequence}) = \frac{m}{m+n} \cdot \frac{m+1}{m+n+1} \cdot ... \cdot \frac{m+k-1}{m+n+k-1} \cdot \frac{n}{m+n+k} \cdot \frac{n+1}{m+n+k+1} \cdot ... \cdot \frac{n+t-k-1}{m+n+t-1}$$
We can write this more compactly using product notation:
$$P(\text{specific sequence}) = \prod_{i=0}^{k-1}\frac{m+i}{m+n+i} \cdot \prod_{j=0}^{t-k-1}\frac{n+j}{m+n+k+j}$$
However, we're not interested in a specific sequence, but in the probability of ending up with k red balls drawn after t turns, regardless of the order. The number of ways to choose k items from t items is given by the binomial coefficient:
$$\binom{t}{k} = \frac{t!}{k!(t-k)!}$$
Therefore, the probability of drawing k red balls in any order out of t draws is:
$$P(k \text{ red balls in } t \text{ draws}) = \binom{t}{k} \cdot \prod_{i=0}^{k-1}\frac{m+i}{m+n+i} \cdot \prod_{j=0}^{t-k-1}\frac{n+j}{m+n+k+j}$$
This gives us the probability of ending up with (m+k, n+(t-k)) balls after t turns.
To get all possibilities for t turns, we sum this probability for all possible values of k, from 0 to t:
$$P(\text{all possibilities}) = \sum_{k=0}^t \binom{t}{k} \cdot \prod_{i=0}^{k-1}\frac{m+i}{m+n+i} \cdot \prod_{j=0}^{t-k-1}\frac{n+j}{m+n+k+j}$$
- $\sum_{k=0}^t$ : Summing over all possible numbers of red balls drawn, from 0 to t.
- $\binom{t}{k}$ : This is the number of ways to choose k red balls out of t draws.
- $\prod_{i=0}^{k-1}\frac{m+i}{m+n+i}$ : This is the probability of drawing k red balls.
- $\prod_{j=0}^{t-k-1}\frac{n+j}{m+n+k+j}$ : This is the probability of drawing (t-k) blue balls.
This above formula can be used to answer questions such as "What is the probability of $k$ red balls at the the t-th turn?" or "What is the probability of there being less than $k$ balls at the t-th turn"?
$$ P(R_t = m + k) = \binom{t}{k} \cdot \prod_{i=0}^{k-1} \frac{m + i}{m + n + i} \cdot \prod_{j=0}^{t-k-1} \frac{n + j}{m + n + k + j} $$
$$ P(R_t < m + k) = \sum_{r=0}^{k-1} \left[ \binom{t}{r} \cdot \prod_{i=0}^{r-1} \frac{m + i}{m + n + i} \cdot \prod_{j=0}^{t-r-1} \frac{n + j}{m + n + r + j} \right] $$
This concludes my work.
Is it correct?
- PS: I also started reading about the "rising factorial" as it comes up in these types of problems.
A regular factorial n! is the product of all positive integers less than or equal to n.
$$n! = n \times (n-1) \times (n-2) \times ... \times 2 \times 1$$
On the other hand, the rising factorial notation (https://en.wikipedia.org/wiki/Falling_and_rising_factorials), denoted by $x^(n)$ or $(x)_n$ is the product of n consecutive integers starting from x.
$$x^{(n)} = x \times (x+1) \times (x+2) \times ... \times (x+n-1)$$
For example: $$3^{(4)} = 3 \times 4 \times 5 \times 6 = 360$$
Here, we did a factorial from $3$ to $[(3+4)-1]$, i.e. 3 to 6.
Going back to the original formula, we can write:
$$ \frac{(m)_k}{(m+n)_k} = \frac{m (m + 1) (m + 2) \cdots (m + k - 1)}{(m + n) (m + n + (m + n + 2) \cdots (m + n + k - 1)} $$
$$ \frac{(n)_{t-k}}{(m+n+k)_{t-k}} = \frac{n (n + 1) (n + 2) \cdots (n + (t - k) - 1)}{(m + n + k) (m + n + k + 1) (m + n + k + 2) \cdots (m + n + k + (t - k) - 1)} $$
And putting everything together:
$$P(\text{all possibilities}) = \sum_{k=0}^t \binom{t}{k} \cdot \frac{(m)_k}{(m+n)_k} \cdot \frac{(n)_{t-k}}{(m+n+k)_{t-k}}$$