1

This summer, I had posted some questions on better understanding Polya's Urn (https://en.wikipedia.org/wiki/P%C3%B3lya_urn_model) problem in probability theory:

The basic Polya Urn problem is as follows:

  • Start with an urn containing $r_0$ red balls and $b_0$ blue balls. At each step $n \geq 1$:
  • Draw a ball uniformly at random
  • Return the ball to the urn
  • Add one more ball of the same color

If $(R_n, B_n)$ be the number of red and blue balls after $n$ steps. Then:

$$P(R_{n+1} = R_n + 1 | R_n, B_n) = \frac{R_n}{R_n + B_n}$$ $$P(B_{n+1} = B_n + 1 | R_n, B_n) = \frac{B_n}{R_n + B_n}$$

Supposedly, if we start with the same number of red and blue balls, the distribution of the long term ratio follow a Uniform Probability Distribution. If we start with different numbers of red and blue balls, the distribution of the long term ratio follow a Beta Probability Distribution.

I am interested in knowing what happens if there are multiple colors. For example:

Start with an urn containing $c_i^{(0)}$ balls of color $i$ for $i = > 1,\ldots,n$ colors. At each step $k \geq 1$:

  1. Draw a ball uniformly at random
  2. Return the ball to the urn
  3. Add one more ball of the same color

Mathematically, let $c_i^{(k)}$ be the number of balls of color $i$ after $k$ steps. Then:

$$P(c_i^{(k+1)} = c_i^{(k)} + 1 | c_1^{(k)},\ldots,c_n^{(k)}) = > \frac{c_i^{(k)}}{\sum_{j=1}^n c_j^{(k)}}$$

I saw some questions posted on this topic (e.g. Establishing bounds on Pólya's Urn for three colors and two draws after $N$ draws, Generalized Polya's Urn, Probability... Urn problem with multiple colors, multiple draws and a given condition), but I was looking for some more conclusive results on this.

For example, if there are more than 2 colors, it seems to hard define a ratio of colors. In this case, I thought of defining the distribution of the most popular color vs the least popular color after a certain number of simulations. I tried to do this in R (starting with an equal number of balls):

enter image description here

While a bit unorthodox, I then tried to use some Machine Learning based dimensionality reduction techniques to visualize the results from a different perspective (this uses the actual final proportional of all colors in each simulations):

enter image description here

In general, are there any mathematical results on this problem? E.g. the limiting distribution for $n$ colors is Dirichlet or Multinomial?

konofoso
  • 681
  • 1
    I believe that in "A generalized Pólya urn model and related multivariate distributions" by Inoue & Aki, the authors seem to establish that it follows what they call a "generalized trinomial distribution" (remark 4.1). They obtain an explicit formula for the (multivariate) generating function (proposition 4.1). – Kolakoski54 Nov 22 '24 at 07:18
  • thanks for the feedback! – konofoso Nov 23 '24 at 03:32

0 Answers0