5

How do you show, that the expected value of a hyper-geometric random variable X with parameters $r$,$w$, and $n$ (a box contains $r$ red balls, $w$ white balls and $n$ balls are drawn without replacement) is $$\mathbf{E}(X)=\frac{rn}{r+w}?$$

I was able to reduce this to $$ \sum _{k=1}^{\infty}{r\choose k}{w\choose n-k}/{r+w\choose n} =\frac{n!(r+w-n)!r!w!}{(r+w)!k!} \sum _{k=1}^{\infty} \frac{1}{k!(r-k)!(n-k)!(w-n+k)!}. $$

No text I have seen yet proves this (and I think I know why mine doesn't either!). Any suggestions or references??

D. Chen
  • 195
cap
  • 2,493

3 Answers3

16

The linearity of expectation is the simple way to approach this problem. It is a very powerful technique that enables us to find the expectation of many random variables $X$ even when it is extremely difficult to find the distribution of $X$.

But if you really want to avoid using the linearity of expectation, it can be done in this case. The calculation will take a while: its length can be considered a proof of the fact we should use the linearity of expectation! By the general formula for expectation in the discrete case, when the distribution of $X$ is known, the expectation of the number of red balls $X$ is $$\sum_{k=0}^r kP(X=k).$$
The probability that $X=k$ is, as you know, $$\frac{\binom{r}{k}\binom{w}{n-k}}{\binom{r+w}{n}},$$ and therefore $$E(X)=\sum_{k=1}^r k \frac{\binom{r}{k}\binom{w}{n-k}}{\binom{r+w}{n}}. \tag {$\ast$} $$ We are summing from $k=1$ on because the $k=0$ term makes no contribution to the expectation, and could cause some headaches later.

We use the following result:

Lemma: The binomial coefficient $\binom{q}{p}$ is equal to $\frac{q}{p}\binom{q-1}{p-1}$.

The lemma is easy to prove, either combinatorially or by manipulation. For a manipulational proof, note that $\tfrac{q}{p}\tbinom{q-1}{p-1}=\tfrac{q}{p}\tfrac{(q-1)!}{(p-1)!(q-p)!}=\tfrac{q!}{p!(q-p)!}=\tbinom{q}{p}. \hspace{1cm}\Box$

Using the lemma, we can see that $$\binom{r+w}{n}=\frac{r+w}{n}\binom{r+w-1}{n-1}.\tag {$\ast\ast$}$$ We also need some information about $k\binom{r}{k}$. By the lemma, or otherwise, $$k\binom{r}{k}=r\binom{r-1}{k-1}.\tag {$\ast\ast\ast$}$$ Substituting the values obtained in $(\ast\ast)$ and $(\ast\ast\ast)$ for the terms in the formula $(\ast)$ for the expectation of $X$, we obtain $$E(X)=\frac{rn}{r+w}\sum_{k=1}^r \frac{\binom{r-1}{k-1}\binom{w}{n-k}}{\binom{r+w-1}{n-1}}.$$ Make the change of variable $j=k-1$. Then the above formula for $E(X)$ becomes$$E(X)=\frac{rn}{r+w}\sum_{j=0}^{r-1} \frac{\binom{r-1}{j}\binom{w}{n-j-1}}{\binom{r+w-1}{n-1}}.$$

Note that $\frac{\binom{r-1}{j}\binom{w}{n-j-1}}{\binom{r+w-1}{n-1}}$ is the probability that when you draw $n-1$ balls from an urn that contains $r-1$ red and $w$ white, you will get exactly $j$ red balls. When we sum this from $j=0$ to $r-1$, we are adding up all the probabilities, so the complicated-looking sum is equal to $1$. We conclude that $$E(X)=\frac{rn}{r+w}\sum_{j=0}^{r-1} \frac{\binom{r-1}{j}\binom{w}{n-j-1}}{\binom{r+w-1}{n-1}}=\frac{rn}{r+w}. \hspace{1cm} \Box$$

Remark: Although the linearity approach is the smoothest, there are other properties of expectation that one can use for a proof. For example, let $E(n,x,y)$ be the expected number of red balls when we draw from $x$ red and $y$ white. On the first pick, we get a red with probability $\frac{r}{r+w}$, and a white with probability $\frac{w}{w+r}$. If we get a red on the first pic, then our expected number of reds is $1$ plus the expected number of reds from the remaining picks. If we get a white, then our expected number of reds is simply the expected number of reds from the remaining picks. So we obtain $$E(n,r,w)=\frac{r}{r+w}(1+E(n-1,r-1,w))+\frac{w}{r+w}E(n-1,r,w-1).$$ Using this formula, and a simple induction on $n$, we can prove that $E(n,x,y)=\frac{nx}{x+y}$.

André Nicolas
  • 514,336
12

Use linearity of expectation. For each red ball, the probability of being drawn, and thus the expected number of successes produced by that ball, is $n/(r+w)$. By linearity of expectation, the expected number of successes is just the sum of the contributions from all $r$ red balls, i.e. $r\cdot n/(r+w)$.

joriki
  • 242,601
0

Denote by $\mathbb{E}_{(n,N,M)}[X]$ the expectation of a hypergeometric random variable with parameters $(n,N,M)$. We show that $\mathbb{E}_{(n,N,M)}[X]=n\frac{N}{N+M}$ by induction on $n$:

  • Base case ($n=1$):

    We have $\mathbb{E}_{(1,N,M)}[X]=0\mathbb{P}(X=0)+1\mathbb{P}(X=1)=\frac{N}{N+M}=n\frac{N}{N+M}$.

  • Inductive step ($n-1\to n$):

    Note that $$\mathbb{E}_{(n,N,M)}[X]=\frac{N}{N+M}(1+\mathbb{E}_{(n-1,N-1,M)}[X])+ \frac{M}{N+M}(\mathbb{E}_{(n-1,N,M-1)}[X]).$$ The inductive hypothesis implies that $$\mathbb{E}_{(n-1,N-1,M)}[X]=(n-1)\frac{N-1}{N+M-1},$$and$$\mathbb{E}_{(n-1,N,M-1)}[X]=(n-1)\frac{N}{N+M-1}.$$ Substitute these two formulas into the previous expression to obtain \begin{align*} \mathbb{E}_{(n,N,M)}[X]&=\frac{N}{N+M}\left(1+(n-1)\frac{N-1}{N+M-1}\right)+\frac{M}{N+M}(n-1)\frac{N}{N+M-1}\notag \\ &=\frac{N}{N+M}\left(\frac{(N+M-1)+(n-1)(N-1)}{N+M-1}\right)+\frac{M(n-1)N}{(N+M)(N+M-1)}\notag \\ &=\frac{N(N+M-1)+N(n-1)(N-1)+M(n-1)N}{(N+M)(N+M-1)}\notag \\ &=\frac{N(N+M-1)+N(n-1)(N-1+M)}{(N+M)(N+M-1)}\notag \\ &=\frac{N+N(n-1)}{(N+M)}=n\frac{N}{N+M}.\notag \end{align*}