3

Let $X,Y$ be two independent random variables with binomial distribution: $B[4n,p]$ with $p=0.5$. Let $M=\min(X,Y)$. What is the expectation of $M$?

This question seems related but it has no answer: Expectation of Minimum of Multinomially Distributed Random Variables

This question also seems related but it talks about uniformly distributed continuous variables: Expectation of the min of two independent random variables?

I tried to explicitly calculate the probability distribution of $M$, but it turned out too complicated.

Here is my current attempt; looking forward for your comments.)


Each of $X$ and $Y$ is the number of successes in $4n$ independent trials. Instead of $8n$ different trials, let's do $4n$ trials such that each trial has 4 possible outcomes: 00, 01, 10, 11, each outcome occurs with probability $0.25$. Define the random variables $Z_{00}, Z_{01}, Z_{10}, Z_{11}$ as the number of occurances of each outcome. So we have:

  • $X = Z_{11}+Z_{10}$
  • $Y = Z_{11}+Z_{01}$
  • $M = Z_{11}+\min{(Z_{10},Z_{01})}$

$E[Z_{11}]=4n/4=n$; it remains to calculate the expectation of: $\min{(Z_{10},Z_{01})}$.

If we assume that there are $2n$ trials with a single 1 (i.e. $Z_{10}+Z_{01}=2n$, which is their expected value), then $Z_{10}$ and $Z_{01}$ can be seen as related to a single random variable. I.e., there is a random variable $Z_{10}$ with distribution $B(2n,p)$, and the other variable is its complement: $Z_{01}=2n-Z_{10}$. So we have to calculate:

$$E[\min(Z_{10},2n-Z_{10})]$$

$$= \sum_{k=0}^{n} k \binom{2n}{k} p^k q^{2n-k} + \sum_{k=0}^{n-1} k \binom{2n}{2n-k} p^{2n-k} q^{k}$$

Since $p=q=0.5$, the first sum is $\frac{n}{2}$ (see Expectation of half of a binomial distribution) and the second sum is almost identical except for the $n$-th element. So we get:

$$\frac{n}{2} + \frac{n}{2} - n\binom{2n}{n}2^{-2n} = n(1-\binom{2n}{n}2^{-2n})$$

And the total expectation of $M$ is this plus $n$:

$$E[M] = n(2-\binom{2n}{n}2^{-2n})$$

By Stirling's formula, this can be simplified to:

$$E[M] \approx n(2-\frac{1}{\sqrt{\pi n}}) = 2n-\sqrt\frac{n}{\pi}$$

As a sanity check, this is indeed smaller than the expectation of $X$ and $Y$, which is $2n$.

Does this make any sense?


I now found this question in Quora: What is the expected value of the maximum of n iid normal variables? The first answer says that for two variables with distribution $N[0,1]$ the expected value is $1/\sqrt \pi$, so by symmetry the expected value of the minimum is $-1/\sqrt \pi$. A binomial variable $B[4n,p]$ can be approximated as a normal variable $N[4np,\sqrt{4npq}]$. The minimum of two such variables, by scaling, is $4np-\sqrt{4npq}/\sqrt \pi$, which happens to be exactly:

$$2n - \sqrt\frac{n}{\pi}$$

I used two very approximate and incorrect methods, and got at the same result... this implies that this result cannot be too far from the truth. But what is the truth?

1 Answers1

1

OP asks:

I used two very approximate and incorrect methods, and got at the same result... this implies that this result cannot be too far from the truth. But what is the truth?

One can easily find exact solutions with a computer algebra system ... which is helpful for the comparison you seek. In particular, if $p = \frac12$, then $X$ and $Y$ have joint pmf say $f(x,y)$:

       f = (2^-t Binomial[t, x]) ( 2^-t Binomial[t, y]);
domain[f] = {{x, 0, t}, {y, 0, t}}  &&  {t > 0}  &&  {Discrete};

... where, in your set up, $t = 4n$. One can then find exact solutions for any given value of $t$, or your $n$. In particular, $E[min(X,Y)]$ for $n$ = 1 to 10 can be calculated exactly as:

sol = Table[  Expect[Min[x, y], f /. t -> 4 n], {n, 1, 10}]

which returns:

$$\left\{\frac{93}{64},\frac{26333}{8192},\frac{10554795}{2097152},\frac{1846943453}{268435456},\frac{1202081373695}{137438953472},\frac{186920529770667}{17592186044416},\frac{56357790507521559}{4503599627370496},\frac{8307059966383480541}{576460752303423488},\frac{9629671370833818977607}{590295810358705651712},\frac{1376773263601616247805695}{75557863725914323419136}\right\}$$

where I am using the Expect function from the mathStatica add-on to Mathematica.

To a few decimal places, the above exact solution is:

{1.45313, 3.21448, 5.03292, 6.8804, 8.74629, 10.6252, 12.5139, 14.4105, 16.3133, 18.2214}

... whereas your approximate solution of $2 n-\sqrt{\frac{n}{\pi }}$ yields:

{1.43581, 3.20212, 5.02279, 6.87162, 8.73843, 10.618, 12.5073, 14.4042, 16.3074, 18.2159}

In summary: your approximate solution is not only elegant, but appears to work remarkably well! However, your stated exact solution $E[M]$ appears to be incorrect.


If you would like to solve yourself, $E[ min(X,Y)]$ can also be expressed as the following sum:

$$ \sum_{y=0}^t \text{expr} $$

where:

expr = 2^(-1-2t)Binomial[t,y] (2^t t-2 Binomial[t, 1+y] Hypergeometric2F1[1, 1-t+y, 2+y,-1] - 2 Binomial[t, 2+y] Hypergeometric2F1[2, 2-t+y, 3+y, -1])

Note:

This answer uses the mathStatica package for Mathematica. As disclosure, I should add that I am one of the authors.

wolfies
  • 5,244
  • 1
    And you forgot to mention that you are one of the developers of the mathStatica software add-on for Mathematica. – Did Feb 07 '15 at 14:22
  • I don't like to unnecessarily float my boat - but thanks for the attribution. – wolfies Feb 07 '15 at 14:38
  • 1
    You are welcome. How come I had to add the mention myself? Let me suggest that you keep the (good) habit you came to (after one or two nudges), of making explicit your interest in the marketing of the mathStatica package when you post m.se answers using it. – Did Feb 07 '15 at 17:11
  • I think it's a question of balance and contribution. If mathStatica is doing all the work, or substantively producing the answer, then I think it deserves a proper mention/attribution. By contrast, the answer here is really just a comment - but too long for a comment - and thus a somewhat reluctant answer. There is no result per se - just an affirmation or test of the quality of the OP's approximation. The contribution is minor. It is very kind of you to provide formal attribution, but I feel the latter might tend towards over-indulgence in this instance. Either way, it is appreciated. – wolfies Feb 07 '15 at 18:09
  • No, much more simply than the convoluted arguments you try to put forward would let one believe, this is a matter of honesty. – Did Feb 07 '15 at 18:13
  • 1
    @Did wolfies' relation to mathStatica is clearly stated in his profile, just one click away from the answer, so... I don't think there is any honesty issue here. – Erel Segal-Halevi Feb 08 '15 at 07:28
  • @wolfies: thanks! Is it possible to find a symbolic calculation? I found some symbolic examples in mathStatica's documentation. – Erel Segal-Halevi Feb 08 '15 at 07:43
  • @ErelSegalHalevi Hi. For given values of $n$, such as $n= 1$ or $n=2$, the calculation above yields the EXACT symbolic solution. No random sampling or Monte Carlo. For a general solution as a function of parameter $n$, the answer obtained was a mess ... many lines long with linear difference equations etc. It gives the same result for the same values of $t$ or $n$. For your checking purposes, the approach given here does the job much more simply. – wolfies Feb 08 '15 at 07:58
  • @ErelSegalHalevi See addendum at end. – wolfies Feb 08 '15 at 08:15
  • @ErelSegalHalevi You are late at a "party" going on from some time now. But no, disclosure tucked away in the author's profile, on another page, is not the same as in the answer itself. – Did Feb 08 '15 at 08:46