7

I am trying to find an approximate function to my problem. Consider a coin tossing scenario as described in this article: https://www.jstor.org/stable/2690002?seq=1

Basically we toss n coins until all are heads and ask what is the average number of tries (please, see comments for simple explanation).

We did this experiment ourselves and our equation goes in hand with findings in the above-mentioned example (with slight discrepancies due to domain specifics of our problem).

The final equation is: $h(n) = 1+ \frac{1+\sum_{k=1}^{n-1}\binom{n-1}{k-1}h(k)}{2^{n-1}-1}$

Now this equation turns out to be computational heavy. We have managed to calculate approx. first 8000 values.

The values go like this:

$$ \begin{array}{|c|c|c|} \hline n & \text{dec} \approx & \text{frac} \\ \hline 1 & 0 & \frac{0}{1} \\ 2 & 2 & \frac{2}{1} \\ 3 & 2.667 & \frac{8}{3} \\ 4 & 3.14286 & \frac{22}{7} \\ 5 & 3.505 & \frac{368}{105} \\ 6 & 3.794 & \frac{2470}{651} \\ 7 & 4.0348 & \frac{7880}{1953} \\ 8 & 4.24 & \frac{150266}{35433} \\ 9 & 4.4210 & \frac{13315424}{3011805} \\ 10 & 4.581 & \frac{2350261538}{513010785} \\ 11 & 4.725 & \frac{1777792792}{376207909} \\ 12 & 4.856 & \frac{340013628538}{70008871793} \\ ... & ... & ... \\ 1024 & 11.332 & ... \\ 2048 & 12.3324 & ... \\ 4096 & 13.3326 & ... \\ 8192 & 14.3327 & ... \\ ... & ... & ... \\ \hline \end{array} $$

Now for the sake of larger $n$s and also the computational speed we need to find some approximation of this function.

By using Least Square Method we think this function could be something like:

$$ \lim_{n \rightarrow \infty} h(n) - \log_{2}n \le C \approx 1.333 $$

We need to proof that this does not go to infinity (there is a C that is an upper bound).

Can you help us provide some method/book so we know what problem do we face here?


We also know, that numerators alone and denominators alone are in the Encyclopedia of Integer Sequences as numerators/denominators.

Gary
  • 36,640
chazecka
  • 123
  • 1
    How the average number of tries can be 0 with $n=1$ ? – Christophe Boilley Jan 15 '25 at 10:14
  • @ChristopheBoilley great question! In our example we have a reference coin (lets say Heads) and we ask: how many times do we have to toss until all are different from the reference (HTT). That's why the +1. When there is only one reference coin, it is automatically different from the rest, so the +1. – chazecka Jan 15 '25 at 11:46
  • I don’t understand. Do you toss this reference coin the first time? – Christophe Boilley Jan 15 '25 at 11:54
  • Yes. Suppose you have 5 coins. You mark one as a reference and toss with all five of them. You then check the reference one (lets say it was Heads) and you remove all the Tails (e.g. 2) and go again with the non removed coins (3 now in total). And I ask, how many times do I toss (on average) until only reference coin is left (the rest differs from the reference). The answer is $h(5)$. – chazecka Jan 15 '25 at 12:03
  • so $h(1)$ is I am left with only the reference coin. I don't have to toss. That's why $h(1) = 0$. – chazecka Jan 15 '25 at 12:05
  • 1
    You should clarify it in your question. – Christophe Boilley Jan 15 '25 at 12:08

4 Answers4

6

Using $$ h(n)=1+\sum _{k=1}^n \frac{(-1)^{k+1}}{2^k-1}\binom{n}{k} $$ from Claude Leibovici's answer and substituting $$ \frac{1}{2^k-1}=\sum_{m=0}^\infty 2^{-k (m+1)}, $$ we can interchange the order of summation to obtain $$ h(n) = 1 + \sum_{m = 0}^\infty \left[ 1 - \left( 1 - \frac{1}{2^{m + 1} } \right)^n \right]. $$ From the monotonicity of the summands, we can infer that \begin{align*} 1 + \int_0^{ + \infty } {\left[ {1 - \left( {1 - \frac{1}{{2^{x + 1} }}} \right)^n } \right]{\rm d}x} \le h(n) \le 1 &+ \int_0^{ + \infty } {\left[ {1 - \left( {1 - \frac{1}{{2^{x + 1} }}} \right)^n } \right]{\rm d}x} \\ &+ \left[ {1 - \left( {1 - \frac{1}{{2^{0 + 1} }}} \right)^n } \right]. \end{align*} Hence, $$ h(n) = \int_0^{ + \infty } {\left[ {1 - \left( {1 - \frac{1}{{2^{x + 1} }}} \right)^n } \right]{\rm d}x} + \mathcal{O}(1) $$ With the substitution $t = 1 - \frac{1}{{2^{x + 1} }}$, we can rewrite the integral as $$ h(n) = \frac{1}{{\log 2}}\int_{1/2}^1 {\frac{{t^n - 1}}{{t - 1}}{\rm d}t} + \mathcal{O}(1) = \frac{1}{{\log 2}}\int_0^1 {\frac{{t^n - 1}}{{t - 1}}{\rm d}t} + \mathcal{O}(1). $$ The second integral is exactly the $n^{\text{th}}$ harmonic number, so we find $$ h(n) = \frac{1}{{\log 2}}\sum\limits_{k = 1}^n {\frac{1}{k}} + \mathcal{O}(1) = \frac{{\log n}}{{\log 2}} + \mathcal{O}(1) = \log _2 \!n + \mathcal{O}(1) $$ as $n\to+\infty$.

Remark. Using the Euler–Maclaurin formula along with a more accurate approximation for the harmonic numbers, and the observation $$ \frac{1}{{\log 2}}\int_0^{1/2} {\frac{{t^n - 1}}{{t - 1}}{\rm d}t} = 1 + \frac{1}{{\log 2}}\int_0^{1/2} {\frac{{t^n }}{{t - 1}}{\rm d}t} = 1 + \mathcal{O}\!\left( {\frac{1}{n}} \right), $$ we obtain the asymptotic expression $$ h(n) = \log _2 \!n + \frac{1}{2} + \frac{\gamma }{{\log 2}} + \mathcal{O}\!\left( {\frac{1}{n}} \right) $$ as $n\to+\infty$, where $\gamma$ denotes the Euler–Mascheroni constant. The value of the constant is $1.332746177\ldots\,$.

Addendum. Another answers may be found here.

Gary
  • 36,640
  • 1
    +1. I guess we can directly provide the expression for $h(n)$ as follows. If I understood OP's comment right then given natural $n$, for each nonnegative integer $k$ the probability $P_k$ that we have to toss at least $k$ times is$\left(1-2^{-k}\right)^{n-1}$ (but the case $k=n-1=0$, when $P_k=1$). Then the required expectation$$h(n)=\sum_{k=1}^\infty k(P_k-P_{k-1})=\lim_{m\to\infty} mP_m-\sum_{k=0}^{m-1} P_k.$$ These values of $h(n)$ seem to fit to OP's table. – Alex Ravsky Jan 17 '25 at 15:27
4

This is not an answer but (I hope) it could help.

With $h_1=0$, using your definition of $h_n$, it could be simpler and faster to compute it as $$h_{n}=\sum _{k=1}^n (-1)^{k+1}\,\frac{2^k}{2^k-1}\binom{n}{k}=1+\sum _{k=1}^n (-1)^{k+1}\,\frac{1}{2^k-1}\binom{n}{k}$$ and probably to use that $$\frac{1}{2^k-1}=\sum_{m=0}^\infty 2^{-k (m+1)}$$

What looks interesting is that $$n\big(h_n-\log_2(n)\big) \sim a + b\,n$$

$$\begin{array}{l|lll} \text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\ \hline a & 0.7188497 & 2.0856\times 10^{-4} & \{0.7184405,0.7192590\} \\ b & 1.3327493 & 3.6097\times 10^{-7} & \{1.3327485,1.3327500\} \\ \end{array}$$ and $$n\big(h_n-\log_2(n)\big) < (2n+1)\log(2)$$

Edit (for the art of the art's sake)

After @Gary'answer, it is possible to go further with respect to the proposed left bound \begin{align*} J_n=1 + \int_0^{ + \infty } {\left[ {1 - \left( {1 - \frac{1}{{2^{x + 1} }}} \right)^n } \right]\,dx} \le h(n) \end{align*} $$J_n=\frac{H_n}{\log (2)}+\frac{2^{-(n+1)}}{\log (2)}\Bigg(\frac {1}{n+1}+\frac{1}{2} \Phi\left(\frac{1}{2},1,n+2\right) \Bigg)$$ where $\Phi(.)$ is the Hurwitz-Lerch transcendent function. Asymptotically $$\frac{1}{n+1}+\frac{1}{2} \Phi\left(\frac{1}{2},1,n+2\right) \sim \frac{2}{n}-\frac{3}{n^2}-\frac{3}{n^3}+O\left(\frac{1}{n^4}\right)$$

$\frac{H_n}{\log (2)}$ represents more than $99.9$% of $J_n$ as soon as $n>5$.

  • This is perfect, everything seems to work for us. Can you provide some math theory/book/wiki so we can replicate? – chazecka Jan 16 '25 at 10:33
  • 2
    @chazecka. Replicate what ? By the way, this is a nice problem and (+1). Cheers :-) – Claude Leibovici Jan 16 '25 at 11:13
  • I appreciate your interest! I can see that the first equation comes from the original article (here, I am unsure if this equation would be less computationally heavy). And in the third equation, you say it approximately equals a + bn; where do these a, b and their estimations come from? – chazecka Jan 16 '25 at 12:21
  • @chazecka. A plot and a quick and dirty linear regression with $R^2=0.99999999\cdots$ and aproximate numbers. The calculations are much faster. I did not access the paper (56 € for a pdf is expensive). Have you a link for a free access to it ? – Claude Leibovici Jan 16 '25 at 12:40
  • @chazecka. It is incredibly faster – Claude Leibovici Jan 16 '25 at 12:55
  • A non-simplified version can also be efficient when utilizing caching. However, when attempting to implement your version in R and Julia, we encountered difficulties for values of n greater than 1000. The challenge arises from the fact that regression does not constitute a formal proof. – chazecka Jan 16 '25 at 14:27
  • 2
    @chazecka. I never claimed that this was a proof – Claude Leibovici Jan 16 '25 at 14:37
  • 2
    @ClaudeLeibovici The precise asymptotics is $$h(n) = \log _2 ! n + \frac{1}{2} + \frac{\gamma }{\log 2} + \mathcal{O}!\left( \frac{1}{n}\right). $$ See my answer. – Gary Jan 17 '25 at 14:17
0

The following is not a complete answer, but might help you with precise control of errors.

The sum $\sum_{k=1}^{n-1}\binom{n-1}{k-1}\log_2(k)$ is less than $2^{n-1}\mathbf E(\log_2(X+1))$ where $X$ is binomial with parameters $n-1$ and $\frac 12$. Take an approximation by a normal distribution with mean $\frac{n-1}{2}$ and variance $\frac{n-1}{4}$ in order to compute $\displaystyle\int_0^{+\infty}\log(x+1)\sqrt{\frac{2}{\pi(n-1)}}\exp\left(\frac{-2(x-\frac{n-1}2)^2}{n-1}\right)\mathrm dx$

Shift the variable $x$ by $\frac{n-1}2$ to get the upper bound $\log_2(n)-1+\displaystyle\int_{\frac{1-n}{2}}^{+\infty}\log\left(1+\frac{2x+1}n\right)\sqrt{\frac{2}{\pi(n-1)}}\exp\left(\frac{-2x^2}{n-1}\right)\mathrm dx$.

You can then discard the additive 1 at the beginning of your formula, and get the log part of your upper bound (the constant part is straightforward). You still have to control the error in the approximation of distribution and the last integral.

0

Gary's answer is complete and very nice, I just wanted to add maybe an intuitive reason, why this should be (not proof).

First I want to clarify that the we want to find the average number of tosses before we roll heads on each of n coins at least once. (We also get rid of the "of by one" shift.)

Now using the law of large numbers we get that (roughly) half of the coins will be tails on a flip, leading to $$ h(n) = 1 + h \left( \frac{n}{2} \right) = k + h \left( \frac{n}{2^k} \right) = \log_2 (n) + h(1) $$ And we can calculate quite simply (as you did) that $$ h(1) = \sum_{i=1}^\infty{\frac{i}{2^i}} = 2 $$ Hence $h(n) \approx \log_2(n) + 2$