How to win Matt Parker's jackpot - finding the median of the following distribution

Question

In a recent video the legendary Matt Parker claimed he kept flipping a two-sided (fair) coin untill he scored a sequence of ten consecutive 'switch flips', i.e. letting $T$ denote a tail and $H$ a head, then a sequence of ten switch flips is defined to be either $THTHTHTHTH$ or $HTHTHTHTHT$. He set up a contest and allowed each viewer to guess once at the exact amount of flips he needed to obtain such a sequence. The ten viewers with the ten closest answers would be awarded a prize.

The contest is over, so there is no incentive to keep a solution to the following problem to yourself. What is the best number to bet? Of course this somehow depends on how other viewers answer: you are more likely to win if your bet is not close to many other bets, so if a large number of viewers is mathematically inclined and bets the same number - say 1023 - then it no longer is the best (profit maximizing) bet. I've therefore simplified to the following question: let $X$ be the stochastic variable representing the number of flips needed untill a sequence of ten consecutive switch flips if obtained, then for which number $a \in \mathbb{N}$ does the expected value of the (absolute value of the) error $$ \mathbb{E}[\vert X - a \vert] $$ reach its minimal value? It is well-known that $a$ is the median of the (distribution of) $X$, but how can one compute it? Numerical approximations are welcome, theoretical (generalizable) results are preferred.

I found a way to compute the expected value of $X$ itself for general $n$ (i.e. the total number of coin flips needed to get a sequence of the form $THTHTHTH...$ or $HTHTHTHT...$ of length $n$). Let $\mathbb{E}_i$ denote the expected number of coin flips needed to get a desired sequence of length $n$, assuming we already have a sequence of length $i \in \mathbb{N}$. We immediately find $$ \mathbb{E}_0 = 1 + \mathbb{E}_1 $$ since we are certain to have a sequence of length $1$ after one flip. Furthermore, for $1 \leq i \leq n-1$ $$ \mathbb{E}_i = \frac{1}{2}\left(\mathbb{E}_1 + 1\right) + \frac{1}{2}\left(\mathbb{E}_{i+1} + 1\right) $$ since, given a sequence of $i$ flips which ends, say, on a tail, we have a $\frac{1}{2}$ chance to increase this to a sequence of $i+1$ flips (if we get, say, a head) and a $\frac{1}{2}$ chance to get back where we started, at $1$ flip. Using that $\mathbb{E}_n = 0$ the above gives us system of $n$ equations in the $n$ variables $\mathbb{E}_0, \ldots, \mathbb{E}_{n-1}$. One can easily check that the unique solution is given by $$ \mathbb{E}_i = 2^{n} - 2^{i} \quad 0 \leq i \leq n $$ Since Matt Parker started at $0$ and wanted to get $10$ flips, the expected value of the number of flips needed is $2^{10} - 1 = 1023$ and this should be a reasonable bet.

Does anyone know how to find the distribution of $X$ (or directly the median of $X$)? Like I said, analytical solutions are of course preferred, but any kind of method - even requiring numerical computations, but preferably not Monte Carlo simulations - would be interesting to me.

EDIT: I found out that the problem can be reduced to a combinatorial problem. Indeed, we have that $$ P(X \leq k) = \frac{\# \lbrace \text{sequences of length $k$ which contain a desired subsequence of length $n$}\rbrace}{2^k} $$ where $2^k$ is the total number of sequences of length $k$, since every sequence of length $k$ is equally likely to occur. Let $S_k$ be the set of sequences of length $k$ of $0$'s and $1$'s (we identify tails with $0$ and heads with $1$). We have a map $$ f: S_k \to S_{k-1} $$ where, for any sequence $s \in S_k$, the $i$th element of $f(s)$ is $1$ if $s(i) \neq s(i+1)$ and $0$ is $s(i) = s(i+1)$. This map makes the desired sequences in $S_k$ correspond bijectively with the sequences in $S_{k-1}$ which contain $n-1$ zeroes in a row. Hence, it suffices to count the number of sequences of a given length $k-1$ which contain $n-1$ zeroes in a row. Any ideas on how to continue?

Thanks for the bounty -- I'd written most of my answer this morning before you set the bounty, I just hadn't gotten around to finishing and posting it yet :-) — joriki, Apr 12 '16 at 13:04

score 4 · Answer 1 · edited Apr 13 '17 at 12:21

4

The probability to get $10$ consecutive switch flips can be modelled as a Markov chain, with the states corresponding to the number of consecutive switch flips in the immediate past. The state in which $10$ consecutive switch flips have been encountered is absorbing. The stationary distribution with eigenvalue $1$ has probability $1$ at this absorbing state. There is a quasi-stationary distribution in which each state has half the probability of the one with one less switch flip and probability "leaks out" into the absorbing state at a rate of roughly $2^{-10}$. The median is the least integer $a$ for which $\textsf P(X\le a)\gt\frac12$. The time it takes for $\frac12$ to leak out is roughly given by $a\cdot2^{-10}=\log2$, or $a=2^{10}\log2\approx 710$.

Here's Java code that follows the evolution of the distribution of the process until the probability for $10$ switch flips exceeds $\frac12$. The median turns out to be $712$. The precision of the above estimate is in part gratuitous, since the process takes $10$ steps to equilibrate before probability starts leaking into the absorbing state. Nevertheless, the agreement shows that the quasi-stationary model is quite good, so $P(X\le x)\approx1-\left(1-2^{-10}\right)^t\approx1-\mathrm e^{-2^{-10}t}$ should be a good approximation.

Regarding the game-theoretic aspect of the problem, this is related to the ice-cream vendor problem, where the flip numbers are mapped to the beach using the cumulative distribution function $P(X\le x)$. However, the conclusion in the continuous case that there is no equilibrium for $3$ players doesn't go through in the discrete case.

edited Apr 13 '17 at 12:21

Community

1

answered Apr 12 '16 at 13:03

joriki

242,601

I'm not familiar with Markov chains (I know what they are but that's about it). I'll take some time this week to look up some concepte to make sure I get what you're saying. – Bib-lost Apr 12 '16 at 13:32
@Bib-lost: You don't need any of the advanced concepts for Markov chains here (recurrence / transience / reducibility / ...) -- just think of it as a vector of probabilities that's transformed into a new vector in each step, with half the probability in each entry advancing to the next entry and the other half going back to the first entry, except the last ($10$-th) entry, which represents the "success probability" retains its probability (and gets half of the probability of the previous entry in addition). – joriki Apr 12 '16 at 13:39
Okay so in every step we're basically multiplying our vector by some 10 times 10 matrix and we want to know after how many steps we get a 1/2 in het last entry. I get that, got the matrix too. Still don't know what you mean with (quasi-)stationary distribution though. But I'll look it up this week. – Bib-lost Apr 12 '16 at 14:22
@Bib-lost: I'm not sure you can look that up, it's a bit informal :-) What I mean is that after $10$ steps, the vector settles into a form where roughly each entry is half the previous one, and this is "almost" a stationary distribution -- it would be stationary if it weren't for the fact that in each step a bit of probability is syphoned off by the absorbing state (the last one). That part that leaks out is about $2^{-10}$ of the total probability left in the non-absorbing states, so the remaining probability in those states roughly decreases by a factor of $1-2^{-10}$ in each step. – joriki Apr 12 '16 at 14:39
@Bib-lost: If you can run Java programs, you can run the code I linked to (with some added print statements) to see how the vector develops -- the pattern of the "quasi-stationary" distribution will become quite clear then. – joriki Apr 12 '16 at 14:40
Only got Matlab here, but I guess it's easy enough to convert it. I'll have some time tomorrow. – Bib-lost Apr 12 '16 at 14:45
1

Converted the code to Matlab and got a median of 712 as well. Interesting approach, I must say. I still wonder how you got the estimate that about $2^{-10}$ of the total probability leaks out of the non-absorbing states, even though it indeed seems to be quite precise. – Bib-lost Apr 13 '16 at 17:45
1

@Bib-lost: In the quasi-stationary distribution, each state has roughly half the probability of the previous one, so expressed in multiples of the probability in the last non-absorbing state, the total probability in all the non-absorbing states is $\sum_{i=0}^82^i=2^9-1$. Thus roughly $2^{-9}$ of this total probability is in the last non-absorbing state, and half of that, i.e. roughly $2^{-10}$, leaks into the absorbing state in each step. – joriki Apr 13 '16 at 20:12
1

It's clear now, thanks a lot for your help! (I chose zhoraster's answer as accepted because their approach was closer to my own and because their used approximation was more justified IMHO. Your exact solutions seem numerically more or less equivalent. If I could have split the bounty, I would have.) – Bib-lost Apr 15 '16 at 08:15

zhoraster · Accepted Answer · 2016-04-14T15:25:41.527

Let $a^k_{n}$ denote the number of zero-one sequences of length $n$ with longest zero run non-exceeding $k$, and $a^k_{n,m}$ denote the number of such sequences with $m$ trailing zeros, $m=0,\dots,k$.

Then $$ a^{k}_{n,m} = a^{k}_{n-m,0},\ m = 1,\dots,k, n\ge m, $$ and $a^k_{n,0} = a^k_{n-1}$, $n\ge 0$. Therefore, $$ a_n^k = \sum_{m=0}^k a^k_{n,m} = a^k_{n-1} + \sum_{m=1}^{k} a^k_{n-m,0} = \sum_{j=1}^{k+1} a^k_{n-j} \tag{1} $$ for $n\ge k+1$. (In other words, take a sequence and remove all trailing zeroes and preceding one; you'll end up with a sequence whose longest zero run does not exceed $k$.)

The initial values are $a_n^k = 2^n, n\le k$. Therefore, it easy to see that the generating function $A^k(z) = \sum_{n=0}^\infty a_n^k z^n$ is equal to $$ A^k(z) = \frac{\sum_{j=0}^k z^j}{1-\sum_{j=1}^{k+1} z^j} = \frac{1-z^{k+1}}{1-2z+z^{k+2}}. $$

Writing the partial fraction expansion for $A^k(z)$, it is possible to derive an explicit formula for $a_n^k$.

Now, denoting $T_m$ the number of throws to get $m$ heads, we have $$ P(T_m>n) = \frac{1}{2^n}P(\text{at most $m-1$ consecutive heads in the first $n$ throws}) = \frac{a^{m-1}_n}{2^n}. $$

However, for $m=9$ (which corresponds to $10$ alternating flips, as you've explained) there is nothing very exciting, as the roots can be computed only approximately. Nevertheless, using the recurrent formula (1) gives an efficient way to compute the required number of sequences.

A good rough approximation is $a_n^{8} \sim C_\tau \tau^{-n-1}$, where $\tau = 0.500493118286\dots$ is the unique positive root of the polynomial $f(\tau) = \sum_{j=1}^{9}z^{j} -1$ (alternatively, the root of $z^{10}-2z+1$ inside $(0,1)$), and $$ C_\tau = \frac{\sum_{j=0}^{8} \tau^j}{f'(\tau)} = 0.503980275733\dots $$ Indeed, since $A^8(z)$ is rational, and $f$ does not have double roots, it has a partial fraction expansion $A^8(z) = \sum_{\zeta: f(\zeta) = 0} \frac{C_\zeta}{\zeta - z}$. Therefore, the sequence is of the form $a_n^8 = \sum_{\zeta: f(\zeta) = 0} C_\zeta \zeta^{-n-1}$. In particular, since $\tau$ is the root with the smallest absolute value, $a_n^8\sim C_\tau \tau^{-n-1}$. Moreover, the norms of other roots are bigger than $1.11$, so the relative error of this approximation is of order $r^{-n}$ with $r>2$. This is especially good for our problem, since we will be interested with very large values of $n$.

So the probability we need is approximately $$ P(T_9>n)\sim \frac{C_\tau}{\tau (2\tau)^n}, $$ and this is very sharp (up to some exponentially small relative error). In order to find the median, one should find the smallest $n$ such that $P(T_9>n)< 1/2$. This gives the value $$n= \lceil- \log(4C_\tau)/\log(2\tau)\rceil-1 = 711$$ for the median, which agrees with joriki's calculations.

I'm working through your post. What do you mean by trailing zeroes? What's the number of trailing zeroes in, for example, 00110? Or 110001? Or 0011000? Also, what is $A$? — Bib-lost, Apr 13 '16 at 16:23
00110 - 1 trailing zero. 110001 - 0. 0011000 - 3. Sorry, I've carelessly deleted the definition of $A$ while editing, will add it back. — zhoraster, Apr 13 '16 at 20:00
Ok, it's clear now up to "compute the required number of sequences". Could you explain why $C_\tau \tau^{-n-1}$ is a good approximation for $a^8_n$? I presume it might be some property of Taylor series that I'm overlooking? — Bib-lost, Apr 14 '16 at 14:47
@Bib-lost, I've added a short explanation. Feel free to ask if something is still unclear. — zhoraster, Apr 14 '16 at 15:26
All clear now. Assuming that "is the unique positive root of", "does not have double roots" and "are bigger than $1.11$" are all observations which stem from the numerical computation of the roots, rather than some kind of theoretical derivation? — Bib-lost, Apr 14 '16 at 19:28
@Bib-lost, the absence of double roots is routinely checked by inspecting that $f$ and its derivative are coprime. The uniqueness of a positive root is obvious since $f$ increases on $[0,\infty)$. Everything else is done numerically. — zhoraster, Apr 15 '16 at 01:56
Thanks a lot for your help. Most of the techniques you used were known to me, but to see them used together so smoothly was refreshing. I had never thought complex analysis would come in handy in this problem. I learned a lot. — Bib-lost, Apr 15 '16 at 08:11

score 0 · Answer 3 · answered Apr 12 '16 at 15:29

0

Intuitively it seems like you should bet on this event occurring in the first 10 flips.

Reasoning: If you pick any particular 10 flips you have a $\frac{1}{2^9}$ probability of those particular ten flips being a sequence of interest.

However, we specifically want the first such sequence, which means that there will be constraints on the earlier flips (to avoid an earlier sequence of this type). Therefore your probability of some sequence of 10 being the first such sequence is less than or equal to $\frac{1}{2^9}$, with equality only for the first ten flips.

answered Apr 12 '16 at 15:29

paw88789

41,907

1

But we're not trying to maximise the probability of being exactly right, but of being closest. (I didn't downvote.) – joriki Apr 12 '16 at 15:58
I indeed intended the question as @joriki interpreted it. I'm sorry if something's unclear. – Bib-lost Apr 12 '16 at 20:02
To maximize the chance of being closest, it seems to me that you need to know additional information, such as how many total guessers there are, and how their guesses may be distributed.If there are a lot of guessers, it is not unlikely that you have to guess exactly in order to win. – paw88789 Apr 14 '16 at 20:19

How to win Matt Parker's jackpot - finding the median of the following distribution

3 Answers3