7

This question is a creative thought of mine that I stumbled upon while studying some basic probability and statistics:

Problem

What is the probability of encountering $5$ consecutive equal rolls in $100$ dice rolls?

Attempt 1

First, I thought to myself: since a single roll of a die has 6 outcomes. Then, 5 consecutive faces should have 1/6⁵. However, the whole event has 100 rolls, that means the 5 consecutive faces can be placed 96 times within that string of 100. So I get 96/6⁵.

But then, I thought the 1st roll can be any faces of 1,2,3,4,5,6 and that's when the 2nd roll should be ⅙ because of the outcome of the 1st roll ascertain what the 2nd roll. That makes it 96/6⁴

Unsatisfied with my answer, that's when I posted it online: Reddit Post (inaccurate answer), Facebook Post.

Attempt 2 and sanity check

This relies on a recursive approach. See my own answer to the question.

Question

Is this correct? Is there a better way to approach the problem?

  • 3
    I checked with markov chains that you're correct! Congratulations on finding the dynamic programming (recursion) approach. Wolfram Alpha link – Benjamin Wang Sep 20 '24 at 09:19
  • Is there a better way to approach this problem? I don't even know, how I did it. I only combine my basic knowledge then came up with this approach. – Gio Tungul Sep 20 '24 at 10:03
  • 2
    Arguably your way is computationally efficient and using the least amount of theory. If you want to learn the Markov Chains way you should study it up to "absorption probabilities" (Chapter 3). Also if you clean up your question with some MathJax (which I've started) it is a useful resource for everyone else. It's the most detailed question I've ever seen from a new user. You can even cut-and-paste your 2nd attempt and post it as your own answer. – Benjamin Wang Sep 20 '24 at 11:28

2 Answers2

4

Let $A_{n,k}$ , for $k=1,2,3,4$ , be the counting of "bad" sequences (no $5$ consecutive rolls) of length $n$ where the $k$ trailing run has length $k$ (that is, the last $k$ faces are equal, but the last $k+1$ are not).

Let $B_{n}= \sum_{k=1}^4 A_{n,k}$ be the total number of "bad" sequences of length $n$.

Then, for example $A_{21,3} = A_{20,2}$. And $A_{21,1} = 5 \times (A_{20,1}+A_{20,2}+A_{20,3}+A_{20,4}) = 5 \, B_{20}$

That is

$$A_{n,k} = \begin{cases} A_{n-1,k-1} & k>1 \\ 5 \sum_{j=1}^4 A_{n-1,j} = 5 B_{n-1}& k=1 \end{cases} \tag 1 $$

Summing $(1)$ over $k$ we get

$$B_n = 5 B_{n-1} + B_{n-1} -A_{n-1,4}$$

Also, $A_{n-1,4}=A_{n-2,3}=A_{n-3,2}=A_{n-4,1}= 5 B_{n-5}$. Hence

$$B_n = 6 B_{n-1} - 5 B_{n-5} \tag2$$

which agrees with your recursion. The initial conditions are $B_n = 6^n$ for $n=1,2 ... 4$ and $B_5= 6^5-6$

The recursion $(2)$ is a linear difference equation. The characteristic equation $x^5=6 x -5$ has five roots, the dominant one is $\lambda = 5.99613201072218$. Hence, asympotically we should have $B_n \approx \alpha \lambda^n$ with $\alpha \approx 1$. And, indeed, under this approximation we have a probability $p \approx (\lambda/6)^n \approx 0.937548043$.

Alternatively, we can write $(2)$ as a first-order multidimensional linear recurrence $C_n= A C_{n-1}$ with

$$ A = \pmatrix{6 & 0& 0& 0& -5\\ 1 &0&0&0&0 \\ 0 &1&0&0&0 \\ 0 &0&1&0&0 \\ 0 &0&0&1&0 \\ }$$

and $C_n =(B_n, B_{n-1}, \cdots B_{n-4})^t$. Then $C_{100}=A C_{99} = A^{95} C_{5}$. This solution can also be obtained using Markov chains.

In Matlab/Octave:

output_precision(7)

A = [6,0,0,0,-5 ; 1,0,0,0,0 ;0,1,0,0,0 ; 0,0,1,0,0 ; 0,0,0,1,0 ]

(A^95 * [6^5-6,6^4, 6^3, 6^2, 6]')/6^100

gives $0.9398522$

A different (quite quick and rough) approximation. The probability of the event $E_i \equiv$ "the five rolls starting at $i$ are not all equal" is $P(E_i)=1-6/6^5 = 0.9992283951$. We are interested in $p =P(E_1, E_2 \cdots E_{96}) \approx P(E_i)^{96} = 0.928576355$. This is only an approximation, because the events are not independent.

leonbloy
  • 66,202
3

Solution

I imagined, what if instead of finding the probability of getting $5$ consecutive faces in $100$ rolls, I calculate the probability of not getting $5$ consecutive faces in those $100$ rolls? So, I made restrictions and cases.

Let $a_n$ be the number of ways to avoid $5$ consecutive rolls in $n$ rolls. Then our answer is $1-a_{100}/6^{100}$.

In the first $4$ positions, there is nothing to avoid, so $a_n=6^n$ for $n\le 4$, i.e. $(6, 36, 216, 1296)$.

And for $a_5$, which has a total $6^5$ outcome but since this is where we limit the streak of going $(1,1,1,1,1)$, $(2,2,2,2,2)$, $(3,3,3,3,3)$, $(4,4,4,4,4)$, $(5,5,5,5,5)$ and $(6,6,6,6,6)$. So taking account the restrictions,

$$a_5 = 6^5-6=7770.$$

For $a_6$, this is where the recursion should start. First, $a_5$ multiplied by $6$ is the total outcome of rolling a die, then subtract it with the product of $a_1$ and $5$, which is the $5$ consecutive faces we are trying to limit.

$$a_6=6a_5-5a_1=(7,770\times 6)-(6\times5)=46,620-30=46,590$$

Now for $a_7$, we just repeat what we did just like in the previous case, but this time we have $a_6$ multiplied by 6, then subtract it with the product of $a_2$ and $5$, which is the restriction.

$$a_7=6a_6-5a_2=(46,590\times6)-(36\times5) =(279,540)-(180)=279,360$$

And now you get the pattern

$$a_n=6a_{n-1}-5a_{n-5} \ \ \text{for} \ \ n\ge 6.$$

The Google Sheets calculation gives $a_{100}\approx 6.14023 \times 10^{77}$ total number of ways to avoid $5$ consecutive rolls in $100$ rolls. So

$$X := \frac{a_{100}}{6^{100}} \approx \frac{6.14023\times 10^{77}}{6.53319\times 10^{77}} \approx 0.9398522064 \approx 93.99\%$$

So the answer $1-X\approx 0.06014779357\approx6.01\%$.

Sanity check

I was surprised by the result because I think $X$ is too big. So I solved another problem that is easier to manually check with the same method:

What is the probability of encountering $3$ consecutive equal flips in $6$ coin flips?

The number of possible outcomes is only $2^6=64$ outcomes. Let $a_n$ be the number of ways to avoid $3$ consecutive flips in $n$ flips. Then our answer is $1-a_{6}/2^6$.

In the first $2$ positions, there is nothing to avoid, so $a_n=2^n$ for $n\le 2$, i.e. $(2, 4)$.

Third position has $2^3$ outcomes, but since we limit the streaks of forming $(0,0,0)$ and $(1,1,1)$, it should be

$$a_3 = 2^3-2=6.$$

For the fourth position, just like what I did with the dice which this is where the recursion will start. the 3rd-pos multiply by $2$ since that's the total outcome of flipping a coin, then subtract it with the product of 1st-pos and $1$, which is the $3$ consecutive faces we are trying to limit.

$$a_4 =2a_3-a_1=(6\times2)-(2\times1)=10$$

Just repeat for fifth and sixth position.

$$a_5 =2a_4-a_2=(10\times2)-(4\times1)=16$$ $$a_6 =2a_5-a_3=(16\times2)-(6\times1)=26$$

So I get $a_{6}=26$ ways to avoid 3 consecutive flips in 6 flips. So the answer is

$$ 1-\frac{a_6}{2^6} =1-\frac{26}{64} =0.59375 \approx 59.38\%$$

That also means, there are $64-26=38$ ways of encountering $3$ consecutive flips in those $64$ total outcomes.

Here's the manual checking on Google Sheets which confirms the answer.

This suggests that my solution to the original problem is correct.

Clarification for the subtraction part of the recursion

The subtraction is just the recurrence relation that is there to correct for invalid sequences that contain consecutives. I want to count the total number of valid sequences without the consecutive, but some sequences contain streaks that we don’t want. The subtraction removes those invalid sequences.

Example, let's just use coin flips.

Let’s say we want to avoid a streak of $3$ consecutive heads $(0,0,0)$ or tails $(1,1,1)$. If we started recurring, the $3$ consecutive faces will overlaps with the previous $n$th-pos, where's that's been accounted already with $a_{n-1}×q$

To make it clear, here's a single streak of $6$th-pos $(0,0,0,0,0,0)$ which has the streak of the $5$th-pos $(0,0,0,0,0)$ which also has the streak of the previous $n$th-pos.

The recurrence relation for coin flips is:

$$a_n=a_{n-1}×2-a_{n-3}×1$$

$a_{n-1}×2$ is all possible sequence with a given length of $n$.

$a_{n-1}$ is the number of valid sequences with length of $n-1$.

$2$ number of outcome which is the head and tail.

$-a_{n-3}×1$ removes invalid sequences that end with a streak of 3 identical outcomes.

For example if the the last $3$ flips are $(0,0,0)$ or $(1,1,1)$ these are the streak I want to exclude because it is already accounted from the previous $n$th-pos. As I explained above.

$a_{n-3}$ represents the number of sequences of length $n-3$.

If I append the $(0,0,0)$ or $(1,1,1)$ to these sequences, I will create an invalid sequence of length $n$ with a streak of $3$ consecutive outcomes.

$1$ There are only $1$ streak for heads $(0,0,0)$ and $1$ streak for tails $(1,1,1)$, so we subtract $1$ copy of each possible streak, hence multiplying by $1$.

Here's the visual representation about why we subtract the $a_{n-l}×(q-1)$

Breakdown of the formula

$$a_n = \begin{cases} q^n & \text{if } n < l \\ q^l - q & \text{if } n = l \\ a_{n-1} \times q - a_{n-l} \times (q-1) & \text{if } n > l \end{cases}$$

$n$=number of positions

$q$=number of possible outcomes (e.g., $2$ for coin flips, $6$ for dice rolls)

$l$=length of the $n$th consecutive faces.

  • Ideally I'd like to see more justification about why you're subtracting $5a_{n-5}$. Currently, the explanation subtract it with the product of a_1 and 5, which is the 5 consecutive faces we are trying to limit is a little unclear. I think you understand it internally (since you correctly applied it to the sanity check), but can you try to explain it more precisely? – Benjamin Wang Sep 21 '24 at 04:35
  • My Reasoning for Subtracting

    My thought is something like this in the subtracting part. This part corrects for the invalid sequences where the last 5 rolls form 5 consecutive faces.

    $$a_n-5$$ is the number of valid sequences of length $$n-5$$, and if I append 5 consecutive numbers to this sequence, I'll create an invalid sequence.

    There are 5 possible outcomes that can form such a streak we only subtract streaks of 5 consecutive identical rolls. This is why I multiply by $$5$$.

    – Gio Tungul Sep 21 '24 at 06:32
  • Same goes with coin flips, $$a_n-3$$ the number of valid sequences of the length $$n-3$$. If I append to the identical outcomes of (0,0,0) or (1,1,1) I'll create an invalid sequence.

    There are 2 possible 3 consecutive faces for a coin: one for heads and one for tails. But since you're only looking to subtract the streak of identical faces, you're subtracting 1 (because there's only one type of streak for each possible outcome.

    This is why I multiply by 1, as I am only accounting for the single streak per sequence of 3 rolls.

    – Gio Tungul Sep 21 '24 at 06:34
  • My Simplest Reasoning for Subtraction

    I just thought since there's 6 outcomes for the dice, then there should be 5 invalid sequences if either one of the 6 of 5 consecutive faces that might occur. Same goes with coin flips, there's 2 outcomes for the coin, then there should be 1 invalid sequence if either one of the 2 or 3 consecutive faces that might occur.

    Basically, total outcome minus 1.

    I hope I explained it well.

    – Gio Tungul Sep 21 '24 at 06:36
  • Your math is right but the justification (to me only -- maybe others find this totally clear) seems to miss the mark. Let $s_i$ be outcome of roll $i$. I'd reason as follows: at step $n$ you're removing the sequences that have just become invalid, in particular those with $$s_n = s_{n-1}=s_{n-2}=s_{n-3}=s_{n-4}$$ but $$s_{n-4}\neq s_{n-5}.$$(If you don't enforce this constraint then you're removing sequences you've already removed in earlier steps.) So the $5$ comes from the restriction $s_{n-4}\neq s_{n-5}$ (e.g. if $s_{n-5}=2$ then remove those with $s_{n-4}=1,3,4,5,6$ so $5$ options.) – Benjamin Wang Sep 21 '24 at 09:37
  • Hello Sir, I can't reply with my answer here since the number of character limited. I’ve added my response at the bottom of my answer where I can edit. I hope it satisfies your question about the subtraction part. Thank you – Gio Tungul Sep 21 '24 at 13:37