No overlaps. We are counting runs of at least 5, whereby for example a run of 6 does not count as 2 runs of 5.
I have received an answer to this question from someone with a PhD in Statistics, yet their theoretical answer does not agree with my code simulation.
Theoretically, the answer would be
$ \frac{96}{32} - \frac{95}{64} + \frac{94}{128} - \frac{93}{256} + \frac{92}{512} - ...$
since we expect $\frac{96}{32}=3$ instances of 5 heads in a row (if we allow overlap), and if we apply the Inclusion-Exclusion Principle, we can correct for double counts of runs of 6, 7, 8, etc...
The problem is, this answer is approx 2, but my coded simulation always results in about 1.5:
import numpy as np
import pandas as pd
def random_list(length):
random_list = np.zeros(length) #list of "length" zeros
for i in range(len(random_list)):
random_value = np.random.random() #instantiate random value for each i
if random_value > 0.5: #for approximately half of the random values
random_list[i] = 1
else:
random_list[i] = 0
return random_list #random_list is now a random list of zeros and ones
def count_ones(array):
runs = 0
i = 0
while i < (len(array) - 1): #iterate over each index in list
if array[i] == 1: #find a value of one
j = i + 1 #the next value is j
while array[j] == 1: #iterate over indices until we hit a zero or end of array
if j == (len(array)-1):
break #break out of loop if we are at the end of the list
j += 1
k = j #we now have either the first zero after a list of ones, or we are at the end of the list
ones = k - i #how many ones in a row
if ones >= 5:
runs += 1 #count this as a run of 5
else: #if array[i] == 0
k = i # necessary so that code after if/else conditional runs
i = k + 1 # loop will iterate over index after k
return runs
def average_runs(trials):
results = np.zeros(trials)
for i in range(trials): #do it a large number of times
array = random_list(100)
runs = count_ones(array)
results[i] = runs
average = sum(results)/len(results) #take the average
return average
average_runs(1000)
Can anyone explain why the simulation and theory do not agree?
I took my results to a Cambridge PhD in number theory, who said 'Yes… that's what most of my students find.'
My own trial started not with mere binary coin tosses, but roulette spins… though in either case, runs of 12 or 13 in a row were not uncommon.
If your particular simulation doesn't agree with whichever theory you're following, what does that suggest?
Almost separately, why did you need such complex code for such a simple problem?
– Robbie Goodwin May 28 '25 at 20:56count_oneswon't count a run of exactly $5$ heads right at the end. Therefore the number your code is estimating is actually $1.5$ :). Unsolicited advice.. if you find yourself writing aforloop that iterates over a numpy array, either there is a better way to do what you want to achieve or you shouldn't be using numpy. If you're interested in some faster/more compact implementations, I compared a few to your approach here. (@TylerW you may be interested too) – Izaak van Dongen Jun 04 '25 at 15:57