Probability of Success runs in Bernoulli trials

Question

I want to calculate the probability of getting a sequence of (at least) $r$ consecutive successes in $n \ge r$ Bernoulli trials (probability = $p$). To make my definition of $r$ consecutive successes clear, consider the following sequence:

$$S\, S\, S\, S\, F\, S\, S\, S\, S\, S\, S\, F$$

Assuming $r=3$, in the above sequence, there is no consecutive sequence of 3 successes, but there is a sequence of 4 successes and a sequence of 6 successes. More precisely, a sequence of $r$ consecutive successes can have $3$ forms:

$$\underbrace{S\cdots S}_{r}\,F \qquad F\,\underbrace{S\cdots S}_{r}\,F \qquad F\,\underbrace{S\cdots S}_{r}.$$

Note that in certain definitions the sequence $F\, S\, S\, S\, S\, S\, S\, F$ could contain two sequences of $3$ successes and three of $2$ successes, but not in my case. I am making this distinction so as not to "count" the same sequence more than once.

The probability I want to calculate is the sum of the probabilities of having a consecutive sequence of sizes $r, r+1, r+2, \cdots, n$. That's why I used the term "at least".

First I thought of using the formula presented in Feller on p. 325 of An Introduction to Probability Theory and Its Applications, equation 7.11, which can also be found in this answer. Looking at the definition he uses of sequence of successes, presented on page 305, I don't know if it fits what I'm looking for.

I also thought about Muselli's article Simple Expressions for Success Run Distributions in Bernoulli Trials, which at first I thought calculated the same probability as Feller's equation, but when calculated in some cases they are not the same, in addition to having some strange behaviors for different values of $p$.

Consider $p = 0.5, n=5, k=2$ and $x=1$ in the equation in the article, we obtain a probability of $0.5625$, but in the book it is $1-q_5 = 0.59375$. So, other than a reference for calculating probability, what would be the difference between the article and the book?

I am confused by your definition. You add "at least" twice, yet claim that your sequence does have a sequence of four and of six. So, you can't mean "at least", as then having six would imply having four and having three. However, your next display with $m \ge r$ suggests that you really do mean "at least". Can you clarify? — Sam OT, Sep 30 '24 at 15:31
@SamOT Sorry, it's actually a bit difficult to make it clear. As I said, I want to calculate all the possibilities of obtaining a sequence of $r, r+1, r+2...$ consecutive successes. But in Feller's book, in the sequence I presented, he will count the sequence $F, S, S, S, S, S, S, F$ several times, if it is $r=2$, he counts 3 sequences of two successes and 2 sequences of 3 sucess. But I would like to consider this as being only a sequence of $6$ successes. — Mrcrg, Sep 30 '24 at 15:45
But the final probability I want to calculate takes into account all cases greater than or equal to $r$. I think the way Feller counts the sequences may increase the probability, but maybe in the end our ideas are the same, I don't know. — Mrcrg, Sep 30 '24 at 15:46
A sequence of $r,r+1,r+2\cdots$ conscutive successes. Do you mean like $F^+SSSF^+SSSSF^+SSSSSF^+SSSSSSF^+$ ? — , Sep 30 '24 at 19:00
@YvesDaoust No, given a sequence of 100 successes and failures. I want to know the probability of having a sequence of at least $r$ consecutive successes. This includes the possibility of $r+1$ consecutive successes, also $r+2$ and so on. — Mrcrg, Sep 30 '24 at 20:36
A possible line of attack is to compute the histogram of the lengths of the $S$-sequences incrementally. Assume you have the histogram for the length $n$. Then in half of the cases, you will append an $F$, and the histogram does not change. In the other half, you will lengthen the final $S$-sequence by one unit (it could be empty). So you also need the histogram of the lengths of the final $S$-sequence. — , Sep 30 '24 at 21:04

user2661923 · Answer 1 · 2024-10-01T20:23:52.020

Edited to correct typo's.

To the best of my knowledge, there are several means of attack. I reject Inclusion-Exclusion as being too unwieldy here. Generating functions may offer a solution. However, since I am totally ignorant of generating functions, I will have to leave that approach to someone else.

I suspect that recursion may be do-able. However, the fact that this is a probability problem rather than merely an enumeration problem seems to also make recursion problematic.

In this answer, I will use Stars and Bars.

For Stars and Bars theory, see this article and this article.

Throughout my analysis, I will assume that $~q = (1-p).$

Throughout this answer, I will adopt the convention that

$$\binom{a}{b} = 0 ~: ~a < b.$$

For illustrative purposes, first assume that $~n = 20, r = 7.~$

Then, the desired computation is

$$\sum_{k=7}^{20} \left\{ ~p^k q^{20-k} \left[ ~\binom{20}{k} - f(20,7,k) ~\right] ~\right\},$$

where $~f(n,r,k)~$ represents the enumeration of all distributions of exactly $~k~$ successes, out of $~n~$ trials, where no occurrence of $~r~$ consecutive successes occurred.

Consider the following tableau, which is based on $~k = 7.$

- F - F F F - F F F - F - F - F - F - F - F -

The $~(20 - k) = (20 - 7) = 13~$ failures create $~(13 + 1) = 14~$ islands. Reading these islands from left to right, let $~x_1, ~x_2, ~\cdots, ~x_{14}~$ denote the respective sizes of these islands. Then, $~f(20,7,7)~$ represents the number of solutions to the following enumeration problem:

$x_1 + x_2 + \cdots + x_{14} = k = 7.$
$x_1, ~x_2, ~\cdots, ~x_{14} \in \Bbb{Z_{\geq 0}}.$
$x_1, ~x_2, ~\cdots, ~x_{14} \in \Bbb{Z_{\leq (7-1)}}.$

The idea behind the third bullet point above is that there will be no occurrence of $~7~$ consecutive successes if and only if each of the $~14~$ variables is $~\leq 6.$

To enumerate the above problem, I will follow the model in this answer.

Keeping in mind the convention that $\displaystyle \binom{a}{b} = 0 ~: ~a < b,$ you have that

$$f(20,7,7) = \sum_{w=0}^{14} (-1)^w T_w,$$

where

$$T_w = \binom{14}{w} \times \binom{20 - [7w]}{13}.$$

The analysis in the specific case of $~n = 20, r = 7,~$ easily generalizes. The desired computation is

$$\sum_{k=r}^{n} \left\{ ~p^k q^{n-k} \left[ ~\binom{n}{k} - f(n,r,k) ~\right] ~\right\}.$$

Then, $~f(n,r,k)~$ represents the enumeration of the number of solutions to

$x_1 + x_2 + \cdots + x_{(n+1-k)} = k.$
$x_1, ~x_2, ~\cdots, ~x_{(n+1-k)} \in \Bbb{Z_{\geq 0}}.$
$x_1, ~x_2, ~\cdots, ~x_{(n+1-k)} \in \Bbb{Z_{\leq (r-1)}}.$

Then,

$$f(n,r,k) = \sum_{w=0}^{n+1-k} (-1)^w T_w,$$

where

$$T_w = \binom{n+1-k}{w} \times \binom{n - [rw]}{n - k}.$$

I may be wrong, but your equation seems to have the same strange behavior that I noticed in Muselli's article. For example, $n = 5, r = 1$ and $p = 0.9$ I get a probability of $0.2053$, but if I change $p$ to $0.5$ the probability is $0.8125$, which seems strange to me. — Mrcrg, Sep 30 '24 at 21:48
@Mrcrg If $~n = 5, ~r = 1, ~p = 0.9,~$ then the probability of $~0.2053,~$ can't be right. For example, the probability that the first two trials are successes is $~.9^2 = 0.81,~$ so the overall probability must be $~> 0.81.$ — user2661923, Sep 30 '24 at 22:27
Looking at your reasoning, I didn't find any error, but I wrote some code in Python and found this answer. I just did the math again by hand and found the same result. My values for the function $f$ were, $f(5,1,1) =0, f(5,1,2)=0, f(5,1,3)=1, f(5,1,4)=3, f(5,1,5)=1$, and the final sum was $0.00045 + 0.0081 + 0.06561 + 0.13122 + 0 =0.2053$. — Mrcrg, Sep 30 '24 at 23:29
@Mrcrg If it makes a difference, I just found and corrected typo's in my formula. See, for example, the last line of my answer. — user2661923, Oct 01 '24 at 20:25

Frank M Stalone · Answer 2 · 2025-02-25T04:02:07.043

0

Muselli gives the probability of exactly x runs. Regarding your example, you most likely want the probability of at least one run, which is $0.59375$. Muselli's Eqn 13 can also get that by adding $f(1)+f(2)$, but better yet, we can tweak the equation: change $\binom{m}{x}$ to $\binom{m-1}{x-1}$.

Edit: for those without access to the paper cited in the question, Muselli's equation is:

Let $M_n^{(k)}$ be the number of success runs with length $k$ or more in $n$ Bernoulli trials.

$$P(M_n^{(k)}=x)=\sum_{m=x}^{\lfloor\frac{n+1}{k+1}\rfloor}(-1)^{m-x}\binom{m}{x}p^{mk}q^{m-1}\left(\binom{n-mk}{m-1}+q\binom{n-mk}{m}\right)$$

Replacing $\binom{m}{x}$ with $\binom{m-1}{x-1}$ gives $P(M_n^{(k)}\ge x)$ as dictated by the inclusion-exclusion principle.

edited Feb 25 '25 at 04:02

answered Feb 25 '25 at 01:26

Frank M Stalone

11

This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review – Liding Yao Feb 25 '25 at 02:28
@liding-yao OP asked, "what would be the difference between the article and the book?" I explained that the book gives the probability of at least one run, whereas the article gives that of exactly one, thus directly answering OP's question. The more general question is already answered by the very article OP referenced, but I've now edited my answer so that readers other than OP can also benefit. – Frank M Stalone Feb 25 '25 at 04:09

Probability of Success runs in Bernoulli trials

2 Answers2