1

I want to calculate the probability of getting a sequence of (at least) $r$ consecutive successes in $n \ge r$ Bernoulli trials (probability = $p$). To make my definition of $r$ consecutive successes clear, consider the following sequence:

$$S\, S\, S\, S\, F\, S\, S\, S\, S\, S\, S\, F$$

Assuming $r=3$, in the above sequence, there is no consecutive sequence of 3 successes, but there is a sequence of 4 successes and a sequence of 6 successes. More precisely, a sequence of $r$ consecutive successes can have $3$ forms:

$$\underbrace{S\cdots S}_{r}\,F \qquad F\,\underbrace{S\cdots S}_{r}\,F \qquad F\,\underbrace{S\cdots S}_{r}.$$

Note that in certain definitions the sequence $F\, S\, S\, S\, S\, S\, S\, F$ could contain two sequences of $3$ successes and three of $2$ successes, but not in my case. I am making this distinction so as not to "count" the same sequence more than once.

The probability I want to calculate is the sum of the probabilities of having a consecutive sequence of sizes $r, r+1, r+2, \cdots, n$. That's why I used the term "at least".

First I thought of using the formula presented in Feller on p. 325 of An Introduction to Probability Theory and Its Applications, equation 7.11, which can also be found in this answer. Looking at the definition he uses of sequence of successes, presented on page 305, I don't know if it fits what I'm looking for.

I also thought about Muselli's article Simple Expressions for Success Run Distributions in Bernoulli Trials, which at first I thought calculated the same probability as Feller's equation, but when calculated in some cases they are not the same, in addition to having some strange behaviors for different values ​​of $p$.

Consider $p = 0.5, n=5, k=2$ and $x=1$ in the equation in the article, we obtain a probability of $0.5625$, but in the book it is $1-q_5 = 0.59375$. So, other than a reference for calculating probability, what would be the difference between the article and the book?

Mrcrg
  • 2,969
  • 1
    I am confused by your definition. You add "at least" twice, yet claim that your sequence does have a sequence of four and of six. So, you can't mean "at least", as then having six would imply having four and having three. However, your next display with $m \ge r$ suggests that you really do mean "at least". Can you clarify? – Sam OT Sep 30 '24 at 15:31
  • @SamOT Sorry, it's actually a bit difficult to make it clear. As I said, I want to calculate all the possibilities of obtaining a sequence of $r, r+1, r+2...$ consecutive successes. But in Feller's book, in the sequence I presented, he will count the sequence $F, S, S, S, S, S, S, F$ several times, if it is $r=2$, he counts 3 sequences of two successes and 2 sequences of 3 sucess. But I would like to consider this as being only a sequence of $6$ successes. – Mrcrg Sep 30 '24 at 15:45
  • But the final probability I want to calculate takes into account all cases greater than or equal to $r$. I think the way Feller counts the sequences may increase the probability, but maybe in the end our ideas are the same, I don't know. – Mrcrg Sep 30 '24 at 15:46
  • A sequence of $r,r+1,r+2\cdots$ conscutive successes. Do you mean like $F^+SSSF^+SSSSF^+SSSSSF^+SSSSSSF^+$ ? –  Sep 30 '24 at 19:00
  • @YvesDaoust No, given a sequence of 100 successes and failures. I want to know the probability of having a sequence of at least $r$ consecutive successes. This includes the possibility of $r+1$ consecutive successes, also $r+2$ and so on. – Mrcrg Sep 30 '24 at 20:36
  • Your second paragraph is misleading, then. –  Sep 30 '24 at 20:49
  • A possible line of attack is to compute the histogram of the lengths of the $S$-sequences incrementally. Assume you have the histogram for the length $n$. Then in half of the cases, you will append an $F$, and the histogram does not change. In the other half, you will lengthen the final $S$-sequence by one unit (it could be empty). So you also need the histogram of the lengths of the final $S$-sequence. –  Sep 30 '24 at 21:04

2 Answers2

1

Edited to correct typo's.


To the best of my knowledge, there are several means of attack. I reject Inclusion-Exclusion as being too unwieldy here. Generating functions may offer a solution. However, since I am totally ignorant of generating functions, I will have to leave that approach to someone else.

I suspect that recursion may be do-able. However, the fact that this is a probability problem rather than merely an enumeration problem seems to also make recursion problematic.

In this answer, I will use Stars and Bars.

For Stars and Bars theory, see this article and this article.

Throughout my analysis, I will assume that $~q = (1-p).$

Throughout this answer, I will adopt the convention that

$$\binom{a}{b} = 0 ~: ~a < b.$$

For illustrative purposes, first assume that $~n = 20, r = 7.~$

Then, the desired computation is

$$\sum_{k=7}^{20} \left\{ ~p^k q^{20-k} \left[ ~\binom{20}{k} - f(20,7,k) ~\right] ~\right\},$$

where $~f(n,r,k)~$ represents the enumeration of all distributions of exactly $~k~$ successes, out of $~n~$ trials, where no occurrence of $~r~$ consecutive successes occurred.

Consider the following tableau, which is based on $~k = 7.$

- F - F F F - F F F - F - F - F - F - F - F -

The $~(20 - k) = (20 - 7) = 13~$ failures create $~(13 + 1) = 14~$ islands. Reading these islands from left to right, let $~x_1, ~x_2, ~\cdots, ~x_{14}~$ denote the respective sizes of these islands. Then, $~f(20,7,7)~$ represents the number of solutions to the following enumeration problem:

  • $x_1 + x_2 + \cdots + x_{14} = k = 7.$

  • $x_1, ~x_2, ~\cdots, ~x_{14} \in \Bbb{Z_{\geq 0}}.$

  • $x_1, ~x_2, ~\cdots, ~x_{14} \in \Bbb{Z_{\leq (7-1)}}.$

The idea behind the third bullet point above is that there will be no occurrence of $~7~$ consecutive successes if and only if each of the $~14~$ variables is $~\leq 6.$

To enumerate the above problem, I will follow the model in this answer.

Keeping in mind the convention that $\displaystyle \binom{a}{b} = 0 ~: ~a < b,$ you have that

$$f(20,7,7) = \sum_{w=0}^{14} (-1)^w T_w,$$

where

$$T_w = \binom{14}{w} \times \binom{20 - [7w]}{13}.$$

The analysis in the specific case of $~n = 20, r = 7,~$ easily generalizes. The desired computation is

$$\sum_{k=r}^{n} \left\{ ~p^k q^{n-k} \left[ ~\binom{n}{k} - f(n,r,k) ~\right] ~\right\}.$$

Then, $~f(n,r,k)~$ represents the enumeration of the number of solutions to

  • $x_1 + x_2 + \cdots + x_{(n+1-k)} = k.$

  • $x_1, ~x_2, ~\cdots, ~x_{(n+1-k)} \in \Bbb{Z_{\geq 0}}.$

  • $x_1, ~x_2, ~\cdots, ~x_{(n+1-k)} \in \Bbb{Z_{\leq (r-1)}}.$

Then,

$$f(n,r,k) = \sum_{w=0}^{n+1-k} (-1)^w T_w,$$

where

$$T_w = \binom{n+1-k}{w} \times \binom{n - [rw]}{n - k}.$$

user2661923
  • 42,303
  • 3
  • 21
  • 46
  • I may be wrong, but your equation seems to have the same strange behavior that I noticed in Muselli's article. For example, $n = 5, r = 1$ and $p = 0.9$ I get a probability of $0.2053$, but if I change $p$ to $0.5$ the probability is $0.8125$, which seems strange to me. – Mrcrg Sep 30 '24 at 21:48
  • @Mrcrg Did you find an analytical flaw in my answer? – user2661923 Sep 30 '24 at 22:24
  • @Mrcrg If $~n = 5, ~r = 1, ~p = 0.9,~$ then the probability of $~0.2053,~$ can't be right. For example, the probability that the first two trials are successes is $~.9^2 = 0.81,~$ so the overall probability must be $~> 0.81.$ – user2661923 Sep 30 '24 at 22:27
  • Looking at your reasoning, I didn't find any error, but I wrote some code in Python and found this answer. I just did the math again by hand and found the same result. My values ​​for the function $f$ were, $f(5,1,1) =0, f(5,1,2)=0, f(5,1,3)=1, f(5,1,4)=3, f(5,1,5)=1$, and the final sum was $0.00045 + 0.0081 + 0.06561 + 0.13122 + 0 =0.2053$. – Mrcrg Sep 30 '24 at 23:29
  • Your equation have this behavior – Mrcrg Oct 01 '24 at 00:29
  • 1
    @Mrcrg If it makes a difference, I just found and corrected typo's in my formula. See, for example, the last line of my answer. – user2661923 Oct 01 '24 at 20:25
0

Muselli gives the probability of exactly x runs. Regarding your example, you most likely want the probability of at least one run, which is $0.59375$. Muselli's Eqn 13 can also get that by adding $f(1)+f(2)$, but better yet, we can tweak the equation: change $\binom{m}{x}$ to $\binom{m-1}{x-1}$.

Edit: for those without access to the paper cited in the question, Muselli's equation is:

Let $M_n^{(k)}$ be the number of success runs with length $k$ or more in $n$ Bernoulli trials.

$$P(M_n^{(k)}=x)=\sum_{m=x}^{\lfloor\frac{n+1}{k+1}\rfloor}(-1)^{m-x}\binom{m}{x}p^{mk}q^{m-1}\left(\binom{n-mk}{m-1}+q\binom{n-mk}{m}\right)$$

Replacing $\binom{m}{x}$ with $\binom{m-1}{x-1}$ gives $P(M_n^{(k)}\ge x)$ as dictated by the inclusion-exclusion principle.