1

I have a string of length n that can consist of 3 different characters: a, b, and c.

I need a formula to calculate the number of strings which contains at least 3 consecutive c's (e.g., ccccb). So far, my formula is: \[ (n-2) \cdot 3^{(n-3)} \]

This formula attempts to place the 3 consecutive c's in the string (e.g., cccxx, xcccx, xxccc), and then fill the remaining positions with a, b, or c.

So far:

  • (n-2) possible positions for the "ccc" block within the string.
  • 3^{(n-3)} ways to fill the remaining n-3 positions with any of the three characters (a, b, c).

However, I realize I am overcounting because:

  1. Strings like ccccc are counted multiple times.
  2. Strings like xcccc are also counted multiple times.

Could someone help me adjust my formula to correctly account for these overcounting issues for any given n?

Thank you!

RobPratt
  • 50,938
  • When working with a consecutive block it can be useful to replace the block with a single character. So here replace $ccc$ with $x$ then ask how many 3 letter strings you can create that contain exactly one $x$ and no $c$. – CyclotomicField May 26 '24 at 15:24
  • 1
    I would work recursively. A good string can only begin in a few ways: $x$, where $x$ denotes either $a$ or $b$, $cx, ccx$, in each case followed by a good word of shorter length, or it can begin $cccx$ followed by word with no instance of $ccc$. Work out a recursion based on that, and see if your formula satisfies it. If it does, then just matching a few terms completes the proof. – lulu May 26 '24 at 15:58
  • 1
    https://oeis.org/A231430 – RobPratt May 26 '24 at 17:44

1 Answers1

1

This problem can be attacked through the very convoluted use of Inclusion-Exclusion theory and Stars and Bars theory. Personally, I regard the approach that I will describe as inferior to attacking the problem with recursion. The reason is that although this approach will allow a closed form expression for the computation, as a function of $~n,~$ it will generally require the use of computer assistance to perform the computation. So, I regard any advantage of this approach over recursion to be very iffy.

See this article for an introduction to Inclusion-Exclusion. Then, see this answer for an explanation of and justification for the Inclusion-Exclusion formula.

For Stars and Bars theory, see this article and this article.

As an illustrative example, I will set $~n = 20,~$ and obtain the corresponding closed form expression, for two illustrative terms that are pertinent to the computation, when $~n = 20.~$ At the end of my answer, I will then describe the overall closed form expression for any value of $~n.~$


Following the syntax in the 2nd Inclusion-Exclusion link above, let $~S~$ denote the collection of all $~20~$ character strings, without any regard for whether the string contains at least one occurrence of ccc.

Then, for $~k \in \{1,2,\cdots,18\},~$ let $~S_k~$ denote the subset of $~S~$ that specifically contains the ccc substring, starting in position $~k.~$ So, for example, an element in $~S_1,~$ will contain ccc, starting in position 1, and may or may not also contain ccc elsewhere in the string.

So, the desired computation is

$$| ~S_1 \cup S_2 \cup \cdots \cup S_{18} ~|. \tag1 $$

The standard Inclusion-Exclusion approach to enumerating the expression in (1) above is:

  • Let $~T_1~$ denote $~\displaystyle \sum_{1 \leq i_1 \leq 18} | ~S_{i_1} ~|.$

    That is, $~T_1~$ represents the sum of $~\displaystyle \binom{18}{1}~$ terms.

  • For $~r \in \{2,3,\cdots,18\},~$

    let $~T_r~$ denote $~\displaystyle \sum_{1 \leq i_1 < i_2 < \cdots < i_r \leq 18} | ~S_{i_1} \cap S_{i_2} \cap \cdots \cap S_{i_r} ~|.$

    That is, $~T_r~$ represents the sum of $~\displaystyle \binom{18}{r}~$ terms.

Then, in accordance with Inclusion-Exclusion theory, the computation in (1) above is equivalent to

$$\sum_{r=1}^{18} (-1)^{r+1} T_r. \tag2 $$

By considerations of symmetry, you have that
$~3^{17} = |~S_1~| = | ~S_2 ~| = \cdots = | ~S_{18} ~|.$
Therefore, $~T_1 = 18 \times 3^{17}.$

The difficulty is that symmetrical considerations break down when computing $~T_r ~: ~r \geq 2.$

To attack this (general) problem, I will

  • Illustrate the manual way that $~T_2~$ can be computed, assuming that $~n=20$.

  • Illustrate the analytical way of computing $~T_2,~$ assuming that $~n=20$.

  • Create a helper function $~f(r,z,o).$

  • Create a helper function $~g(n,r,z,o).$

  • Illustrate the analytical way of computing $~T_{12},~$ assuming that $~n=20$.

  • Provide a closed form expression of $~\displaystyle \sum_{r=1}^{n-2} (-1)^{r+1}T_r.$


$\underline{\text{Manual Computation of} ~T_2}$

In this section, it will be assumed that each intersection is of form $~S_{i_1} \cap S_{i_2} ~: ~1 \leq i_1 < i_2 \leq 18.$

Each of the $~\displaystyle \binom{18}{2} = 153 ~$ intersections will fall into one of three categories:

Category-1
An intersection like $~S_1 \cap S_4.~$
This intersection has ccc starting in positions 1 and 4. Therefore, there are $~14~$ unspecified positions. Therefore, $~|~S_1 \cap S_4 ~| = 3^{14}.$

In Category-1, as $~i_1~$ goes from $~1~$ through $~15,~$ there are $~18 - (i_1+2) = 16-i_1 ~$ possible values for $~i_2.~$
Therefore, Category-1 has $~\displaystyle \sum_{i_1=1}^{15} (16 - i_1) = 120~$ separate terms, each of which equals $~3^{14}.$
Therefore, when computing $~T_2,~$ the Category-1 subtotal is
$120 \times 3^{14}.$

Category-2
An intersection like $~S_1 \cap S_3.~$
This intersection has ccc starting in positions 1 and 3. Therefore, there are $~15~$ unspecified positions. Therefore, $~|~S_1 \cap S_3 ~| = 3^{15}.$

In Category-2, as $~i_1~$ goes from $~1~$ through $~16,~$ there is exactly 1 possible value for $~i_2.~$
Therefore, Category-2 has $~16~$ separate terms, each of which equals $~3^{15}.$
Therefore, when computing $~T_2,~$ the Category-2 subtotal is
$16 \times 3^{15}.$

Category-3
An intersection like $~S_1 \cap S_2.~$
This intersection has ccc starting in positions 1 and 2. Therefore, there are $~16~$ unspecified positions. Therefore, $~|~S_1 \cap S_2 ~| = 3^{16}.$

In Category-3, as $~i_1~$ goes from $~1~$ through $~17,~$ there is exactly 1 possible value for $~i_2.~$
Therefore, Category-3 has $~17~$ separate terms, each of which equals $~3^{16}.$
Therefore, when computing $~T_2,~$ the Category-3 subtotal is
$17 \times 3^{16}.$

So,

$$T_2 = \left( ~120 \times 3^{14} ~\right) + \left( ~16 \times 3^{15} ~\right) + \left( ~17 \times 3^{16} ~\right).$$


$\underline{\text{Analytical Computation of} ~T_2}$

Consider the following tableau:

- - - - i-1 - - - - - i-2 - - - - - - - 

The above tableau, which has $~[ ~20 - ( ~3-1 ~) ~] = 18~$ positions, assigns two of these positions to $~i_1~$ and $~i_2.~$ Here, the position assigned to $~i_1~$ is to the left of the position assigned to $~i_2.$

The positioning of $~i_1~$ and $~i_2~$ create $~(2 + 1) = 3~$ islands. Examining the islands from left to right, let $~x_1, ~x_2, x_3,~$ denote the size of these islands.

In order to categorize the nature of the $~S_{i_1} \cap S_{i_2}~$ intersection, you can ignore the values of $~x_1~$ and $~x_3,~$ and focus exclusively on $~x_2.~$

Either $~x_2 = 0, ~x_2 = 1, ~$ or $~x_2 > 1.$

$\mathbf{x_2 = 0}$
This represents what I describe as a fully compressed intersection. Because it is fully compressed, instead of their being $~[ ~20 - ( ~2 \times 3 ~) ~] = 14~$ unspecified positions, there are $~[ ~20 - ( ~2 \times 3 ~) ~] + 2 = 16~$ unspecified positions.

The number of such intersections is the same as the number of solutions to

  • $x_1 + x_2 + x_3 = [ ~20 - (3-1) ~] - 2 = 16.$

  • $~x_2 = 0.$

  • $x_1, x_3 \in \Bbb{Z_{\geq 0}}.$

By Stars and Bars theory, with the variable $~x_2 = 0~$ ignored, there are $~\displaystyle \binom{16 + [2 - 1]}{[2 - 1]} = 17~$ such solutions.

So, analytically, the first partial sum when computing $~T_2~$ is $~\displaystyle \left( ~17 \times 3^{16} ~\right).$

$\mathbf{x_2 = 1}$
Similarly, when computing the second partial sum of $~T_2,~$ it will be $~3^{15}~$ times the number of solutions to

  • $x_1 + x_2 + x_3 = [ ~20 - (3-1) ~] - 2 = 16.$

  • $~x_2 = 1.$

  • $x_1, x_3 \in \Bbb{Z_{\geq 0}}.$

By Stars and Bars theory, with the variable $~x_2 = 1~$ ignored, and the $~x_1 + x_3~$ sum adjusted to $~15,~$ there are $~\displaystyle \binom{15 + [2 - 1]}{[2 - 1]} = 16~$ such solutions.

So, analytically, the second partial sum when computing $~T_2~$ is $~\displaystyle \left( ~16 \times 3^{15} ~\right).$

$\mathbf{x_2 > 1}$
Finally, when computing the third partial sum of $~T_2,~$ it will be $~3^{14}~$ times the number of solutions to

  • $x_1 + x_2 + x_3 = [ ~20 - (3-1) ~] - 2 = 16.$

  • $~x_2 \geq 2.$

  • $x_1, x_3 \in \Bbb{Z_{\geq 0}}.$

By Stars and Bars theory, with the change of variable $~y_2 = x-2 \implies y_2 \in \Bbb{Z_{\geq 0}},~$ and the $~x_1 + y_2 + x_3~$ sum equal to $~(16 - 2), ~$ there are $~\displaystyle \binom{14 + [3 - 1]}{[3 - 1]} = \binom{16}{2} = 120~$ such solutions.

So, analytically, the third partial sum when computing $~T_2~$ is $~\displaystyle \left( ~120 \times 3^{14} ~\right).$


$\underline{\text{Helper Function} ~f(r,z,o)}$

This section and the next section will be used to create helper functions that will facilitate expressing the general closed form expression. In this section, assuming that you have $~(r+1)~$ variables, how many ways are there that, of the variables $~\{x_2,x_3,\cdots,x_r\},~$ you can have exactly $~z~$ of these variables equal to $~0,~$ and $~o~$ of these variables equal to $~1~?$

For the function $~f(r,z,o),~$ the allowable range for the variables $~r, ~z, ~$ and $~o,~$ will be as specified, in the next section, for the function $~g(n,r,z,o).$

Given that, I specify that

$$f(r,z,o) = \binom{r-1}{z} \times \binom{r-1-z}{o}.$$


$\underline{\text{Helper Function} ~g(n,r,z,o)}$

In this section, assume that you have

  • The specific intersection represented by
    $~(x_1,x_2,\cdots,x_r,x_{r+1}).~$

  • $x_1 + x_2 + \cdots + x_r + x_{r+1} = (n-2-r).$

  • $x_1,x_{r+1} \in \Bbb{Z_{\geq 0}}.$

  • $x_2,x_3,\cdots,x_{z+1}~$ are all equal to $~0.$

  • $x_{z+2}, x_{z+3}, \cdots, x_{(z+1+o)}~$ all equal to $~1.~$

  • $x_{(z+1+o + 1)}, \cdots, x_r, ~$ are all in $~\Bbb{Z_{\geq 2}}.~$

Then, what is the product of the number of solutions possible times $~3^p,~$ where $~p~$ equals the number of unspecified characters in each solution?

Clarification
The helper function in this section is intended to dovetail with the helper function in the previous section. $~f(r,z,o)~$ is concerned with how many ways that there are of choosing $~z~$ variables to equal $~0,~$ and then choosing $~o~$ variables to equal $~1,~$ from the set $~\{x_2,\cdots,x_r\}.$

$g(n,r,z,o)~$ then assumes that starting from $~x_2,~$ and considering the variables in ascending order by variable index, the first $~z~$ variables are equal to $~0,~$ and then the next $~o~$ variables are equal to $~1.$

Range of Variables
Before specifying the formula for $~g(n,r,z,o),~$ it is important to establish upper and lower bounds (where appropriate) for $~n, ~r, ~z,~$ and $~o.$

For simplicity, I will require that $~n,~$ which represents the length of the string, is an element in $~\Bbb{Z_{\geq 6}}.$

Considering the significance of the variable $~T_r,~$ which requires that $~r \leq (n-2),~$ I will require that $~r \in \{2,3,\cdots,n-2\}.$

In considering the upper and lower bounds for $~z~$ and $~o,~$ note that both $~x_1~$ and $~x_{r+1}~$ are permitted to be any non-negative integers. Therefore, you never have to be concerned that the sum represented by $~x_2 + x_3 + \cdots + x_r~$ is too low. However, you must not have the sum $~x_2 + x_3 + \cdots + x_r > (n-2-r).~$

With this in mind, you can always have $~z~$ be as large as $~(r-1).$
Suppose however, that $~z < r-1.~$ Then, the minimum sum of $~x_2 + x_3 + \cdots + x_r~$ will be achieved when the $~(r-1-z)~$ remaining variables from $~\{x_2,\cdots, x_r\},~$ are each equal to $~1.~$ So, the minimum sum, which will be $~(r - 1 - z),~$ must not exceed $~(n-2-r).~$

Therefore, you must have that

$$(r - 1 - z) \leq (n - 2 - r) \implies z \geq (2r + 1 - n).$$

So, the allowable values for $~z~$ are

$$z \in \Bbb{Z}, ~z \geq \max\left[ ~0, ~(2r + 1 - n) ~\right], z \leq (r-1).$$

Note that $~r \leq (n-2) \implies (2r + 1 - n) \leq (r - 1).$

To determine the allowable values for $~o,~$ assume that $~z~$ is some element in the range shown immediately above. Then, at first glance, you must have $~0 \leq o \leq (r - 1 - z).~$ Further, for a given value of $~z~$ and $~o,~$ the minimum value for $~x_2 + \cdots + x_r~$ will be

$$~o + 2(r - 1 - z - o) = 2r - 2 - 2z - o,$$

which must not exceed $~(n - r - 2).$

Therefore,

$$(2r - 2 - 2z - o) \leq (n - r - 2) \implies o \geq 3r - 2z - n.$$

So, for the variable $~o,~$ you have the lower bounds of $~0,~$ and $~(3r - 2z - n),~$ and the upper bound of $~(r - 1 - z).$

Also, since $~z \geq 2r + 1 - n,~$ you can never have $~(3r - 2z - n) > (r - 1 - z).$

Therefore, the allowable values for $~o~$ are

$$o \in \Bbb{Z}, ~o \geq \max[ ~0, ~(3r - 2z - n) ~], ~o \leq (r - 1 - z).$$

Computation of $~\mathbf{g(n,r,z,o)}$

Assuming that the variables $~n, ~r, ~z, ~$ and $~o~$ are all in range, Stars and Bars theory provides the computation of the number of solutions.

The original sum is $~(n-2-r),~$ and the original number of variables is $~(r + 1).~$ Assuming that the $~z~$ variables are ignored, the sum is still $~(n-2-r),~$ and the number of variables is now $~(r + 1 - z).$

Next, you employ the change of variables $~a_i = x_i - 1,~$ for each of the $~o~$ variables exactly equal to $~1.~$ So, this reduces the sum from $~(n-2-r)~$ to $~(n-2-r-o),~$ and reduces the number of variables from $~(r + 1 - z),~$ to $~(r + 1 - z - o).$

The last thing to deal with is that while the two variables $~x_1,~$ and $~x_{r+1}~$ are permitted to be in $~\Bbb{Z_{\geq 0}},~$ the $~(r - 1 - z - o)~$ other variables must each be in $~\Bbb{Z_{\geq 2}}.$

Therefore, you must employ the change of variables $~y_i = x_i - 2,~$ on these $~(r - 1 - z - o)~$ variables, which reduces the sum to

$$(n - 2 - r - o) - [ ~2 \times (r - 1 - z - o) ~] = n - 3r + 2z + o.$$

So, the number of solutions must correspond to the number of solutions to

  • $x_1 + x_2 + \cdots + x_k = M.$
  • $x_1, \cdots, x_k \in \Bbb{Z_{\geq 0}}.$
  • $k = (r + 1 - z - o), ~M = n - 3r + 2z + o.$

By Stars and Bars theory, the number of solutions is

$$\binom{M + [k-1]}{k-1} = \binom{[n - 3r + 2z + o] + [r - z - o]}{r - z - o}$$

$$= \binom{n - 2r + z}{r - z - o}.$$

To compute the number of unspecified characters, first assume that $~z = 0 = o.$ Then, you will have $~n - 3r~$ unspecified characters. For each variable in $~\{x_2,\cdots,x_r\},~$ that is equal to $~0,~$ you are free-ing up two character positions, because you are having an intersection of two subsets use $~4~$ character positions, rather than $~6.~$ Similarly, for each variable in $~\{x_2,\cdots,x_r\},~$ that is equal to $~1,~$ you are free-ing up one character position.

Therefore, the number of unspecified character positions is

$$n - 3r + 2z + o.$$

Note that since $~o \geq ( ~3r - 2z - n),~$
you have that $~(2z + o) \geq (~3r - n).$

Therefore,

$$g(n,r,z,o) = \binom{n - 2r + z}{r - z - o} \times 3^{( ~n - 3r + 2z + o ~)}.$$


$\underline{\text{Analytical Computation of} ~T_{12}}$

To utilize the helper functions, you simply have to establish the viable values of $~(z,o),~$ for $~r = 12, n = 20,~$ and then plug in the formulas.

Since $~(2r + 1 - n) = 5, ~$ you have that

  • $~z \in \{5,6,\cdots,11\}.$

  • Since $~(3r - n) = 16,~$ you have that the lower bound of $~o~$ is
    $16 - 2z ~: ~z \leq 8~$ and $ ~0 ~: ~z \geq 9$,
    while the upper bound for $~o~$ is $~(11-z).$

Therefore, for $~n = 20,~$ you have that

$$T_{12} = \sum_{z=5}^{8} \left[ ~\sum_{o = 16 - 2z}^{11-z} f(12,z,o) \times g(20,12,z,o) ~\right] $$

$$+ \sum_{z=9}^{11} \left[ ~\sum_{o = 0}^{11-z} f(12,z,o) \times g(20,12,z,o) ~\right]. $$


$\underline{\text{Closed form expression of} ~\displaystyle \sum_{r=1}^{n-2} (-1)^{r+1}T_r}$

For simplicity, it is assumed that $~n \in \Bbb{Z_{\geq 6}}.$

$$T_1 = (n-2) \times 3^{n-3}.$$

The remainder of this section assumes that $~r \in \Bbb{Z}, ~2 \leq r \leq (n-2).$

The allowable range for the variable $~z~$ is

$$z \in \Bbb{Z}, ~z \geq \max\left[ ~0, ~(2r + 1 - n) ~\right], z \leq (r-1).$$

The allowable range for the variable $~o~$ is

$$o \in \Bbb{Z}, ~o \geq \max[ ~0, ~(3r - 2z - n) ~], ~o \leq (r - 1 - z).$$

With $~z~$ and $~o~$ in range, the helper functions are

$$f(r,z,o) = \binom{r-1}{z} \times \binom{r-1-z}{o}$$

and

$$g(n,r,z,o) = \binom{n - 2r + z}{r - z - o} \times 3^{( ~n - 3r + 2z + o ~)}.$$

Then,

$$T_r = \sum_{z ~\text{in range}} \left[ ~\sum_{o ~\text{in range}} f(r,z,o) \times g(n,r,z,o) ~\right].$$

user2661923
  • 42,303
  • 3
  • 21
  • 46