Given the regular expression $(1 + \varepsilon + 0 )(1 + \varepsilon + 0 )(1 + \varepsilon + 0 )(1 + \varepsilon + 0 )$, how many distinct strings are in the language? How do you determine this from looking at the regular expression? Do I have to generate a table of all possible combinations or is there a more straightforward way?
2 Answers
In your example, think of the result as having filled four slots: _ _ _ _, each of which can take one or three substrings, namely 0, 1, or the empty string. Ignoring the empty strings, it's clear that there are sixteen possible results: 0000, 0001, 0010, ... , 1111.
With the empty strings, though, since we could make 10 by $(\epsilon)(\epsilon)(1)(0)$, or by $(\epsilon)(1)(\epsilon)(0)$, or four other arrangements. What now? Well, if we realize that the six possibilities we just had all correspond to the string 10, we've got it: we'll have all strings over $\{0,1\}$ of length zero, one, two, three, and four, namely
$\epsilon$ (1 of them)
$0, 1$ (2 of these)
$00, 01, 10, 11$ (4)
$000, 001, 010, \dots, 111$ (8)
$0000, 0001, 0010, \dots 1111$ (16)
For a total of $1+2+4+8+16=31 = 2^5-1$. This is a particularly nice example; in general it'll be far more complicated, like, say the the expression $(1 + 01)1(1(0+\epsilon) + (101(10+101)))$. Sad to say, there's no simple rule governing all cases.
By the way, welcome to the site!
- 15,016
- 5
- 43
- 54
Your regular expression can also be written as $(0+1+\epsilon)^4$. It is not hard to check that it captures all strings over $\{0,1\}$ of length at most 4. You can count these in various ways.
As an exercise, try to answer the question for $(0+1+\epsilon)^n$.
More generally, counting the number of strings in a regular expression can be difficult. Indeed, suppose that you are given a CNF $C_1 \land \cdots \land C_m$ over the $n$ variables $x_1,\ldots,x_n$. For each $i \in [m]$, you can write a regular expression $r_i$ of length $O(n)$ for the set of all truth assignments which falsify $C_i$. For example, if $C_i = x_1 \lor x_2 \lor x_3$ then the regular expression is $000(0+1)^{n-3}$. Let $r = r_1 + \cdots + r_m$, whose length is $O(mn)$. The number of strings in $L[r]$ is $2^n$ iff the CNF is unsatisfiable.
- 280,205
- 27
- 317
- 514