6

I have a language where each string in the language has even amount of $0$'s as $1$'s (e.g., $0101$, $1010$, $1100$, $0011$, $10$ are all in the language). I was hoping to define a context-free grammar that describes this language. After defining a context-free grammar I want to formally prove that this context-free grammar describes this language.

I've came up with the context-free grammar production rules: $$ \begin{align*} &S\to0S1S \\ &S\to1S0S \\ &S\to\epsilon \end{align*} $$ Is this the correct context free grammar to define this language?

Im kind of stumped for the proving part. I'm guessing I will need some sort of induction?

Raphael
  • 73,212
  • 30
  • 182
  • 400
Andrew Reynolds
  • 191
  • 2
  • 8

2 Answers2

6

Your grammar does work. And you are correct in assuming that induction is a worthwile proof-technique. As a hint, you can use the following induction hypothesis:

IH: All the words of length at most $n$ that can be derived from $S$ have an equal amount of $0$'s and $1$'s.

Can you fill in the anchor and step from here?


As D.W. pointed out, the above is only enough to prove that the grammar will not produce any wrong words, or $L(G) \subseteq L$. In order to complete the proof, you also need to show that $L \subseteq L(G)$, i.e. all the words in the language are generated by the grammar.

For this part you can try an induction with the hypothesis

IH: All the words of length at most $n$ that have an equal amount of $0$'s and $1$'s can be derived from $S$.

(Of course, you can combine these two inductions, but having them seperate may be easier to handle if you are unexperienced.)

FrankW
  • 6,609
  • 4
  • 27
  • 42
4

As FrankW mentions, your grammar does generate the language. As D.W. mentions, the induction argument that FrankW suggests easily shows that all the words that your grammar generates have an equal number of 0s and 1s. It is more difficult to prove the other direction, that is, that every word with an equal number of 0s and 1s can be generated by the language. The idea is as follows. Suppose without loss of generality that we are trying to generate a word $w$ which starts with 0. Our goal is to factor $w$ as $w = 0x1y$ in such a way that $x,y$ have an equal number of 0s and 1s (why is that enough?). If $x$ has an equal number of 0s and 1s then $y$ also does (why?), so it is enough to figure out how to find the $0x1$ part. The idea here is to trace the word from left to right, keeping count of the difference between the number of 0s and the number of 1s. For example, for the word $w = 0010110110$, the sequence is $$ \begin{array}{l|c} \text{Subword} & \#_0-\#_1 \\\hline \epsilon & 0 \\ 0 & 1 \\ 00 & 2 \\ 001 & 1 \\ 0010 & 2 \\ 00101 & 1 \\ 001011 & 0 \\ 0010110 & 1 \\ 00101101 & 0 \\ 001011011 & -1 \\ 0010110110 & 0 \end{array} $$ This shows that we can break $w$ as $0|0101|1|0110$ (so $x = 0101$ and $y = 0110$). Another possibility is $w = 0|010110|1|10$. Both of these correspond to $0$ entries in the table. See if you can come up with an argument showing that these $0$ entries must always exist. Don't forget to account for cases like $0|1|\cdots$ ($x = \epsilon$) and $0|\cdots|1$ ($y = \epsilon$).

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514