Is this counting language context free?

Question

Let $\Sigma = \left\{ 0,\,1,\,2\right\}$. I want to look at the following language: $L=\left\{ xyz \, | \, |x|_0 + |z|_0 = |x|_2 +|z|_2 \wedge y \in \left\{ 1 \right\} ^{*} \right\}$.

I would like to prove or disprove $L$ being context-free

I'm having a very hard time to construct a context-free grammer for $L$, so I attempted to build a non-deterministic PDA.

My attempt goes in the direction of: If i want to sum the zeroes from $x$ and $z$, and fron this sum to substract the sum of $2$s from both $x$ and $z$, I can do it differently:

I can count the number of $0$s from $x$ by inserting a #, then remove one # for every $2$ I count, if negative I'll push @ . Then I will do the same with $z$.

The points are:

I'm not sure if this idea will work?
If this language is CFL, I have no idea how to begin constructing a CFL grammer for it?

score 3 · Answer 1 · edited Jun 16 '20 at 10:30

Your idea is correct. After all, $|x|_0 + |z|_0 = |x|_2 + |z|_2$ if and only if $|x|_0 + |z|_0 - (|x|_2 + |z|_2) = |xz|_0 - |xz|_2 = 0$. Note, however, you must (ab)use nondeterminism correctly so as to guess where $y$ is so as to know what belongs to $x$ and what to $z$.
Producing a CFG from a PDA is, unfortunately, not easy in general. There are cases in which, given a CFL, it is (much) easier to construct one than it is to construct the other. Having experience in these kind of exercises might give you a sudden flash of inspiration; failing that, you can always resort to the standard construction.

In this particular case, your "flash of inspiration" could have come from taking a closer look at the equation on the number of zeroes and two's. The idea is the equation holds for the empty word and we can produce symbols while conserving it; that is, for example, if you generate a zero for $x$ (i.e., $|x|_0$ is increased by one), then you also want to generate a $2$ somewhere so the equation still holds. Hence, a CFG with the following set of productions suffices: $$\begin{align*} S &\to X_0 S X_2 \mid X_2 S X_0 \mid X S X \mid Y \mid \varepsilon, \\ X_0 &\to X 0 \mid 0 X, \\ X_2 &\to X 2 \mid 2 X, \\ X &\to 0 X 2 \mid 2 X 0 \mid 1 X \mid X 1 \mid \varepsilon, \\ Y &\to 1 Y \mid \varepsilon \end{align*}$$ A brief reasoning to make sure every word $w = x y z \in L$ is generated: The ones in $x$ and $z$ are generated by $X$, so we can exclude them. Further, any neighboring pair $02$ or $20$ may be generated by $X$ (and we include $X$ everywhere it is possible for such a pair to appear). Removing all such pairs, the only possible word left is of the form $x = 0^n$ and $z = 2^n$ or $x = 2^n$ and $z = 0^n$, and we can generate $w$ by using the appropriate rules for $S$ containing $X_0$ and $X_2$.

John L. · Answer 2 · 2019-02-01T07:30:13.703

I can count the number of 0s from $x$ by inserting a #, then remove one # for every 2 I count, if negative I'll push @ . Then I will do the same with $$.

This PDA is pretty good. You can certainly continue to convert this PDA to a context-free grammar according to some algorithm such as the one given in the proof of the language of a PDA is context-free, although that procedure is usually rather heavy and less enlightening.

If this language is CFL, I have no idea how to begin constructing a CFL grammer for it?

I would agree that it might not be easy. I would recommend that you should read this answer to how to produce a context-free grammar by Hendrik Jan if you have not yet.

Here is the simplest context-free grammar for $L$. $$S \to 0S2S \mid 2S0S \mid 1S \mid \epsilon$$

Note that $L=\{ xz \mid |x|_0 + |z|_0 = |x|_2 +|z|_2\}=\{ w \mid |w|_0 = |w|_2 \}$. That is, $L$ is the language of words with equal number of 0s and 2s, as you have observed.

Here is a brief reasoning that shows the language of above grammar is $L$.

It is immediate that all words generated by $S$ has equal number of $0$'s and $2$'s.
Suppose $1^kw\in L$ for some $w$ that does not start with 1.
- Suppose $w$ starts with 0. If we count the number of 0s and the number of 2s in $w$ starting from the starting 0, there will be a time that the number of 2s catches up with the number of 0s. So $w=0w_12w_2$ for some $w_1, w_2\in L$.
- Suppose $w$ starts with 2. Similarly, $w=2w_10w_2$ for some $w_1, w_2\in L$.
Combining the above two cases with the base case $1^k$ for $k\ge0$, which can be generated, we see by mathematical induction that all words in $L$ can be generated.

Is this counting language context free?

2 Answers2