3

I'm trying to solve this problem:

Let $L$ be some infinite language, show that there exists a sub-language of $L$ that is not regular

But can this be correct? If I have the language $\{a\}^*$ for example, that's infinite but you can make a DFA for any sub-language of it, right?

There's a hint that this can be proved using diagonalization, but I think I must be misunderstanding the question.

Raphael
  • 73,212
  • 30
  • 182
  • 400
jtht
  • 133
  • 1
  • 3

5 Answers5

20

The powerset of the language contains all the sublanguages. This will be uncountably infinite, but there are only countably many regular languages (for a particular, finite alphabet)

panofsteel
  • 301
  • 1
  • 2
6

Hint. An infinite language $\mathcal{L}$ doesn't necessarily contain words of all possible lengths but it must contain words of infinitely many different lengths. You can use this to get a non-regular subset of the original language (in fact, even an undecidable subset). For a sketch of how to continue, mouse-over the yellow region below.

Let the set of lengths of words in $\mathcal{L}$ be $\{\ell_1, \ell_2, \dots\}$ and let $U$ be any undecidable set of natural numbers. The language $\{w\in \mathcal{L}\mid |w| = \ell_u \text{ for some } u\in U\}$ is a sublanguage of $\mathcal{L}$ and is undecidable – so it's certainly not regular.

David Richerby
  • 82,470
  • 26
  • 145
  • 239
5

If $L$ itself is not regular, we are done. Otherwise, $L$ satisfies the pumping lemma: there is an integer $p \ge 1$ such that for any $w \in L$ such that $|p| \ge w$, there exist $x$, $y$ and $z$ such that $|y| \ge 1$, $|x y| \le p$ and $\forall i \ge 0, x y^i z \in L$. Since $L$ is infinite, there exists a word in $L$ that satisfies $|p| \ge w$; let $x$, $y$ and $z$ be the corresponding words given by the pumping lemma. The language $L' = \{x y^i z \mid i \in \mathbb{N}\}$ is a subset of $L$. Let $L'' = \{x y^i z \mid i \text{ is prime}\}$: this is a subset of $L$ which is not regular. One way to see that this language isn't regular is that it doesn't satisfy the pumping lemma. Another way is to use the classification of word lengths of regular languages.

There's a stronger result that any infinite language has a subset that is not decidable. Diagonalization is a reasonable way of proving this. I don't think diagonalization would be particularly useful to prove the existence of a non-regular subset.

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
1

You can prove this using diagonalization. Let $R_1,R_2,\ldots$ be an enumeration of the infinite regular languages. We will construct a descending sequence $L = L_0 \supseteq L_1 \supseteq L_2 \supseteq \cdots$ and an increasing sequence $X_0 = \emptyset \subseteq X_1 \subseteq X_2 \subseteq \cdots$ such that $R_n \setminus L_n \neq \emptyset$, and furthermore $L \setminus L_n$ and $X_n$ are finite for all $n$, $X' = \bigcup_n X_n$ and $L' = \bigcap_n L_n$ are infinite, and $L' \supseteq X'$.

At stage $t$, we construct $L_t$ and $X_t$. If $R_t \setminus L_{t-1} \neq \emptyset$, we set $L_t = L_{t-1}$. Otherwise, $R_t \subseteq L_{t-1}$. The set $\Delta = L_{t-1} \setminus R_t$ is infinite, since otherwise $L = R_t \cup \Delta \cup (L \setminus L_{t-1})$ would be regular, and in particular we can choose some $x \in \Delta \setminus X_{t-1}$ and set $L_t = L_{t-1} - x$. In both cases we set $X_t = X_{t-1} + \min L_{t-1}$.

By construction $R_n \setminus L' \supseteq R_n \setminus L_n \neq \emptyset$, and so $L'$ is not equal to any infinite regular language, and so to no regular language.


This arguments works if we replace the class $\mathcal{L}$ of regular languages with any other class which is closed under word addition (i.e., $L \in \mathcal{L}$ implies $L+w \in \mathcal{L}$). The argument is constructive in the sense that the $t$th smallest word is either in $X_t$ or in $L\setminus L_t$. When $L$ is itself regular and $\mathcal{L}$ is the class of regular languages, this makes the resulting language decidable, thus improving on the cardinality argument given by Nathan Dunn. While in this particular case there are more direct solutions, diagonalization could work in more general situations.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
-3

Consider the language $L = \{a^n \mid n \text{ is prime}\}$. $L$ is a subset of $\{a\}^*\!$, which is a regular language but it is not regular so has no DFA.

David Richerby
  • 82,470
  • 26
  • 145
  • 239
muradin
  • 156
  • 1
  • 1
  • 7