When I am presented with a problem of finding whether or not the given set is countable, I cannot figure out how to determine it or prove it. The general approach is to compare it with $\mathbb{N}$, but how can one prove that The set of all syntactically correct C program is countable while set of all non-regular languages over $\{0,1\}$ is not.
2 Answers
Some common approaches to prove that some set is countable:
- Give an enumeration, i.e. a list that contains all of the elements of the set. It's fine if the list contains duplicates.
- Show that it is a subset of a countable set.
- Show that it is the countable union of countable sets.
- Show that there exists an injection from this set to a countable set, or a surjection from a countable set to this set.
Some common approaches to prove that some set is uncountable are:
- Show that it is a superset of an uncountable set.
- Show that it is the power set of an infinite set (countable or uncountable, it doesn't matter).
- Show that there is an injection from an uncountable set to this set, or a surjection from this set to an uncountable set.
- Show that the set is of the form $A \setminus B$ where $A$ is uncountable and $B$ is countable.
- Show that the set cannot be enumerated, using a diagonal argument.
In the case of all syntactically correct C programs, it is a subset of the set of all strings over the ASCII characters (i.e. some subset of the language $\{0,\ldots,127\}^{*}$), which is countable. There are a number of ways to show this, but the easiest way is to just enumerate them all in len-lex order.
In the case of all non-regular languages over $\{0,1\}$, note that it is equal to $\mathscr{P}\left(\{0,1\}^{*}\right) \setminus R$ where $R$ is the set of all regular languages.
Since $\{0,1\}^{*}$ is infinite, its power set is uncountable. The set of all regular languages $R$ is, however, countable. To see why, the set of all regular expressions is countable since it is a subset of the free language over a finite set of characters. Therefore, the set of all non-regular languages is uncountable.
EDIT h/t to Yves Daoust for spotting a mistake.
- 341
- 1
- 8
- 24,523
- 3
- 48
- 99
Consider an infinite set $A$. Then, $A$ is countable (or countably infinite) iff there is a bijection $f$ from $\mathbb{N}$ to $A$. This is why we use the term "countable" as intuitively we can count the elements in $A$: $f(1)$ is the first element, $f(2)$ is the second, etc. As $f$ is a bijection, this counting covers every element $a$ in $A$ exactly once, formally, there is a single natural $i\geq 1$ with $f(i) = a$.
As already presented in other answers, there are several approaches to show countability or non-countability, but it is important to understand that it all comes down to whether you can naturally count, meaning, showing a bijection to $\mathbb{N}$.
how can one prove that The set of all syntactically correct C program is countable
One way to do this is to find another representation of a $C$ program into an instance of an object that we can count. For example, here, we show that the set of all DFAs over a fixed alphabet $\Sigma$ is countable by showing that there is an alphabet $\Sigma'$ such that we can encode any DFA over $\Sigma$ as a word over $\Sigma'$. This way, we can count DFAs over $\Sigma$, as we can count words. The formal underlying argument is that there is an injective function from the set of DFAs over $\Sigma$ to the set of finite words over $\Sigma'$. Note that two DFAs that are identical up to the names of letters and states are considered the same DFA under this mapping. Note also that this shows that there are countably many regular languages over $\Sigma$.
A funny different way to prove the above claim would be to say that any DFA over $\Sigma$ can be drawn on an Ipad white board screen, and two DFAs cannot induce the same drawing. Bingo, so we have an injection from the set of DFAs over $\Sigma$ to the countable set of binary images (if you draw with black, and the background is white, then you got a binary image). Why the set of binary images is countable? Well, you can consider an $n\times n$ binary matrix that represents this image, and we know, as written in the other answer that the countable union of countable sets is countable. Note that although we can draw the same DFA in different ways, we can define to map it to the smallest matrix under some order on the the binary matrices. So what I want you to get is that it is all about counting, not matter how weirdly you wish to count.
What about TMs, or graphs, or C programs, as you asked. Well the idea is the same, every C program is eventually compiled into a a number over $\{ 0, 1\}$, and the set $\{ 0, 1\}^*$ is countable. The straight-forward way to see it is that any C program is a long string over some finite alphabet that includes all symbols\characters that you can use to write a C code (well, for new lines and spaces, you can use special letters, etc.)
The set all non-regular languages over {0,1} is not countable.
Here you can use a counting-argument based on Cantor's theorem stating that there is no surjective function from a set $A$ to the power set $P(A)$. Specifically, there are $\aleph$ languages over $\{ 0, 1\}^*$, yet as argued before, only countably many DFAs over $\{0, 1\}$. Therefore, there is a non-regular language over $\{ 0, 1\}$. The underlying formal argument that uses Cantor's theorem suggests that $|\{ 0, 1\}^*| < |P(\{ 0, 1\}^*)|$. In particular, $\aleph_0 < \aleph$. Note that Cantor's theorem, a.k.a the diagonalization method, implies that if $A$ is countable then $P(A)$ is not. So again, all these points or tricks to show countability or non-countability, simply boil down to whether you can or cannot define a bijection to $\mathbb{N}$.
- 5,494
- 1
- 11
- 41