24

Per recommendation I am reposting this from Stack Overflow.

Recently I have been thinking about following issue.

Consider the code for a standard "Hello world!" program:

main()
{
    printf("Hello World");

}

Now almost any change in this code will make it completely useless, in fact almost every change will prevent the code from compiling. For example:

main(5
{
    printf("Hello World");

}

Now to the actual question. Is there a programming language where every possible combination of symbols - that is, every expression - makes sense? I tried thinking about some sort of solution and came up with two:

  1. Postfix with a limited number of variables. Essentially all variables are already defined before you write any code and you have to work just with them. Theoretically you can than perform an arbitrary number of operations by forming a chain of many simple programs, each one of them feeding results to others. Code could be written as a series of characters in postfix notation;

  2. "Postfix" with a stack of variables. Variables are stored on a stack; every operation takes two variables from the top and puts the result in their place. The program ends when it reaches the last operation or variable.

Personally I hate both of these. Not only are they limited, they are inelegant. They are not even real solutions, more like workarounds, essentially "offshoring" some work to an external process.

Does anyone have any other idea how to solve this problem?

user1561358
  • 961
  • 1
  • 8
  • 10

8 Answers8

33

Redcode, the assembly language behind codewars, was explicitly written to have very few halting instructions, because the code often gets mangled before it finally gives out, and the more opportunities it has to halt, the less interesting the game is.

You see very few such languages in practice because we don't just want a program to run, we want it to run in the way we expect. If you can make a typo and change the way the program ran, it must be acceptably close to the original expected behavior, or the programmers with seethe in frustration.

There is some precedence for such things by using natural languages rather than formal languages, but it's not what I would call a large field when you compare it to the use of formal languages. If you're interested in such programming languages, the natural language processing community is where I'd look.

Another field you could look at is genetics. There are remarkably few genetic sequences which are simply invalid. Plenty of them that aren't very effective at reproductions, but very few invalid ones.

Cort Ammon
  • 3,522
  • 14
  • 16
17

The idea of a universal Turing machine uses just such a "programming language": a coding of Turing machines as natural numbers, represented for example in binary, such that every natural number denotes a Turing machine, i.e., program. In this language, every string of zeroes and ones is a program.

If you're worried that some numbers might code invalid programs, this can be side-stepped as follows. Imagine writing out all strings in the character set of your programming language (say, Java), in lexicographic order, starting with strings of length one, then two, then three, ... Then, make a new programming language by letting the number $n$ stand for the $n$th string in the list that's a valid Java program. In the new programming language, programs are just natural numbers and every natural number is a valid program.

I'm sure there are also esoteric programming languages where every string is a program; however, if you're just asking for a list of those, I think your question is off-topic here.

David Richerby
  • 82,470
  • 26
  • 145
  • 239
14

Check out Jot, a Turing-complete language based on combinatory logic, where every sequence of 0s and 1s (including an empty sequence) is a valid program.

Petr
  • 2,265
  • 1
  • 17
  • 21
12

Extending a programming language so that every expression makes sense is always possible, but not interesting. For example, you can just assign the meaning “do nothing” to any expression that the original language rejects.

Designing a programming language where every expression makes sense in a “you can execute it” way is not particularly useful. A good programming language is not just one where a monkey can type on a keyboard and write a valid program, but one where a programmer can easily write the program that they intended to write. Writing valid programs is not the difficult part of programming: the difficult part is writing a program that performs what was expected of it. Rejecting obviously incorrect programs is very helpful in this respect.

Another way to tackle this is to fully define the semantics of all possible inputs, including specifying what compile-time, load-time or run-time error should be generated for each input if any. That is, “abort the program after printing Syntax error at line 42 on the standard error stream” is part of the defined semantics of the language. Every expression “makes sense” in that it has a defined meaning. Is that a useful meaning? Maybe — after all, if the program is obviously wrong, rejecting it is useful.

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
8

One nice example is whitespace. In the language proper, any combination of operators are valid. The operators are space, tab and newline (specifically "\n"). All other characters are considered comments.

This answer, and indeed your question (as well as this entire web page) are examples of valid whitespace programs (though they may not do anything particularly interesting).

slebetman
  • 699
  • 3
  • 7
7

I would like to address the idea that many posters have given, that such a language would be "useless". Perhaps it would be useless for humans to write, manually, with the intention of solving some particular task. However, despite being a majority use-case for programming languages, that's certainly not the only use-case. Several use-cases come to mind where such a language is useful, and we can look to those fields for examples of such languages.

Firstly Cort Ammon's allusion to genetics is spot on: the program transformation in the question (substituting ) for 5) can be seen as a mutation. This kind of manipulation is common in the field of evolutionary computation; in particular genetic algorithms perform such transformations on strings, whilst genetic programming transforms programs. In either case, we usually want to assign meaning to every possibility, since that will produce the most compact search space.

Genetic algorithms rely on some sort of evaluation function for strings; if we use a programming language interpreter as our evaluation function, then we have a scenario where a programming language which assigns meaning to all possible strings is useful. In genetic programming, it is assumed that our evaluation function is a programming language interpreter, but we may choose various representations for our programs; for example, many systems operate on abstract syntax trees. If we choose strings as our representation, then we recover the same scenario as with genetic algorithms.

Another situation where we may want every string to be a valid program is when enumerating programs. This is related to the bijection mentioned by CodesInChaos, but we may prefer to operate on strings rather than Natural numbers for several reasons:

  • If there is some structure in the language, eg. we can assign meaning to sub-strings, this may be lost when translating to Natural numbers. In this case we may prefer to use strings, in order to reason about and transform sub-strings locally, rather than representing the whole-program as a number. This is analogous to how we might prefer to use bitwise operations on an int rather than arithmetic expressions, when each bit has an individual meaning. This is basically a generalisation of the evolutionary scenario.
  • We may want to generate the programs on demand; for example, we might begin executing a program which is completely undetermined, and only generate (eg. randomly) the individual instructions (eg. characters) when/if the instruction pointer reaches them. This is common in algorithmic information theory, where the program is a Turing machine tape, and the aim is to characterise the behaviour of randomly-generated programs. For example, we can formulate the Solomonoff prior over arbitrary strings as the probability that a universal Turing machine with a random tape will output that string.

In terms of example languages, many evolutionary computation systems are based on stack languages like the Push family. These tend to allow arbitrary streams of tokens (which we could represent as individual characters). Sometimes (like with BrainSlugs83's Brainfuck example) there are restrictions on balancing parentheses; however, we can relate this to self-delimiting programs, in that a string like [ may not be a valid program, but it is a valid program prefix. If we imagine a compiler/interpreter reading source code from stdin, then it won't reject a string like [, it will simply wait for more input before continuing.

Languages like Binary Combinatory Logic and Binary Lambda Calculus arose directly out of work on algorithmic information theory, eg. from http://tromp.github.io/cl/cl.html

This design of a minimalistic universal computer was motivated by my desire to come up with a concrete definition of Kolmogorov Complexity, which studies randomness of individual objects.

Warbo
  • 642
  • 3
  • 13
2

Real programming languages are to convey meaning to people, not computers. As plenty of fun texts with almost randomly shuffled letters floating around the 'ńet show, people can read gibberish and make sense out of it, even without overtly noticing the mangling. Just think back how hard it is to find typos and other such errors in texts.

A programming language like what you ask for would make people understand what they want to read, not what is written down. Debugging in languages where there are a limited set of legal statements, where there is not much ambiguity possible, is already hard enough. Good languages reduce possible interpretations due to e.g. transposed symbols or typos. Natural languages are also notorious for their redundancy, for the same sort of reason.

vonbrand
  • 14,204
  • 3
  • 42
  • 52
1

In the Brainfuck Programming Language, almost every possible binary expression can be interpreted as a program. -- That is, you could take a completely good program, type a bunch of garbage into it, and, it would still be compilable / interpretable without any issues.

(Edit: It turns out you have to match opening and closing square brackets, but otherwise, the above is true.)

It achieves this via these two simple methods:

  1. All of the commands it understands are a single byte (In BF they are all single ASCII characters for example*).

  2. All of the characters it does not understand, it discards as comments.

The programming language is Turing complete (meaning, it can do anything that any other language can do).

*: It turns out not all BF commands are a single ASCII byte -- i.e. brackets MUST be matched -- so as is, that language doesn't meet the first criteria there. -- But any language that does meet both criteria would fulfill what the OP is asking for.

BrainSlugs83
  • 135
  • 4