63

Safe programming languages (PL) are gaining popularity. What is the formal definition of safe PL? For example, C is not safe, but Java is safe. I suspect that the property “safe” should be applied to a PL implementation rather than to the PL itself. If so, how do we define what is a safe PL implementation?

D.W.
  • 167,959
  • 22
  • 232
  • 500
beroal
  • 781
  • 1
  • 5
  • 12

10 Answers10

100

There is no formal definition of "safe programming language"; it's an informal notion. Rather, languages that claim to provide safety usually provide a precise formal statement of what kind of safety is being claimed/guaranteed/provided. For instance, the language might provide type safety, memory safety, or some other similar guarantee.

D.W.
  • 167,959
  • 22
  • 232
  • 500
42

If you can get your hands on a copy of Benjamin Pierce's Types and Programming Languages, in the introduction, he has a good overview on various perspectives on the term "safe language".

One proposed interpretation of the term that you might find interesting is:

“A safe language is completely defined by its programmer’s manual.” Let the definition of a language be the set of things the programmer needs to understand in order to predict the behavior of every program in the language. Then the manual for a language like C does not constitute a definition, since the behavior of some programs (e.g., ones involving unchecked array accesses or pointer arithmetic) cannot be predicted without knowing the details of how a particular C compiler lays out structures in memory, etc., and the same program may have quite different behaviors when executed by different compilers.

So, I would be hesitant to use the term "unsafe" to refer to a programming language implementation. If an undefined term in a language produces different behavior in different implementations, one of the implementations might product behavior that might be more expected, but I wouldn't call it "safe".

martin
  • 545
  • 3
  • 5
32

When we call a language “safe” in some respect, that formally means that there’s a proof that no well-formed program in the language can do something we consider dangerous. The word “safe” is also used less formally, but that’s what people here understand your question to mean. There are many different definitions of properties we want a “safe” language to have.

A few important ones are:

  • Andrew Wright and Matthias Felleisen’s definition of “type soundness”, which is cited in many places (including Wikipedia) as an accepted definition of “type safety,” and their 1994 proof that a subset of ML meets it.

  • Michael Hicks lists several definitions of “memory safety” here. Some are lists of types of errors that cannot occur, and some are based on treating pointers as capabilities. Java guarantees that none of those errors are possible (unless you explicitly use a feature marked unsafe) by having a garbage collector manage all allocations and deallocations. Rust makes the same guarantee (again, unless you explicitly mark code as unsafe), through its affine type system, which requires a variable to be either owned or borrowed before being used at most once.

  • Similarly, thread-safe code is usually defined as code that cannot exhibit certain kinds of bugs involving threads and shared memory, including data races and deadlocks. These properties are often enforced at the language level: Rust guarantees that data races cannot occur in its type system, C++ guarantees that its std::shared_ptr smart pointers to the same objects in multiple threads will not delete an object prematurely or fail to delete it when the last reference to it is destroyed, C and C++ additionally have atomic variables built into the language, with atomic operations guaranteed to enforce certain kinds of memory-consistency if used correctly. MPI restricts interprocess communication to explicit messages, and OpenMP has syntax to ensure that access to variables from different threads is safe.

  • The property that memory will never leak is often called safe-for-space. Automatic garbage collection is one language feature to ensure this.

  • Many languages have a guarantee that its operations will have well-defined results and its programs will be well-behaved. As supercat gave an example of above, C does this for unsigned arithmetic (guaranteed to wrap around safely) but not signed arithmetic (where overflow is allowed to cause arbitrary bugs, because C needed to support CPUs that do wildly-different things when signed arithmetic overflows), but then the language sometimes silently converts unsigned quantities to signed ones.

  • Functional languages have a large number of invariants that any well-formed program is guaranteed to maintain, for example, that pure functions cannot cause side-effects. These may or may not be described as “safe.”

  • Some languages, such as SPARK or OCaml, are designed to facilitate proving program correctness. This may or may not be described as “safe” from bugs.

  • Proofs that a system cannot violate a formal security model (hence the quip, “Any system that’s provably secure probably isn’t.”)

Davislor
  • 1,241
  • 9
  • 12
21

Safe is not binary, it's a continuum.

Informally speaking, safety is meant by opposition to bugs, the 2 most often mentioned being:

  • Memory Safety: the language and its implementation prevent a variety of memory related errors such as use-after-free, double-free, out-of-bounds access, ...
  • Type Safety: the language and its implementation prevent a variety of type related errors such as unchecked casts, ...

Those are not the sole classes of bugs that languages prevent, data-race freedom or deadlock freedom is rather desirable, proofs of correctness are pretty sweet, etc...

Simply incorrect programs are rarely considered "unsafe" however (only buggy), and the term safety is generally reserved for guarantees affecting our ability to reason about a program. Thus, C, C++ or Go, having Undefined Behavior, are unsafe.

And of course, there are languages with unsafe subsets (Java, Rust, ...) which purposefully delineate areas where the developer is responsible for maintaining the language guarantees and the compiler is in "hands-off" mode. The languages are still generally dubbed safe, despite this escape hatch, a pragmatic definition.

Matthieu M.
  • 550
  • 5
  • 14
15

While I don't disagree with D.W.'s answer, I think it leaves one part of "safe" unaddressed.

As noted, there are multiple types of safety promoted. I believe it's good to understand why there are multiple notions. Each notion is associated with the idea that programs suffer especially from a certain class of bugs, and that programmers would be unable to make this specific kind of bug if the language blocked the programmer from doing so.

It should be noted that these different notions therefore have different classes of bugs, and these classes are not mutually exclusive nor do these classes cover all forms of bugs. Just to take D.W.'s 2 examples, the question whether a certain memory location holds a certain object is both a question of type safety and memory safety.

A further criticism of "safe languages" follows from the observation that banning certain constructs deemed dangerous leaves the programmer with the need to come up with alternatives. Empirically, safety is better achieved by good libraries. using code that's already field-tested saves you from making new bugs.

MSalters
  • 927
  • 4
  • 12
8

A fundamental difference between C and Java is that if one avoids certain easily-identifiable features of Java (e.g. those in the Unsafe namespace), every possible action one may attempt--including "erroneous" ones--will have an limited range of possible outcomes. While this limits what one can do in Java--at least without using the Unsafe namespace, it also makes it possible to limit the damage that can be caused by an erroneous program, or--more importantly--by a program which would correctly process valid files but is not particularly guarded against erroneous ones.

Traditionally, C compilers would process many actions in Standard-defined fashion in "normal" cases, while processing many corner cases "in a manner characteristic of the environment". If one were using a CPU which would short out and catch fire if numerical overflow occurred and wanted to avoid having the CPU catch fire, one would need to write code to avoid numerical overflow. If, however, one were using a CPU which would perfectly happily truncate values in two's-complement fashion, one didn't have to avoid overflows in cases where such truncation would result in acceptable behavior.

Modern C takes things a step further: even if one is targeting a platform which would naturally define a behavior for something like numerical overflow where the Standard would impose no requirements, overflow in one portion of a program may affect the behavior of other parts of the program in arbitrary fashion not bound by the laws of time and causality. For example, consider something like:

 uint32_t test(uint16_t x)
 {
   if (x < 50000) foo(x);
   return x*x; // Note x will promote to "int" if that type is >16 bits.
 }

A "modern" C compiler given something like the above might conclude that since the computation of x*x would overflow if x is greater than 46340, it can perform the call to "foo" unconditionally. Note that even if it would be acceptable to have a program abnormally terminate if x is out of range, or have the function return any value whatsoever in such cases, calling foo() with an out-of-range x might cause damage far beyond either of those possibilities. Traditional C wouldn't provide any safety gear beyond what the programmer and underlying platform supplied, but would allow safety gear to limit the damage from unexpected situations. Modern C will bypass any safety gear that isn't 100% effective at keeping everything under control.

supercat
  • 1,281
  • 8
  • 11
6

There are several layers of correctness in a language. In order of increasing abstraction:

  • Few programs are error free (only those for which correctness can be proven). Others mentioned already that error containment is therefore the most concrete safety aspect. Languages which run in a virtual machine like Java and .net are generally safer in this respect: Program errors are normally intercepted and handled in a defined way.1
  • At the next level, errors detected at compile time instead of at run time make a language safer. A syntactically correct program should also be semantically right as much as possible. Of course the compiler cannot know the big picture, so this concerns the detail level. Strong and expressive data types are one aspect of safety on this level. One could say the language should make it hard to make certain kinds of errors (type errors, out-of bound access, uninitialized variables etc.). Run-time type information like arrays which carry length information avoid errors. I programmed Ada 83 in college and found that a compiling Ada program typically contained perhaps an order of magnitude fewer errors than the corresponding C program. Just take Ada's ability to user-define integer types which are not assignable without explicit conversion: Whole space ships have crashed because feet and meters were confused, which one could trivially avoid with Ada.

  • At the next level, the language should provide means to avoid boilerplate code. If you have to write your own containers, or their sorting, or their concatenation, or if you must write your own string::trim() you'll make mistakes. Since the abstraction level rises this criteria involves the language proper as well as the language's standard library.

  • These days the language should provide means for concurrent programming on the language level. Concurrency is hard to get right and perhaps impossible to do correctly without language support.

  • The language should provide means for modularization and collaboration. The strong, elaborate, user-defined types from above help create expressive APIs.

Somewhat orthogonally the language definition should be intelligible; the language and libraries should be well documented. Bad or missing documentation leads to bad and wrong programs.


1 But because usually the correctness of the virtual machine cannot be proven such languages may somewhat paradoxically not be suitable for very strict safety requirements.
2

Please, do not say that every PL has unsafe commands. We can always take a safe subset.

Every language I know of has ways of writing illegal programs that can be (compiled and) run. And every language that I know of has a safe subset. So, what's your question really?


Safety is multidimensional and subjective.

Some languages have lots of operations that are "unsafe". Others have a fewer such operations. In some languages, the default way of doing something is inherently unsafe. In others, the default way is safe. In some languages, there is an explicit "unsafe" subset. In other languages, there is no such subset at all.

In some languages, "safety" refers exclusively to memory safety — a service offered by the standard library and/or the runtime where memory access violations are made difficult or impossible. In other languages, "safety" explicitly includes thread safety. In other languages, "safety" refers to the guarantee that a program will not crash (a requirement that includes not allowing uncaught exceptions of any kind). Finally, in many languages "safety" refers to type safety — if the type system is consistent in certain ways, it is said to be "sound" (incidentally, Java and C# do not have completely sound type systems).

And in some languages, all the different meanings of "safety" are considered subsets of type safety (e.g. Rust and Pony achieve thread safety through properties of the type system).

0

Take this C code:

int* p;
*p = 1;

C is a very unsafe language. p is uninitialised, so it could point anywhere. Likely effects are an immediate crash, overwriting memory somewhere with no bad side effect, or overwriting something important, and then Things Go Wrong. Now we change it to

int* p = NULL;
*p = 1;

On many implementations this will crash because that’s storing to NULL will crash on those implementations. But you may have an optimising compiler that throws this and any following code away!

Now in a safe language, the first example would not compile. The compiler must prove that any variable has been assigned a value before it is used. In the second example, assigning NULL is only allowed if p is a nullable variable, but then assigning through that pointer is not allowed unless the value is tested first successfully so the assignment could not happen.

A safe language also needs to handle the case that a pointer becomes invalid. Examples:

int* p = malloc(100);
int* q = p;
free(p);

Now both p and q are invalid and assignment through either will be unsafe; a safe language prevents this. Or

int* p;
{ int x; p = &x; }
*p = 2;

At the time of the assignment p has changed from a valid pointer to an invalid one because x doesn’t exist anymore. A safe language will prevent this.

gnasher729
  • 32,238
  • 36
  • 56
-2

All languages are converted to the same instruction set for execution so at a very low-level view no language can be more or less inherently safe than another.

The words "safe" and "safety" have in recent decades been mutilated by certain politically oriented parts of English-speaking society, such that the common usage has almost no definition. However, for technical matters I define "safety" and "safe" as a device(physical or procedural) that prevents the unintentional use of something or that makes accidental use substantially more difficult; and the state of being under the protection of such a device.

So, a safe language has some device to limit a particular class of bugs. Limits by their nature often come with some other inconvenience or outright lack of ability. This inconvenience or inability to do certain things may or may not be an acceptable tradeoff, and must be judged for each situation.

This also should not be taken to imply that "unsafe" languages will result in bugs. For example, I don't have safety corks on my forks and yet I have, without much effort, managed for decades to avoid stabbing my eye while eating. It is certainly less effort to use bare forks with a bit of care and skill than the effort that would be needed to use the fork-cork. (The fork-cork is an old obscure pop culture reference.) The saved effort frees up resources I can use to improve other parts of my life.

Max Power
  • 135
  • 6