19

In programming languages, closures are a popular and often desired feature. Wikipedia says (emphasis mine):

In computer science, a closure (...) is a function together with a referencing environment for the non-local variables of that function. A closure allows a function to access variables outside its immediate lexical scope.

So a closure is essentially a (anonymous?) function value which can use variables outside of its own scope. In my experience, this means it can access variables that are in scope at its definition point.

In practice, the concept seems to be diverging, though, at least outside of functional programming. Different languages implement different semantics, there even seem to be wars of opinons on. Many programmers do not seem to know what closures are, viewing them as little more than anonymous functions.

Also, there seem to exist major hurdles when implementing closures. Most notable, Java 7 was supposed to include them but the feature was pushed back to a future release.

Why are closures so hard (to understand and) to realise? This is too broad and vague a question, so let me focus it more with these interconnected questions:

  • Are there problems with expressing closures in common semantic formalisms (small-step, big-step, ...)?
  • Are existing type systems not suited for closures and can not be extended easily?
  • Is it problematic to bring closures in line with a traditional, stack-based procedure translation?

Note that the question relates mostly to procedural, object-oriented and scripting languages in general. As far as I know, functional languages do not have any problems.

Raphael
  • 73,212
  • 30
  • 182
  • 400

3 Answers3

10

May I direct you to the Funarg problem wikipedia page? At least this is how the compiler people used to reference the closure implementing problem.

So a closure is essentially a (anonymous?) function value which can use variables outside of its own scope. In my experience, this means it can access variables that are in scope at its definition point.

While this definition makes sense, it does not help describe the problem of implementing first-class functions in a traditional runtime-stack based language. When it comes to implementation issues, first class functions can be roughly divided into two classes:

  • Local variables in the functions are never used after the function returns.
  • Local variables can be used after the function returns.

The first case (downwards funargs) is not that hard to implement and can be found on even the older procedural languages, like Algol, C and Pascal. C kind of skirts the issue, since it does not allow nested functions but Algol and Pascal do the necessary bookkeeping to allow inner functions to reference the stack variables of the outer function.

The second case (upwards funargs), on the other hand, requires activation records to be saved outside the stack, in the heap. This means that it is very easy to leak memory resources unless the language runtime includes a garbage collector. While almost everything is garbage collected today, requiring one is still a significant design decision and was even more so some time ago.


As for the particular example of Java, if I remember correctly, the main issue was not actually being able to implement closures, but how to introduce them to the language in a way that was not redundant with existing features (like anonymous inner classes) and that did not clash with existing features (like checked exceptions - a problem that is not trivia l to solve and that most people don't think of at first).

I can also think of other things that make first class functions less trivial to implement, such as deciding what to do with "magical" variables such as this, self or super and how to interact with existing control flow operators, such as break and return (do we want to allow for non-local returns or not?). But in the end, the recent popularity of first-class functions seems to indicate that languages that don't have them mostly do so for historical reasons or due to some significant design decision early on.

hugomg
  • 1,409
  • 1
  • 10
  • 15
5

We can look at how closures are implemented in C#. The scale of the transformations the C# compiler performs makes it clear that their way of implementing closures is quite a lot of work. There may be easier ways to implement closures, but I'd think the C# compiler team would be aware of this.

Consider the following pseudo-C# (I cut out a bit of C# specific stuff):

int x = 1;
function f = function() { x++; };
for (int i = 1; i < 10; i++) {
    f();
}
print x; // Should print 9

The compiler transforms this into something like this:

class FunctionStuff {
   int x;
   void theFunction() {
       x++;
   }
}

FunctionStuff theClosureObject = new FunctionStuff();
theClosureObject.x = 1;
for (int i = 1; i < 10; i++) {
    theClosureObject.theFunction();
}
print theClosureObject.x; // Should print 9

(in reality, the variable f will still be created, where f is a 'delegate' (= function pointer), but this delegate is still associated with the theClosureObject object - I left this part out for clarity for those that aren't familiar with C#)

This transformation is quite massive and tricky: consider closures inside closures and the interplay of closures with the rest of the C# language features. I can imagine that the feature was pushed back for Java, as Java 7 already has quite a lot of new features.

Alex ten Brink
  • 9,206
  • 3
  • 36
  • 63
5

To answer part of your question. The formalism described by Morrisett and Harper covers big- and small-step semantics of higher-order polymorphic languages containing closures. There are papers before these providing the kinds of semantics you are looking for. Look, for example, at the SECD machine. Adding mutable references or mutable locals into these semantics is straightforward. I don't see that there are any technical problems in providing such semantics.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Dave Clarke
  • 20,345
  • 4
  • 70
  • 114