32

Looking for insight into decisions around garbage collected language design. Perhaps a language expert could enlighten me? I come from a C++ background, so this area is baffling to me.

It seems nearly all modern garbage collected languages with OOPy object support like Ruby, Javascript/ES6/ES7, Actionscript, Lua, etc. completely omit the destructor/finalize paradigm. Python seems to be the only one with its class __del__() method. Why is this? Are there functional/theoretical limitations within languages with automatic garbage collection which prevent effective implementations of a destructor/finalize method on objects?

I find it extremely lacking that these languages consider memory as the only resource worth managing. What about sockets, file handles, application states? Without the ability to implement custom logic to clean up non-memory resources and states on object finalization, I'm required to litter my application with custom myObject.destroy() style calls, placing the cleanup logic outside my "class", breaking attempted encapsulation, and relegating my application to resource leaks due to human error rather than automatically being handled by the gc.

What are the language design decisions which lead to these languages not having any way to execute custom logic on object disposal? I have to imagine there is a good reason. I'd like to better understand the technical and theoretical decisions that resulted in these languages not having support for object destruction/finalization.

Update:

Perhaps a better way to phrase my question:

Why would a language have the built-in concept of object instances with class or class-like structures along with custom instantiation (constructors), yet completely omit the destruction/finalize functionality? Languages which offer automatic garbage collection seem to be prime candidates to support object destruction/finalization as they know with 100% certainty when an object is no longer in use. Yet most of those languages do not support it.

I don't think it's a case where the destructor may never get called, as that would be a core memory leak, which gcs are designed to avoid. I could see a possible argument being that the destructor/finalizer may not get called until some indeterminate time in the future, but that didn't stop Java or Python from supporting the functionality.

What are the core language design reasons to not support any form of object finalization?

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
dbcb
  • 423
  • 4
  • 7

8 Answers8

10

The pattern you're talking about, where objects know how to clean their resources up, falls into three relevant categories. Let's not conflate destructors with finalizers - only one is related to garbage collection:

  • The finalizer pattern: cleanup method declared automatically, defined by programmer, called automatically.

    Finalizers are called automatically before deallocation by a garbage collector. The term applies if the garbage collection algorithm employed can determine object life cycles.

  • The destructor pattern: cleanup method declared automatically, defined by programmer, called automatically only sometimes.

    Destructors can be called automatically for stack-allocated objects (because object lifetime is deterministic), but must be explicitly called on all possible execution paths for heap-allocated objects (because object lifetime is nondeterministic).

  • The disposer pattern: cleanup method declared, defined, and called by programmer.

    Programmers make a disposal method and call it themselves - this is where your custom myObject.destroy() method falls. If disposal is absolutely required, then disposers must be called on all possible execution paths.

Finalizers are the droids you're looking for.

The finalizer pattern (the pattern your question is asking about) is the mechanism for associating objects with system resources (sockets, file descriptors, etc.) for mutual reclamation by a garbage collector. But, finalizers are fundamentally at the mercy of the garbage collection algorithm in use.

Consider this assumption of yours:

Languages which offer automatic garbage collection ... know with 100% certainty when an object is no longer in use.

Technically false (thank you, @babou). Garbage collection is fundamentally about memory, not objects. If or when a collection algorithm realizes an object's memory is no longer in use depends on the algorithm and (possibly) how your objects refer to each other. Let's talk about two types of runtime garbage collectors. There are lots of ways to alter and augment these to basic techniques:

  1. Tracing GC. These trace memory, not objects. Unless augmented to do so, they don't maintain back references to objects from memory. Unless augmented, these GCs won't know when an object can be finalized, even if they know when its memory is unreachable. Hence, finalizer calls aren't guaranteed.

  2. Reference Counting GC. These use objects to track memory. They model object reachability with a directed graph of references. If there is a cycle in your object reference graph, then all objects in the cycle will never have their finalizer called (until program termination, obviously). Again, finalizer calls are not guaranteed.

TLDR

Garbage collection is difficult and diverse. A finalizer call cannot be guaranteed before program termination.

kdbanman
  • 136
  • 6
5

In a nutshell

Finalization is not a simple matter to be handled by garbage collectors. It is easy to use with reference counting GC, but this family of GC is often incomplete, requiring memory leaks to be compensated for by explicit triggering of destruction and finalization of some objects and structures. Tracing garbage collectors are much more effective, but they make it much harder to identify object to be finalized and destroyed, as opposed to just identifying the unused memory, thus requiring more complex management, with a cost in time and space, and in complexity of the implementation.

Introduction

I assume that what you are asking is why garbage collected languages do not automatically handle destruction/finalization within the garbage collection process, as indicated by the remark:

I find it extremely lacking that these languages consider memory as the only resource worth managing. What about sockets, file handles, application states?

I disagree with the accepted answer given by kdbanman. While the facts stated there are mostly correct, though strongly biased towards reference counting, I do not believe they properly explain the situation complained about in the question.

I do not believe that the terminology developed in that answer is much of an issue, and it is more likely to confuse things. Indeed, as presented, the terminology is mostly determined by the way the procedures are activated rather than by what they do. The point is that in all cases, there is the need to finalize an object no longer needed with some cleanup process and to free whatever resources it has been using, memory being just one of them. Ideally, all of it should be done automatically when the object is no longer to be used, by means of a garbage collector. In practice, GC may be missing or have deficiencies, and this is compensated for by explicit triggering by the program of finalization and reclamation.

Explicit trigerring by the program is a problem since it can allow for hard to analyze programming errors, when an object still in use is being explicitly terminated.

Hence it is much better to rely on automatic garbage collection to reclaim resources. But there are two issues:

  • some garbage collection technique will allow memory leaks that prevent full reclamation of resources. This is well known for reference counting GC, but may appear for other GC techniques when using some data organizations without care (point not discussed here).

  • while GC technique may be good at identifying memory resources no longer used, finalizing objects contained therein may not be simple, and that complicates the problem of reclaiming other resources used by these objects, which is often the purpose of finalization.

Finally, an important point often forgotten is that GC cycles can be triggered by anything, not just memory shortage, if the proper hooks are provided and if the cost of a GC cycle is considered worth it. Hence it is perfectly OK to initiate a GC when any kind of resource is missing, in the hope of freeing some.

Reference counting garbage collectors

Reference counting is a weak garbage collecting technique, that will not handle cycles properly. It would indeed be weak on destructing obsolete structures, and reclaiming other resources simply because it is weak on reclaiming memory. But finalizers can be used most easily with a reference counting garbage collector (GC), since a ref-count GC does reclaims a structure when its ref count goes down to 0, at which time its address is know together with its type, either statically or dynamically. Hence it is possible to reclaim the memory precisely after applying the proper finalizer, and calling recursively the process on all pointed objects (possibly via the finalizing procedure).

In a nutshell, finalization is easy to implement with Ref Counting GC, but suffers from the "incompleteness" of that GC, indeed due to circular structures, to precisely the same extent that memory reclamation suffers. In other word, with reference count, memory is precisely as poorly managed as other resources such as sockets, file handles, etc.

Indeed, Ref Count GC inability to reclaim looping structures (in general) may be seen as memory leak. You cannot expect all GC to avoid memory leaks. It depends on the GC algorithm, and on the type structure information dynamically available (for example in conservative GC).

Tracing garbage collectors

The more powerful family of GC, without such leaks, is the tracing family that explores the live parts of the memory, starting from well identified root pointers. All parts of the memory that are not visited in this tracing process (which can actually be decomposed in various ways, but I have to simplify) are unused parts of the memory that can be thus reclaimed1. These collectors will reclaim all memory parts that can no longer be accessed by the program, no matter what it does. It does reclaim circular structures, and the more advanced GC are based on some variation of this paradigm, sometimes highly sophisticated. It can be combined with reference counting in some cases, and compensate for its weaknesses.

A problem is that your statement (at the end of the question):

Languages which offer automatic garbage collection seem to be prime candidates to support object destruction/finalization as they know with 100% certainty when an object is no longer in use.

is technically incorrect for tracing collectors.

What is known with 100% certainty is what parts of memory are no longer in use. (More precisely, it should be said that they are no longer accessible, because some parts, that can no longer be used according to the logic of the program, are still considered in use if there is still a useless pointer to them in the program data.) But further processing and appropriate structures are needed to know what unused objects may have been stored in these now unused parts of the memory. This cannot be determined from what is known of the program, since the program is no longer connected to these parts of the memory.

Thus after a pass of garbage collection, you are left with fragments of memory that contain objects which are no longer in use, but there is a priori no way to know what these objects are so as to apply the correct finalization. Furthermore, if the tracing collector is the mark-and-sweep type, it may be that some of the fragments may contain objects that have already been finalized in a previous GC pass, but were not used since for fragmentation reasons. However this can be dealt with using extended explicit typing.

While a simple collector would just reclaim these fragments of memory, without further ado, finalization require a specific pass to explore that unused memory, identify the objects there contained, and apply finalization procedures. But such an exploration requires determination of the type of objects that were stored there, and type determination is also needed to apply the proper finalization, if any.

So that implies extra costs in GC time (the extra pass) and possibly extra memory costs to make proper type information available during that pass by diverse techniques. These costs may be significant as one will often want to finalize only a few objects, while the time and space overhead could concern all objects.

Another point is that the time and space overhead may concern program code execution, and not just the GC execution.

I cannot give a more precise answer, pointing at specific issues, because I do not know the specifics of many of the languages you list. In the case of C, typing is a very difficult issue that lead to the development of conservative collectors. My guess would be that this affects also C++, but I am no expert on C++. This seems to be confirmed by Hans Boehm who did much of the research on conservative GC. Conservative GC cannot reclaim systematically all unused memory precisely because it may lack precise type information on data. For the same reason, it would not be able to systematically apply finalizing procedures.

So, it is possible to do what you are asking, as you know from some languages. But it does not come for free. Depending on the language and its implementation, it may entail a cost even when you do not use the feature. Various techniques and trade-offs can be considered to address these issues, but that is beyond the scope of a reasonably sized answer.

1 - this is an abstract presentation of tracing collection (encompassing both copy and mark-and-sweep GC), things vary according to the type of tracing collector, and exploring the unused part of memory is different, depending on whether copy or mark and sweep is used.

babou
  • 19,645
  • 43
  • 77
4

The object destructor pattern is fundamental to error handling in systems programming, but has nothing to do with garbage collection. Rather, it has to do with matching object lifetime to a scope, and can be implemented/used in any language that has first class functions.

Example (pseudocode). Suppose you have a "raw file" type, like the Posix file descriptor type. There are four fundamental operations, open(), close(), read(), write(). You would like to implement a "safe" file type that always cleans up after itself. (I.e., that has an automatic constructor and destructor.)

I'll assume our language has exception handling with throw, try and finally (in languages without exception handling you can set up a discipline where the user of your type returns a special value to indicate an error.)

You set up a function that accepts a function that does the work. The worker function accepts one argument (a handle to the "safe" file).

with_file_opened_for_read (string:   filename,
                           function: worker_function(safe_file f)):
  raw_file rf = open(filename, O_RDONLY)
  if rf == error:
    throw File_Open_Error

  try:
    worker_function(rf)
  finally:
    close(rf)

You also provide implementations of read() and write() for safe_file (that just call the raw_file read() and write()). Now the user uses the safe_file type like this:

...
with_file_opened_for_read ("myfile.txt",
                           anonymous_function(safe_file f):
                             mytext = read(f)
                             ... (including perhaps throwing an error)
                          )

A C++ destructor is really just syntactic sugar for a try-finally block. Pretty much all I've done here is convert what a C++ safe_file class with a constructor and destructor would compile into. Note that C++ doesn't have finally for its exceptions, specifically because Stroustrup felt that using an explicit destructor was better syntactically (and he introduced it into the language before the language had anonymous functions).

(This is a simplification of one of the ways that people have been doing error handling in Lisp-like languages for many years. I think I first ran into it in the late 1980s or early 1990s, but I don't remember where.)

Wandering Logic
  • 17,863
  • 1
  • 46
  • 87
2

This isn't a full answer to the question, but I wanted to add a couple of observations that haven't been covered in the other answers or comments.

  1. The question implicitly assumes that we're talking about a Simula-style object oriented language, which is itself limiting. In most languages, even those with objects, not everything is an object. The machinery to implement destructors would impose a cost which not every language implementor is willing to pay.

  2. C++ has some implicit guarantees about destruction order. If you have a tree-like data structure, for example, the children will be destroyed before the parent. This is not the case in GC'd languages, so hierarchical resources may be released in an unpredictable order. For non-memory resources, this can matter.

Pseudonym
  • 24,523
  • 3
  • 48
  • 99
2

When the of two most popular GC frameworks (Java and .NET) were being designed, I think the authors expected that finalization would work well enough to avoid the need for other forms of resource management. Many aspects of language and framework design can be greatly simplified if there's no need for all features necessary to accommodate 100% reliable and deterministic resource management. In C++, it's necessary to distinguish between the concepts of:

  1. Pointer/reference that identifies an object which is exclusively owned by the holder of the reference, and which is not identified by any pointers/references the owner doesn't know about.

  2. Pointer/reference that identifies a sharable object which isn't exclusively owned by anyone.

  3. Pointer/reference that identifies an object which is exclusively owned by the holder of the reference, but to which may be accessible through "views" the owner has no way of tracking.

  4. Pointer/reference that identifies an object which is provides a view of an object which is owned by someone else.

If a GC language/framework don't have to worry about resource management, all of the above can be replaced by a single kind of reference.

I would find naïve the idea that finalization would eliminate the need for other forms of resource management, but whether or not such expectation was been reasonable at the time, history has since shown that there are many cases that require more precise resource management than finalization provides. I happen to think that rewards of recognizing ownership at the language/framework level would be sufficient to justify the cost (the complexity has to exist somewhere, and moving it to the language/framework would simplify user code) but do recognize that there are significant design benefits to having a single "kind" of reference--something which only works if the language/framework are agnostic to issues of resource cleanup.

supercat
  • 1,281
  • 8
  • 11
2

Why is the object destructor paradigm in garbage collected languages pervasively absent?

I come from a C++ background, so this area is baffling to me.

The destructor in C++ actually does two things combined. It frees RAM and it frees resource ids.

Other languages separate these concerns by having the GC be in charge of freeing RAM while another language feature take charge of freeing resource ids.

I find it extremely lacking that these languages consider memory as the only resource worth managing.

That's what GCs are all about. They only dono one thing and it is to ensure that you don't run out of memory. If RAM is infinite, all GCs would be retired as there is no longer any real reason for them to exist.

What about sockets, file handles, application states?

Languages can provide different ways of freeing resource ids by:

  • manual .CloseOrDispose() scattered across code

  • manual .CloseOrDispose() scattered within manual "finally block"

  • manual "resource id blocks" (i.e. using, with, try-with-resources, etc) which automates .CloseOrDispose() after the block is done

  • guaranteed "resource id blocks" which automates .CloseOrDispose() after the block is done

Many languages use manual (as opposed to guaranteed) mechanisms which creates an opportunity for resource mismanagement. Take this simple NodeJS code:

require('fs').openSync('file1.txt', 'w');
// forget to .closeSync the opened file

..where the programmer has forgotten to close the opened file.

For as long as the program keeps running, the opened file would be stuck in limbo. This is easy to verify by trying to open the file using HxD and verifying that it can't be done:

enter image description here

Freeing resource ids within C++ destructors is also non-guaranteed. You might think RAII operates like guaranteed "resource id blocks", yet unlike "resource id blocks", the C++ language does not stop the object providing the RAII block from being leaked, so the RAII block may never be done.


It seems nearly all modern garbage collected languages with OOPy object support like Ruby, Javascript/ES6/ES7, Actionscript, Lua, etc. completely omit the destructor/finalize paradigm. Python seems to be the only one with its class __del__() method. Why is this?

Because they manage resource ids using other ways, as mentioned above.

What are the language design decisions which lead to these languages not having any way to execute custom logic on object disposal?

Because they manage resource ids using other ways, as mentioned above.

Why would a language have the built-in concept of object instances with class or class-like structures along with custom instantiation (constructors), yet completely omit the destruction/finalize functionality?

Because they manage resource ids using other ways, as mentioned above.

I could see a possible argument being that the destructor/finalizer may not get called until some indeterminate time in the future, but that didn't stop Java or Python from supporting the functionality.

Java doesn't have destructors.

The Java docs mention:

the usual purpose of finalize, however, is to perform cleanup actions before the object is irrevocably discarded. For example, the finalize method for an object that represents an input/output connection might perform explicit I/O transactions to break the connection before the object is permanently discarded.

..but putting resource-id management code within Object.finalizer is largely regarded as an anti-pattern (cf.). Those code should instead be written at the call site.

For people who use the anti-pattern, their justification is that they might have forgotten to release the resource-ids at the call site. Thus, they do it again in the finalizer, just in case.

What are the core language design reasons to not support any form of object finalization?

There are not many use cases for finalizers as they are for running a piece of code between the time when there are no longer any strong references to the object, and the time when it's memory is reclaimed by the GC.

A possible use case is when you'd like to keep a log of the time between the object is collected by the GC and the time when there are no longer any strong references to the object, as such:

finalize() {
    Log(TimeNow() + ". Obj " + toString() + " is going to be memory-collected soon!"); // "soon"
}
Pacerier
  • 123
  • 5
0

The RAII pattern in C++ is very powerful when you have deterministic object lifetime (stack objects go out of scope, heap objects are deleted), it is widely used to manage both memory and other resources (usually more precious, like file handles, sockets, locks, etc.).

However in GC languages where objects have indeterministic lifetime (GC only happens when there is a memory shortage, or not happen at all if program finishes quickly), it becomes necessary to treat memory management and more precious resource management separately. In these GC languages, these precious resources are usually managed through explicit method call (close(), unlock(), etc.), and sometimes with the help from other language constructs like "try...finally (Java)" or "with .... as ... (Python)".

-1

found a ref on this in Dr Dobbs wrt c++ that has more general ideas that argues destructors are problematic in a language where they are implemented. a rough idea here seems to be that a main purpose of destructors is to handle memory deallocation, and that is hard to accomplish correctly. memory is allocated piecewise but different objects get connected and then the deallocation responsibility/ boundaries are not so clear.

so the solution to this of a garbage collector evolved years ago, but garbage collection is not based on objects disappearing from scope at method exits (that is a conceptual idea that is hard to implement), but on a collector running periodically, somewhat nondeterministically, when the app experiences "memory pressure" (ie running out of memory).

in other words the mere human concept of an "newly unused object" is actually in some ways a misleading abstraction in the sense that no object can "instantaneously" become unused. unused objects can only be "discovered" by running a garbage collection algorithm that traverses the object reference graph and the best performing algorithms run intermittently.

it is possible a better garbage collection algorithm lies waiting to be discovered that can near-instantaneously identify unused objects, which could then lead to consistent destructor calling code, but one has not been found after many years of research in the area.

the solution to resource management areas such as files or connections seems to be to have object "managers" that attempt to handle their use.

vzn
  • 11,162
  • 1
  • 28
  • 52