13

Garbage collections have to visit all objects that are alive, so as to find the memory that can be reclaimed. (Having many generations’ just delays this a bit)

All things being equal, it is clearly better to first visit the object that are already paged into RAM, before paging other block in and therefore paging out some object.

Anther possibility is that when the OS wishes to take a page of ram away from the process, the GC is first asked if it has a page that can be given up without needing to be paged out. The GC may be mostly done with moving objects from a page, so can clear that page within the time limit the OS has for needing a page.

Yet, I cannot recall any garbage collector that integrates with the OS paging system that drive the order the GC works in.

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
Ian Ringrose
  • 809
  • 6
  • 12

2 Answers2

8

As I recall, copy collectors are supposed to be paging friendly, as the tracing by copying tends to improve the locality of pointer references. This has a positive effect on the program (mutator) that will cause less page faults when following links, and will also improve the next collection cycle as tracing will also cause less page faults. The tracing agenda (which pointers should be processed first) can have an impact on the effectiveness for improving data locality. This may be improved by mesuring statistics on the number of access to different pointers in different types of cells.

Now, if you consider a tracing collector in general, you must usually maintain a structure that keeps track of the pointers that have not been traced yet. It may be possible to organize this structure so that all waiting pointers pointing in the same page will be kept together (though that may take more space, in some cases, depending on the available techniques to keep the list of such pointers). A possible policy is then to always trace first the largest set of waiting pointers pointing to the same page, when there is no waiting pointer left to the pages in memory.

Regarding the question in the third paragraph, that was added after I answered, copy collection is again an answer. The OS may reduce the number of allocated physical pages at collection time, since the pages are completely freed. With a mark and sweep collector, the event of a full page beeing free is probably much rarer, thus not worth a specific machanism to be taken into account.

This kind of ideas is natural, and is probably described in some of the papers. But I do not recall it off hand. I think the early papers on Lisp GC contain some of these ideas (such as: should car or cdr be followed first?).

The good news in this role of copy-collection is also that paging is friendly to copy collection since it increases the available storage space. Recall that the copy collector requires in principle twice as much space as used for actual data storage. Now, the effect of paging depends also on the address space of the machine, and the physical memory available. In older computer, physical memory was much less than available address space, so that paging was really a space bonus, allowing policies such as copy GC. Even when physical space is as big as the address space, one might want to share it, so that the process using a GC would have less address space without paging (see paging). These remarks are somewhat superseeded by the use of generational collectors. They generally use copy collection for the young generation precisely because of these qualities, and because the young generation is mostly short lived.

Then you have all the interactions of generational GC with the cache system, that has been discussed in a previous question: Are generational garbage collectors inherently cache-friendly?

For more information on these issue, I would search the web with, for example, the keywords garbage collection and locality.

babou
  • 19,645
  • 43
  • 77
8

Emery Berger, Matthew Hertz & Yi Feng did some work on this.

Garbage collection offers numerous software engineering advantages, but interacts poorly with virtual memory managers. Existing garbage collectors require far more pages than the application's working set and touch pages without regard to which ones are in memory, especially during full-heap garbage collection. The resulting paging can cause throughput to plummet and pause times to spike up to seconds or even minutes.

I present a garbage collector that avoids paging. This bookmarking collector cooperates with the virtual memory manager to guide its eviction decisions.

This is a video of Emery’s talk on it, and he wrote a paper Garbage Collection Without Paging

For some reasons there does not seem to be much later work on it, or any “real world” usage. At the end of the paper it says “We are developing a concurrent variant of the bookmarking collection algorithm”, but I can’t track it down.

CRAMM: Virtual Memory Support for Garbage-Collected Applications looks at changing the OS to make GC create less paging.

Using Page Residency to Balance Tradeoffs in Tracing Garbage Collection

We introduce an extension of mostly copying collection that uses page residency to determine when to relocate objects. Our collector promotes pages with high residency in place, avoiding unnecessary work and wasted space. It predicts the residency of each page, but when its predictions prove to be inaccurate, our collector reclaims unoccupied space by using it to satisfy allocation requests.Using residency allows our collector to dynamically balance the tradeoffs of copying and non-copying collection. Our technique requires less space than a pure copying collector and supports object pinning without otherwise sacrificing the ability to relocate objects.Unlike other hybrids, our collector does not depend on application-specific configuration and can quickly respond to changing application behavior. Our measurements show that our hybrid performs well under a variety of conditions; it prefers copying collection when there is ample heap space but falls back on non-copying collection when space becomes limited.

Ian Ringrose
  • 809
  • 6
  • 12