Wednesday, 30 May 2007

Garbage collecting in .NET

The .NET framework have automatic garbage collection, and uses an algorithm called “mark and compact” which is an variant of the “mark and sweep” algorithm, where the heap is compacted after each garbage collection. This makes the allocation very fast, since the heap is always compacted and the next free slot is just the bottom of the heap (the heap grows downwards)

Research has shown that you can split objects on the heap into two groups, the young ones which will only survive until the first garbage collection and the old ones which will stay in the heap through out the execution. The .NET garbage collector uses this by only garbage collecting the young generation each time, and the old only when its needed. [For more information read the article, or ask me]
Even though the garbage collector is automatic it doesn’t mean that you can make programs which will suffocate the GB. I the following is a list of some of the problems which can cause long GB breaks.


• Too many allocations: Its very easy to create many temporary elements which will make the GB pause long. Don’t use f.x. String.Split, creates a new String object for each split
• Don’t make too large allocations
Its very cheap to make new allocations, and too big allocations will trigger the GB much more often, which is expensive
• Too many pointers
Structures with many pointers takes time to for the GB, because it has to run through all of them. For long lived structures this is not a problem, but if these structures will be made on a transitory basis. Even with a long lived structure can give problems, if the structure is changed over time, this can endup being distributed through out the heap and thereby make it harder for the GB to “compact” the heap
• Too many roots
Too many roots will make the mark process too long. Making deep recursive methods with many object pointers, is also a no no.
• Too many object writes.
Write to an object, and its pointers have to be checked by the GB, because the object will be registered in the card table
• Too many Almost-Long-Life objects
The biggest pitfall of them all. To avoid these kinds of objects, your best lines of defense go like this:
1. Allocate as few objects as possible, with due attention to the amount of temporary space you are using.
2. Keep the longer-lived object sizes to a minimum.
3. Keep as few object pointers on your stack as possible (those are roots).
• Use only Finalization when its necessary.
If an object is dead but has a finalizer it will be kept alive until the GB have time to run the metod body, and also all the referenced objects inside the object. Therefore use only a finalizer when its really needed. Move resources which need finalizing to a root object, to minimize the size of elements in inferno.
In many cases it is much better to implement the IDisposable interface.

To get the best out of the allocator you should consider practices such as the following:
• Allocate all of the memory (or as much as possible) to be used with a given data structure at the same time.
• Remove temporary allocations that can be avoided with little penalty in complexity.
• Minimize the number of times object pointers get written, especially those writes made to older objects.
• Reduce the density of pointers in your data structures.
• Make limited use of finalizers, and then only on "leaf" objects, as much as possible. Break objects if necessary to help with this.
Personally I think that these requirements above is a bit to restrict, because you will find other perspectives which will contradict these requirements. But I added them here as a guideline.

[GB1] Mariani, Rico: Garbage Collector Basics and Performance Hints, 2003 (Link)

No comments: