Static: The compiler, at the time of compilation (or the program's startup, either approach can work) has knowledge of the desired allocations, and creates space for them out of somewhere in the process' available space. These locations are fixed throughout the lifetime of the program, and cannot be expanded or shrunk--they are literally static throughout the entirety of the application's life. This is how C/C++ handle
static-modified variable declarations, for example. Most C/C++ globals fall into this category.
Stack: The stack is a region of RAM that gets created on every thread that your application is running on. It works in a LIFO (Last In, First Out) manner, meaning that as soon as you allocate – or “push” – memory on to the stack, that chunk of memory will be the first to be deallocated – or “popped.” Every time a function declares a new variable, it is “pushed” onto the stack, and after that variable falls out of scope (such as when the function closes), that variable will be deallocated from the stack automatically. Once a stack variable is freed, that region of memory becomes available for other stack variables.
Due to the pushing and popping nature of the stack, memory management is very logical and is able to be handled completely by the CPU; this makes it very quick, especially since each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor’s cache. However, there are some cons to this form of strict management. The size of the stack is a fixed value, and allocating more onto the stack than it can hold will result in a stack overflow. The size of the stack is decided when the thread is created, and each variable has a maximum size that it can occupy based on its data type; this prevents certain variables such as integers from ever growing beyond a certain value, and forces more complex data types such as arrays to specify their size prior to runtime since the stack won’t let them be resized. Variables allocated on the stack also are always local in nature because they are always next in line to be popped (unless more variables are pushed prior to the popping of earlier variables).
Heap: The heap is a memory set aside for runtime/dynamic memory allocation. Allocation is handled at runtime, and therefore is up to the program to determine structure, means, etc. Once you allocate space, that space can be accessed at any point in time not only throughout just the thread, but throughout the application’s entire life. Deallocation must occur at some point, if the space is to be reused. Most operating systems will, once an application ends, reclaim that space. (Usually by tearing down the process entirely--we're long past the days where programs are obtaining exclusive access to physical addresses.)
Interaction with the heap is usually through some form of reference; in languages that support direct access, these are ‘pointers,’ which are variables whose values are the address of another variable, such as a memory location. By creating a pointer, you ‘point’ at a memory location on the heap, which is what signifies the initial location of your variable and tells the program where to access the value. Due to the dynamic nature of the heap, it is completely unmanaged by the CPU aside from initial allocation and heap resizing; in non-garbage collected languages such as C and C++, this requires you as the developer to manage memory and to manually free memory locations when they are no longer needed. Failing to do so can create memory leaks and cause memory to become fragmented, which will cause reads from the heap to take longer and makes it difficult to continuously allocate more memory onto the heap.
Automated memory management (GC) almost always refers to heap management; I've never heard it applied to any other (static or stack) form of memory allocation and reclamation. Several strategies are possible (and numerous hybrid approaches):
Allocate without reclamation: Technically, this is a legal automatic memory management technique, though obviously it is optimized more for performance of allocation and reclamation (i.e., zero time spent reclamation) than longevity. The JVM has one such allocator, called the Epsilon GC, and is typically used for specific purposes (such as benchmarking or diagnostics).
Compile-time: Compile-time garbage collection is a form of static analysis allowing memory to be reused and reclaimed based on invariants known during compilation.
This form of garbage collection has been studied in the Mercury programming language, and it saw greater usage with the introduction of LLVM's automatic reference counter (ARC) into Apple's ecosystem (iOS and OS X) in 2011.
Reference counting: Each allocated object has a count associated with it indicating how many references are currently pointing to it. When the reference count drops to zero, the object is eligible for reclamation. Reference counts can either be managed automatically (as in, the language/runtime manage it without programmer intervention required) or manually (programmers must ensure they call some kind of
release method or function to indicate a finished state of use). Reference counting is highly vulnerable to mutually-referencing objects (cyclic object graphs) as a source of memory leaks.
Microsoft COM was the first "mainstream" platform to really embrace reference counting as a part of its formal semantics (the
IUnknown interface had three methods, two of which--
Release--managed the reference count of the allocated component), with very mixed results. When COM went distributed (DCOM), reference counts jumped in severity, since now the garbage collection was distributed, and a missed or dropped call could keep an object alive forever.
CORBA used reference counting as well, with much the same result.
Java RMI chose to use a different form of distributed garbage collection, using a "heartbeat" to keep an object alive, and if no such heartbeat came through every so often, the object was assumed to be unreachable and therefore eligible for reclamation.
Objective-C used reference counting, then later moved to "automatic reference counts" (ARC) which was then pushed under the covers by the Swift language.
Some additional ideas/concepts around ref-counting:
Deferred reference counting: In deferred reference counts, local updates to the reference count (that is, those arising from assignments to and from local variables) are "deferred", as long as it is known that the reference count will remain positive when it should be positive. For example, the code to swap the (reference-counted) values contained in two variables a and b naively performs the following operations:
incrc(a); tmp := a; incrc(b); decrc(a); a := b; incrc(tmp); decrc(b); b := tmp decrc(tmp); (tmp goes out of scope)
After all the dust has cleared, after six reference count operations, the reference counts of the objects referred to by a and b (now by b and a) are unchanged. It is more efficient to recognize this and leave the reference counts unchanged.
Lazy Freeing: Note that naive reference counting can take arbitrarily long to return from a decrement, because freeing one object may recursively trigger the freeing of other objects whose counts drop to zero. Instead, an object whose reference count falls to zero may be placed on a queue/list/stack of objects that are no longer in use, but not yet processed onto the free list. Alternatively, when an object is allocated off the free list, the pointers to objects within it may be scanned. This is also known as "Weizenbauming", at least in some circles.
One-bit reference counting: William Stoye's trick in his combinator machine. Reference counts are frequently one, and noting this in the pointer, rather than the object, can save space, cut memory references, and allow the easy recycling of vast amounts of garbage. In a combinator machine, the actual primitives can be parameterized by the reference counts of their operands to recycle memory in place. Note, however, that a combinator machine generates vast amounts of garbage to begin with; this technique is unlikely to work as well in general use.
Smart pointers: Most smart pointers are small "wrappers" around native pointers, carrying a reference count that tries to be as automatic as possible. Most popular in C++ implementations, though some languages (Rust) look to incorporate smart-pointer-type semantics directly into the language. These generally try to then utilize stack-based concepts to help automagically manage reference counts for allocated objects, but circular reference counts are still possible (and probable). In languages like Java or C#, one might imagine that all object references are smart, in that they are known to the platform. (Note: the CLR specifically has object reference pointers--what the CLI Spec calls "managed references"--as well as "native pointers", which are raw locations in memory, without the additional CLR-backed support, for interop with native code.)
Reachability: When moving beyond pointer/reference-based schemes, most automated memory systems need to know which objects are eligible for reclamation, and which aren't--in essence, which are still under the possibility of being used by code. (An automatic memory system should never deallocate an object that is in use.) This analysis is usually known as "reachability" analysis, that is, finding which objects are "reachable" by user code, and therefore unsafe to reclaim. Many systems have multiple possibilities of reachability:
ReferenceQueuemechanic. If you make a weak reference to the thread with a reference queue, your program can be notified when the thread is no longer strongly reachable. Upon receiving this notification, the program can perform any required cleanup of the associated data object. This makes them useful for object-pooling kinds of behavior.
Note: This is using the JVM terminology; the CLR does not have soft or phantom references, and I've never heard of Python or Ruby having anything beyond weak references, either. Any GC language has reachable/unreachable, and any language which has some kind of concept of "destructor" or "finalizer" has to have f-reachable (or it goes to great lengths to invoke those destructors/finalizers the moment the last reference is dropped, which usually implies either some very sophisticated compiler analysis or a reference count buried under the surface, as in Swift/Obj-C's automatic reference counting (ARC).)
Mark-Sweep: A garbage collection pass occurs in two phases: a "mark" phase, to identify the difference between collectible objects and non-collectible objects, and a "sweep" phase, in which the collectible objects are finalized and collected. Pros: simple, Cons: tends to lead to major fragmentation of the heap; the mark phase may require all activity in the garbage collector to cease (so as to avoid problems of marking an object collectible if concurrently it's becoming reachable again--but I think this isn't possible for a lot of systems; see "concurrent mark-and-sweep" below); as the heap gets larger, taking the time to crawl the heap becomes exponentially greater.
Some variations on mark-sweep:
Tricolor marking: Because of these performance problems, most modern tracing garbage collectors implement some variant of the tri-color marking abstraction, but simple collectors (such as the mark-and-sweep collector) often do not make this abstraction explicit. Tri-color marking works as described below.
Three sets are created – white, black and gray:
In many algorithms, initially the black set starts as empty, the gray set is the set of objects which are directly referenced from roots and the white set includes all other objects. Every object in memory is at all times in exactly one of the three sets. The algorithm proceeds as following:
When the gray set is empty, the scan is complete; the black objects are reachable from the roots, while the white objects are not and can be garbage-collected.
Since all objects not immediately reachable from the roots are added to the white set, and objects can only move from white to gray and from gray to black, the algorithm preserves an important invariant – no black objects reference white objects. This ensures that the white objects can be freed once the gray set is empty. This is called the tri-color invariant. Some variations on the algorithm do not preserve this invariant but use a modified form for which all the important properties hold.
The tri-color method has an important advantage – it can be performed "on-the-fly", without halting the system for significant periods of time. This is accomplished by marking objects as they are allocated and during mutation, maintaining the various sets. By monitoring the size of the sets, the system can perform garbage collection periodically, rather than as needed. Also, the need to touch the entire working set on each cycle is avoided.
Mark-and-don't-sweep: This might also be regarded as "lazy sweep" or "incremental sweep". Whenever memory is needed, the allocator scans objects until it discovers one that is marked "unused" and is also large enough for the request. All objects scanned, whether large enough or not, are marked "used". When the scan pointer reaches the end of memory, the sense of the mark bits is reversed (that is, if the current assignment is 0==used and 1==unused, then the new assignment is 0==unused, 1==used). Temporarily, all objects on the heap are now marked "unused". Next, a mark phase is run, which tags all reachable objects as "used" so that they will not be reallocated later.
Snapshot mark-and-sweep: Snapshot mark-and-sweep uses the observation that the set of unreachable objects does not shrink. It is possible (using, for instance, copy-on-write page mapping) to quickly make a copy of the address space, process it concurrently to determine what is garbage, and send that information back to the running process. More garbage may have been generated in the interim, but that is ok, because it will be found in the next collection.
Mark-Sweep-Compact: A mark-sweep algorithm that also takes the time after the sweep to rearrange the objects in memory so as to put available spaces next to one another, reducing fragmentation and making it easier to satisfy future requests. Takes longer, and requires objects in memory to be movable (either by adding a layer of indirection on access, or by allowing pointers to be "fixed up" later to point to the right location).
Generational: A form of copying collector. Based on the observation that most objects have short lifetimes, it is useful to restrict garbage collection to the most recently allocated objects. A generational collector maintains several
generations'' of objects. Newly created objects are all put in theyoungest'' generation, and when the space allocated for that generation is full, the collector will use the root set (and any pointers from older generations to the youngest generation -- see below) to reclaim dead objects from the youngest generation only, leaving the
older'' generations untouched. Objects that survive after several (perhaps just one) collection of the youngest generation arepromoted'' to the next older generation, and when that generation becomes full, it and all the generations younger than it are collected.
The difficulty with generational collectors is the identification of pointers from older generations into younger ones. An observation is that such references are uncommon in most programs: new objects typically point to older objects; this is clearly true in pure functional languages where assignment is prohibited, but is common for many programs and languages. Nonetheless, such references do exist, and a collector must handle them. When a pointer to a younger generation object is written into an old object, the pointer (or just the old object) is recorded so it can be scanned when the younger generation is collected.
Conservative Collection: Conservative garbage collection makes use of the observation that if you are not relocating (copying) objects, then you need not be quite so certain about exactly what is a pointer. It suffices to scan the root set (and objects) for any pointer-like bit patterns, and treat those pointer-like bit patterns as actual pointers while marking. The result is that the "true" reachable objects are all found, along with a few others. This works surprisingly well in practice (if a few additional tricks are added) and often permits the use of garbage collection with oblivious source code and compilers.
Conservative collection is very important because it permits the use of garbage collection with programs that were not written with garbage collection in mind. That is, it simpifies the use of garbage collection with existing (non-garbage-collected) libraries of code.
Ironically, many of these strategies could be used in custom allocation/deallocation schemes in C/C++, but typically aren't due to the complexity of memory management; specifically, C++'s inability to allow for the "movement" of objects as part of the process. This inability to move objects around in memory (that is, specifically, to track where the object ends up and "backport" the new location to the already-existing pointers pointing to it) leads to excessive fragmentation over time, and that in turn can lead to inability to satisfy an allocation request--no one "hole" in the memory space is large enough to accommodate the request, despite there being enough total memory to do so.
Consider a simple generational stop-and-collect collector, such as the one which SML/NJ used to have. "Generational" means that data is partitioned into old and new. This partition is useful to the GC for two reasons: (a) because data tends to die young, collecting just new data will probably free a lot of space, and (b) because pointers tend to point from new objects to old objects, and not vice versa, it is cheap to find all the pointers to new objects.
Property (b) is only true if you can tell when a pointer to a new object has been written into an old object. Otherwise you have to scan all the old objects to find pointers to new objects, which loses one of the main advantages of generational GC. So you put the old data behind a write barrier, and record those writes. When you come to GC the new data, you know the only pointers from old to new are those which you have recorded.
Consider a tracing GC (see note below) which is incremental or concurrent, i.e. the user's program (the 'mutator') can run before the GC is complete. Now there is an invariant: black objects do not point to white objects. If the mutator writes a white pointer into a black object, this invariant is broken and the GC can fail. There are two basic solutions: prevent the mutator from seeing white objects ("read barriers") or prevent the mutator from writing white pointers into black objects ("write barriers"). The write barrier solution puts the black objects behind a write barrier. When a white-on-black write takes place there are various fixes: incrementally grey the white object, regrey the black object, &c.
(note) For a tracing collector (marking or copying), one conceptually colours the data white (not yet seen by the collector), black (alive and scanned by the collector) and grey (alive but not yet scanned by the collector). The collector proceeds by scanning grey objects for pointers to white objects. The white objects found are turned grey, and the grey objects scanned are turned black. When there are no more grey objects, the collection is complete and all the white objects can be recycled.
The overhead of write barriers is more likely to be noticeable in an imperative-style program which frequently writes pointers into existing data structures than in a functional-style program which constructs data only once and never changes them.
remembered set: A list of all the interesting references between two sets of objects maintained separately, so you don't have to find them by scanning. In generational garbage collection, this technique is used for references from an older generation to a younger one. Typically, remembered sets are maintained by write barriers (q.v.).
Last modified 06 April 2022