Garbage collection (computer science)


In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.
Garbage collection relieves the programmer from performing manual memory management where the programmer specifies what objects to deallocate and return to the memory system and when to do so. Other similar techniques include stack allocation, region inference, memory ownership, and combinations of multiple techniques. Garbage collection may take a significant proportion of total processing time in a program and, as a result, can have significant influence on performance.
Resources other than memory, such as network sockets, database handles, user interaction windows, file and device descriptors, are not typically handled by garbage collection. Methods used to manage such resources, particularly destructors, may suffice to manage memory as well, leaving no need for GC. Some GC systems allow such other resources to be associated with a region of memory that, when collected, causes the work of reclaiming these resources.

Principles

The basic principles of garbage collection are to find data objects in a program that cannot be accessed in the future, and to reclaim the resources used by those objects.
Many programming languages require garbage collection, either as part of the language specification or effectively for practical implementation ; these are said to be garbage collected languages. Other languages were designed for use with manual memory management, but have garbage-collected implementations available. Some languages, like Ada, Modula-3, and C++/CLI, allow both garbage collection and manual memory management to co-exist in the same application by using separate heaps for collected and manually managed objects; others, like D, are garbage-collected but allow the user to manually delete objects and also entirely disable garbage collection when speed is required.
While integrating garbage collection into the language's compiler and runtime system enables a much wider choice of methods, post-hoc GC systems exist, such as Automatic Reference Counting, including some that do not require recompilation. The garbage collector will almost always be closely integrated with the memory allocator.

Advantages

Garbage collection frees the programmer from manually dealing with memory deallocation. As a result, certain categories of bugs are eliminated or substantially reduced:
Some of the bugs addressed by garbage collection have security implications.

Disadvantages

Typically, garbage collection has certain disadvantages, including consuming additional resources, performance impacts, possible stalls in program execution, and incompatibility with manual resource management.
Garbage collection consumes computing resources in deciding which memory to free, even though the programmer may have already known this information. The penalty for the convenience of not annotating object lifetime manually in the source code is overhead, which can lead to decreased or uneven performance. A peer-reviewed paper from 2005 came to the conclusion that GC needs five times the memory to compensate for this overhead and to perform as fast as explicit memory management. Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routine testing. The impact on performance was also given by Apple as a reason for not adopting garbage collection in iOS despite being the most desired feature.
The moment when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments, in transaction processing, or in interactive programs. Incremental, concurrent, and real-time garbage collectors address these problems, with varying trade-offs.
The modern GC implementations try to minimize blocking "stop-the-world" stalls by doing as much work as possible on the background, for example marking unreachable garbage instances while the application process continues to run. In spite of these advancements, for example in the.NET CLR paradigm it is still very difficult to maintain large heaps with resident objects that get promoted to older generations without incurring noticeable delays.
The need for explicit manual resource management for non-GCed resources in an object oriented language becomes transitive to composition. That is: in a non-deterministic GC system, if a resource or a resource-like object requires manual resource management, and this object is used as "part of" another object, then the composed object will also become a resource-like object that itself requires manual resource management.

Strategies

Tracing

is the most common type of garbage collection, so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting. The overall strategy consists of determining which objects should be garbage collected by tracing which objects are reachable by a chain of references from certain root objects, and considering the rest as garbage and collecting them. However, there are a large number of algorithms used in implementation, with widely varying complexity and performance characteristics.

Escape analysis

is a compile-time technique that can convert heap allocations to stack allocations, thereby reducing the amount of garbage collection to be done. This analysis determines whether an object allocated inside a function is accessible outside of it. If a function-local allocation is found to be accessible to another function or thread, the allocation is said to “escape” and cannot be done on the stack. Otherwise, the object may be allocated directly on the stack and released when the function returns, bypassing the heap and associated memory management costs.

Timestamp & Heartbeat

This is not a classical garbage collection algorithm used to collect unused memory instead this is a technique to primarily handle garbage collection of heterogenous resources. The method is used to garbage collect unused resources from the system and avoid running out of system resources. The common method is to check if the system resource is still alive and being used. The simplest of these are network algorithms sending so called heartbeat signals to a monitor. In this case the resources in on the monitor server are freed when a client fails to send a heartbeat to the monitor server. Timestamp methods can work as garbage collectors for collection of unused memory but have serious drawbacks for this task and are used when other methods are not practical.

Availability

Generally speaking, higher-level programming languages are more likely to have garbage collection as a standard feature. In some languages that do not have built in garbage collection, it can be added through a library, as with the Boehm garbage collector for C and C++.
Most functional programming languages, such as ML, Haskell, and APL, have garbage collection built in. Lisp is especially notable as both the first functional programming language and the first language to introduce garbage collection.
Other dynamic languages, such as Ruby and Julia, JavaScript and ECMAScript also tend to use GC. Object-oriented programming languages such as Smalltalk and Java usually provide integrated garbage collection. Notable exceptions are C++ and Delphi which have destructors.

BASIC

Historically, languages intended for beginners, such as BASIC and Logo, have often used garbage collection for heap-allocated variable-length data types, such as strings and lists, so as not to burden programmers with manual memory management. On early microcomputers, with their limited memory and slow processors, BASIC garbage collection could often cause apparently random, inexplicable pauses in the midst of program operation.
Some BASIC interpreters, such as Applesoft BASIC on the Apple II family, repeatedly scanned the string descriptors for the string having the highest address in order to compact it toward high memory, resulting in O performance, which could introduce minutes-long pauses in the execution of string-intensive programs. A replacement garbage collector for Applesoft BASIC published in Call-A.P.P.L.E. identified a group of strings in every pass over the heap, which cut collection time dramatically. BASIC.System, released with ProDOS in 1983, provided a windowing garbage collector for BASIC that reduced most collections to a fraction of a second.

Objective-C

While the Objective-C traditionally had no garbage collection, with the release of OS X 10.5 in 2007 Apple introduced garbage collection for Objective-C 2.0, using an in-house developed runtime collector.
However, with the 2012 release of OS X 10.8, garbage collection was deprecated in favor of LLVM's automatic reference counter that was introduced with OS X 10.7. Furthermore, since May 2015 Apple even forbids the usage of garbage collection for new OS X applications in the App Store. For iOS, garbage collection has never been introduced due to problems in application responsivity and performance; instead, iOS uses ARC.

Limited environments

Garbage collection is rarely used on embedded or real-time systems because of the usual need for very tight control over the use of limited resources. However, garbage collectors compatible with many limited environments have been developed. The Microsoft.NET Micro Framework, and Java Platform, Micro Edition are embedded software platforms that, like their larger cousins, include garbage collection.

Garbage Collectors for Java

Some popular Garbage Collectors available in Java JDKs include:
Comparing Garbage Collectors is a complex task, and different applications can have very different needs. Among the factors of importance are performance overhead, stop-the-world pauses times, frequency and distribution, scalability, memory allocation performance,...

Compile-time use

is a form of static analysis allowing memory to be reused and reclaimed based on invariants known during compilation.
This form of garbage collection has been studied in the Mercury programming language, and it saw greater usage with the introduction of LLVM's automatic reference counter into Apple's ecosystem in 2011.

Real-time systems

Incremental, concurrent, and real-time garbage collectors have been developed, such as Baker's algorithm or Lieberman's algorithm.
In Baker's algorithm, the allocation is done in either half of a single region of memory. When it becomes half full, a garbage collection is performed which moves the live objects into the other half and the remaining objects are implicitly deallocated. The running program has to check that any object it references is in the correct half, and if not move it across, while a background task is finding all of the objects.
Generational garbage collection schemes are based on the empirical observation that most objects die young. In generational garbage collection two or more allocation regions are kept, which are kept separate based on object's age. New objects are created in the "young" generation that is regularly collected, and when a generation is full, the objects that are still referenced from older regions are copied into the next oldest generation. Occasionally a full scan is performed.
Some high-level language computer architectures include hardware support for real-time garbage collection.
Most implementations of real-time garbage collectors use tracing. Such real-time garbage collectors meet hard real-time constraints when used with a real-time operating system.