Scapegoat tree

In computer science, a scapegoat tree is a self-balancing binary search tree, invented by Arne Andersson and again by Igal Galperin and Ronald L. Rivest. It provides worst-case O lookup time, and O amortized insertion and deletion time.
Unlike most other self-balancing binary search trees which provide worst case O lookup time, scapegoat trees have no additional per-node memory overhead compared to a regular binary search tree: a node stores only a key and two pointers to the child nodes. This makes scapegoat trees easier to implement and, due to data structure alignment, can reduce node overhead by up to one-third.
Instead of the small incremental rebalancing operations used by most balanced tree algorithms, scapegoat trees rarely but expensively choose a "scapegoat" and completely rebuild the subtree rooted at the scapegoat into a complete binary tree. Thus, scapegoat trees have O worst-case update performance.

Theory

A binary search tree is said to be weight-balanced if half the nodes are on the left of the root, and half on the right.
An α-weight-balanced node is defined as meeting a relaxed weight balance criterion:
size ≤ α*size
size ≤ α*size
Where size can be defined recursively as:
function size is
if node = nil then
return 0
else
return size + size + 1
end if
end function
Even a degenerate tree satisfies this condition if α=1, whereas an α=0.5 would only match almost complete binary trees.
A binary search tree that is α-weight-balanced must also be α-height-balanced, that is
height ≤
By contraposition, a tree that is not α-height-balanced is not α-weight-balanced.
Scapegoat trees are not guaranteed to keep α-weight-balance at all times, but are always loosely α-height-balanced in that
height ≤ + 1.
Violations of this height balance condition can be detected at insertion time, and imply that a violation of the weight balance condition must exist.
This makes scapegoat trees similar to red-black trees in that they both have restrictions on their height. They differ greatly though in their implementations of determining where the rotations take place. Whereas red-black trees store additional 'color' information in each node to determine the location, scapegoat trees find a scapegoat which isn't α-weight-balanced to perform the rebalance operation on. This is loosely similar to AVL trees, in that the actual rotations depend on 'balances' of nodes, but the means of determining the balance differs greatly. Since AVL trees check the balance value on every insertion/deletion, it is typically stored in each node; scapegoat trees are able to calculate it only as needed, which is only when a scapegoat needs to be found.
Unlike most other self-balancing search trees, scapegoat trees are entirely flexible as to their balancing. They support any α such that 0.5 < α < 1. A high α value results in fewer balances, making insertion quicker but lookups and deletions slower, and vice versa for a low α. Therefore in practical applications, an α can be chosen depending on how frequently these actions should be performed.

Operations

Lookup

Lookup is not modified from a standard binary search tree, and has a worst-case time of O. This is in contrast to splay trees which have a worst-case time of O. The reduced node memory overhead compared to other self-balancing binary search trees can further improve locality of reference and caching.

Insertion

Insertion is implemented with the same basic ideas as an unbalanced binary search tree, however with a few significant changes.
When finding the insertion point, the depth of the new node must also be recorded. This is implemented via a simple counter that gets incremented during each iteration of the lookup, effectively counting the number of edges between the root and the inserted node. If this node violates the α-height-balance property, a rebalance is required.
To rebalance, an entire subtree rooted at a scapegoat undergoes a balancing operation. The scapegoat is defined as being an ancestor of the inserted node which isn't α-weight-balanced. There will always be at least one such ancestor. Rebalancing any of them will restore the α-height-balanced property.
One way of finding a scapegoat, is to climb from the new node back up to the root and select the first node that isn't α-weight-balanced.
Climbing back up to the root requires O storage space, usually allocated on the stack, or parent pointers. This can actually be avoided by pointing each child at its parent as you go down, and repairing on the walk back up.
To determine whether a potential node is a viable scapegoat, we need to check its α-weight-balanced property. To do this we can go back to the definition:
size ≤ α*size
size ≤ α*size
However a large optimisation can be made by realising that we already know two of the three sizes, leaving only the third to be calculated.
Consider the following example to demonstrate this. Assuming that we're climbing back up to the root:
size = size + size + 1
But as:
size = 1.
The case is trivialized down to:
size = size + size + 1
Where x = this node, x + 1 = parent and size is the only function call actually required.
Once the scapegoat is found, the subtree rooted at the scapegoat is completely rebuilt to be perfectly balanced. This can be done in O time by traversing the nodes of the subtree to find their values in sorted order and recursively choosing the median as the root of the subtree.
As rebalance operations take O time, insertion has a worst-case performance of O time. However, because these worst-case scenarios are spread out, insertion takes O amortized time.

Sketch of proof for cost of insertion

Define the Imbalance of a node v to be the absolute value of the difference in size between its left node and right node minus 1, or 0, whichever is greater. In other words:
Immediately after rebuilding a subtree rooted at v, I = 0.
Lemma: Immediately before rebuilding the subtree rooted at v,

Proof of lemma:
Let be the root of a subtree immediately after rebuilding. . If there are degenerate insertions, then

,

and

Since before rebuilding, there were insertions into the subtree rooted at that did not result in rebuilding. Each of these insertions can be performed in time. The final insertion that causes rebuilding costs. Using aggregate analysis it becomes clear that the amortized cost of an insertion is :

Deletion

Scapegoat trees are unusual in that deletion is easier than insertion. To enable deletion, scapegoat trees need to store an additional value with the tree data structure. This property, which we will call MaxNodeCount simply represents the highest achieved NodeCount. It is set to NodeCount whenever the entire tree is rebalanced, and after insertion is set to max.
To perform a deletion, we simply remove the node as you would in a simple binary search tree, but if
NodeCount ≤ α*MaxNodeCount
then we rebalance the entire tree about the root, remembering to set MaxNodeCount to NodeCount.
This gives deletion its worst-case performance of O time; however, it is amortized to O average time.

Sketch of proof for cost of deletion

Suppose the scapegoat tree has elements and has just been rebuilt. At most deletions can be performed before the tree must be rebuilt. Each of these deletions take time. The deletion causes the tree to be rebuilt and takes time. Using aggregate analysis it becomes clear that the amortized cost of a deletion is :

Etymology

The name Scapegoat tree " is based on the common wisdom that, when something goes wrong, the first thing people tend to do is find someone to blame." In the Bible, a scapegoat is an animal that is ritually burdened with the sins of others, and then driven away.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...