Extendible hashing

Extendible hashing is a type of hash system which treats a hash as a bit string and uses a trie for bucket lookup. Because of the hierarchical nature of the system, re-hashing is an incremental operation. This means that time-sensitive applications are less affected by table growth than by standard full-table rehashes.
Extendible hashing was described by Ronald Fagin in 1979.
Practically all modern filesystems use either extendible hashing or B-trees.
In particular, the Global File System, ZFS, and the SpadFS filesystem use extendible hashing.

Example

Assume that the hash function returns a string of bits. The first i bits of each string will be used as indices to figure out where they will go in the "directory". Additionally, i is the smallest number such that the index of every item in the table is unique.
Keys to be used:
Let's assume that for this particular example, the bucket size is 1. The first two keys to be inserted, k₁ and k₂, can be distinguished by the most significant bit, and would be inserted into the table as follows:
Now, if k₃ were to be hashed to the table, it wouldn't be enough to distinguish all three keys by one bit. Also, because the bucket size is one, the table would overflow. Because comparing the first two most significant bits would give each key a unique location, the directory size is doubled as follows:
And so now k₁ and k₃ have a unique location, being distinguished by the first two leftmost bits. Because k₂ is in the top half of the table, both 00 and 01 point to it because there is no other key to compare to that begins with a 0.
The above example is from.

Further detail

Now, k₄ needs to be inserted, and it has the first two bits as 01.., and using a 2 bit depth in the directory, this maps from 01 to Bucket A. Bucket A is full, so it must be split; because there is more than one pointer to Bucket A, there is no need to increase the directory size.
What is needed is information about:

The key size that maps the directory, and
The key size that has previously mapped the bucket

In order to distinguish the two action cases:

Doubling the directory when a bucket becomes full
Creating a new bucket, and re-distributing the entries between the old and the new bucket

Examining the initial case of an extendible hash structure, if each directory entry points to one bucket, then the local depth should be equal to the global depth.
The number of directory entries is equal to 2^{global depth}, and the initial number of buckets
is equal to 2^{local depth}.
Thus if global depth = local depth = 0, then 2⁰ = 1, so an initial directory of one pointer to one bucket.
Back to the two action cases; if the bucket is full:

If the local depth is equal to the global depth, then there is only one pointer to the bucket, and there is no other directory pointers that can map to the bucket, so the directory must be doubled.
If the local depth is less than the global depth, then there exists more than one pointer from the directory to the bucket, and the bucket can be split.

Key 01 points to Bucket A, and Bucket A's local depth of 1 is less than the directory's global depth of 2, which means keys hashed to Bucket A have only used a 1 bit prefix, and the bucket needs to have its contents split using keys 1 + 1 = 2 bits in length; in general, for any local depth d where d is less than D, the global depth, then d must be incremented after a bucket split, and the new d used as the number of bits of each entry's key to redistribute the entries of the former bucket into the new buckets.
Now,
is tried again, with 2 bits 01.., and now key 01 points to a new bucket but there is still in it.
If had been 000110, with key 00, there would have been no problem, because would have remained in the new bucket A' and bucket D would have been empty.
So Bucket D needs to be split, but a check of its local depth, which is 2, is the same as the global depth, which is 2, so the directory must be split again, in order to hold keys of sufficient detail, e.g. 3 bits.

Bucket D needs to split due to being full.
As D's local depth = the global depth, the directory must double to increase bit detail of keys.
Global depth has incremented after directory split to 3.
The new entry is rekeyed with global depth 3 bits and ends up in D which has local depth 2, which can now be incremented to 3 and D can be split to D' and E.
The contents of the split bucket D,, has been re-keyed with 3 bits, and it ends up in D.
K4 is retried and it ends up in E which has a spare slot.

Now, is in D and is tried again, with 3 bits 011.., and it points to bucket D which already contains so is full; D's local depth is 2 but now the global depth is 3 after the directory doubling, so now D can be split into bucket's D' and E, the contents of D, has its retried with a new global depth bitmask of 3 and ends up in D', then the new entry is retried with bitmasked using the new global depth bit count of 3 and this gives 011 which now points to a new bucket E which is empty. So goes in Bucket E.

Example implementation

Below is the extendible hashing algorithm in Python, with the disc block / memory page association, caching and consistency issues removed. Note a problem exists if the depth exceeds the bit size of an integer, because then doubling of the directory or splitting of a bucket won't allow entries to be rehashed to different buckets.
The code uses the least significant bits, which makes it more efficient to expand the table, as the entire directory can be copied as one block.

Python example

PAGE_SZ = 10
class Page:
def __init__ -> None:
self.m =
self.d = 0
def full -> bool:
return len >= PAGE_SZ
def put -> None:
for i, in enumerate:
if key k:
del self.m
break
self.m.append)
def get:
for key, value in self.m:
if key k:
return value
class EH:
def __init__ -> None:
self.gd = 0
self.pp =
def get_page:
h = hash
return self.pp
def put -> None:
p = self.get_page
full = p.full
p.put
if full:
if p.d self.gd:
self.pp *= 2
self.gd += 1
p0 = Page
p1 = Page
p0.d = p1.d = p.d + 1
bit = 1 << p.d
for k2, v2 in p.m:
h = hash
new_p = p1 if h & bit else p0
new_p.put
for i in range &, len:
self.pp = p1 if i & bit else p0
def get:
return self.get_page.get
if __name__ "__main__":
eh = EH
N = 10088
l = list
import random
random.shuffle
for x in l:
eh.put
print
for i in range:
print

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...