Linear hashing

Linear hashing is a dynamic data structure which implements a hash table and grows or shrinks one bucket at a time. It was invented by Witold Litwin in 1980.
It has been analyzed by Baeza-Yates and Soza-Pollman.
It is the first in a number of schemes known as dynamic hashing
such as Larson's Linear Hashing with Partial Extensions,
Linear Hashing with Priority Splitting,
Linear Hashing with Partial Expansions and Priority Splitting,
or Recursive Linear Hashing.
The file structure of a dynamic hashing data structure adapts itself to changes in the size of the file, so expensive periodic file reorganization is avoided. A Linear Hashing file expands by splitting
a pre-determined bucket into two and contracts by merging two predetermined buckets into one. The trigger for a reconstruction depends on the flavor of the scheme; it could be an overflow at a bucket or load factor moving outside of a predetermined range.
Linear Hashing has also been made into a scalable distributed data structure, LH*. In LH*, each bucket resides at a different server.
LH* itself has been expanded to provide data availability in the presence of
failed buckets.
Key based operations in LH and
LH* take maximum constant time independent of the number of buckets and hence of records.

Algorithm details

Records in LH or LH* consists of a key and a content, the latter basically all the other attributes of the record. They are stored in buckets. For example, in Ellis' implementation, a bucket is a linked list of records. The file allows the key based CRUD operations create or insert, read, update, and delete as well as a scan operations that scans all records, for example to do a database select operation on a non-key attribute. Records are stored in buckets whose numbering starts with 0.

Hash functions

In order to access a record with key, a family of hash functions, called
collectively a dynamic hash function is applied to the key. At any time,
at most two hash functions and are used. A typical
example uses the division modulo x operation. If the original number of buckets is
, then the family of hash functions is

File expansion

As the file grows through insertions, it expands gracefully through the splitting
of one bucket into two buckets. The sequence of buckets to split is predetermined.
This is the fundamental difference to schemes like Fagin's extendible hashing.
For the two new buckets, the hash function is replaced with
. The number of the bucket to be split is part of the
file state and called the split pointer.

Split control

A split can be performed whenever a bucket overflows. This is an uncontrolled split.
Alternatively, the file can monitor the load factor and performs a split whenever
the load factor exceeds a threshold. This was controlled splitting.

Addressing

Addressing is based on the file state, consisting of the split pointer
and the level. If the level is, then the hash functions
used are and.
The LH algorithm for hashing key is
if

Splitting

When a bucket is split, split pointer and possibly the level are updated according to
if :

File contraction

If under controlled splitting the load factor sinks below a threshold, a merge operation
is triggered. The merge operation undoes the last split, also resetting the file state.

File state calculation

The file state consists of split pointer and level. If the
original file started with buckets, then the number of buckets
and the file state are related via

LH*

The main contribution of LH* is to allow a client of an LH* file to find the bucket where
the record resides even if the client does not know the file state. Clients in fact store
their version of the file state, which is initially just the knowledge of the first bucket, namely Bucket 0. Based on their file state, a client calculates the address of a
key and sends a request to that bucket. At the bucket, the request is checked and if
the record is not at the bucket, it is forwarded. In a reasonably stable system, that is,
if there is only one split or merge going on while the request is processed, it can
be shown that there are at most two forwards. After a forward, the final bucket sends an
Image Adjustment Message to the client whose state is now closer to the state of the distributed file. While forwards are reasonably rare for active clients,
their number can be even further reduced by additional information exchange between
servers and clients

Adoption in language systems

Griswold and Townsend discussed the adoption of linear hashing in the Icon language. They discussed the implementation alternatives of dynamic array algorithm used in linear hashing, and presented performance comparisons using a list of Icon benchmark applications.

Adoption in database systems

Linear hashing is used in the Berkeley database system, which in turn is used by many software systems such as OpenLDAP, using a C implementation derived from the CACM article and first published on the Usenet in 1988 by Esmond Pitt.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...