Randomized algorithm

A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random bits. Formally, the algorithm's performance will be a random variable determined by the random bits; thus either the running time, or the output are random variables.
One has to distinguish between algorithms that use the random input so that they always terminate with the correct answer, but where the expected running time is finite, and algorithms which have a chance of producing an incorrect result or fail to produce a result either by signaling a failure or failing to terminate. In some cases, probabilistic algorithms are the only practical means of solving a problem.
In common practice, randomized algorithms are approximated using a pseudorandom number generator in place of a true source of random bits; such an implementation may deviate from the expected theoretical behavior.

Motivation

As a motivating example, consider the problem of finding an ‘a’ in an array of n elements.
Input: An array of n≥2 elements, in which half are ‘a’s and the other half are ‘b’s.
Output: Find an ‘a’ in the array.
We give two versions of the algorithm, one Las Vegas algorithm and one Monte Carlo algorithm.
Las Vegas algorithm:

findingA_LV
begin
repeat
Randomly select one element out of n elements.
until 'a' is found
end

This algorithm succeeds with probability 1. The number of iterations varies and can be arbitrarily large, but the expected number of iterations is
Since it is constant the expected run time over many calls is.
Monte Carlo algorithm:

findingA_MC
begin
i=0
repeat
Randomly select one element out of n elements.
i = i + 1
until i=k or 'a' is found
end

If an ‘a’ is found, the algorithm succeeds, else the algorithm fails. After k iterations, the probability of finding an ‘a’ is:

This algorithm does not guarantee success, but the run time is bounded. The number of iterations is always less than or equal to k. Taking k to be constant the run time is.
Randomized algorithms are particularly useful when faced with a malicious "adversary" or attacker who deliberately tries to feed a bad input to the algorithm such as in the Prisoner's dilemma. It is for this reason that randomness is ubiquitous in cryptography. In cryptographic applications, pseudo-random numbers cannot be used, since the adversary can predict them, making the algorithm effectively deterministic. Therefore, either a source of truly random numbers or a cryptographically secure pseudo-random number generator is required. Another area in which randomness is inherent is quantum computing.
In the example above, the Las Vegas algorithm always outputs the correct answer, but its running time is a random variable. The Monte Carlo algorithm is guaranteed to complete in an amount of time that can be bounded by a function the input size and its parameter k, but allows a small probability of error. Observe that any Las Vegas algorithm can be converted into a Monte Carlo algorithm, by having it output an arbitrary, possibly incorrect answer if it fails to complete within a specified time. Conversely, if an efficient verification procedure exists to check whether an answer is correct, then a Monte Carlo algorithm can be converted into a Las Vegas algorithm by running the Monte Carlo algorithm repeatedly till a correct answer is obtained.

Computational complexity

models randomized algorithms as probabilistic Turing machines. Both Las Vegas and Monte Carlo algorithms are considered, and several complexity classes are studied. The most basic randomized complexity class is RP, which is the class of decision problems for which there is an efficient randomized algorithm which recognizes NO-instances with absolute certainty and recognizes YES-instances with a probability of at least 1/2. The complement class for RP is co-RP. Problem classes having algorithms with polynomial time average case running time whose output is always correct are said to be in ZPP.
The class of problems for which both YES and NO-instances are allowed to be identified with some error is called BPP. This class acts as the randomized equivalent of P, i.e. BPP represents the class of efficient randomized algorithms.

History

Historically, the first randomized algorithm was a method developed by Michael O. Rabin for the closest pair problem in computational geometry.
The study of randomized algorithms was spurred by the 1977 discovery of a randomized primality test by Robert M. Solovay and Volker Strassen. Soon afterwards Michael O. Rabin demonstrated that the 1976 Miller's primality test can be turned into a randomized algorithm. At that time, no practical deterministic algorithm for primality was known.
The Miller-Rabin primality test relies on a binary relation between two positive integers k and n that can be expressed by saying that k "is a witness to the compositeness of" n. It can be shown that

If there is a witness to the compositeness of n, then n is composite, and
If n is composite then at least three-fourths of the natural numbers less than n are witnesses to its compositeness, and
There is a fast algorithm that, given k and n, ascertains whether k is a witness to the compositeness of n.

Observe that this implies that the primality problem is in Co-RP.
If one randomly chooses 100 numbers less than a composite number n, then the probability of failing to find such a "witness" is ¹⁰⁰ so that for most practical purposes, this is a good primality test. If n is big, there may be no other test that is practical. The probability of error can be reduced to an arbitrary degree by performing enough independent tests.
Therefore, in practice, there is no penalty associated with accepting a small probability of error, since with a little care the probability of error can be made astronomically small. Indeed, even though a deterministic polynomial-time primality test has since been found, it has not replaced the older probabilistic tests in cryptographic software nor is it expected to do so for the foreseeable future.

Examples

Quicksort

is a familiar, commonly used algorithm in which randomness can be useful. Any deterministic version of this algorithm requires O time to sort n numbers for some well-defined class of degenerate inputs, with the specific class of inputs that generate this behavior defined by the protocol for pivot selection. However, if the algorithm selects pivot elements uniformly at random, it has a provably high probability of finishing in O time regardless of the characteristics of the input.

Randomized incremental constructions in geometry

In computational geometry, a standard technique to build a structure like a convex hull or Delaunay triangulation is to randomly permute the input points and then insert them one by one into the existing structure. The randomization ensures that the expected number of changes to the structure caused by an insertion is small, and so the expected running time of the algorithm can be bounded from above. This technique is known as randomized incremental construction.

Min cut

Input: A graph G
Output: A cut partitioning the vertices into L and R, with the minimum number of edges between L and R.
Recall that the contraction of two nodes, u and v, in a graph yields a new node u ' with edges that are the union of the edges incident on either u or v, except from any edge connecting u and v. Figure 1 gives an example of contraction of vertex A and B.
After contraction, the resulting graph may have parallel edges, but contains no self loops.
Karger's basic algorithm:
begin
i=1
repeat
repeat
Take a random edge ∈ E in G
replace u and v with the contraction u'
until only 2 nodes remain
obtain the corresponding cut result C_i
i=i+1
until i=m
output the minimum cut among C₁,C₂,...,C_m.
end
In each execution of the outer loop, the algorithm repeats the inner loop until only 2 nodes remain, the corresponding cut is obtained. The run time of one execution is, and n denotes the number of vertices.
After m times executions of the outer loop, we output the minimum cut among all the results. The figure 2 gives an
example of one execution of the algorithm. After execution, we get a cut of size 3.
Lemma 1: Let k be the min cut size, and let be the min cut. If, during iteration i, no edge is selected for contraction, then.
Proof: If G is not connected, then G can be partitioned into L and R without any edge between them. So the min cut in a disconnected graph is 0. Now, assume G is connected. Let V=L∪R be the partition of V induced by C : C=. Consider an edge of C. Initially, u,v are distinct vertices. As long as we pick an edge, and do not get merged. Thus, at the end of the algorithm, we have two compound nodes covering the entire graph, one consisting of the vertices of L and the other consisting of the vertices of R. As in figure 2, the size of min cut is 1, and C =. If we don't select for contraction, we can get the min cut.
Lemma 2: If G is a multigraph with p vertices and whose min cut has size k, then G has at least pk/2 edges.
Proof:
Because the min cut is k, every vertex v must satisfy degree ≥ k. Therefore, the sum of the degree is at least pk. But it is well known that the sum of vertex degrees equals 2|E|. The lemma follows.
Analysis of algorithm
The probability that the algorithm succeeds is 1 − the probability that all attempts fail. By independence, the probability that all attempts fail is

By lemma 1, the probability that is the probability that no edge of C is selected during iteration i. Consider the inner loop and let denote the graph after j edge contractions, where. has vertices. We use the chain rule of conditional possibilities.
The probability that the edge chosen at iteration j is not in C, given that no edge of C has been chosen before, is. Note that still has min cut of size k, so by Lemma 2, it still has at least edges.
Thus,.
So by the chain rule, the probability of finding the min cut C is
Cancellation gives. Thus the probability that the algorithm succeeds is at least. For, this is equivalent to. The algorithm finds the min cut with probability , in time.

Derandomization

Randomness can be viewed as a resource, like space and time. Derandomization is then the process of removing randomness. It is not currently known if all algorithms can be derandomized without significantly increasing their running time. For instance, in computational complexity, it is unknown whether P = BPP, i.e., we do not know whether we can take an arbitrary randomized algorithm that runs in polynomial time with a small error probability and derandomize it to run in polynomial time without using randomness.
There are specific methods that can be employed to derandomize particular randomized algorithms:

the method of conditional probabilities, and its generalization, pessimistic estimators
discrepancy theory
the exploitation of limited independence in the random variables used by the algorithm, such as the pairwise independence used in universal hashing
the use of expander graphs to amplify a limited amount of initial randomness
Where randomness helps

When the model of computation is restricted to Turing machines, it is currently an open question whether the ability to make random choices allows some problems to be solved in polynomial time that cannot be solved in polynomial time without this ability; this is the question of whether P = BPP. However, in other contexts, there are specific examples of problems where randomization yields strict improvements.

Based on the initial motivating example: given an exponentially long string of 2^k characters, half a's and half b's, a random access machine requires 2^k−1 lookups in the worst-case to find the index of an a; if it is permitted to make random choices, it can solve this problem in an expected polynomial number of lookups.
The natural way of carrying out a numerical computation in embedded systems or cyber-physical systems is to provide a result that approximates the correct one with high probability. The hard problem associated with the evaluation of the discrepancy loss between the approximated and the correct computation can be effectively addressed by resorting to randomization
In communication complexity, the equality of two strings can be verified to some reliability using bits of communication with a randomized protocol. Any deterministic protocol requires bits if defending against a strong opponent.
The volume of a convex body can be estimated by a randomized algorithm to arbitrary precision in polynomial time. Bárány and Füredi showed that no deterministic algorithm can do the same. This is true unconditionally, i.e. without relying on any complexity-theoretic assumptions, assuming the convex body can be queried only as a black box.
A more complexity-theoretic example of a place where randomness appears to help is the class IP. IP consists of all languages that can be accepted by a polynomially long interaction between an all-powerful prover and a verifier that implements a BPP algorithm. IP = PSPACE. However, if it is required that the verifier be deterministic, then IP = NP.
In a chemical reaction network, the ability to ever reach a given target state from an initial state is decidable, while even approximating the probability of ever reaching a given target state is undecidable. More specifically, a limited Turing machine can be simulated with arbitrarily high probability of running correctly for all time, only if a random chemical reaction network is used. With a simple nondeterministic chemical reaction network, the computational power is limited to primitive recursive functions.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...