Thompson sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Description

Consider a set of contexts, a set of actions, and rewards in. In each round, the player obtains a context, plays an action and receives a reward following a distribution that depends on the context and the issued action. The aim of the player is to play actions such as to maximize the cumulative rewards.
The elements of Thompson sampling are as follows:

a likelihood function ;
a set of parameters of the distribution of ;
a prior distribution on these parameters;
past observations triplets ;
a posterior distribution, where is the likelihood function.

Thompson sampling consists in playing the action according to the probability that it maximizes the expected reward, i.e. action is chosen with probability
where is the indicator function.
In practice, the rule is implemented by sampling, in each round, parameters from the posterior, and choosing the action that maximizes, i.e. the expected reward given the sampled parameters, the action and the current context. Conceptually, this means that the player instantiates their beliefs randomly in each round according to the posterior distribution, and then acts optimally according to them. In most practical applications, it is computationally onerous to maintain and sample from a posterior distribution over models. As such, Thompson sampling is often used in conjunction with approximate sampling techniques.

History

Thompson sampling was originally described by Thompson in 1933. It was subsequently rediscovered numerous times independently in the context of multi-armed bandit problems. A first proof of convergence for the bandit case has been shown in 1997. The first application to Markov decision processes was in 2000. A related approach was published in 2010. In 2010 it was also shown that Thompson sampling is instantaneously self-correcting. Asymptotic convergence results for contextual bandits were published in 2011. Nowadays, Thompson Sampling has been widely used in many online learning problems: Thompson sampling has also been applied to A/B testing in website design and online advertising; Thompson sampling has formed the basis for accelerated learning in decentralized decision making; a Double Thompson Sampling algorithm has been proposed for dueling bandits, a variant of traditional MAB, where feedbacks come in the format of pairwise comparison.

Relationship to other approaches

Probability matching

Probability matching is a decision strategy in which predictions of class membership are proportional to the class base rates. Thus, if in the training set positive examples are observed 60% of the time, and negative examples are observed 40% of the time, the observer using a probability-matching strategy will predict a class label of "positive" on 60% of instances, and a class label of "negative" on 40% of instances.

Bayesian control rule

A generalization of Thompson sampling to arbitrary dynamical environments and causal structures, known as Bayesian control rule, has been shown to be the optimal solution to the adaptive coding problem with actions and observations. In this formulation, an agent is conceptualized as a mixture over a set of behaviours. As the agent interacts with its environment, it learns the causal properties and adopts the behaviour that minimizes the relative entropy to the behaviour with the best prediction of the environment's behaviour. If these behaviours have been chosen according to the maximum expected utility principle, then the asymptotic behaviour of the Bayesian control rule matches the asymptotic behaviour of the perfectly rational agent.
The setup is as follows. Let be the actions issued by an agent up to time, and let be the observations gathered by the agent up to time. Then, the agent issues the action with probability:
where the "hat"-notation denotes the fact that is a causal intervention, and not an ordinary observation. If the agent holds beliefs over its behaviors, then the Bayesian control rule becomes
where is the posterior distribution over the parameter given actions and observations.
In practice, the Bayesian control amounts to sampling, in each time step, a parameter from the posterior distribution, where the posterior distribution is computed using Bayes' rule by only considering the likelihoods of the observations and ignoring the likelihoods of the actions, and then by sampling the action from the action distribution.

Upper-Confidence-Bound (UCB) algorithms

Thompson sampling and upper-confidence bound algorithms share a fundamental property that underlies many of their theoretical guarantees. Roughly speaking, both algorithms allocate exploratory effort to actions that might be optimal and are in this sense "optimistic." Leveraging this property, one can translate regret bounds established for UCB algorithms to Bayesian regret bounds for Thompson sampling or unify regret analysis across both these algorithms and many classes of problems.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...