Perplexity

In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample.

Perplexity of a probability distribution

The perplexity PP of a discrete probability distribution p is defined as
where H is the entropy of the distribution and x ranges over events. This measure is also known in some domains as the diversity.
Perplexity of a random variable X may be defined as the perplexity of the distribution over its possible values x.
In the special case where p models a fair k-sided die, its perplexity is k. A random variable with perplexity k has the same uncertainty as a fair k-sided die, and one is said to be "k-ways perplexed" about the value of the random variable.
Perplexity is sometimes used as a measure of how hard a prediction problem is. This is not always accurate. If you have two choices, one with probability 0.9, then your chances of a correct guess are 90 percent using the optimal strategy.
The perplexity is 2^{−0.9 log₂ 0.9 - 0.1 log₂ 0.1}= 1.38. The inverse of the perplexity, is 1/1.38 = 0.72, not 0.9.
The perplexity is the exponentiation of the entropy, which is a more clearcut quantity.
The entropy is a measure of the expected, or "average", number of bits required to encode the outcome of the random variable, using a theoretical optimal variable-length code, cf.
It can equivalently be regarded as the expected information gain from learning the outcome of the random variable.

Perplexity of a probability model

A model of an unknown probability distribution p, may be proposed based on a training sample that was drawn from p. Given a proposed probability model q, one may evaluate q by asking how well it predicts a separate test sample x₁, x₂,..., x_N also drawn from p. The perplexity of the model q is defined as
where is customarily 2. Better models q of the unknown distribution p will tend to assign higher probabilities q to the test events. Thus, they have lower perplexity: they are less surprised by the test sample.
The exponent above may be regarded as the average number of bits needed to represent a test event x_i if one uses an optimal code based on q. Low-perplexity models do a better job of compressing the test sample, requiring few bits per test element on average because q tends to be high.
The exponent may also be regarded as a cross-entropy,
where denotes the empirical distribution of the test sample.

Perplexity per word

In natural language processing, perplexity is a way of evaluating language models. A language model is a probability distribution over entire sentences or texts.
Using the definition of perplexity for a probability model, one might find, for example, that the average sentence x_i in the test sample could be coded in 190 bits. This would give an enormous model perplexity of 2¹⁹⁰ per sentence. However, it is more common to normalize for sentence length and consider only the number of bits per word. Thus, if the test sample's sentences comprised a total of 1,000 words, and could be coded using a total of 7.95 bits per word, one could report a model perplexity of 2^7.95 = 247 per word. In other words, the model is as confused on test data as if it had to choose uniformly and independently among 247 possibilities for each word.
The lowest perplexity that has been published on the Brown Corpus as of 1992 is indeed about 247 per word, corresponding to a cross-entropy of log₂247 = 7.95 bits per word or 1.75 bits per letter using a trigram model. It is often possible to achieve lower perplexity on more specialized corpora, as they are more predictable.
Again, simply guessing that the next word in the Brown corpus is the word "the" will have an accuracy of 7 percent, not 1/247 = 0.4 percent, as a naive use of perplexity as a measure of predictiveness might lead one to believe. This guess is based on the unigram statistics of the Brown corpus, not on the trigram statistics, which yielded the word perplexity 247. Using trigram statistics would further improve the chances of a correct guess.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...