Conditional random field

Conditional random fields are a class of statistical modeling method often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighboring" samples, a CRF can take context into account. To do so, the prediction is modeled as a graphical model, which implements dependencies between the predictions. What kind of graph is used depends on the application. For example, in natural language processing, linear chain CRFs are popular, which implement sequential dependencies in the predictions. In image processing the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions.
Other examples where CRFs are used are: labeling or parsing of sequential data for natural language processing or biological sequences, POS tagging, shallow parsing, named entity recognition, gene finding, peptide critical functional region finding, and object recognition and image segmentation in computer vision.

Description

CRFs are a type of discriminative undirected probabilistic graphical model.
Lafferty, McCallum and Pereira define a CRF on observations and random variables as follows:

Let be a graph such that
so that is indexed by the vertices of.
Then is a conditional random field when the random variables, conditioned on, obey the Markov property with
respect to the graph:, where means
that and are neighbors in.

What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets and, the observed and output variables, respectively; the conditional distribution is then modeled.

Inference

For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold.
However, there exist special cases for which exact inference is feasible:

If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the forward-backward and Viterbi algorithm for the case of HMMs.
If the CRF only contains pair-wise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions.

If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include:

Loopy belief propagation
Alpha expansion
Mean field inference
Linear programming relaxations
Parameter Learning

Learning the parameters is usually done by maximum likelihood learning for. If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex. It can be solved for example using gradient descent algorithms, or Quasi-Newton methods such as the L-BFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. Exact inference is intractable in general graphs, so approximations have to be used.

Examples

In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables represents a sequence of observations and represents a hidden state variable that needs to be inferred given the observations. The are structured to form a chain, with an edge between each and. As well as having a simple interpretation of the as "labels" for each element in the input sequence, this layout admits efficient algorithms for:

model training, learning the conditional distributions between the and feature functions from some corpus of training data.
decoding, determining the probability of a given label sequence given.
inference, determining the most likely label sequence given.

The conditional dependency of each on is defined through a fixed set of feature functions of the form, which can be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for. The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for.
Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models, but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence.
Notably, in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence at any point during inference, and the range of the feature functions need not have a probabilistic interpretation.

Variants

Higher-order CRFs and semi-Markov CRFs

CRFs can be extended into higher order models by making each dependent on a fixed number of previous variables. In conventional formulations of higher order CRFs, training and inference are only practical for small values of , since their computational cost increases exponentially with.
However, another recent advance has managed to ameliorate these issues by leveraging concepts and tools from the field of Bayesian nonparametrics. Specifically, the CRF-infinity approach constitutes a CRF-type model that is capable of learning infinitely-long temporal dynamics in a scalable fashion. This is effected by introducing a novel potential function for CRFs that is based on the Sequence Memoizer, a nonparametric Bayesian model for learning infinitely-long dynamics in sequential observations. To render such a model computationally tractable, CRF-infinity employs a mean-field approximation of the postulated novel potential functions. This allows for devising efficient approximate training and inference algorithms for the model, without undermining its capability to capture and model temporal dependencies of arbitrary length.
There exists another generalization of CRFs, the semi-Markov conditional random field , which models variable-length segmentations of the label sequence. This provides much of the power of higher-order CRFs to model long-range dependencies of the, at a reasonable computational cost.
Finally, large-margin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs.

Latent-dynamic conditional random field

Latent-dynamic conditional random fields or discriminative probabilistic latent variable models are a type of CRFs for sequence tagging tasks. They are latent variable models that are trained discriminatively.
In an LDCRF, like in any sequence tagging task, given a sequence of observations x =, the main problem the model must solve is how to assign a sequence of labels y = from one finite set of labels. Instead of directly modeling as an ordinary linear-chain CRF would do, a set of latent variables h is "inserted" between x and y using the chain rule of probability:
This allows capturing latent structure between the observations and labels. While LDCRFs can be trained using quasi-Newton methods, a specialized version of the perceptron algorithm called the latent-variable perceptron has been developed for them as well, based on Collins' structured perceptron algorithm. These models find applications in computer vision, specifically gesture recognition from video streams and shallow parsing.

Software

This is a partial list of software that implement generic CRF tools.

CRFs based on recurrent neural networks
Linear-chain CRFs with fast online ADF training
Linear-chain CRFs
CRFs with submodular energy functions
General CRFs
General CRFs
General CRFs
General CRFs
Linear-chain CRFs
Hidden-state CRFs
Linear-chain CRF, HCRF and HMMs
Fast linear-chain CRFs
Fast restricted linear-chain CRFs
Linear-chain CRFs
First-order and second-order Markov CRFs
First-order, linear-chain CRFs
CRF for segmenting images and image volumes
Linear-chain for sequence tagging
Structured Learning in Python
A python binding for crfsuite
Probabilistic programming language capable of defining CRFs and other graphical models
Modeling and computational tools for CRFs and other undirected graphical models
Library for discrete factor graph models and distributive operations on these models
Library for building, training, and performing inference with Undirected Graphical Models
Fast Linear CRFs

This is a partial list of software that implement CRF related tools.

Medical Named Entity Recognizer
CRF based gene predictor
Named Entity Recognizer
Named Entity Recognizer

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...