A stochastic chain with memory of variable length is a stochastic chain, taking values in a finite alphabet, and characterized by a probabilistic context tree, so that
is the group of all contexts. A context, being the size of the context, is a finite portion of the past, which is relevant to predict the next symbol ;
The class of stochastic chains with memory of variable length was introduced by Jorma Rissanen in the article A universal system for data compression system. Such class of stochastic chains was popularized in the statistical and probabilistic community by P. Bühlmann and A. J. Wyner in 1999, in the article Variable Length Markov Chains. Named by Bühlmann and Wyner as “variable length Markov chains”, these chains are also known as “variable order Markov models", “probabilistic suffix trees” and “context tree models”. The name “stochastic chains with memory of variable length” seems to have been introduced by Galves and Löcherbach, in 2008, in the article of the same name.
Consider a system by a lamp, an observer and a door between both of them. The lamp has two possible states: on, represented by 1, or off, represented by 0. When the lamp is on, the observer may see the light through the door, depending on which state the door is at the time: open, 1, or closed, 0. such states are independent of the original state of the lamp. Let a Markov chain that represents the state of the lamp, with values in and let be a probability transition matrix. Also, let be a sequence of independent random variables that represents the door's states, also taking values in, independent of the chain and such that where. Define a new sequence such that In order to determine the last instant that the observer could see the lamp on, i.e. to identify the least instant, with in which. Using a context tree it's possible to represent the past states of the sequence, showing which are relevant to identify the next state. The stochastic chain is, then, a chain with memory of variable length, taking values in and compatible with the probabilistic context tree, where
Inferences in chains with variable length
Given a sample, one can find the appropriated context tree using the following algorithms.
The context algorithm
In the article A Universal Data Compression System, Rissanen introduced a consistent algorithm to estimate the probabilistic context tree that generates the data. This algorithm's function can be summarized in two steps:
Given the sample produced by a chain with memory of variable length, we start with the maximum tree whose branches are all the candidates to contexts to the sample;
The branches in this tree are then cut until you obtain the smallest tree that's well adapted to the data. Deciding whether or not shortening the context is done through a given gain function, such as the ratio of the log-likelihood.
Be a sample of a finite probabilistic tree. For any sequence with, it is possible to denote by the number of occurrences of the sequence in the sample, i.e., Rissanen first built a context maximum candidate, given by, where and is an arbitrary positive constant. The intuitive reason for the choice of comes from the impossibility of estimating the probabilities of sequence with lengths greater than based in a sample of size. From there, Rissanen shortens the maximum candidate through successive cutting the branches according to a sequence of tests based in statistical likelihood ratio. In a more formal definition, if bANnxk1b0 define the probability estimator of the transition probability by where. If, define. To, define where and Note that is the ratio of the log-likelihood to test the consistency of the sample with the probabilistic context tree against the alternative that is consistent with, where and differ only by a set of sibling knots. The length of the current estimated context is defined by where is any positive constant. At last, by Rissanen, there's the following result. Given of a finite probabilistic context tree, then when.