Kolakoski sequence


In mathematics, the Kolakoski sequence, sometimes also known as the Oldenburger-Kolakoski sequence, is an infinite sequence of symbols that is the sequence of run lengths in its own run-length encoding, and the prototype for an infinite family of related sequences. It is named after the recreational mathematician William Kolakoski who described it in 1965, but subsequent research has shown that the sequence was previously discussed by Rufus Oldenburger in 1939.

Definition of the classic Kolakoski sequence

The initial terms of the Kolakoski sequence are:
Each symbol occurs in a "run" of either one or two consecutive terms, and writing down the lengths of these runs gives exactly the same sequence:
The description of the Kolakoski sequence is therefore reversible. If K stands for "the Kolakoski sequence", description #1 logically implies description #2 :
Accordingly, one can say that each term of the Kolakoski sequence generates a run of one or two future terms. The first 1 of the sequence generates a run of "1", i.e. itself; the first 2 generates a run of "22", which includes itself; the second 2 generates a run of "11"; and so on. Each number in the sequence is the length of the next run to be generated, and the element to be generated alternates between 1 and 2:
As can be seen, the length of the sequence at each stage is equal to the sum of terms in the previous stage. This animation illustrates the process:
These self-generating properties, which remain if the sequence is written without the initial 1, mean that the Kolakoski sequence can be described as a fractal, or mathematical object that encodes its own representation on other scales. Bertran Steinsky has created a recursive formula for the i-th term of the sequence but the sequence is believed to be aperiodic, that is, its terms do not have a general repeating pattern.

Other self-generating Kolakoski sequences

From finite integer alphabets

The Kolakoski sequence is the prototype for an infinite family of other sequences that are each their own run-length encodings. Each sequence is based on what is formally called an alphabet of integers. For example, the classic Kolakoski sequence described above has the alphabet. Some of the additional Kolakoski sequences listed at the OEIS are:
Like the Kolakoski -sequence, writing the run-lengths returns the same sequence. More generally, any alphabet of integers,, can generate a Kolakoski sequence if the same integer does not occur 1) twice or more in a row; 2) at the beginning and end of the alphabet. For example, the alphabet generates:
And the alphabet generates:
Again, writing the run-lengths returns the same sequence.

From infinite integer alphabets

Kolakoski sequences can also be created from infinite alphabets of integers, such as :
The infinite alphabet generates the Golomb sequence:
A Kolakoski sequence can also be created from integers chosen at random from a finite alphabet, with the restriction that the same number cannot be chosen twice in a row. If the finite alphabet is, one possible sequence is this:
In effect, the sequence is based on the infinite alphabet, which contains a random sequence of 1s, 2s and 3s from which repeats have been removed.

Chain sequences

Whereas the classic Kolakoski -sequence generates itself, these two sequences generate each other:
In other words, if you write the run-lengths of the first sequence, you generate the second; if you write the run-lengths of the second, you generate the first. In the following chain of three sequences, the run-lengths of each generate the next in the order 1 → 2 → 3 → 1:
The sequences use the integer alphabet, but each starts at a different point in the alphabet. The following five sequences form a similar chain using the alphabet :
However, to create a sequence-chain of length l, it is not necessary to have distinct integer alphabets of size l. For example, the alphabet-series,,, and is sufficient for a five-link chain:
Each sequence is unique and the run-lengths of each generate the terms of the next sequence in the chain. The integer alphabets used to generate a chain can also be of different sizes. A Kolakoski mirror can be created from the alphabets and :

Research into the classic sequence

Density of the sequence

It seems plausible that the density of 1s in the Kolakoski -sequence is 1/2, but this conjecture remains unproved. Václav Chvátal has proved that the upper density of 1s is less than 0.50084. Nilsson has used the same method with far greater computational power to obtain the bound 0.500080.
Although calculations of the first 3×108 values of the sequence appeared to show its density converging to a value slightly different from 1/2, later calculations that extended the sequence to its first 1013 values show the deviation from a density of 1/2 growing smaller, as one would expect if the limiting density actually is 1/2.

Connection with tag systems

describes the Kolakoski sequence in connection with the history of cyclic tag systems.

Uniqueness of the sequence

Some discussions of the classic Kolakoski sequence make the claim that, written with or without the initial 1, it is the "only sequence" that is its own run-length encoding or the only such sequence that begins with 1. As can be seen above, this is untrue: an infinite number of additional sequences possess these properties. However, the Kolakoski - and -sequences are the only such sequences using solely the integers 1 and 2.

Anti-Kolakoski sequence

In the anti-Kolakoski sequence, the run-lengths of 1s and 2s never coincide with the terms of the original sequence:
As can be seen, the run-lengths of the anti-Kolakoski sequence return the Kolakoski -sequence, which means that the former can be created from the latter by simple subtraction. If k is the i-th term of the Kolakoski -sequence and ak is the i-th term of the anti-Kolakoski sequence, then ak = 3-k, just as k = 3-ak. Accordingly, like the Kolakoski sequence, the anti-Kolakoski sequence retains its defining property when written without its initial term, i.e. 2.

Kolakoski Constant

The so-called Kolakoski constant is created by subtracting 1 from each term of the Kolakoski -sequence and treating the result as a binary fraction.

Algorithms

Algorithm for the Kolakoski {1,2}-sequence

The Kolakoski -sequence may be generated by an algorithm that, in the i-th iteration, reads the value xi that has already been output as the i-th value of the sequence. Then, if i is odd, it outputs xi copies of the number 1, while if i is even, it outputs xi copies of the number 2.
Thus, the first few steps of the algorithm are:
  1. The first value has not yet been output, so set x1 = 1, and output 1 copy of the number 1
  2. The second value has not yet been output, so set x2 = 2, and output 2 copies of the number 2
  3. The third value x3 was output as 2 in the second step, so output 2 copies of the number 1.
  4. The fourth value x4 was output as 1 in the third step, so output 1 copy of the number 2. Etc.
This algorithm takes linear time, but because it needs to refer back to earlier positions in the sequence it needs to store the whole sequence, taking linear space. An alternative algorithm that generates multiple copies of the sequence at different speeds, with each copy of the sequence using the output of the previous copy to determine what to do at each step, can be used to generate the sequence in linear time and only logarithmic space.

General algorithm for Kolakoski sequences

In general, a Kolakoski sequence for any integer alphabet may be generated by an algorithm that, in the i-th iteration, reads the value xi that has already been output as the i-th value of the sequence. At each step, the output ni is adjusted according to the size of the alphabet, reverting to n1 when the final position in the alphabet is exceeded. The first few steps of the algorithm for the alphabet are:
  1. The first value has not yet been output, so set x1 = 1 = n1, and output 1 copy of the number 1
  2. The second value has not yet been output, so set x2 = 2 = n2, and output 2 copies of the number 2
  3. The third value x3 was output as 2 in the second step, so output 2 copies of 3 = n3.
  4. The fourth value x4 was output as 3 in the third step, so output 3 copies of 4 = n4.
  5. The fifth value x5 was output as 3 in the third step, so output 3 copies of the number 1 = n1=adjusted.
  6. The sixth value x6 was output as 4 in the fourth step, so output 4 copies of the number 2 = n2=adjusted. Etc.
The resultant sequence is:

Algorithm for Kolakoski-chains

Kolakoski chains of any desired length can be generated with a simple algorithm. Suppose that one wishes to generate a chain with 3 sequences where the terms of seq are generated by the run-lengths of seq and the alphabet is. Begin by setting the first term of seq, the initial sequence in the chain, to the value of 2. The next sequence in the chain, seq, whose run-lengths generate the terms of seq, must therefore have the terms. Therefore seq, whose run-lengths generate seq =, must have the runs. Here is the first stage of the algorithm:
Now, note that the run-lengths of seq generate the terms of seq, which means that the terms of seq generate the runs of seq. Because seq = after stage 1 of the algorithm, seq must equal in the next stage. From this extended seq, one can generate further runs of seq, then further runs of seq:
Now use the terms of seq in stage 2 to generate further runs of seq in stage 3:
The sequences can now be re-arranged so that the run-lengths of seq generate the terms of seq = seq):
If the chain has 5 sequences, the algorithm yields these stages:
Finally, the sequences are re-arranged so that the run-lengths of seq generate the terms of seq: