De Bruijn index


In mathematical logic, the de Bruijn index is a tool invented by the Dutch mathematician Nicolaas Govert de Bruijn for representing terms of lambda calculus without naming the bound variables. Terms written using these indices are invariant with respect to α-conversion, so the check for α-equivalence is the same as that for syntactic equality. Each de Bruijn index is a natural number that represents an occurrence of a variable in a λ-term, and denotes the number of binders that are in scope between that occurrence and its corresponding binder. The following are some examples:
De Bruijn indices are commonly used in higher-order reasoning systems such as automated theorem provers and logic programming systems.

Formal definition

Formally, λ-terms written using De Bruijn indices have the following syntax :
where n—natural numbers greater than 0—are the variables. A variable n is bound if it is in the scope of at least n binders ; otherwise it is free. The binding site for a variable n is the nth binder it is in the scope of, starting from the innermost binder.
The most primitive operation on λ-terms is substitution: replacing free variables in a term with other terms. In the β-reduction N, for example, we must:
  1. find the instances of the variables n1, n2,..., nk in M that are bound by the λ in λ M,
  2. decrement the free variables of M to match the removal of the outer λ-binder, and
  3. replace n1, n2,..., nk with N, suitably incrementing the free variables occurring in N each time, to match the number of λ-binders, under which the corresponding variable occurs when N substitutes for one of the ni.
To illustrate, consider the application
which might correspond to the following term written in the usual notation
After step 1, we obtain the term λ 4 □, where the variables that are destined for substitution are replaced with boxes. Step 2 decrements the free variables, giving λ 3 □. Finally, in step 3, we replace the boxes with the argument, namely λ 5 1; the first box is under one binder, so we replace it with λ 6 1 ; the second is under two binders, so we replace it with λ 7 1. The final result is λ 3 .
Formally, a substitution is an unbounded list of term replacements for the free variables, written M1.M2..., where Mi is the replacement for the ith free variable. The increasing operation in step 3 is sometimes called shift and written ↑k where k is a natural number indicating the amount to increase the variables; for example, ↑0 is the identity substitution, leaving a term unchanged.
The application of a substitution s to a term M is written M. The composition of two substitutions s1 and s2 is written s1 s2 and defined by
The rules for application are as follows:

The steps outlined for the β-reduction above are thus more concisely expressed as:

Alternatives to De Bruijn indices

When using the standard "named" representation of λ-terms, where variables are treated as labels or strings, one must explicitly handle α-conversion when defining any operation on the terms. The standard Variable Convention of Barendregt is one such approach where α-conversion is applied as needed to ensure that:
  1. bound variables are distinct from free variables, and
  2. all binders bind variables not already in scope.
In practice this is cumbersome, inefficient, and often error-prone. It has therefore led to the search for different representations of such terms. On the other hand, the named representation of λ-terms is more pervasive and can be more immediately understandable by others because the variables can be given descriptive names. Thus, even if a system uses De Bruijn indices internally, it will present a user interface with names.
De Bruijn indices are not the only representation of λ-terms that obviates the problem of α-conversion. Among named representations, the nominal techniques of Pitts and Gabbay is one approach, where the representation of a λ-term is treated as an equivalence class of all terms rewritable to it using variable permutations. This approach is taken by the Nominal Datatype Package of Isabelle/HOL.
Another common alternative is an appeal to higher-order representations where the λ-binder is treated as a true function. In such representations, the issues of α-equivalence, substitution, etc. are identified with the same operations in a meta-logic.
When reasoning about the meta-theoretic properties of a deductive system in a proof assistant, it is sometimes desirable to limit oneself to first-order representations and to have the ability to name or rename assumptions. The locally nameless approach uses a mixed representation of variables—De Bruijn indices for bound variables and names for free variables—that is able to benefit from the α-canonical form of De Bruijn indexed terms when appropriate.