Defunctionalization


In programming languages, defunctionalization is a compile-time transformation which eliminates higher-order functions, replacing them by a single first-order apply function. The technique was first described by John C. Reynolds in his 1972 paper, "Definitional Interpreters for Higher-Order Programming Languages". Reynolds' observation was that a given program contains only finitely many function abstractions, so that each can be assigned and replaced by a unique identifier. Every function application within the program is then replaced by a call to the apply function with the function identifier as the first argument. The apply function's only job is to dispatch on this first argument, and then perform the instructions denoted by the function identifier on the remaining arguments.
One complication to this basic idea is that function abstractions may reference free variables. In such situations, defunctionalization must be preceded by closure conversion, so that any free variables of a function abstraction are passed as extra arguments to apply. In addition, if closures are supported as first-class values, it becomes necessary to represent these captured bindings by creating data structures.
Instead of having a single apply function dispatch on all function abstractions in a program, various kinds of control flow analysis can be employed to determine which function may be called at each function application site, and a specialized apply function may be referenced instead. Alternately, the target language may support indirect calls through function pointers, which may be more efficient and extensible than a dispatch-based approach.
Besides its use as a compilation technique for higher-order functional languages, defunctionalization has been studied as a way of mechanically transforming interpreters into abstract machines. Defunctionalization is also related to the technique from object-oriented programming of representing functions by function objects.

Example

This is an example given by Olivier Danvy, translated to Haskell:
Given the Tree datatype:

data Tree a = Leaf a
| Node

We will defunctionalize the following program:

cons :: a -> ->
cons x xs = x : xs
o :: -> -> a -> c
o f g x = f
flatten :: Tree t ->
flatten t = walk t
walk :: Tree t -> ->
walk = cons x
walk = o

We defunctionalize by replacing all higher-order functions with a value of the Lam datatype, and instead of calling them directly, we introduce an apply function that interprets the datatype:

data Lam a = LamCons a
| LamO
apply :: Lam a -> ->
apply xs = x : xs
apply xs = apply f1
cons_def :: a -> Lam a
cons_def x = LamCons x
o_def :: Lam a -> Lam a -> Lam a
o_def f1 f2 = LamO f1 f2
flatten_def :: Tree t ->
flatten_def t = apply
walk_def :: Tree t -> Lam t
walk_def = cons_def x
walk_def = o_def