Tree (data structure)
In computer science,[] a tree is a widely used abstract data type that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes.
A tree data structure can be defined recursively as a collection of nodes, where each node is a data structure consisting of a value, together with a list of references to nodes, with the constraints that no reference is duplicated, and none points to the root.
Alternatively, a tree can be defined abstractly as a whole as an ordered tree, with a value assigned to each node. Both these perspectives are useful: while a tree can be analyzed mathematically as a whole, when actually represented as a data structure it is usually represented and worked with separately by node. For example, looking at a tree as a whole, one can talk about "the parent node" of a given node, but in general as a data structure a given node only contains the list of its children, but does not contain a reference to its parent.
Terminology
A node is a structure which may contain a value or condition, or represent a separate data structure. Each node in a tree has zero or more child nodes, which are below it in the tree. A node that has a child is called the child's parent node. A node has at most one parent.An internal node is any node of a tree that has child nodes. Similarly, an external node is any node that does not have child nodes.
The topmost node in a tree is called the root node. Depending on definition, a tree may be required to have a root node, or may be allowed to be empty, in which case it does not necessarily have a root node. Being the topmost node, the root node will not have a parent. It is the node at which algorithms on the tree begin, since as a data structure, one can only pass from parents to children. Note that some algorithms begin at the root, but first visit leaf nodes, only visit the root last. All other nodes can be reached from it by following edges or links. In diagrams, the root node is conventionally drawn at the top. In some trees, such as heaps, the root node has special properties. Every node in a tree can be seen as the root node of the subtree rooted at that node.
The height of a node is the length of the longest downward path to a leaf from that node. The height of the root is the height of the tree. The depth of a node is the length of the path to its root. This is commonly needed in the manipulation of the various self-balancing trees, AVL Trees in particular. The root node has depth zero, leaf nodes have height zero, and a tree with only a single node has depth and height zero. Conventionally, an empty tree has height −1.
A subtree of a tree is a tree consisting of a node in and all of its descendants in. Nodes thus correspond to subtrees – the subtree corresponding to the root node is the entire tree, and each node is the root node of the subtree it determines; the subtree corresponding to any other node is called a proper subtree.
Preliminary definition
A tree is a nonlinear data structure, compared to arrays, linked lists, stacks and queues which are linear data structures. A tree can be empty with no nodes or a tree is a structure consisting of one node called the root and zero or one or more subtrees.Mathematical definition
Unordered tree
Mathematically, an unordered treecan be defined as an algebraic structure where is the non-empty carrier set of nodes and is a function on which assigns each node its "parent" node,. The structure is subject to the condition that every non-empty subalgebra must have the same fixed point. That is, there must be a unique "root" node, such that and for every node, some iterative application equals.
There are several equivalent definitions.
As the closest alternative, one can define unordered trees as partial algebras
which are obtained from the total algebras described above by letting be undefined.
That is, the root is the only node on which the function is not defined and for every node
, the root is reachable from in the directed graph.
This definition is in fact coincident with that of an anti-arborescence.
The TAoCP book uses the term oriented tree.
The box on the right contains what can be regarded as a canonical definition of unordered trees. It just describes the partial algebra as a relational structure. Moreover, dedicated terminology can be provided for generalizations of unordered trees that correspond to distinguished subsets of the listed conditions:
- - directed pseudoforest
- - directed acyclic graph
- - unordered forest
- - directed pseudotree
To be precise, we should speak about an inverse set-theoretic tree since the set-theoretic definition usually employs opposite ordering.
The correspondence between and is established via reflexive transitive closure / reduction, with the reduction resulting in the "partial" version without the root cycle.
The definition of trees in descriptive set theory utilizes the
representation of partial orders as prefix orders between finite sequences. In turns out that up to isomorphism, there is a one-to-one correspondence between the DST trees and the tree structures defined so far.
We can refer to the four equivalent characterizations as to
tree as an algebra,
tree as a partial algebra,
tree as a partial order, and
tree as a prefix order.
There is also a fifth equivalent definition - that of a
graph-theoretic rooted tree which is just a connected acyclic
rooted graph.
The expression of trees as algebras
follows directly the implementation of tree structures using parent pointers. Typically, the partial version is used in which the root node has no parent defined. However, in some implementations or models even the circularity is established. Notable examples:
The Linux VFS where "The root dentry has a d_parent that points to itself".
The concept of an instantiation tree
from object-oriented programming. In this case, the root node is the top metaclass - the only class that is a direct instance of itself.
In this model, the tree established via
superclass
links between non-terminal objects is infinite and has an infinite branch.Sibling sets
In every unordered tree there is a distinguished partition of the set of nodes into sibling sets.Two non-root nodes, belong to the same sibling set if
parent = parent.
The root node forms the singleton sibling set.
A tree is said to be locally finite or finitely branching if each of its sibling sets is finite.
Each pair of distinct siblings is incomparable in ≤.
This is why the word unordered is used in the definition.
Such a terminology might become misleading when all sibling sets are singletons, i.e. when the set of all nodes is totally ordered by ≤.
In such a case we might speak about a singly-branching tree instead.
Using set inclusion
As with every partially ordered set, tree structures can be represented by inclusion order - by set systems in which ≤ is coincident with ⊆, the induced inclusion order.Consider a structure such that is a non-empty set, and ℱ is a set of subsets of such that the following are satisfied :
∅ ∉ ℱ.
U ∈ ℱ.
For every, from ℱ,
X ∩ Y ∈ .
For every from ℱ,
there are only finitely many
from ℱ such that
X ⊆ Y.
whose root equals.
Conversely, if is an unordered tree,
and ℱ is the set
of all principal ideals of
then the set system satisfies the above properties.
The set-system view of tree structures provides the default semantic model - in the majority of most popular cases, tree data structures represent containment hierarchy.
This also offers a justification for order direction with the root at the top: The root node is a greater container than any other node. Notable examples:
- Directory structure of a file system. A directory contains its sub-directories.
- DOM tree. The document parts correspondent to DOM nodes are in subpart relation according to the tree order.
- Single inheritance in object-oriented programming. An instance of a class is also an instance of a superclass.
- Hierarchical taxonomy such as the Dewey Decimal Classification with sections of increasing specificity.
- BSP trees, quadtrees, octrees, R-trees and other tree data structures used for recursive space partitioning.
Well-founded trees
Assuming the axiom of dependent choice a tree is well-founded if and only if it has no infinite branch.
Well-founded trees can be defined recursively - by forming trees from a disjoint union of smaller trees.
For the precise definition, suppose that is a set of nodes.
Using the reflexivity of partial orders, we can identify any tree on a subset of with its partial order - a subset of X × X.
The set ℛ of all relations that form a well-founded tree on a subset of is defined in stages,
so that
For each ordinal number, let
belong to the -th stage if and only if is equal to
where is a subset of ⋃ such that elements of are pairwise disjoint, and is a node that does not belong to.
Observe that the lowest stage consists of single-node trees since only empty is possible. In each stage, new trees
are built by taking a forest with components from lower stages and attaching a new root atop of.
In contrast to the tree height which is at most ω, the rank of well-founded trees is unlimited,
see the properties of "unfolding".
Using recursive pairs
In computing, a common way to define well-founded trees is via recursive ordered pairs:
a tree is a forest together with a "fresh" node.
A forest in turn is a possibly empty set of trees with pairwise disjoint sets of nodes.
For the precise definition, proceed similarly as in the construction of names used in the set-theoretic technique of forcing.
Let be a set of nodes. In the superstructure over,
define sets, of trees and forests, respectively, and a map
assigning each tree its underlying set of nodes
so that
Circularities in the above conditions can be eliminated by stratifying each of, and into stages like in the previous subsection.
Subsequently, define a "subtree" relation on as the reflexive transitive closure of the "immediate subtree" relation defined between trees by
where is the projection of onto the first coordinate, i.e.
it is the forest such that for some.
It can be observed that is a multitree: for every, the principal ideal ordered by is a well-founded tree as a partial order. Moreover,
for every tree, its "nodes"-order structure is given by if and only if there are forests such that both and are subtrees of and.
Using arrows
Another formalization as well as generalization of unordered trees can be obtained by reifying parent-child pairs of nodes. Each such ordered pair can be regarded as an abstract entity - an "arrow".This results in a multidigraph
where
is the set of nodes,
is the set of arrows,
and and
are functions from to assigning each arrow its source and target, respectively.
The structure is subject to the following conditions:
is an unordered tree, as a total algebra.
The map is a bijection between arrows and nodes.
The condition says that inverse consecutivity of arrows
is a total child-to-parent map.
Let this parent map between arrows be denoted, i.e.
p = s ○ t−1.
Then we also have
s = p ○ t, thus a multidigraph satisfying can also be axiomatized as
,
with the parent map instead of as a definitory constituent.
Observe that the root arrow is necessarily a loop, i.e. its source and target coincide.
Arrow tree: the hard-link structure of VFS
An important generalization of the above structure is established by allowing the target map to be many-to-one. This means that is weakened to
The map is surjective
- each node is the target of some arrow.
only leaf arrows are allowed to have the same target.
That is,
the restriction of to the
range of
is still injective.
Multidigraphs satisfying can be called "arrow trees"
-
their tree characteristics is imposed on arrows rather than nodes.
These structures can be regarded as the most essential abstraction of the Linux VFS because they reflect the hard-link structure of filesystems. Nodes are called inodes, arrows are dentries. The parent and target maps and are respectively represented by
d_parent
and d_inode
fields in the dentry data structure.Each inode is assigned a fixed file type, of which the
directory type plays a special role of "designed parents":
only directory inodes can appear as hard-link source and
a directory inode cannot appear as the target of more than one hard-link.
Using dashed style for the first half of the root loop indicates that, similarly to the parent map, there is a partial version for the source map in which the source of the root arrow is undefined.
This variant is employed for further generalization, see #Using paths in a multidigraph.
Using paths in a [|digraph]
Unordered trees naturally arise by "unfolding" of accessible pointed graphs.
Let ℛ = be a pointed relational structure, i.e. such that is the set of nodes, is a relation between nodes,
and is a distinguished "root" node. Assume further that is accessible, which means that equals the preimage of under the reflexive transitive closure of, and call such a structure an accessible pointed graph or apg for short.
Then one can derive another apg ℛ' = - the [|unfolding] of - as follows:
- is the set of reversed paths to, i.e. the set of non-empty finite sequences of nodes such that
consecutive members of are inversely -related, and the first member of is the root, - is a relation between paths from such that paths and are -related
if and only if
p = q ⁎ for some node
, and - is the one-element sequence .
Moreover, the following properties are satisfied:
is isomorphic to its unfolding
if and only if
is a tree.
Unfolding preserves well-foundedness: If is well-founded then so is.
Unfolding preserves rank: If is well-founded then the ranks of and coincide.
Notes:
To establish a concordancy between and the map, the presented definition uses reversed accessibility:
is reachable from any node. In the original definition by P. Aczel, every node is reachable from .
We have implicitly introduced a definition of unordered trees as apgs: call an apg ℛ = a tree if the reduct is an unordered tree as a partial algebra. This can be translated as: Every node is accessible by exactly one path.
Using paths in a multidigraph
As shown on the example of hard-link structure of file systems, many data structures in computing allow multiple links between nodes.Therefore,
in order to properly exhibit the appearance of unordered trees among data structures it is necessary to generalize accessible pointed graphs to multidigraph setting.
To simplify the terminology, we make use of the term quiver which is an established synonym for "multidigraph".
Accessible pointed quiver : generalization of apg to multidigraphs.
Let an [|accessible pointed quiver] or apq for short
be defined as a structure
ℳ =
where
is a set of nodes,
is a set of arrows,
is a partial function from to ,
and is a total function from to .
Thus, is a "partial multidigraph".
The structure is subject to the following conditions:
- There is exactly one "root" arrow,, whose source is undefined.
Every node is reachable via a finite sequence of consecutive arrows starting with the root arrow.
The unfolding of is formed by the sequences mentioned in
-
which are the accessibility paths.
As an apq, the unfolding can be written as
ℳ' =
where
is the set of accessibility paths,
coincides with,
coincides with path popping, and
is the identity on.
Like with apgs, unfolding is idempotent and always results in a tree.
The underlying apg is obtained as the structure
where
.
The diagram above shows an example of an apq with 1+14 arrows.
In JavaScript, Python or
Ruby, the structure can be created by the following code:
r = ;
r = ; r = r; r = ; r = ;
r = ; r = r;
r = ; r = r; r = ;
r = r; r = r; r = ;
r = r; r = r;
Using names
Unordered trees and their generalizations form the essence of naming systems.There are two prominent examples of naming systems: file systems and associative arrays. The multidigraph-based structures from previous subsections provided anonymous abstractions for both cases. To obtain naming capabilities, arrows are to be equipped with names as identifiers.
A name must be locally unique - within each sibling set of arrows there can be at most one arrow labelled by a given name.
This can be formalized as a structure
where
is a set of nodes,
is a set of names,
is a set of arrows,
is a partial function from to,
is a partial function from to, and
is a total function from to.
For an arrow, constituents of the triple
are respectively 's
source, name and target.
The structure is subject to the following conditions.
The reduct is an
accessible pointed quiver as defined previously.
The name function is undefined just for the source-less root arrow.
The name function is injective in the restriction to every sibling set of arrows, i.e. for every non-root arrows,,
if and
then
This structure can be called a nested dictionary or named apq.
In computing, such structures are ubiquitous.
The table above shows that arrows can be considered "un-reified" as the set
of source-name-target triples.
This leads to a relational structure
which can be viewed as a
relational database table.
Underlines in
source
andname
indicate primary key.
The structure can be rephrased as a
deterministic labelled transition system:
is a set of "states", is a set of "labels", is a set of "labelled transitions".
Nested dictionary
The diagram on the right shows a nested dictionary
that has the same underlying multidigraph as the example in the previous subsection.
The structure can be created by the code below. Like before, exactly the same code applies for JavaScript, Python and Ruby.
First, a substructure,, is created
by a single assignment of a literal
to r
.This structure, depicted by full lines, is an "arrow tree".
The literal in turn appears to be a JSON serialization of.
Subsequently, the remaining arrows are created by assignments of already existing nodes. Arrows that cause cycles are displayed in blue.
r =
r = r; r = r
r = r; r = r = r
In the Linux VFS, the name function is represented by the
d_name
field in the dentry data structure.The structure above demonstrates a correspondence between JSON-representable structures and hard-link structures of file systems.
In both cases, there is a fixed set of built-in types of "nodes" of which one type is a container type, except that in JSON, there are in fact two such types
- Object and Array. If the latter one is ignored then the provided abstractions of file-systems and JSON data are the same
-
both are arrow trees equipped with naming and a distinction of container nodes.
Pathnames
The naming function of a nested dictionary naturally extends from arrows to arrow paths.Each sequence of consecutive arrows is implicitly assigned a pathname
- the sequence
of arrow names.
Local uniqueness carries over to arrow paths: different sibling paths have different pathnames.
In particular, the root-originating arrow paths are in one-to-one correspondence with their pathnames.
This correspondence provides a "symbolic" representation of the unfolding of via pathnames
-
the nodes in are globally identified via a tree of pathnames.
Ordered tree
The structures introduced in the previous subsection form just the core "hierarchical" part of tree data structuresthat appear in computing. In most cases, there is also an additional "horizontal" ordering between siblings. In search trees the order is commonly established by the "key" or value associated with each sibling, but in many trees that is not the case. For example, XML documents, lists within JSON files, and many other structures have order that does not depend on the values in the nodes, but is itself data sorting the paragraphs of a novel alphabetically would lose information.
The correspondent expansion of the previously described tree structures can be defined by endowing each sibling set with a linear order as follows.
An alternative definition according to Kuboyama is presented in the next subsection.
An ordered tree is a structure
where
is a non-empty set of nodes
and
≤V
and
≤S
are relations on
called vertical order
and
sibling order, respectively.
The structure is subject to the following conditions:
is a partial order that is an unordered tree as defined in the previous subsection.
is a partial order.
Distinct nodes are comparable in <S if and only if they are siblings:
∪ = ○ ) ∖ idX.
Every node has only finitely many preceding siblings, i.e. every principal ideal of is finite.
is a component-wise linear order, each component being a sibling set.
Condition asserts that if a sibling set is infinite then is isomorphic to
, the usual ordering of natural numbers.
Given this, there are three distinguished partial orders which are uniquely given by the following prescriptions:
style="font-style:normal">○ ○ , = ∪ , = ∪ .
This amounts to a "V-S-H-L±" system of five partial orders
≤V,
≤S,
≤H,
≤L⁺,
≤L⁻
on the same set of nodes, in which, except for the pair
,
any two relations uniquely determine the other three, see the determinacy table.
Notes about notational conventions:
The relation composition symbol ○ used in this subsection is to be interpreted left-to-right.
Symbols < and ≤ express the strict and non-strict versions
of a partial order.
Symbols > and ≥ express the converse relations.
The ≺ symbol is used for the
covering relation of ≤ which is the immediate version of a partial order.
≺,
<, ≤,
≻,
>, ≥ for a single partial order relation.
Except for
≺
and
≻,
each version uniquely determines the others.
Passing from ≺ to <
requires that < be transitively reducible.
This is always satisfied for all of
<V,
<S and
<H
but might not hold for
<L⁺ or
<L⁻ if is infinite.
The partial orders
≤V and
≤H
are complementary:
⊎
⊎
⊎
=
X × X
∖
idX.
As a consequence, the "concordant" linear order
<L⁺
is a linear extension of
<V.
Similarly, <L⁻
is a linear extension of
>V.
The covering relations
≺L⁻ and
≺L⁺ correspond to pre-order traversal and
post-order traversal, respectively.
If x
≺L⁻
y
then, according to whether has a previous sibling or not,
the node is either the "rightmost" non-strict descendant of the previous sibling of or, in the latter case, is the first child of.
Pairs of the latter case form the relation
∖
which is a partial map that assigns each non-leaf node its first child node.
Similarly,
∖
assigns each non-leaf node with finitely many children its last child node.
Definition using horizontal order
The Kuboyama's definition of "rooted ordered trees" makes use of the horizontal order ≤H as a definitory relation.Using the notation and terminology introduced so far, the definition can be expressed as follows.
An ordered tree is a structure
such that conditions are satisfied:
is a partial order that is an unordered tree.
is a partial order.
The partial orders
≤V and
≤H
are complementary:
⊎
⊎
⊎
=
X × X
∖
idX.
are comparable in
The partial orders
≤V and
≤H
are "consistent":
=
○
○
.
Every node has only finitely many preceding siblings.
by
=
∩
○
),
i.e. two distinct nodes are in sibling order if and only if they are in horizontal order and are siblings.
Determinacy table
The following table shows the determinacy of the "V-S-H-L±" system.Relational expressions in the table's body are equal to one of
<V,
<S,
<H,
<L⁻,
or
<L⁺
according to the column.
It follows that except for the pair ,
an ordered tree is uniquely determined by any two of the five relations.
In the last two rows,
denotes the infimum of in
,
and
denotes the supremum of in
.
In both rows, resp. can be equivalently replaced by the sibling equivalence
○.
In particular, the partition into sibling sets together with either of
≤L⁻ or
≤L⁺
is also sufficient to determine the ordered tree.
The first prescription for ≺V can be read as:
the parent of a non-root node equals the infimum of the set of all immediate
predecessors of siblings of, where the words "infimum" and "predecessors" are meant with regard to ≤L⁻.
Similarly with the second prescription, just use "supremum", "successors" and
≤L⁺.
The relations ≤S and
≤H obviously cannot form a definitory pair. For the simplest example, consider an ordered tree with exactly two nodes - then one cannot tell which of them is the root.
XPath Axes
XPath Axis | Relation |
ancestor | <V |
ancestor-or-self | ≤V |
child | ≻V |
descendant | >V |
descendant-or-self | ≥V |
following | <H |
following-sibling | <S |
parent | ≺V |
preceding | >H |
preceding-sibling | >S |
self | idX |
The table on the right shows a correspondence of introduced relations
to XPath axes, which are used in structured document systems to access nodes that bear particular ordering relationships to a starting "context" node.
For a context node
, its axis named by the specifier in the left column is the set of nodes that equals the
image of under
the correspondent relation.
As of XPath 2.0, the nodes are "returned" in document order,
which is the "discordant" linear order
≤L⁻.
A "concordance" would be achieved, if the vertical order ≤V was defined oppositely, with the bottom-up direction outwards the root like in set theory in accordance to natural trees.
Traversal maps
Below is the list of partial maps that are typically used for ordered tree traversal.Each map is a distinguished functional subrelation of
≤L⁻ or of its opposite.
≺V
... the parent-node partial map,
≻S
... the previous-sibling partial map,
≺S
... the next-sibling partial map,
∖
... the first-child partial map,
∖
... the last-child partial map,
≻L⁻
... the previous-node partial map,
≺L⁻
... the next-node partial map.
Generating structure
The traversal maps constitute a partial unary algebrathat forms a basis for representing trees as linked data structures.
At least conceptually,
there are parent links, sibling adjacency links, and first / last child links.
This also applies to unordered trees in general, which can be observed on the dentry data structure in the Linux VFS.
Similarly to the "V-S-H-L±" system of partial orders, there are pairs of traversal maps that uniquely determine the whole ordered tree structure.
Naturally, one such generating structure is
which can be transcribed as
- the structure of parent and next-sibling links.
Another important generating structure is
known as left-child right-sibling binary tree.
This partial algebra establishes a one-to-one correspondence between binary trees and
ordered trees.
Definition using binary trees
The correspondence to binary trees provides a concise definition of ordered trees as partial algebras.An ordered tree is a structure where
is a non-empty set of nodes, and
, are partial maps on called
left-child and right-sibling, respectively.
The structure is subject to the following conditions:
The partial maps and are disjoint, i.e.
∩
=
∅
.
The inverse of ∪ is a partial map
such that the partial algebra is an unordered tree.
is obtained as follows:
Encoding by sequences
Ordered trees can be naturally encoded by finite sequences of natural numbers.Denote ω⁎ the set of all finite sequences of natural numbers. Then any non-empty subset of ω⁎ that is closed under taking prefixes gives rise to an ordered tree: just take
the prefix order for ≥V
and the lexicographical order for
≤L⁻.
Conversely, for an ordered tree T =
assign each node the sequence of sibling indices, i.e.
the root is assigned the empty sequence and for every non-root node,
let w = w ⁎ where is the number of preceding siblings of
and ⁎ is the concatenation operator.
Put. Then, equipped with the induced orders
≤V
and
≤L⁻, is isomorphic to.
Per-level ordering
As a possible expansion of the "V-S-H-L±" system,another distinguished relations between nodes can be defined, based on the tree's level structure.
First, let us denote by ∼E the equivalence relation defined by
x ∼E y
if and only if
and have the same number of ancestors.
This yields a partition of the set of nodes into levels
L0, L1,...
- a coarsement of the partition into sibling sets.
Then define relations
<E,
<B⁻
and
<B⁺
by
It can be observed that <E
is a strict partial order and
<B⁻
and
<B⁺ are strict total orders.
Moreover, there is a similarity
between the "V-S-L±" and "V-E-B±" systems:
<E
is component-wise linear and orthogonal to
<V,
<B⁻
is linear extension of
<E
and of >V,
and
<B⁺
is a linear extension of
<E
and of <V.
Terminology used in trees
Data type versus data structure
There is a distinction between a tree as an abstract data type and as a concrete data structure, analogous to the distinction between a list and a linked list.As a data type, a tree has a value and children, and the children are themselves trees; the value and children of the tree are interpreted as the value of the root node and the subtrees of the children of the root node. To allow finite trees, one must either allow the list of children to be empty, or allow trees to be empty, in which case the list of children can be of fixed size, if desired.
As a data structure, a linked tree is a group of nodes, where each node has a value and a list of references to other nodes. There is also the requirement that no two "downward" references point to the same node. In practice, nodes in a tree commonly include other data as well, such as next/previous references, references to their parent nodes, or nearly anything.
Due to the use of to trees in the linked tree data structure, trees are often discussed implicitly assuming that they are being represented by references to the root node, as this is often how they are actually implemented. For example, rather than an empty tree, one may have a null reference: a tree is always non-empty, but a reference to a tree may be null.
Recursive
Recursively, as a data type a tree is defined as a value, together with a list of trees, the subtrees of its children; symbolically:t: v
with the axioms:
In terms of [type theory">Abstract data type">ADT, the abstract tree type with values of some type is defined, using the abstract forest type , by the functions:
with the axioms:
In terms of [type theory, a tree is an inductive type defined by the constructors and .
Mathematical
Viewed as a whole, a tree data structure is an ordered tree, generally with values attached to each node. Concretely, it is :- A rooted tree with the "away from root" direction, meaning:
- * A directed graph,
- * whose underlying undirected graph is a tree,
- * with a distinguished root,
- * which determines the direction on the edges,
- an ordering on the child nodes of a given node, and
- a value at each node.
Allowing empty trees makes some definitions simpler, some more complicated: a rooted tree must be non-empty, hence if empty trees are allowed the above definition instead becomes "an empty tree or a rooted tree such that...". On the other hand, empty trees simplify defining fixed branching factor: with empty trees allowed, a binary tree is a tree such that every node has exactly two children, each of which is a tree. The complete sets of operations on the tree must include fork operation.
Drawing trees
Trees are often drawn in the plane. Ordered trees can be represented essentially uniquely in the plane, and are hence called plane trees, as follows: if one fixes a conventional order, and arranges the child nodes in that order, this yields an embedding of the tree in the plane, unique up to ambient isotopy. Conversely, such an embedding determines an ordering of the child nodes.If one places the root at the top and places all nodes that are a given distance from the root on a given horizontal line, one obtains a standard drawing of the tree. Given a binary tree, the first child is on the left, and the second child is on the right.
Representations
There are many different ways to represent trees; common representations represent the nodes as dynamically allocated records with pointers to their children, their parents, or both, or as items in an array, with relationships between them determined by their positions in the array.Indeed, a binary tree can be implemented as a list of lists : the head of a list is the left child, while the tail is the right child. This can be modified to allow values as well, as in Lisp S-expressions, where the head is the value of the node, the head of the tail is the left child, and the tail of the tail is the right child.
In general a node in a tree will not have pointers to its parents, but this information can be included or stored separately. Alternatively, upward links can be included in the child node data, as in a threaded binary tree.
Generalizations
Digraphs
If edges are thought of as references, then a tree is a special case of a digraph, and the tree data structure can be generalized to represent directed graphs by removing the constraints that a node may have at most one parent, and that no cycles are allowed. Edges are still abstractly considered as pairs of nodes, however, the terms and are usually replaced by different terminology. Different implementation strategies exist: a digraph can be represented by the same local data structure as a tree, assuming that "list of children" is a list of references, or globally by such structures as adjacency lists.In graph theory, a tree is a connected acyclic graph; unless stated otherwise, in graph theory trees and graphs are assumed undirected. There is no one-to-one correspondence between such trees and trees as data structure. We can take an arbitrary undirected tree, arbitrarily pick one of its vertices as the, make all its edges directed by making them point away from the root node – producing an arborescence – and assign an order to all the nodes. The result corresponds to a tree data structure. Picking a different root or different ordering produces a different one.
Given a node in a tree, its children define an ordered forest. Just as subtrees are natural for recursion, forests are natural for corecursion.
Via mutual recursion, a forest can be defined as a list of trees, where a node consists of a value and a forest :
f:
n: v f
Traversal methods
Stepping through the items of a tree, by means of the connections between parents and children, is called walking the tree, and the action is a '' of the tree. Often, an operation might be performed when a pointer arrives at a particular node. A walk in which each parent node is traversed before its children is called a pre-order walk; a walk in which the children are traversed before their respective parents are traversed is called a post-order walk; a walk in which a node's left subtree, then the node itself, and finally its right subtree are traversed is called an in-order traversal.A level-order walk effectively performs a [breadth-first search over the entirety of a tree; nodes are traversed level by level, where the root node is visited first, followed by its direct child nodes and their siblings, followed by its grandchild nodes and their siblings, etc., until all nodes in the tree have been traversed.
Common operations
- Enumerating all the items
- Enumerating a section of a tree
- Searching for an item
- Adding a new item at a certain position on the tree
- Deleting an item
- Pruning: Removing a whole section of a tree
- Grafting: Adding a whole section to a tree
- Finding the root for any node
- Finding the lowest common ancestor of two nodes
Common uses
- Representing hierarchical data such as syntax trees
- Storing data in a way that makes it efficiently searchable
- Representing sorted lists of data
- As a workflow for compositing digital images for visual effects
- Storing Barnes-Hut trees used to simulate galaxies.
Other trees
- Trie
- Day–Stout–Warren algorithm
- Enfilade
- Left child-right sibling binary tree
- Hierarchical temporal memory