Abstract data type
In computer science, an abstract data type is a mathematical model for data types. An abstract data type is defined by its behavior from the point of view of a user, of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user.
Formally, an ADT may be defined as a "class of objects whose logical behavior is defined by a set of values and a set of operations"; this is analogous to an algebraic structure in mathematics. What is meant by "behavior" varies by author, with the two main types of formal specifications for behavior being axiomatic specification and an abstract model; these correspond to axiomatic semantics and operational semantics of an abstract machine, respectively. Some authors also include the computational complexity, both in terms of time and space. In practice many common data types are not ADTs, as the abstraction is not perfect, and users must be aware of issues like arithmetic overflow that are due to the representation. For example, integers are often stored as fixed-width values, and thus experience integer overflow if the maximum value is exceeded.
ADTs are a theoretical concept, in computer science, used in the design and analysis of algorithms, data structures, and software systems, and do not correspond to specific features of computer languages—mainstream computer languages do not directly support formally specified ADTs. However, various language features correspond to certain aspects of ADTs, and are easily confused with ADTs proper; these include abstract types, opaque data types, protocols, and design by contract. ADTs were first proposed by Barbara Liskov and Stephen N. Zilles in 1974, as part of the development of the CLU language.
Examples
For example, integers are an ADT, defined as the values..., −2, −1, 0, 1, 2,..., and by the operations of addition, subtraction, multiplication, and division, together with greater than, less than, etc., which behave according to familiar mathematics, independently of how the integers are represented by the computer. Explicitly, "behavior" includes obeying various axioms, and preconditions on operations. Typically integers are represented in a data structure as binary numbers, most often as two's complement, but might be binary-coded decimal or in ones' complement, but the user is abstracted from the concrete choice of representation, and can simply use the data as data types.An ADT consists not only of operations but also of values of the underlying data and of constraints on the operations. An "interface" typically refers only to the operations, and perhaps some of the constraints on the operations, notably pre-conditions and post-conditions, but not other constraints, such as relations between the operations.
For example, an abstract stack, which is a last-in-first-out structure, could be defined by three operations: push, that inserts a data item onto the stack; pop, that removes a data item from it; and peek or top, that accesses a data item on top of the stack without removal. An abstract queue, which is a first-in-first-out structure, would also have three operations: enqueue, that inserts a data item into the queue; dequeue, that removes the first data item from it; and front, that accesses and serves the first data item in the queue. There would be no way of differentiating these two data types, unless a mathematical constraint is introduced that for a stack specifies that each pop always returns the most recently pushed item that has not been popped yet. When analyzing the efficiency of algorithms that use stacks, one may also specify that all operations take the same time no matter how many data items have been pushed into the stack, and that the stack uses a constant amount of storage for each element.
Introduction
Abstract data types are purely theoretical entities, used to simplify the description of abstract algorithms, to classify and evaluate data structures, and to formally describe the type systems of programming languages. However, an ADT may be implemented by specific data types or data structures, in many ways and in many programming languages; or described in a formal specification language. ADTs are often implemented as modules: the module's interface declares procedures that correspond to the ADT operations, sometimes with comments that describe the constraints. This information hiding strategy allows the implementation of the module to be changed without disturbing the client programs.The term abstract data type can also be regarded as a generalized approach of a number of algebraic structures, such as lattices, groups, and rings. The notion of abstract data types is related to the concept of data abstraction, important in object-oriented programming and design by contract methodologies for software development.
Defining an abstract data type
An abstract data type is defined as a mathematical model of the data objects that make up a data type as well as the functions that operate on these objects.There are no standard conventions for defining them. A broad division may be drawn between "imperative" and "functional" definition styles.
Imperative-style definition
In the philosophy of imperative programming languages, an abstract data structure is conceived as an entity that is mutable—meaning that it may be in different states at different times. Some operations may change the state of the ADT; therefore, the order in which operations are evaluated is important, and the same operation on the same entities may have different effects if executed at different times—just like the instructions of a computer, or the commands and procedures of an imperative language. To underscore this view, it is customary to say that the operations are executed or applied, rather than evaluated. The imperative style is often used when describing abstract algorithms.Abstract variable
Imperative-style definitions of ADT often depend on the concept of an abstract variable, which may be regarded as the simplest non-trivial ADT. An abstract variable V is a mutable entity that admits two operations:- store where x is a value of unspecified nature;
- fetch, that yields a value,
- fetch always returns the value x used in the most recent store operation on the same variable V.
In this definition, it is implicitly assumed that storing a value into a variable U has no effect on the state of a distinct variable V. To make this assumption explicit, one could add the constraint that
- if U and V are distinct variables, the sequence is equivalent to.
The definition of an abstract variable V may also restrict the stored values x to members of a specific set X, called the range or type of V. As in programming languages, such restrictions may simplify the description and analysis of algorithms, and improve their readability.
Note that this definition does not imply anything about the result of evaluating fetch when V is un-initialized, that is, before performing any store operation on V. An algorithm that does so is usually considered invalid, because its effect is not defined.
Instance creation
Some algorithms need to create new instances of some ADT. To describe such algorithms, one usually includes in the ADT definition a create operation that yields an instance of the ADT, usually with axioms equivalent to- the result of create is distinct from any instance in use by the algorithm.
Example: abstract stack (imperative)
As another example, an imperative-style definition of an abstract stack could specify that the state of a stack S can be modified only by the operations- push, where x is some value of unspecified nature;
- pop, that yields a value as a result,
- For any value x and any abstract variable V, the sequence of operations is equivalent to V ← x.
where x, y, and z are any values, and U, V, W are pairwise distinct variables, is equivalent to
Here it is implicitly assumed that operations on a stack instance do not modify the state of any other ADT instance, including other stacks; that is,
- For any values x, y, and any distinct stacks S and T, the sequence is equivalent to.
- create ≠ S for any prior stack S ;
- empty) ;
- not empty.
Single-instance style
On the other hand, some ADTs cannot be meaningfully defined without assuming multiple instances. This is the case when a single operation takes two distinct instances of the ADT as parameters. For an example, consider augmenting the definition of the abstract stack with an operation compare that checks whether the stacks S and T contain the same items in the same order.
Functional-style definition
Another way to define an ADT, closer to the spirit of functional programming, is to consider each state of the structure as a separate entity. In this view, any operation that modifies the ADT is modeled as a mathematical function that takes the old state as an argument, and returns the new state as part of the result. Unlike the imperative operations, these functions have no side effects. Therefore, the order in which they are evaluated is immaterial, and the same operation applied to the same arguments will always return the same results.In the functional view, in particular, there is no way to define an "abstract variable" with the semantics of imperative variables. Instead of storing values into variables, one passes them as arguments to functions.
Example: abstract stack (functional)
For example, a complete functional-style definition of an abstract stack could use the three operations:- push: takes a stack state and an arbitrary value, returns a stack state;
- top: takes a stack state, returns a value;
- pop: takes a stack state, returns a stack state.
Instead of create, a functional-style definition of an abstract stack may assume the existence of a special stack state, the empty stack, designated by a special symbol like Λ or ""; or define a bottom operation that takes no arguments and returns this special stack state. Note that the axioms imply that
- push ≠ Λ.
Note that these axioms do not define the effect of top or pop, unless s is a stack state returned by a push. Since push leaves the stack non-empty, those two operations are undefined when s = Λ. On the other hand, the axioms imply that push = push if and only if x = y and s = t.
As in some other branches of mathematics, it is customary to assume also that the stack states are only those whose existence can be proved from the axioms in a finite number of steps. In the abstract stack example above, this rule means that every stack is a finite sequence of values, that becomes the empty stack after a finite number of pops. By themselves, the axioms above do not exclude the existence of infinite stacks or circular stacks. In particular, they do not exclude states s such that pop = s or push = s for some x. However, since one cannot obtain such stack states with the given operations, they are assumed "not to exist".
Whether to include complexity
Aside from the behavior in terms of axioms, it is also possible to include, in the definition of an ADT operation, their algorithmic complexity. Alexander Stepanov, designer of the C++ Standard Template Library, included complexity guarantees in the STL specification, arguing:Advantages of abstract data typing
Encapsulation
provides a promise that any implementation of the ADT has certain properties and abilities; knowing these is all that is required to make use of an ADT object. The user does not need any technical knowledge of how the implementation works to use the ADT. In this way, the implementation may be complex but will be encapsulated in a simple interface when it is actually used.Localization of change
Code that uses an ADT object will not need to be edited if the implementation of the ADT is changed. Since any changes to the implementation must still comply with the interface, and since code using an ADT object may only refer to properties and abilities specified in the interface, changes may be made to the implementation without requiring any changes in code where the ADT is used.Flexibility
Different implementations of the ADT, having all the same properties and abilities, are equivalent and may be used somewhat interchangeably in code that uses the ADT. This gives a great deal of flexibility when using ADT objects in different situations. For example, different implementations of the ADT may be more efficient in different situations; it is possible to use each in the situation where they are preferable, thus increasing overall efficiency.Typical operations
Some operations that are often specified for ADTs are- compare, that tests whether two instances' states are equivalent in some sense;
- hash, that computes some standard hash function from the instance's state;
- print or show, that produces a human-readable representation of the instance's state.
- create, that yields a new instance of the ADT;
- initialize, that prepares a newly created instance s for further operations, or resets it to some "initial state";
- copy, that puts instance s in a state equivalent to that of t;
- clone, that performs s ← create, copy, and returns s;
- free or destroy, that reclaims the memory and other resources used by s.
Examples
Some common ADTs, which have proved useful in a great variety of applications, are- Container
- List
- Set
- Multiset
- Map
- Multimap
- Graph
- Tree
- Stack
- Queue
- Priority queue
- Double-ended queue
- Double-ended priority queue
; Abstract graphical data type
An extension of ADT for computer graphics was proposed in 1979: an abstract graphical data type. It was introduced by Nadia Magnenat Thalmann, and Daniel Thalmann. AGDTs provide the advantages of ADTs with facilities to build graphical objects in a structured way.
Implementation
Implementing an ADT means providing one procedure or function for each abstract operation. The ADT instances are represented by some concrete data structure that is manipulated by those procedures, according to the ADT's specifications.Usually there are many ways to implement the same ADT, using several different concrete data structures. Thus, for example, an abstract stack can be implemented by a linked list or by an array.
In order to prevent clients from depending on the implementation, an ADT is often packaged as an opaque data type in one or more modules, whose interface contains only the signature of the operations. The implementation of the module—namely, the bodies of the procedures and the concrete data structure used—can then be hidden from most clients of the module. This makes it possible to change the implementation without affecting the clients. If the implementation is exposed, it is known instead as a transparent data type.
When implementing an ADT, each instance or each state is usually represented by a handle of some sort.
Modern object-oriented languages, such as C++ and Java, support a form of abstract data types. When a class is used as a type, it is an abstract type that refers to a hidden representation. In this model an ADT is typically implemented as a class, and each instance of the ADT is usually an object of that class. The module's interface typically declares the constructors as ordinary procedures, and most of the other ADT operations as methods of that class. However, such an approach does not easily encapsulate multiple representational variants found in an ADT. It also can undermine the extensibility of object-oriented programs.
In a pure object-oriented program that uses interfaces as types, types refer to behaviors not representations.
Example: implementation of the abstract stack
As an example, here is an implementation of the abstract stack above in the C programming language.Imperative-style interface
An imperative-style interface might be:typedef struct stack_Rep stack_Rep; // type: stack instance representation
typedef stack_Rep* stack_T; // type: handle to a stack instance
typedef void* stack_Item; // type: value stored in stack instance
stack_T stack_create; // creates a new empty stack instance
void stack_push; // adds an item at the top of the stack
stack_Item stack_pop; // removes the top item from the stack and returns it
bool stack_empty; // checks whether stack is empty
This interface could be used in the following manner:
- include
// includes the stack interface
int x = 17;
stack_push; // adds the address of x at the top of the stack
void* y = stack_pop; // removes the address of x from the stack and returns it
if // does something if stack is empty
This interface can be implemented in many ways. The implementation may be arbitrarily inefficient, since the formal definition of the ADT, above, does not specify how much space the stack may use, nor how long each operation should take. It also does not specify whether the stack state s continues to exist after a call x ← pop.
In practice the formal definition should specify that the space is proportional to the number of items pushed and not yet popped; and that every one of the operations above must finish in a constant amount of time, independently of that number. To comply with these additional specifications, the implementation could use a linked list, or an array together with two integers.
Functional-style interface
Functional-style ADT definitions are more appropriate for functional programming languages, and vice versa. However, one can provide a functional-style interface even in an imperative language like C. For example:typedef struct stack_Rep stack_Rep; // type: stack state representation
typedef stack_Rep* stack_T; // type: handle to a stack state
typedef void* stack_Item; // type: value of a stack state
stack_T stack_empty; // returns the empty stack state
stack_T stack_push; // adds an item at the top of the stack state and returns the resulting stack state
stack_T stack_pop; // removes the top item from the stack state and returns the resulting stack state
stack_Item stack_top; // returns the top item of the stack state