Global serializability
In concurrency control of databases, transaction processing, and other transactional distributed applications, global serializability is a property of a global schedule of transactions. A global schedule is the unified schedule of all the individual database schedules in a multidatabase environment. Complying with global serializability means that the global schedule is serializable, has the serializability property, while each component database has a serializable schedule as well. In other words, a collection of serializable components provides overall system serializability, which is usually incorrect. A need in correctness across databases in multidatabase systems makes global serializability a major goal for global concurrency control. With the proliferation of the Internet, Cloud computing, Grid computing, and small, portable, powerful computing devices, as well as increase in systems management sophistication, the need for atomic distributed transactions and thus effective global serializability techniques, to ensure correctness in and among distributed transactional applications, seems to increase.
In a federated database system or any other more loosely defined multidatabase system, which are typically distributed in a communication network, transactions span multiple databases. Enforcing global serializability in such system, where different databases may use different types of concurrency control, is problematic. Even if every local schedule of a single database is serializable, the global schedule of a whole system is not necessarily serializable. The massive communication exchanges of conflict information needed between databases to reach conflict serializability globally would lead to unacceptable performance, primarily due to computer and communication latency. Achieving global serializability effectively over different types of concurrency control has been open for several years. Commitment ordering, a serializability technique publicly introduced in 1991 by Yoav Raz from Digital Equipment Corporation, provides an effective general solution for global serializability across any collection of database systems and other transactional objects, with possibly different concurrency control mechanisms. CO does not need the distribution of conflict information, but rather utilizes the already needed atomic commitment protocol messages without any further communication between databases. It also allows optimistic implementations. CO generalizes strong strict two phase locking, which in conjunction with the two-phase commit protocol is the de facto standard for achieving global serializability across database systems. As a result, CO compliant database systems can transparently join existing SS2PL based solutions for global serializability. The same applies also to all other multiple object systems that use atomic transactions and need global serializability for correctness.
The most significant aspects of CO that make it a uniquely effective general solution for global serializability are the following:
- Seamless, low overhead integration with any concurrency control mechanism, with neither changing any transaction's operation scheduling or blocking it, nor adding any new operation.
- Heterogeneity: Global serializability is achieved across multiple transactional objects with different concurrency control mechanisms, without interfering with the mechanisms' operations.
- Modularity: Transactional objects can be added and removed transparently.
- Autonomy of transactional objects: No need of conflict or equivalent information distribution.
- Scalability: With "normal" global transactions, computer network size and number of transactional objects can increase unboundedly with no impact on performance, and
- Automatic global deadlock resolution.
The global serializability problem
Problem statement
The difficulties described above translate into the following problem:Quotations
Lack of an appropriate solution for the global serializability problem has driven researchers to look for alternatives to serializability as a correctness criterion in a multidatabase environment, and the problem has been characterized as difficult and open. The following two quotations demonstrate the mindset about it by the end of the year 1991, with similar quotations in numerous other articles:- "Without knowledge about local as well as global transactions, it is highly unlikely that efficient global concurrency control can be provided... Additional complications occur when different component DBMSs and the FDBMSs support different concurrency mechanisms... It is unlikely that a theoretically elegant solution that provides conflict serializability without sacrificing performance and availability exists."
- "Transaction management in a heterogeneous, distributed database system is a difficult issue. The main problem is that each of the local database management systems may be using a different type of concurrency control scheme. Integrating this is a challenging problem, made worse if we wish to preserve the local autonomy of each of the local databases, and allow local and global transactions to execute in parallel. One simple solution is to restrict global transactions to retrieve-only access. However, the issue of reliable transaction management in the general case, where global and local transactions are allowed to both read and write data, is still open."
Even in later years, after the public introduction of the Commitment ordering general solution in 1991, the problem still has been considered by many unsolvable:
- "We present a transaction model for multidatabase systems with autonomous component systems, coined heterogeneous 3-level transactions. It has become evident that in such a system the requirements of guaranteeing full ACID properties and full local autonomy can not be reconciled..."
Similar thinking we see also in the following quotation from a 1998 article:
- "The concept of serializability has been the traditionally accepted correctness criterion in database systems. However in multidatabase systems, ensuring global serializability is a difficult task. The difficulty arises due to the heterogeneity of the concurrency control protocols used by the participating local database management systems, and the desire to preserve the autonomy of the local DBMSs. In general, solutions to the global serializability problem result in executions with a low degree of concurrency. The alternative, relaxed serializability, may result in data inconsistency."
On the other hand, the following quotation on CO appears in a 2009 book:
- "Not all concurrency control algorithms use locks... Three other techniques are timestamp ordering, serialization graph testing, and commit ordering. Timestamp ordering assigns each transaction a timestamp and ensures that conflicting operations execute in timestamp order. Serialization graph testing tracks conflicts and ensures that the serialization graph is acyclic. Commit ordering ensures that conflicting operations are consistent with the relative order in which their transactions commit, which can enable interoperability of systems using different concurrency control mechanisms."
- Beyond the common locking based algorithm SS2PL, which is a CO variant itself, also additional variants of CO that use locks exist,. However, generic, or "pure" CO does not use locks.
- Since CO mechanisms order the commit events according to conflicts that already have occurred, it is better to describe CO as "Commit ordering ensures that the relative order in which transactions commit is consistent with the order of their respective conflicting operations."
Proposed solutions
Several solutions, some partial, have been proposed for the global serializability problem. Among them:- Global conflict graph checking
- Distributed Two phase locking
- Distributed Timestamp ordering
- Tickets
- Commitment ordering
Technology perspective
The commitment ordering solution
Commitment ordering is the only high-performance, fault tolerant, conflict serializability providing solution that has been proposed as a fully distributed, general mechanism that can be combined seamlessly with any local concurrency control mechanism. Since the CO property of a schedule is a necessary condition for global serializability of autonomous databases, it provides the only general solution for autonomous databases. Seemingly by sheer luck, the CO solution possesses many attractive properties:- does not interfere with any transaction's operation, particularly neither block, restrict nor delay any data-access operation for either local or global transactions ; thus allows seamless integration with any concurrency control mechanism.
- allows optimistic implementations.
- allows heterogeneity: Global serializability is achieved across multiple transactional objects with different concurrency control mechanisms, without interfering with the mechanisms' operations.
- allows modularity: Transactional objects can be added and removed transparently.
- allows full ACID transaction support.
- maintains each database's autonomy, and does not need any concurrency control information distribution.
- does not need any knowledge about the transactions.
- requires no communication overhead since it only uses already needed, unmodified atomic commitment protocol messages.
- automatically resolves global deadlocks due to locking.
- scales up effectively with computer network size and number of databases, almost without any negative impact on performance, since each global transaction is typically confined to certain relatively small numbers of databases and network nodes.
- requires no additional, artificial transaction access operations, which typically result in additional, artificial conflicts that reduce concurrency.
- requires low overhead.
All the qualities of CO in the list above, except the first three, are also possessed by SS2PL, which is a special case of CO, but blocking and constraining. This partially explains the popularity of SS2PL as a solution for achieving global serializability. However, property 9 above, automatic resolution of global deadlocks, has not been noticed for SS2PL in the database research literature until today. This, since the phenomenon of voting-deadlocks in such environments and their automatic resolution by the atomic commitment protocol has been overlooked.
Most existing database systems, including all major commercial database systems, are strong strict two phase locking based and already CO compliant. Thus they can participate in a CO based solution for global serializability in multidatabase environments without any modification. Achieving global serializability across SS2PL based databases using atomic commitment has been employed for many years. Virtually all existing distributed transaction processing environments and supporting products rely on SS2PL and provide 2PC. As a matter of fact SS2PL together with 2PC have become a de facto standard. This solution is a homogeneous concurrency control one, suboptimal but still quite effective in most cases, sometimes at the cost of increased computing power needed relatively to the optimum.. It allows inter-operation among SS2PL-compliant different database system types, i.e., allows heterogeneity in aspects other than concurrency control. SS2PL is a very constraining schedule property, and "takes over" when combined with any other property. For example, when combined with any optimistic property, the result is not optimistic anymore, but rather characteristically SS2PL. On the other hand, CO does not change data-access scheduling patterns at all, and any combined property's characteristics remain unchanged. Since also CO uses atomic commitment for achieving global serializability, as SS2PL does, any CO compliant database system or transactional object can transparently join existing SS2PL based environments, use 2PC, and maintain global serializability without any environment change. This makes CO a straightforward, natural generalization of SS2PL for any conflict serializability based database system, for all practical purposes.
Commitment ordering has been quite widely known inside the transaction processing and databases communities at Digital Equipment Corporation since 1990. It has been under company confidentiality due to patenting
processes. CO was disclosed outside of DEC by lectures and technical reports' distribution to database researches in May 1991, immediately after its first patent filing. It has been misunderstood by many database researchers years after its introduction, which is evident by the quotes above from articles in 1997-1998 referencing Commitment ordering articles. On the other hand, CO has been utilized extensively as a solution for global serializability in works on Transactional processes,
and more recently in the related Re:GRIDiT,
which is an approach for transaction management in the converging Grid computing and Cloud computing.
See more in The History of Commitment Ordering.
Relaxing global serializability
Some techniques have been developed for relaxed global serializability. Among them :- Quasi serializability
- Two-level serializability
Another common reason nowadays for Global serializability relaxation is the requirement of availability of internet products and services. This requirement is typically answered by large scale data replication. The straightforward solution for synchronizing replicas' updates of a same database object is including all these updates in a single atomic distributed transaction. However, with many replicas such a transaction is very large, and may span several computers and networks that some of them are likely to be unavailable. Thus such a transaction is likely to end with abort and miss its purpose.
Consequently, Optimistic replication is often utilized, while global serializability is relaxed and compromised for eventual consistency. In this case relaxation is done only for applications that are not expected to be harmed by it.
Classes of schedules defined by relaxed global serializability properties either contain the global serializability class, or are incomparable with it. What differentiates techniques for relaxed global conflict serializability properties from those of relaxed conflict serializability properties that are not RGCSR is typically the different way global cycles in the global conflict graph are handled. No distinction between global and local cycles exists for RCSR properties that are not RGCSR. RCSR contains RGCSR. Typically RGCSR techniques eliminate local cycles, i.e., provide local serializability ; however, obviously they do not eliminate all global cycles.