ABA problem


In multithreaded computing, the ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and "value is the same" is used to indicate "nothing has changed". However, another thread can execute between the two reads and change the value, do other work, then change the value back, thus fooling the first thread into thinking "nothing has changed" even though the second thread did work that violates that assumption.
The ABA problem occurs when multiple threads accessing shared data interleave. Below is the sequence of events that will result in the ABA problem:
Although can continue executing, it is possible that the behavior will not be correct due to the "hidden" modification in shared memory.
A common case of the ABA problem is encountered when implementing a lock-free data structure. If an item is removed from the list, deleted, and then a new item is allocated and added to the list, it is common for the allocated object to be at the same location as the deleted object due to MRU memory allocation. A pointer to the new item is thus often equal to a pointer to the old item, causing an ABA problem.

Examples

Consider a software example of ABA using a lock-free stack:

/* Naive lock-free stack which suffers from ABA problem.*/
class Stack ;

This code can normally prevent problems from concurrent access, but suffers from ABA problems. Consider the following sequence:
Stack initially contains top → A → B → C
Thread 1 starts running pop:
ret = A;
next = B;
Thread 1 gets interrupted just before the compare_exchange_weak...

// Now the stack is top → B → C
// Now the stack is top → C
delete B;


Now the stack is top → A → C
When Thread 1 resumes:
compare_exchange_weak
This instruction succeeds because it finds top ret, so it sets top to next. As B has been deleted the program will access freed memory when it tries to look at the first element on the stack. In C++, as shown here, accessing freed memory is undefined behavior: this may result in crashes, data corruption or even just silently appear to work correctly. ABA bugs such as this can be difficult to debug.
The real problem is not 'ABA', i.e., whether the value of A has been changed does not matter in the example. The real problem is due to B being removed from the list and the memory it occupies gets freed. Even if A has not been changed, i.e., the linked list is singly linked backwards C->B->A and tail->A, but B gets deleted and freed by another thread, the problem above still exists. This leads to another problem, i.e., if B is deleted from the list by another thread, tail would point to the deleted B. The 'ABA Problem' is therefore really 'Problem B' which has not much to do with A.

Workarounds

Tagged state reference

A common workaround is to add extra "tag" or "stamp" bits to the quantity being considered. For example, an algorithm using compare and swap on a pointer might use the low bits of the address to indicate how many times the pointer has been successfully modified. Because of this, the next compare-and-swap will fail, even if the addresses are the same, because the tag bits will not match. This is sometimes called ABAʹ since the second A is made slightly different from the first. Such tagged state references are also used in transactional memory.
If "tag" field wraps around, guarantees against ABA do not stand anymore. However, it has been observed that on currently existing CPUs, and using 60-bit tags, no wraparound is possible as long as the program lifetime is limited to 10 years; in addition, it was argued that for practical purposes it is usually sufficient to have 40-48 bits of tag to guarantee against wrapping around. As modern CPUs tend to support 128-bit CAS operations, this can allow firm guarantees against ABA.

Intermediate nodes

A correct but expensive approach is to use intermediate nodes that are not data elements and thus assure invariants as elements are inserted and removed .

Deferred reclamation

Another approach is to defer reclamation of removed data elements. One way to defer reclamation is to run the algorithm in an environment featuring an automatic garbage collector; a problem here however is that if the GC is not lock-free, then the overall system is not lock-free, even though the data structure itself is.
Another way to defer reclamation is to use one or more hazard pointers, which are pointers to locations that otherwise cannot appear in the list. Each hazard pointer represents an intermediate state of an in-progress change; the presence of the pointer assures further synchronization . Hazard pointers are lock-free, but can only track at most a fixed number of elements per thread as being in-use.
Yet another way to defer reclamation is to use read-copy update, which involves enclosing the update in an RCU read-side critical section and then waiting for an RCU grace period before freeing any removed data elements. Using RCU in this way guarantees that any data element removed cannot reappear until all currently executing operations have completed. RCU is lock-free, but isn't suitable for all workloads.
Some architectures provide "larger" atomic operations such that, as example, both forward and backward links in a doubly linked list can be updated atomically; while this feature is architecture-dependent, it, in particular, is available for x86/x64 architectures and IBM's z/Architecture.
Some architectures provide a load linked, store conditional instruction in which the store is performed only when there are no other stores of the indicated location. This effectively separates the notion of "storage contains value" from "storage has been changed". Examples include DEC Alpha, MIPS, PowerPC, RISC-V and ARM.