Memory ordering describes the order of accesses to computer memory by a CPU. The term can refer either to the memory ordering generated by the compiler during compile time, or to the memory ordering generated by a CPU during runtime. In modern microprocessors, memory ordering characterizes the CPUs ability to reorder memory operations – it is a type of out-of-order execution. Memory reordering can be used to fully utilize the bus-bandwidth of different types of memory such as caches and memory banks. On most modern uniprocessors memory operations are not executed in the order specified by the program code. In single threaded programs all operations appear to have been executed in the order specified, with all out-of-order execution hidden to the programmer – however in multi-threaded environments this can lead to problems. To avoid problems, memory barriers can be used in these cases.
Compile-time memory ordering
The compiler has some freedom to sort the order of operations during compile time. However this can lead to problems if the order of memory accesses is of importance.
In many programming languages different types of barriers can be combined with other operations, so no extra memory barrier is needed before or after it. Depending on a CPU architecture being targeted these language constructs will translate to either special instructions, to multiple instructions, or to normal instruction, depending on hardware memory ordering guarantees.
There can be incoherent instruction cache pipeline, which prevents self-modifying code from being executed without special instruction cache flush/reload instructions.
Dependent loads can be reordered. If the processor fetches a pointer to some data after this reordering, it might not fetch the data itself but use stale data which it has already cached and not yet invalidated. Allowing this relaxation makes cache hardware simpler and faster but leads to the requirement of memory barriers for readers and writers. On Alpha hardware cache line invalidations sent to other processors are processed in lazy fashion by default, unless requested explicitly to be processed between dependent loads. The Alpha architecture specification also allows other forms of dependent loads reordering, for example using speculative data reads ahead of knowing the real pointer to be dereferenced.
RISC-V memory ordering models: ; WMO: Weak memory order ; TSO: Total store order SPARC memory ordering modes: ; TSO: Total store order ; RMO: Relaxed-memory order ; PSO: Partial store order
Hardware memory barrier implementation
Many architectures with SMP support have special hardware instruction for flushing reads and writes during runtime.