Bit Manipulation Instruction Sets


Bit Manipulation Instructions Sets are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD. The purpose of these instruction sets is to improve the speed of bit manipulation. All the instructions in these sets are non-SIMD and operate only on general-purpose registers.
There are two sets published by Intel: BMI and BMI2; they were both introduced with the Haswell microarchitecture. Another two sets were published by AMD: ABM, and TBM.

ABM (Advanced Bit Manipulation)

ABM is only implemented as a single instruction set by AMD; all AMD processors support both instructions or neither. Intel considers POPCNT as part of SSE4.2, and LZCNT as part of BMI1. POPCNT has a separate CPUID flag; however, Intel uses AMD's ABM flag to indicate LZCNT support.
InstructionDescription
POPCNTPopulation count
LZCNTLeading zeros count

LZCNT is related to the Bit Scan Reverse instruction, but sets the ZF and CF flags rather than OF, and produces a defined result if the source operand is zero. For a non-zero argument, sum of LZCNT and BSR results is argument bit width minus 1.

BMI1 (Bit Manipulation Instruction Set 1)

The instructions below are those enabled by the BMI bit in CPUID. Intel officially considers LZCNT as part of BMI, but advertises LZCNT support using the ABM CPUID feature flag. BMI1 is available in AMD's Jaguar, Piledriver and newer processors, and in Intel's Haswell and newer processors.
InstructionDescriptionEquivalent C expression
ANDNLogical and not~x & y
BEXTRBit field extract &
BLSIExtract lowest set isolated bitx & -x
BLSMSKGet mask up to lowest set bitx ^
BLSRReset lowest set bitx &
TZCNTCount the number of trailing zero bits

TZCNT is almost identical to the Bit Scan Forward instruction, but sets the ZF and CF flags rather than OF. For a non-zero argument, result of TZCNT and BSF is equal.

BMI2 (Bit Manipulation Instruction Set 2)

Intel introduced BMI2 together with BMI1 in its line of Haswell processors. Only AMD has produced processors supporting BMI1 without BMI2; BMI2 is supported by AMDs Excavator architecture and newer.
InstructionDescription
BZHIZero high bits starting with specified bit position ;
MULXUnsigned multiply without affecting flags, and arbitrary destination registers
PDEPParallel bits deposit
PEXTParallel bits extract
RORXRotate right logical without affecting flags
SARXShift arithmetic right without affecting flags
SHRXShift logical right without affecting flags
SHLXShift logical left without affecting flags

Parallel bit deposit and extract

The PDEP and PEXT instructions are new generalized bit-level compress and expand instructions. They take two inputs; one is a source, and the other is a selector. The selector is a bitmap selecting the bits that are to be packed or unpacked. PEXT copies selected bits from the source to contiguous low-order bits of the destination; higher-order destination bits are cleared. PDEP does the opposite for the selected bits: contiguous low-order bits are copied to selected bits of the destination; other destination bits are cleared. This can be used to extract any bitfield of the input, and even do a lot of bit-level shuffling that previously would have been expensive. While what these instructions do is similar to bit level gather-scatter SIMD instructions, PDEP and PEXT instructions operate on general-purpose registers.
The instructions are available in 32-bit and 64-bit versions. An example using arbitrary source and selector in 32-bit mode is:
InstructionSelector maskSourceDestination
PEXT0xff00fff00x123456780x00012567
PDEP0xff00fff00x000125670x12005670

TBM (Trailing Bit Manipulation)

TBM consists of instructions complementary to the instruction set started by BMI1; their complementary nature means they do not necessarily need to be used directly but can be generated by an optimizing compiler when supported. AMD introduced TBM together with BMI1 in its Piledriver line of processors; later AMD Jaguar and Zen-based processors do not support TBM. No Intel processors support TBM.
InstructionDescriptionEquivalent C expression
BEXTRBit field extract &
BLCFILLFill from lowest clear bitx &
BLCIIsolate lowest clear bitx | ~
BLCICIsolate lowest clear bit and complement~x &
BLCMSKMask from lowest clear bitx ^
BLCSSet lowest clear bitx |
BLSFILLFill from lowest set bitx |
BLSICIsolate lowest set bit and complement~x |
T1MSKCInverse mask from trailing ones~x |
TZMSKMask from trailing zeros~x &

Supporting CPUs

Note that instruction extension support means the processor is capable of executing the supported instructions for software compatibility purposes. The processor might not perform well doing so. For example Zen, Zen+ and Zen 2 processors implement PEXT and PDEP instructions using microcode resulting in the instructions executing significantly slower than the same behaviour recreated using other instructions. For optimum performance it is recommended that compiler developers choose to use individual instructions in the extensions based on architecture specific performance profiles rather than on extension availability.