Plane (Unicode)


In the Unicode standard, a plane is a continuous group of 65,536 code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position hexadecimal format. Plane 0 is the Basic Multilingual Plane, which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The very last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 13.0, seven of the planes have assigned code points, and five are named.
The limit of 17 planes is due to UTF-16, which can encode 220 code points as pairs of words, plus the BMP as a single word. UTF-8 was designed with a much larger limit of 231 code points, and can encode 221 code points even under the current limit of 4 bytes.
The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 are surrogates, 66 are non-characters, and 137,468 are reserved for private use, leaving 974,530 for public assignment.
Planes are further subdivided into Unicode blocks, which, unlike planes, do not have a fixed size. The 308 blocks defined in Unicode 13.0 cover 26% of the possible code point space, and range in size from a minimum of 16 code points to a maximum of 65,536 code points. For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.

Overview

PlaneAllocated code pointsAssigned characters
0 BMP65,47255,503
1 SMP24,70422,279
2 SIP60,91260,866
3 TIP4,9444,939
14 SSP368337
15 SPUA-A65,536
16 SPUA-B65,536
Totals287,472143,924

Basic Multilingual Plane

The first plane, plane 0, the Basic Multilingual Plane contains characters for almost all modern languages, and a large number of symbols. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing. Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean characters.
The High Surrogate and Low Surrogate codes are reserved for encoding non-BMP characters in UTF-16 by using a pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.
65,472 of the 65,536 code points in this plane have been allocated to a Unicode block, leaving just 64 code points in unallocated ranges.
, the BMP comprises the following 163 blocks:
Plane 1, the Supplementary Multilingual Plane, contains historic scripts, and symbols and notation used within certain fields. Scripts include Linear B, Egyptian hieroglyphs, and cuneiform scripts. It also includes English reform orthographies like Shavian and Deseret, and some modern scripts like Osage, Warang Citi, and Adlam. Symbols and notations include historic and modern musical notation; mathematical alphanumerics; shorthands; Emoji and other pictographic sets; and game symbols for playing cards, Mah Jongg, and dominoes.
, the SMP comprises the following 134 blocks:
Plane 2, the Supplementary Ideographic Plane, is used for CJK Ideographs, mostly CJK Unified Ideographs, that were not included in earlier character encoding standards.
, the SIP comprises the following six blocks:
Plane 3 is the Tertiary Ideographic Plane. CJK Unified Ideographs Extension G was added to the TIP in Unicode 13.0, released in March 2020.. It also is tentatively allocated for Oracle Bone script, Bronze Script, and Small Seal Script.
, the TIP comprises the following block:
Planes 4 to 13 : No characters have yet been assigned to Planes 4 through 13.

Supplementary Special-purpose Plane

Plane 14, the Supplementary Special-purpose Plane. comprising the following two blocks :
The two planes 15 and 16, are designated as "Private Use Areas". They contain blocks called Supplementary Private Use Area-A and -B, which are available for use by parties outside the ISO and the Unicode Consortium.