CJK Unified Ideographs

The Chinese, Japanese and Korean scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common characters were identified and named CJK Unified Ideographs. As of Unicode 13.0, Unicode defines a total of 92,856 CJK Unified Ideographs.
The terms ideographs or ideograms may be misleading, since the Chinese script is not strictly a pictographic or ideographic system.
Historically, Vietnam used Chinese ideographs too, so sometimes the abbreviation "CJKV" is used. This system was replaced by the Latin-based Vietnamese alphabet in the 1920s.

CJK Unified Ideographs blocks

CJK Unified Ideographs

The basic block named CJK Unified Ideographs contains 20,989 basic Chinese characters in the range U+4E00 through U+9FFC. The block not only includes characters used in the Chinese writing system but also kanji used in the Japanese writing system and hanja, whose use is diminishing in Korea. Many characters in this block are used in all three writing systems, while others are in only one or two of the three. Chinese characters are also used in Vietnam's Nôm script. The first 20,902 characters in the block are arranged according to the Kangxi Dictionary ordering of radicals. In this system the characters written with the fewest strokes are listed first. The remaining characters were added later, and so are not in radical order.
The block is the result of Han unification, which was somewhat controversial in the Far East. Since Chinese, Japanese and Korean characters were coded in the same location, the appearance of a selected glyph could depend on the particular font being used. However, the source separation rule states that characters encoded separately in an earlier character set would remain separate in the new Unicode encoding.
Using variation selectors, it is possible to specify certain variant CJK ideograms within Unicode. The Adobe-Japan1 character set, which has 14,683 ideographic variation sequences, is an extreme example of the use of variation selectors.

Charts

6300-77FF,
7800-8CFF,
8D00-9FFF.

CJK Unified Ideographs Extension A

The block named CJK Unified Ideographs Extension A contains 6,592 additional characters in the range U+3400 through U+4DBF.

Charts

CJK Unified Ideographs Extension B

The block named CJK Unified Ideographs Extension B contains 42,718 characters in the range U+20000 through U+2A6DD. These include most of the characters used in the Kangxi Dictionary that are not in the basic CJK Unified Ideographs block, as well as many Nôm characters that were formerly used to write Vietnamese.

Charts

21600-230FF,
23100-245FF,
24600-260FF,
26100-275FF,
27600-290FF,
29100-2A6DF.

CJK Unified Ideographs Extension C

The block named CJK Unified Ideographs Extension C contains 4,149 characters in the range U+2A700 through U+2B734 that were added in Unicode 5.2.

Charts

CJK Unified Ideographs Extension D

The block named CJK Unified Ideographs Extension D contains 222 characters in the range U+2B740 through U+2B81D that were added in Unicode 6.0.

Charts

CJK Unified Ideographs Extension E

The block named CJK Unified Ideographs Extension E contains 5,762 characters in the range U+2B820 through U+2CEA1 that were added in Unicode 8.0.

Charts

CJK Unified Ideographs Extension F

The block named CJK Unified Ideographs Extension F contains 7,473 characters in the range U+2CEB0 through 2EBE0 that were added in Unicode 10.0. It includes more than 1,000 Sawndip characters for Zhuang.

Charts

CJK Unified Ideographs Extension G

A block named CJK Unified Ideographs Extension G was added as part of Unicode 13.0 to the Tertiary Ideographic Plane in the range U+30000 through U+3134F, containing 4,939 characters.

Charts

CJK Compatibility Ideographs

The block named CJK Compatibility Ideographs was created to retain round-trip compatibility with other standards.
Only twelve of its characters have the "Unified Ideograph" property: U+FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27, FA28 and FA29.
None of the other characters in this and other "Compatibility" blocks relate to CJK Unification.

Charts

Known issues

Disunification

U+4039

The character U+4039 was a unification of two different characters until Unicode 5.0. However, they were lexically different characters that should not have been unified; they have different pronunciations and different meanings.
The proposal of disunification of U+4039 was accepted and the new character is encoded at U+9FC3 in Unicode 5.1.

Other 3 glyphs in Extension B

In CJK Unified Ideographs Extension B, some characters are incorrectly unified with others. These characters include U+2017B, U+204AF and U+24CB2. The first two characters contained a wrong unification of Chinese Mainland and Vietnamese source of their glyph, while the last one unifies the Chinese Mainland and Taiwanese ones.

Unifiable variants and exact duplicates in Extension B

Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded. In addition to the deliberate encoding of close glyph variants, six exact duplicates and two semi-duplicates were encoded by mistake:

U+34A8 㒨 = U+20457 ? : U+20457 is the same as the China-source glyph for U+34A8, but it is significantly different from the Taiwan-source glyph for U+34A8
U+3DB7 㶷 = U+2420E ? : same glyph shapes
U+8641 虁 = U+27144 ? : U+27144 is the same as the Korean-source glyph for U+8641, but it is significantly different from the Chinese Mainland-, Taiwan- and Japan-source glyphs for U+8641
U+204F2 ? = U+23515 ? : same glyph shapes, but ordered under different radicals
U+249BC ? = U+249E9 ? : same glyph shapes
U+24BD2 ? = U+2A415 ? : same glyph shapes, but ordered under different radicals
U+26842 ? = U+26866 ? : same glyph shapes
U+FA23 﨣 = U+27EAF ? : same glyph shapes
Other CJK ideographs in Unicode, not Unified

Apart from the eight blocks of "Unified Ideographs," Unicode has about a dozen more blocks with not-unified CJK-characters. These are mainly CJK radicals, strokes, punctuation, marks, symbols and compatibility characters. Although some characters have their counterparts in other blocks, the usages can be different.
Four blocks of compatibility characters are included for compatibility with legacy text handling systems and older character sets:

CJK Compatibility
CJK Compatibility Forms
CJK Compatibility Ideographs
CJK Compatibility Ideographs Supplement

They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means. Therefore, their use is discouraged.
Usually, compatibility characters are those that would not have been encoded except for compatibility and round-trip convertibility with other standards. However, the amount of CJK ideographs within any non-Unicode standard is too big to fit into Unicode's CJK Compatibility Ideographs blocks. Instead, code points are assigned when the affected characters are approved by the Unicode Consortium, but have yet to assign any code points within the CJK Unified Ideographs blocks.

Font support

The blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A, being parts of the Basic Multilingual Plane, are supported by the majority of the CJK fonts. However, Japanese and Korean fonts usually have fewer characters than Chinese. Extensions B, C, D are supported by additional fonts MingLiU-ExtB, MingLiU_HKSCS-ExtB, PMingLiU-ExtB, SimSun-ExtB included in Microsoft Windows since Vista.

Unicode version history

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

CJK Unified Ideographs

CJK Unified Ideographs blocks

CJK Unified Ideographs

Charts

CJK Unified Ideographs Extension A

Charts

CJK Unified Ideographs Extension B

Charts

CJK Unified Ideographs Extension C

Charts

CJK Unified Ideographs Extension D

Charts

CJK Unified Ideographs Extension E

Charts

CJK Unified Ideographs Extension F

Charts

CJK Unified Ideographs Extension G

Charts

CJK Compatibility Ideographs

Charts

Known issues

Disunification

U+4039

Other 3 glyphs in Extension B

Unifiable variants and exact duplicates in Extension B

Other CJK ideographs in Unicode, not Unified

Font support

Unicode version history