GB 2312


is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. GB refers to the Guobiao standards, whereas the T suffix denotes a non-mandatory standard.
was originally a mandatory national standard designated. However, following a National Standard Bulletin of the People's Republic of China in 2017, GB 2312 is no longer mandatory, and its standard code is modified to. has been superseded by GBK and GB18030, which include additional characters, but remains in widespread use as a subset of those encodings.
While GB/T 2312 covers over 99.99% contemporary Chinese text usage, historical texts and many names remain out of scope. Old standard includes 6,763 Chinese characters, along with symbols and punctuation, Japanese kana, the Greek and Cyrillic alphabets, Zhuyin, and a double-byte set of Pinyin letters with tone marks. In later version GB/T 2312-1980, there are 7,445 letters.
, GB2312 is the most popular Chinese encoding, with 13.6% of web pages served from China and territories declaring it, or 0.4% of all web pages globally, a drop from 3.5% in January 2010. However, note that all major web browsers decode documents marked as e.g. "GB2312" or "" as if it were marked "gbk", which is a superset encoding and and GBK have a combined 16.7% share.
There is an analogous character set known as GB/T 12345, closely related to GB/T 2312, but with traditional character forms replacing simplified forms, and some extra 62 supplemental characters. GB-encoded fonts often come in pairs, one with the GB/T 2312 character set and the other with the GB/T 12345 character set.

Characters

Characters in GB/T 2312 are arranged in a 94x94 grid, and the two-byte code point of each character is expressed in the kuten form, which specifies a row and the position of the character within the row.
The rows contain characters as follows:
The rows 10–15 and 90–94 are unassigned.
For GB/T 2312-1980, it contains 682 signs and 6763 Chinese Characters.

Encodings of GB/T 2312

EUC-CN

is often used as the character encoding in programs that deal with GB/T 2312, thus maintaining compatibility with ASCII. Two bytes are used to represent every character not found in ASCII. The value of the first byte is from 0xA1–0xF7, while the value of the second byte is from 0xA1–0xFE. Since all of these ranges are beyond ASCII, like UTF-8, it is possible to check if a byte is part of a multi-byte construct when using EUC-CN, but not if a byte is first or last.
Compared to UTF-8, GB2312 is more storage efficient: while UTF-8 uses three bytes per CJK ideograph, GB2312 only uses two. However, GB2312 does not cover as many ideographs as Unicode does.
To map the kuten code points to bytes, add 160 to the row number of the code point to form the high byte, and add 160 to the column number of the code point to form the low byte.
For example, if you have the GB/T 2312 code point 4566, the high byte will use the row number 45: 45+160=205=0xCD, and the low byte will come from the column, 66: 66+160=212=0xE2. So, the full encoding is 0xCDE2.

HZ

is another encoding of GB 2312 that is used mostly for Usenet postings.

Two implementations of GB/T 2312

There are two implementations of GB/T 2312 which differ in few code points.
EUC-CNGBK/GB18030 subsetGB2312.TXTCharacter name
A1A4
A1AA

The GBK/GB18030 subset is compatible with both GBK and GB18030; GB2312.TXT is the then-official implementation from ftp.unicode.org, which has been obsolete since August 2011 and missing as of September 2016. Even more vendor mappings existed.
As of 2015, Microsoft.Net Framework is using the subset. ICU, iconv-1.14, php-5.6, ActivePerl-5.20, Java 1.7 and Python 3.4 are using GB2312.TXT. Ruby 2.2 is compatible with both implementations; it internally converts the conflictive characters to the subset. W3C's technical recommendation specifies a GBK encoding to be inferred for streams labelled gb2312, which in turn uses a GB18030 decoder.