Extended Unix Code

Extended Unix Code is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.
The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character sets containing a maximum of 94 characters, or 8836 characters, or 830584 characters, as sequences of 7-bit codes. Only ISO-2022 compliant character sets can have EUC forms. Up to four coded character sets can be represented with the EUC scheme.
G0 is almost always an ISO-646 compliant coded character set such as US-ASCII, or that is invoked on GL. An exception from US-ASCII is that 0x5C is often used to represent a Yen sign in EUC-JP and a won sign in EUC-KR.
To get the EUC form of an ISO-2022 character, the most significant bit of each 7-bit byte of the original codes is set ; this allows software to easily distinguish whether a particular byte in a character string belongs to the ISO-646 code or the ISO-2022 code.
The most commonly used EUC codes are variable-width encodings with a character belonging to G0 taking one byte and a character belonging to G1 represented in two bytes. The EUC-CN form of and EUC-KR are examples of such two-byte EUC codes. EUC-JP includes characters represented by up to three bytes whereas a single character in EUC-TW can take up to four bytes.
Modern applications are more likely to use UTF-8, which supports all of the glyphs of the EUC codes, and more, and is generally more portable with fewer vendor deviations and errors. EUC is however still very popular, especially EUC-KR for South Korea.

EUC-CN

EUC-CN is the usual encoded form of the standard for simplified Chinese characters. Unlike the case of Japanese JIS X 0208 and ISO-2022-JP, is not normally used in a 7-bit code version, although a variant form called HZ was sometimes used on USENET.
An ASCII character is represented in its usual encoding. A character from is represented by two bytes, both from the range 0xA1–0xFE.

Related encoding systems

748 code

An encoding related to EUC-CN is the "748" code used in the WITS typesetting system developed by Beijing's Founder Technology. The 748 code contains all of, but is not -compliant and therefore not a true EUC code. The non-GB2312 portion of the 748 code contains traditional and Hong Kong characters and other glyphs used in newspaper typesetting.

GBK and GB 18030

is an extension to. It defines an extended form of the EUC-CN encoding capable of representing a larger array of CJK characters sourced largely from, including traditional Chinese characters and characters used only in Japanese. It is not, however, a true EUC code, because ASCII bytes may appear as trail bytes, due to a larger encoding space being required.
Variants of GBK are implemented by Windows code page 936, and by IBM's code page 1386.
The Unicode-based character encoding defines an extension of GBK capable of encoding the entirety of Unicode. However, Unicode encoded as is a variable-width encoding which may use up to four bytes per character, due to an even larger encoding space being required. Being an extension of GBK, it is a superset of EUC-CN but is not itself a true EUC code. Being a Unicode encoding, its repertoire is identical to that of other Unicode transformation formats such as UTF-8.

Mac OS Chinese Simplified

Other EUC-CN variants deviating from the EUC mechanism include the Mac OS Chinese Simplified script. It uses the bytes 0x80, 0x81, 0x82, 0xA0, 0xFD, 0xFE and 0xFF for the U with umlaut, two special font metric characters, the non-breaking space, the copyright sign, the trademark sign and the ellipsis respectively. This differs in what is regarded as a single-byte character versus the first byte of a two-byte character from both EUC and GBK.
This use of 0xA0, 0xFD, 0xFE and 0xFF matches Apple's Shift_JIS variant.

EUC-JP

EUC-JP is a variable-width encoding used to represent the elements of three Japanese character set standards, namely,, and. Other names for this encoding include Unixized JIS and AT&T JIS. 0.1% of all web pages use EUC-JP since August 2018, while 3.2% of Japanese web sites use this encoding. It is called Code page 954 by IBM. Microsoft has two code page numbers for this encoding.
This encoding scheme allows the easy mixing of 7-bit ASCII and 8-bit Japanese without the need for the escape characters employed by ISO-2022-JP, which is based on the same character set standards, and without ASCII bytes appearing as trail bytes.
A related and partially compatible encoding, called EUC-JISx0213 or EUC-JIS-2004, encodes and .
Compared to EUC-CN or EUC-KR, EUC-JP did not become as widely adopted on PC and Macintosh systems in Japan, which used or its extensions, although it became heavily used by Unix or Unix-like operating systems. Therefore, whether Japanese web sites use EUC-JP or Shift_JIS often depends on what OS the author uses.
Vendor extensions to EUC-JP were usually allocated within the individual code sets, as opposed to using invalid EUC sequences.
Characters are encoded as follows:

As an EUC/ISO 2022 compliant encoding, the C0 control characters, space and DEL are represented as in ASCII.
A graphical character from ASCII is represented as its usual one-byte representation, in the range 0x21 - 0x7E. While some variants of EUC-JP encode the lower half of here, most encode ASCII, including the W3C/WHATWG Encoding standard used by HTML5, and so does EUC-JIS-2004. While this means that 0x5C is typically mapped to Unicode as U+005C REVERSE SOLIDUS, U+005C may be displayed as a Yen sign by certain Japanese-locale fonts, e.g. on Microsoft Windows, for compatibility with the lower half of.
A character from JIS X 0208 is represented by two bytes, both in the range 0xA1 - 0xFE. This differs from the ISO-2022-JP representation by having the high bit set. This code set may also contain vendor extensions in some EUC-JP variants. In EUC-JIS-2004, the first plane of is encoded here, which is effectively a superset of standard.
A character from the upper half of is represented by two bytes, the first being 0x8E, the second being the usual representation in the range 0xA1 - 0xDF. This set may contain IBM vendor extensions in some variants.
A character from JIS X 0212 is represented in EUC-JP by three bytes, the first being 0x8F, the following two being in the range 0xA1-0xFE, i.e. with the high bit set. In addition to standard, code set 3 of some EUC-JP variants may also contain extensions in rows 83 and 84 to represent characters from IBM's Shift JIS extensions which lack standard JIS X 0212 mappings, which may be coded in either of two layouts, one defined by IBM themselves and one defined by the OSF. In EUC-JIS-2004, the second plane of is encoded here, which does not collide with the allocated rows in standard. Some implementations of EUC-JIS-2004, such as the one used by Python, allow both and plane 2 characters in this set.

EUC-KR

EUC-KR is a variable-width encoding to represent Korean text using two coded character sets, and either or US-ASCII, depending on variant. stipulates the encoding and dubbed it as EUC-KR.
A character drawn from KS X 1001 is encoded as two bytes in GR and a character from or US-ASCII takes one byte in GL.
When used with ASCII, it is called Code page 970 by IBM. It is known as Code page 51949 by Microsoft. It is usually referred to as Wansung in the Republic of Korea.
A common extension of EUC-KR is the Unified Hangul Code, which is the default Korean codepage on Microsoft Windows. The W3C/WHATWG Encoding Standard used by HTML5 incorporates the Unified Hangul Code extensions into its definition of EUC-KR. Other EUC-KR compatible extensions include the Mac OS Korean encoding, used by the classic Mac OS. IBM's code page 949 is yet another, unrelated, EUC-KR extension. Similarly to the EUC-CN extensions described above, these extensions do not conform to the EUC structure.
, 0.1% of all web pages globally use EUC-KR, which is misleading as 17.4% of South Korean web pages use, making it the most popular non-UTF-8/Unicode encoding for a language/web domain, while only 8.4% of web pages using Korean language . Including extensions, it is the most widely used legacy character encoding in Korea on all three major platforms, but its use has been very slowly shifting to UTF-8 as it gains popularity, especially on Linux and macOS.
As with most other encodings, UTF-8 is now preferred for new use, solving problems with consistency between platforms and vendors.

EUC-TW

EUC-TW is a variable-width encoding that supports US-ASCII and 16 planes of, each of which is 94x94. It is a rarely used encoding for traditional Chinese characters as used in Taiwan. Big5 is much more common.

As an EUC/ISO 2022 encoding, the C0 control characters, ASCII space and DEL are encoded as in ASCII.
A graphical character from US-ASCII is encoded in GL as its usual single byte representation.
A character from CNS 11643 plane 1 is encoded as two bytes in GR.
A character in plane 1 through 16 of CNS 11643 is encoded as four bytes:
* The first byte is always 0x8E.
* The second byte indicates the plane, the number of which is obtained by subtracting 0xA0 from that byte.
* The third and fourth bytes are in GR.

Note that the plane 1 of CNS 11643 is encoded twice as code set 1 and a part of code set 2.
UTF-8 is becoming more common than EUC-TW, as with most code pages.

Packed versus fixed-length form

The encodings described above are in a variable-width form referred to as the EUC packed format. This is the form usually labelled as EUC.
Internal processing may make use of a fixed-length alternative form called the EUC complete two-byte format. This represents:

Code set 0 as two bytes in the range 0x21–0x7E.
Code set 1 as two bytes in the range 0xA0–0xFF.
Code set 2 as a byte in the range 0x20–0x7E followed by a byte in the range 0xA0–0xFF.
Code set 3 as a byte in the range 0xA0–0xFF followed by a byte in the range 0x21–0x7E.

Initial bytes of 0x00 and 0x80 are used in cases where the code set uses only one byte. There is also a four-byte fixed-length format. These fixed-length forms are suited to internal processing and are not usually encountered in interchange.
EUC-JP is registered with the IANA in both formats, the packed format as "EUC-JP" or "csEUCPkdFmtJapanese" and the fixed width format as "csEUCFixWidJapanese". Only the packed format is included in the WHATWG Encoding Standard used by HTML5.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...