KS X 1001

KS X 1001, "Code for Information Interchange ", formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer.
KS X 1001 is encoded by the most common legacy character encodings for Korean, including EUC-KR and Microsoft's Unified Hangul Code. It contains Korean Hangul syllables, CJK ideographs, Greek, Cyrillic, Japanese and some other characters.
KS X 1001 is arranged as a 94×94 table, following the structure of 2-byte code words in ISO 2022 and EUC. Therefore, its code points are pairs of integers 1–94. However, some encodings, in addition to providing codes for every code point, provide additional codes for characters otherwise representable only as code point sequences.

History

This standard was previously known as KS C 5601. There have been several revisions of this standard. For example, there were revisions in 1987, 1992, 1998 and 2002.
The present, double-byte, Wansung character set was standardised by the third edition of KS C 5601, which was published in 1986. It is an ISO 2022 compatible encoding, typically used in EUC form, which assigns double-byte codes for non-Hangul, Hangul jamo, and the most common Hangul syllables, in contrast to Johab which assigns double-byte codes to all Hangul syllables using modern jamo. Wansung is technically a variable-length encoding, allowing other syllables to be represented with eight-byte sequences, but this feature is not always implemented.
The earliest edition of KS C 5601, published in 1974, defined a variable-length 7-bit character set which assigned single-byte code points to 51 basic Hangul jamo, somewhat analogously to JIS C 6220, in an encoding known as "N-byte Hangul". The second edition, published in 1982, retained the main character set from the 1974 edition but defined two supplementary sets, including Johab. Neither edition was adopted as widely as intended.
Wansung was kept unchanged in the 1987 and 1992 editions. In the 1992 edition, additional annex material was added, including the definition of the Johab encoding in annex 3, and the older N-byte Hangul encoding in annex 4. It was published in response to industry use of Johab as a competing encoding to Wansung, being used at the time by Hangul Word Processor. Following the introduction of Unified Hangul Code by Microsoft in Windows 95, and Hangul Word Processor abandoning Johab in favour of Unicode in 2000, Johab ceased to be commonly used.

Encodings

Encoding schemes of KS X 1001 include EUC-KR and ISO-2022-KR, as well as ISO-2022-JP-2. These all have the drawback that they only assign codes for the 2350 precomposed Hangul syllables which have their own KS X 1001 codepoints, and require others to use eight-byte composition sequences, which are not supported by some partial implementations of the standard.
The Johab encoding and the EUC-KR superset known as Unified Hangul Code provide single codes for all 11172 Hangul syllables. ISO-2022-KR and Johab are rarely used. Some operating systems extend this standard in other non-uniform ways, e.g. the EUC-KR extensions MacKorean on the classic Mac OS, and IBM-949 by IBM.

Hangul Filler

The Hangul Filler character is used to introduce eight-byte Hangul composition sequences and to stand in for an absent element in such a sequence.
Unicode includes the Wansung code Hangul Filler in the Hangul Compatibility Jamo block for round-trip compatibility, but uses its own system for composing Hangul. The KS X 1001 Hangul composition system is not used in Unicode, and the filler renders merely as an empty space; KS X 1001 composition sequences using modern jamo may be mapped to precomposed characters in Unicode. This is not usually done with Unified Hangul Code.
For round-trip compatibility, Unicode also includes the N-byte Hangul code Hangul Filler separately in the Halfwidth and Fullwidth Forms block, named the "Halfwidth Hangul Filler".

N-byte Hangul code

This is the N-byte Hangul code, as specified by KS C 5601-1974 and by annex 4 of KS C 5601-1992. The second half of IBM's Code page 1040 is a superset of this, assigning the characters ¢¬\~ to the same locations as in Code page 1041. Character 0x40/0xC0 is a Hangul Filler, used in combining sequences.
Similarly to its Japanese counterpart JIS C 6220, N-byte Hangul code could be used as a 7-bit encoding, with character allocations over the range 0x40 through 0x7C. The chart below shows the code in an 8-bit environment with the high bit set, as it is used in e.g. code page 1040.

Wansung code charts

Following are the code charts for KS X 1001 in Wansung layout. Where a pair of hexadecimal numbers is given, the smaller is used when encoded over GL, as in ISO-2022-KR when the Korean set has been shifted to, and the larger is used in the more typical case of it being encoded over GR, as in EUC-KR or UHC. Johab changes the arrangement to encode all 11172 Hangul clusters separately and in order.

Non-Hanja non-precomposed sets

Character set 0x21 / 0xA1 (row number 1, special characters)

This set contains punctuation and other symbols, excluding punctuation present in [|KS X 1003]. Encodings which combine KS X 1001 with single-byte ASCII may use alternative Unicode mapping to the Halfwidth and Fullwidth Forms block for the backslash. Unicode mapping of the wave dash also differs between vendors, and may be U+301C or U+223C. Compare the similar but not identical handling of the JIS wave dash, and the handling of the tilde in the next row.
Except for the backslash, if two mappings are shown below, the first is used by Apple and the second is used by Microsoft.

Character set 0x22 / 0xA2 (row number 2, special characters)

This set contains additional punctuation and symbols. Similarly to the tilde character in the previous row, different mappings are used by Apple and Microsoft for the tilde character in this row, which is intended to be shown as a raised tilde, whereas the tilde in the previous row is intended to be shown in-line at dash height. Mapping of the circled dot also differs.
The euro and registered trademark sign were added in 1998, while the postal mark was added in 2002.

Character set 0x23 / 0xA3 (row number 3, basic Latin / ISO 646-KR)

This set corresponds to KS X 1003, but as two-byte codes preceded by 0x23. It includes the English alphabet / Basic Latin alphabet, western Arabic numerals and punctuation.
Compare the Roman set of JIS X 0201, which differs by including a Yen sign rather than a Won sign. Contrast the third rows of KPS 9566 and of JIS X 0208, which follow the ISO 646 layout but only include letters and digits.
Encodings such as EUC-KR and UHC combine KS X 1001 with single-byte ASCII or KS X 1003, and hence use alternative Unicode mappings to the Halfwidth and Fullwidth Forms block for the double-byte representations of these characters.

Character set 0x24 / 0xA4 (row number 4, Hangul jamo)

This set includes modern Hangul consonants, followed by vowels, both ordered by South Korean collation customs, followed by obsolete consonants. When used individually, these characters map to the Unicode Hangul Compatibility Jamo block, and do not have a one-to-one mapping with the position-specific characters in the Hangul Jamo block. Compare with row 4 of the North Korean KPS 9566. Character 04-52 is a Hangul Filler, used in combining sequences.

Character set 0x25 / 0xA5 (row number 5, Roman numerals and Greek)

This set contains Roman numerals and basic support for the Greek alphabet, without diacritics or the final sigma.
Contrast row 6 of KPS 9566, which includes the same characters but in a different layout.

Character set 0x26 / 0xA6 (row number 6, box drawing)

Character set 0x27 / 0xA7 (row number 7, unit symbols)

Character set 0x28 / 0xA8 (row number 8, extended Latin, encircled, fractions)

Character set 0x29 / 0xA9 (row number 9, extended Latin, encircled, superscript and subscript)

Character set 0x2A / 0xAA (row number 10, Hiragana)

This set contains Hiragana for writing the Japanese language.
Compare row 10 of KPS 9566, which uses the same layout. Compare and contrast row 4 of JIS X 0208, which also uses the same layout, but in a different row.

Character set 0x2B / 0xAB (row number 11, Katakana)

This set contains Katakana for writing the Japanese language.
Compare row 11 of KPS 9566, which uses the same layout. Compare and contrast row 5 of JIS X 0208, which also uses the same layout, but in a different row.

Character set 0x2C / 0xAC (row number 12, Cyrillic)

This set contains the modern Russian alphabet, and is not necessarily sufficient to represent other forms of the Cyrillic script.
Compare row 5 of KPS 9566 and row 7 of JIS X 0208, which use the same layout.

Pre-composed Hangul sets (rows number 16 through 40)

Code points for pre-composed Hangul are included in a continuous sorted block between code points 16-01 and 40-94 inclusive. Not all possible syllable clusters are included in this range; the chart below indicates, for each initial+vowel pair, which initial+vowel+final syllable clusters are assigned code points. Vowels, initials and finals are displayed in KS sorting order. The "ø" character is used here to denote the empty final. Compare the different ordering and availability in KPS 9566.
Those which are not listed here may be represented using eight-byte composition sequences. All other modern-jamo clusters are assigned codes elsewhere by UHC. All possible modern-jamo clusters are assigned codes by Johab.

Hanja sets

Johab encoding

KS X 1001, since 1992, also defines an alternative encoding known as Johab. This represents a hangul syllable as the sequence of three five-bit values, split across two 8-bit bytes, most significant bit first. The most significant bit of the lead byte is always set. This encoding is also used for the modern jamo from [|row 4 of KS X 1001], by using the filler values for the other components. The Johab encoding for hangul is shown in the table below.
Johab encodes the remainder of KS X 1001 using lead bytes which do not correspond to an initial jamo, with trail bytes in the ranges 0x31–0x7E and 0x91–0xFE. These codes are algorithmically mapped from the characters' KS X 1001 code points, with two KS X 1001 rows per lead byte.

Five-bit sequence	As initial	As vowel	As final
00000	Not used	Not used	Not used
00001	Filler	Not used	Filler
00010	ㄱ	Filler	ㄱ
00011	ㄲ	ㅏ	ㄲ
00100	ㄴ	ㅐ	ㄳ
00101	ㄷ	ㅑ	ㄴ
00110	ㄸ	ㅒ	ㄵ
00111	ㄹ	ㅓ	ㄶ
01000	ㅁ	Not used	ㄷ
01001	ㅂ	Not used	ㄹ
01010	ㅃ	ㅔ	ㄺ
01011	ㅅ	ㅕ	ㄻ
01100	ㅆ	ㅖ	ㄼ
01101	ㅇ	ㅗ	ㄽ
01110	ㅈ	ㅘ	ㄾ
01111	ㅉ	ㅙ	ㄿ
10000	ㅊ	Not used	ㅀ
10001	ㅋ	Not used	ㅁ
10010	ㅌ	ㅚ	Not used
10011	ㅍ	ㅛ	ㅍ
10100	ㅎ	ㅜ	ㅄ
10101	Not used	ㅝ	ㅅ
10110	Non-Hangul lead bytes	ㅞ	ㅆ
10111	Non-Hangul lead bytes	ㅟ	ㅇ
11000	Non-Hangul lead bytes	Not used	ㅈ
11001	Non-Hangul lead bytes	Not used	ㅊ
11010	Non-Hangul lead bytes	ㅠ	ㅋ
11011	Non-Hangul lead bytes	ㅡ	ㅌ
11100	Non-Hangul lead bytes	ㅢ	ㅍ
11101	Non-Hangul lead bytes	ㅣ	ㅎ
11110	Non-Hangul lead bytes	Not used	Not used
11111	Not used	Not used	Not used

Footnotes

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...