Half-width kana


Half-width kana are katakana characters displayed at half their normal width, instead of the usual square aspect ratio. For example, the usual form of the katakana ka is カ while the half-width form is カ. Half-width hiragana is not usable within Unicode, although it's usable on Web or E-books via CSS's font-feature-settings: "hwid" 1 with Adobe-Japan1-6 based OpenType fonts. Half-width kanji is not usable on modern computers even though it's used in some receipt printers, electric bulletin board or old computers.
Half-width kana were used in the early days of Japanese computing, to allow Japanese characters to be displayed on the same grid as monospaced fonts of Latin characters. Half-width kanji were not used. Half-width kana characters are not generally used today, but find some use in specific settings, such as cash register displays, on shop receipts, Japanese digital television and DVD subtitles, and mailing address labels. Their usage is sometimes also a stylistic choice, particularly frequent in certain Internet slang.
The term "half-width kana", which strictly refers only to how kana are displayed, not how they are stored – is also used loosely to refer to the A0–DF block where katakana are stored in some character [|encodings], such as JIS X 0201 – see encodings, below. This is formally incorrect, however – this JIS standard simply specifies that katakana be stored in these locations, without specifying how they should be displayed; the [|confusion] is because in early computing, the characters stored here were in fact displayed as half-width kana – see confusion, below.

History

Half-width kana and 2/3-width kana were used from pre-computer era. In the early computer era, ASCII is defined as a 7-bit character set and has room for 128 characters. However, since this standard was designed for the United States, it does not contain characters and symbols, such as the yen symbol needed to represent Japanese currency, nor did it include space for characters from other alphabets, such as kana or kanji – thus Japanese characters could not be encoded. Further, Japanese characters, both kana and kanji, are drawn on a square grid, while Latin characters are generally written more narrowly – thus Japanese characters could not be displayed either.
JIS X 0201 was developed in 1969, a time when computers were generally incapable, both by software design and hardware resources, of representing the thousands of Chinese-based kanji characters used in Japanese. As a compromise, this standard encoded katakana as a small set of characters, assigned in the upper byte value range of 0x80–0xFF. This allowed 8-bit processors to encode and process Japanese text phonetically, though without being able to process hiragana or kanji. These katakana characters were in turn displayed as "half-width kana" – a new, unorthodox, narrower form factor to fit the same width as the monospaced Latin alphabets machines were capable of printing and displaying. Encoding-wise, JIS X 0201 is a variant extension of ASCII – it includes additional characters, and does not exactly agree with ASCII on the overlapping part.Half-width kana were developed as "... the first Japanese characters encoded on computers because they are used for Japanese telegrams."
The Nationwide Banking Data Communication System, the largest funds transfer system in Japan, was established in 1973. Transaction messages between banks could only use latin, numbers, and half-width katakana within 20 characters. The system is superseded by ZEDI in 2018, which can handle hiragana and kanji with variable length characters.
To make katakana fit into the narrower cell area allowed, some compromises were made. For example, the diacritical marks dakuten and handakuten are treated as separate characters instead of being part of the preceding character. This compromise led many to consider "half-width kana" visually unattractive, and causes problems for many computer programs today.
Another use of half-width kana is to save space. The Japanese version of Windows 95 used half-width katakana of MS P Gothic in its user interface. It was replaced by full-width kana of MS UI Gothic, little narrower than MS P Gothic.

Encoding

In the JIS X 0201 specification, katakana are encoded in A0–DF block – how they are displayed is not specified, and there is no separate encoding of full-width and half-width kana. In JIS X 0208, katakana, hiragana, and kanji are all encoded, though the ordering of the kana is different – see JIS X 0208#Hiragana and katakana.
In Shift JIS, which combines JIS X 0201 and JIS X 0208, these encodings are stored separately, with JIS X 0201 all being displayed as half-width, while JIS X 0208 are all displayed as full-width. Thus in Shift JIS, Latin characters and katakana have two encodings with two separate display forms, both half-width and full-width.
In Unicode, katakana and hiragana are primarily used as normal, full-width characters ; a separate block, the Halfwidth and Fullwidth Forms block is used to store variant characters, including half-width kana and full-width Latin characters.
Thus, the katakana in JIS X 0201 and the corresponding part of derived encodings are displayed as half-width, while in Unicode half-width forms are specified separately.

Half-width table

"J" indicates the first four bits in JIS X 0201 and in other sets such as Shift JIS, "U" indicates the row in Unicode in the Halfwidth and Fullwidth Forms block.
JU0123456789ABCDEF
AFF6
BFF7ソ
CFF8
DFF9

Please note that the blank first cell represents a non-existent character in JIS, A0; but a fullwidth double parenthesis ⦆ in Unicode, U+FF60.

Half-width kana on the Internet

E-mail

Since the SMTP and NNTP protocols were formerly only able to transmit 7-bits, it was then the convention to use ISO-2022-JP for sending e-mail in Japanese.
Half-width kana is not contained in ISO-2022-JP: it includes the Roman set of JIS X 0201, and all of JIS X 0208, but not the katakana set of JIS X 0201. Both sets of JIS X 0201 have ISO 2022 codes, but the ISO-2022-JP profile only includes the Roman set: this means that the format for including half-width katakana in ISO-2022-JP is both well-defined and a violation of the ISO-2022-JP format. For this reason, if half-width kana were accidentally included in a message, it could become garbled during transmission. The WHATWG encoding standard used by HTML5 permits decoding, but not encoding, of JIS X 0201 katakana in ISO-2022-JP as an extension to the format, and converts half-width katakana to their JIS X 0208 equivalents upon encoding.
This is no longer such a problem since most e-mail servers today use ESMTP, and hence 8-bit characters are acceptable. Alternatively, an encoding system such as Base64 can be used and specified in the message using MIME.

Web pages

The problem that exists in e-mail does not exist with Web pages since HTTP accepts 8-bit characters.
However, one problem that does exist is that computer programs have difficulties determining whether to treat a character as Shift JIS, EUC-JP, or UTF-8 – hence character code information should be specified with a HTTP response header or a Meta tag.

Confusion

Strictly speaking, JIS X 0201 encoding as "half-width katakana" is incorrect, as the standard does not define character widths – it defines only the code representation of katakana characters. In the JIS X 0201 standard, katakana characters are printed in normal width, not half-width.
Half-width characters were only used for display during the period when characters were displayed at half-width, before full-width character displays became widespread. However, in the Shift JIS standard, which combines the JIS X 0201 standard and the JIS X 0208 standard, katakana and Latin characters are encoded twice, both in JIS X 0201 and JIS 0208, but displayed as half-width or full-width according to which section they are in – thus the 0201 katakana block can be thought of as corresponding to "half-width kana", and the misunderstanding that the 0201 standard defines "half-width" characters is widespread.
Further, though JIS X 0201 is a single-byte encoding and JIS X 0208 is a double-byte encoding, there is no connection between number of bytes and width – for example, Unicode can be encoded with four bytes to display both full-width and single-width characters.

In popular culture

The half-width kana characters appear to the public in the Matrix trilogy, directed by the Wachowskis. The 'falling code' of the three films is composed of half-width kana characters and Latin numerals.