C character classification


C character classification is an operation provided by a group of functions in the ANSI C Standard Library for the C programming language. These functions are used to test characters for membership in a particular class of characters, such as alphabetic characters, control characters, etc. Both single-byte, and wide characters are supported.

History

Early C-language programmers working on the Unix operating system developed programming idioms for classifying characters into different types. For example, for the ASCII character set, the following expression identifies a letter, when its value is true:
||
As this may be expressed in multiple formulations, it became desirable to introduce short, standardized forms of such tests that were placed in the system-wide header file ctype.h.

Implementation

Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.
For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written as
#define isdigit
Early versions of Linux used a potentially faulty method similar to the first code sample:
#define isdigit >= '0' &&
This can cause problems if the variable x has a side effect. For example, if one calls isdigit or isdigit). It is not immediately evident that the argument to isdigit is evaluated twice. For this reason, the table-based approach is generally used.

Overview of functions

The functions that operate on single-byte characters are defined in ctype.h header file.
The functions that operate on wide characters are defined in wctype.h header file.
The classification is evaluated according to the effective locale.
Byte
character
Wide
character
Description
checks whether the operand is alphanumeric
checks whether the operand is alphabetic
checks whether the operand is lowercase
checks whether the operand is an uppercase
checks whether the operand is a digit
checks whether the operand is hexadecimal
checks whether the operand is a control character
checks whether the operand is a graphical character
checks whether the operand is space
checks whether the operand is a blank space character
checks whether the operand is a printable character
checks whether the operand is punctuation
converts the operand to lowercase
converts the operand to uppercase
checks whether the operand falls into specific class
converts the operand using a specific mapping
returns a wide character class to be used with iswctype
returns a transformation mapping to be used with towctrans