Integer literal


In computer science, an integer literal is a kind of literal for an integer whose value is directly represented in source code. For example, in the assignment statement x = 1, the string 1 is an integer literal indicating the value 1, while in the statement x = 0x10 the string 0x10 is an integer literal indicating the value 16, which is represented by 10 in hexadecimal.
By contrast, in x = cos, the expression cos evaluates to 1, but the value 1 is not literally included in the source code. More simply, in x = 2 + 2, the expression 2 + 2 evaluates to 4, but the value 4 is not literally included. Further, in x = "1" the "1" is a string literal, not an integer literal, because it is in quotes. The value of the string is 1, which happens to be an integer string, but this is semantic analysis of the string literal – at the syntactic level "1" is simply a string, no different from "foo".

Parsing

Recognizing a string as an integer literal is part of the lexical analysis phase, while evaluating the literal to its value is part of the semantic analysis phase. Within the lexer and phrase grammar, the token class is often denoted integer, with the lowercase indicating a lexical-level token class, as opposed to phrase-level production rule. Once a string has been lexed as an integer literal, its value cannot be determined syntactically, and evaluation of its value becomes a semantic question.
Integer literals are generally lexed with regular expressions, as in Python.

Evaluation

As with other literals, integer literals are generally evaluated at compile time, as part of the semantic analysis phase. In some cases this semantic analysis is done in the lexer, immediately on recognition of an integer literal, while in other cases this is deferred until the parsing stage, or until after the parse tree has been completely constructed. For example, on recognizing the string 0x10 the lexer could immediately evaluate this to 16 and store that, or defer evaluation and instead record a token of type integer and value 0x10.
Once literals have been evaluated, further semantic analysis in the form of constant folding is possible, meaning that literal expressions involving literal values can be evaluated at the compile phase. For example, in the statement x = 2 + 2 after the literals have been evaluated and the expression 2 + 2 has been parsed, it can then be evaluated to 4, though the value 4 does not itself appear as a literal.

Affixes

Integer literals frequently have prefixes indicating base, and less frequently suffixes indicating type. For example, in C++ 0x10ULL indicates the value 16 as an unsigned long long integer.
Common prefixes include:
Common suffixes include:
These affixes are somewhat similar to sigils, though sigils attach to identifiers, not literals.

Digit separators

In some languages, integer literals may contain digit separators to allow digit grouping into more legible forms. If this is available, it can usually be done for floating point literals as well. This is particularly useful for bit fields, and makes it easier to see the size of large numbers at a glance by subitizing rather than counting digits. It is also useful for numbers that are typically grouped, such as credit card number or social security numbers. Very long numbers can be further grouped by doubling up separators.
Typically decimal numbers are grouped in three digit groups, binary numbers in four digit groups, and hexadecimal numbers in two digit groups. Numbers from other systems are grouped following whatever convention is in use.

Examples

In Ada, C#, D, Eiffel, Haskell, Java, Julia, Perl, Python, Ruby, Rust and Swift, integer literals and float literals can be separated with an underscore. There can be some restrictions on placement; for example, in Java they cannot appear at the start or end of the literal, nor next to a decimal point. Note that while the period, comma, and spaces are used in normal writing for digit separation, these conflict with their existing use in programming languages as radix point, list separator, and token separator.
Examples include:

int oneMillion = 1_000_000;
int creditCardNumber = 1234_5678_9012_3456;
int socialSecurityNumber = 123_45_6789;

In C++14, the apostrophe character may be used to separate digits arbitrarily in numeric literals. The underscore was initially proposed, with an initial proposal in 1993, and again for C++11, following other languages. However, this caused conflict with user-defined literals, so the apostrophe was proposed instead, as an "upper comma".

auto integer_literal = 1'000'000;
auto binary_literal = 0b0100'1100'0110;
auto very_long_binary_literal =
0b0000'0001'0010'0011''0100'0101'0110'0111;