IEEE 754-1985

IEEE 754-1985 was an industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the most widely used format for floating-point computation. It was implemented in software, in the form of floating-point libraries, and in hardware, in the instructions of many CPUs and FPUs. The first integrated circuit to implement the draft of what was to become IEEE 754-1985 was the Intel 8087.
IEEE 754-1985 represents numbers in binary, providing definitions for four levels of precision, of which the two most commonly used are:

Level	Width	Range at full precision	Precision
Single precision	32 bits	±1.18 to ±3.4	Approximately 7 decimal digits
Double precision	64 bits	±2.23 to ±1.80	Approximately 16 decimal digits

The standard also defines representations for positive and negative infinity, a "negative zero", five exceptions to handle invalid results like division by zero, special values called NaNs for representing those exceptions, denormal numbers to represent numbers smaller than shown above, and four rounding modes.

Representation of numbers

Floating-point numbers in IEEE 754 format consist of three fields: a sign bit, a biased exponent, and a fraction. The following example illustrates the meaning of each.
The decimal number 0.15625₁₀ represented in binary is 0.00101₂. Analogous to scientific notation, where numbers are written to have a single non-zero digit to the left of the decimal point, we rewrite this number so it has a single 1 bit to the left of the "binary point". We simply multiply by the appropriate power of 2 to compensate for shifting the bits left by three positions:
Now we can read off the fraction and the exponent: the fraction is.01₂ and the exponent is −3.
As illustrated in the pictures, the three fields in the IEEE 754 representation of this number are:
IEEE 754 adds a bias to the exponent so that numbers can in many cases be compared conveniently by the same hardware that compares signed 2's-complement integers. Using a biased exponent, the lesser of two positive floating-point numbers will come out "less than" the greater following the same ordering as for sign and magnitude integers. If two floating-point numbers have different signs, the sign-and-magnitude comparison also works with biased exponents. However, if both biased-exponent floating-point numbers are negative, then the ordering must be reversed. If the exponent were represented as, say, a 2's-complement number, comparison to see which of two numbers is greater would not be as convenient.
The leading 1 bit is omitted since all numbers except zero start with a leading 1; the leading 1 is implicit and doesn't actually need to be stored which gives an extra bit of precision for "free."

Zero

The number zero is represented specially:

Denormalized numbers

The number representations described above are called normalized, meaning that the implicit leading binary digit is a 1. To reduce the loss of precision when an underflow occurs, IEEE 754 includes the ability to represent fractions smaller than are possible in the normalized representation, by making the implicit leading digit a 0. Such numbers are called denormal. They don't include as many significant digits as a normalized number, but they enable a gradual loss of precision when the result of an arithmetic operation is not exactly zero but is too close to zero to be represented by a normalized number.
A denormal number is represented with a biased exponent of all 0 bits, which represents an exponent of −126 in single precision, or −1022 in double precision. In contrast, the smallest biased exponent representing a normal number is 1.

Representation of non-numbers

The biased-exponent field is filled with all 1 bits to indicate either infinity or an invalid result of a computation.

Positive and negative infinity

are represented thus:

NaN

Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:

Range and precision

Precision is defined as the minimum difference between two successive mantissa representations; thus it is a function only in the mantissa; while the gap is defined as the difference between two successive numbers.

Single precision

numbers occupy 32 bits. In single precision:

The positive and negative numbers closest to zero are
: ±2⁻²³ × 2⁻¹²⁶ ≈ ±1.40130
The positive and negative normalized numbers closest to zero are
: ±1 × 2⁻¹²⁶ ≈ ±1.17549
The finite positive and finite negative numbers furthest from zero are
: ± × 2¹²⁷ ≈ ±3.40282

Some example range and gap values for given exponents in single precision:

Actual Exponent	Exp	Minimum	Maximum	Gap
−1	126	0.5	≈ 0.999999940395	≈ 5.96046e-8
0	127	1	≈ 1.999999880791	≈ 1.19209e-7
1	128	2	≈ 3.999999761581	≈ 2.38419e-7
2	129	4	≈ 7.999999523163	≈ 4.76837e-7
10	137	1024	≈ 2047.999877930	≈ 1.22070e-4
11	138	2048	≈ 4095.999755859	≈ 2.44141e-4
23	150	8388608	16777215	1
24	151	16777216	33554430	2
127	254	≈ 1.70141e38	≈ 3.40282e38	≈ 2.02824e31

As an example, 16,777,217 cannot be encoded as a 32-bit float as it will be rounded to 16,777,216. This shows why floating point arithmetic is unsuitable for accounting software. However, all integers within the representable range that are a power of 2 can be stored in a 32-bit float without rounding.

Double precision

numbers occupy 64 bits. In double precision:

The positive and negative numbers closest to zero are
: ±2⁻⁵² × 2⁻¹⁰²² ≈ ±4.94066
The positive and negative normalized numbers closest to zero are
: ±1 × 2⁻¹⁰²² ≈ ±2.22507
The finite positive and finite negative numbers furthest from zero are
: ± × 2¹⁰²³ ≈ ±1.79769

Some example range and gap values for given exponents in double precision:

Actual Exponent	Exp	Minimum	Maximum	Gap
−1	1022	0.5	≈ 0.999999999999999888978	≈ 1.11022e-16
0	1023	1	≈ 1.999999999999999777955	≈ 2.22045e-16
1	1024	2	≈ 3.999999999999999555911	≈ 4.44089e-16
2	1025	4	≈ 7.999999999999999111822	≈ 8.88178e-16
10	1033	1024	≈ 2047.999999999999772626	≈ 2.27374e-13
11	1034	2048	≈ 4095.999999999999545253	≈ 4.54747e-13
52	1075	4503599627370496	9007199254740991	1
53	1076	9007199254740992	18014398509481982	2
1023	2046	≈ 8.98847e307	≈ 1.79769e308	≈ 1.99584e292

Extended formats

The standard also recommends extended format to be used to perform internal computations at a higher precision than that required for the final result, to minimise round-off errors: the standard only specifies minimum precision and exponent requirements for such formats. The x87 80-bit extended format is the most commonly implemented extended format that meets these requirements.

Examples

Here are some [|examples] of single-precision IEEE 754 representations:

Comparing floating-point numbers

Every possible bit combination is either a NaN or a number with a unique value in the affinely extended real number system with its associated order, except for the two combinations of bits for negative zero and positive zero, which sometimes require special attention. The [|binary representation] has the special property that, excluding NaNs, any two numbers can be compared as sign and magnitude integers. When comparing as 2's-complement integers: If the sign bits differ, the negative number precedes the positive number, so 2's complement gives the correct result. If both values are positive, the 2's complement comparison again gives the correct result. Otherwise, the correct FP ordering is the opposite of the 2's complement ordering.
Rounding errors inherent to floating point calculations may limit the use of comparisons for checking the exact equality of results. Choosing an acceptable range is a complex topic. A common technique is to use a comparison epsilon value to perform approximate comparisons. Depending on how lenient the comparisons are, common values include 1e-6 or 1e-5 for single-precision, and 1e-14 for double-precision. Another common technique is ULP, which checks what the difference is in the last place digits, effectively checking how many steps away the two values are.
Although negative zero and positive zero are generally considered equal for comparison purposes, some programming language relational operators and similar constructs treat them as distinct. According to the Java Language Specification, comparison and equality operators treat them as equal, but Math.min and Math.max distinguish them, as do the comparison methods equals, compareTo and even compare of classes Float and Double.

Rounding floating-point numbers

The IEEE standard has four different rounding modes; the first is the default; the others are called directed roundings.

Round to Nearest - rounds to the nearest value; if the number falls midway it is rounded to the nearest value with an even least significant bit, which means it is rounded up 50% of the time
Round toward 0 - directed rounding towards zero
Round toward +∞ - directed rounding towards positive infinity
Round toward −∞ - directed rounding towards negative infinity.
Extending the real numbers

The IEEE standard employs the affinely extended real number system, with separate positive and negative infinities. During drafting, there was a proposal for the standard to incorporate the projectively extended real number system, with a single unsigned infinity, by providing programmers with a mode selection option. In the interest of reducing the complexity of the final standard, the projective mode was dropped, however. The Intel 8087 and Intel 80287 floating point co-processors both support this projective mode.

Functions and predicates

Standard operations

The following functions must be provided:

Add, subtract, multiply, divide
Square root
Floating point remainder. This is not like a normal modulo operation, it can be negative for two positive numbers. It returns the exact value of.
Round to nearest integer. For undirected rounding when halfway between two integers the even integer is chosen.
Comparison operations. Besides the more obvious results, IEEE 754 defines that −∞ = −∞, +∞ = +∞ and x ≠ NaN for any x.
Recommended functions and predicates
copysign returns x with the sign of y, so abs equals copysign. This is one of the few operations which operates on a NaN in a way resembling arithmetic. The function copysign is new in the C99 standard.
−x returns x with the sign reversed. This is different from 0−x in some cases, notably when x is 0. So − is −0, but the sign of 0−0 depends on the rounding mode.
scalb
logb
finite a predicate for "x is a finite value", equivalent to −Inf < x < Inf
isnan a predicate for "x is a NaN", equivalent to "x ≠ x"
x <> y which turns out to have different exception behavior than NOT.
unordered is true when "x is unordered with y", i.e., either x or y is a NaN.
class
nextafter returns the next representable value from x in the direction towards y
History

In 1976 Intel began planning to produce a floating point coprocessor. John Palmer, the manager of the effort, persuaded them that they should try to develop a standard for all their floating point operations. William Kahan was hired as a consultant; he had helped improve the accuracy of Hewlett-Packard's calculators. Kahan initially recommended that the floating point base be decimal but the hardware design of the coprocessor was too far along to make that change.
The work within Intel worried other vendors, who set up a standardization effort to ensure a 'level playing field'. Kahan attended the second IEEE 754 standards working group meeting, held in November 1977. Here, he received permission from Intel to put forward a draft proposal based on the standard arithmetic part of their design for a coprocessor. The arguments over gradual underflow lasted until 1981 when an expert hired by DEC to assess it sided against the dissenters.
Even before it was approved, the draft standard had been implemented by a number of manufacturers. The Intel 8087, which was announced in 1980, was the first chip to implement the draft standard.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...