Historically systems implemented floating point with a coprocessor rather than as an integrated unit. This could be a single integrated circuit, an entire circuit board or a cabinet. Where floating-point calculation hardware has not been provided, floating-point calculations are done in software, which takes more processor time, but avoids the cost of the extra hardware. For a particular computer architecture, the floating-point unit instructions may be emulated by a library of software functions; this may permit the same object code to run on systems with or without floating-point hardware. Emulation can be implemented on any of several levels: in the CPU as microcode, as an operating system function, or in user-space code. When only integer functionality is available, the CORDIC floating-point emulation methods are most commonly used. In most modern computer architectures, there is some division of floating-point operations from integer operations. This division varies significantly by architecture; some have dedicated floating-point registers, while some, like the Intel x86, take it as far as independent clocking schemes. CORDIC routines have been implemented in the Intel8087, 80287, 80387 up to the 80486 coprocessor series, as well as in the Motorola68881 and 68882 for some kinds of floating-point instructions, mainly as a way to reduce the gate counts of the FPU subsystem. Floating-point operations are often pipelined. In earlier superscalar architectures without general out-of-order execution, floating-point operations were sometimes pipelined separately from integer operations. Since the early 1990s, many microprocessors for desktops and servers have more than one FPU. The modular architecture of Bulldozer microarchitecture uses a special FPU named FlexFPU, which uses simultaneous multithreading. Each physical integer core, two per module, is single-threaded, in contrast with Intel's Hyperthreading, where two virtual simultaneous threads share the resources of a single physical core.
Floating-point library
Some floating-point hardware only supports the simplest operations: addition, subtraction, and multiplication. But even the most complex floating-point hardware has a finite number of operations it can support for example, no FPUs directly support arbitrary-precision arithmetic. When a CPU is executing a program that calls for a floating-point operation that is not directly supported by the hardware, the CPU uses a series of simpler floating-point operations. In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on the integer arithmetic logic unit. The software that lists the necessary series of operations to emulate floating-point operations is often packaged in a floating-point library.
Integrated FPUs
In some cases, FPUs may be specialized, and divided between simpler floating-point operations and more complicated operations, like division. In some cases, only the simple operations may be implemented in hardware or microcode, while the more complex operations are implemented as software. In some current architectures, the FPU functionality is combined with units to perform SIMD computation; an example of this is the augmentation of the x87 instructions set with SSEinstruction set in the x86-64 architecture used in newer Intel and AMD processors.
Add-on FPUs
In the 1980s, it was common in IBM PC/compatible microcomputers for the FPU to be entirely separate from the CPU, and typically sold as an optional add-on. It would only be purchased if needed to speed up or enable math-intensive programs. The IBM PC, XT, and most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor. The AT and 80286-based systems were generally socketed for the 80287, and 80386/80386SX-based machines for the 80387 and 80387SX respectively, although early ones were socketed for the 80287, since the 80387 did not exist yet. Other companies manufactured co-processors for the Intel x86 series. These included Cyrix and Weitek. Coprocessors were available for the Motorola 68000 family, the 68881 and 68882. These were common in Motorola 68020/68030-based workstations, like the Sun 3 series. They were also commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding the coprocessor were not as common in lower-end systems. There are also add-on FPUs coprocessor units for microcontroller units /single-board computer, which serve to provide floating-point arithmetic capability. These add-on FPUs are host-processor-independent, possess their own programming requirements and are often provided with their own integrated development environments.