AI accelerator


An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence applications, especially artificial neural networks, recurrent neural network, machine vision and machine learning. Typical applications include algorithms for robotics, internet of things and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability., a typical AI integrated circuit chip contains billions of MOSFET transistors.
A number of vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

History of AI acceleration

s have frequently complemented the CPU with special purpose accelerators for specialized tasks, known as coprocessors. Notable application-specific hardware units include video cards for graphics, sound cards, graphics processing units and digital signal processors. As deep learning and artificial intelligence workloads rose in prominence in the 2010s, specialized hardware units were developed or adapted from existing products to accelerate these tasks.

Early attempts

As early as 1993, digital signal processors were used as neural network accelerators e.g. to accelerate optical character recognition software. In the 1990s, there were also attempts to create parallel high-throughput systems for workstations aimed at various applications, including neural network simulations. FPGA-based accelerators were also first explored in the 1990s for both inference and training. ANNA was a neural net CMOS accelerator developed by Yann LeCun.

Heterogeneous computing

refers to incorporating a number of specialized processors in a single system, or even a single chip, each optimized for a specific type of task. Architectures such as the cell microprocessor have features significantly overlapping with AI accelerators including: support for packed low precision arithmetic, dataflow architecture, and prioritizing 'throughput' over latency. The Cell microprocessor was subsequently applied to a number of tasks including AI.
In the 2000s, CPUs also gained increasingly wide SIMD units, driven by video and gaming workloads; as well as support for packed low precision data types.

Use of GPU

s or GPUs are specialized hardware for the manipulation of images and calculation of local image properties. The mathematical basis of neural networks and image manipulation are similar, embarrassingly parallel tasks involving matrices, leading GPUs to become increasingly used for machine learning tasks., GPUs are popular for AI work, and they continue to evolve in a direction to facilitate deep learning, both for training and inference in devices such as self-driving cars. GPU developers such as Nvidia NVLink are developing additional connective capability for the kind of dataflow workloads AI benefits from. As GPUs have been increasingly applied to AI acceleration, GPU manufacturers have incorporated neural network specific hardware to further accelerate these tasks. Tensor cores are intended to speed up the training of neural networks.

Use of FPGAs

Deep learning frameworks are still evolving, making it hard to design custom hardware. Reconfigurable devices such as field-programmable gate arrays make it easier to evolve hardware, frameworks and software alongside each other.
Microsoft has used FPGA chips to accelerate inference. The application of FPGAs to AI acceleration motivated Intel to acquire Altera with the aim of integrating FPGAs in server CPUs, which would be capable of accelerating AI as well as general purpose tasks.

Emergence of dedicated AI accelerator ASICs

While GPUs and FPGAs perform far better than CPUs for AI related tasks, a factor of up to 10 in efficiency may be gained with a more specific design, via an application-specific integrated circuit. These accelerators employ strategies such as optimized memory use and the use of lower precision arithmetic to accelerate calculation and increase throughput of computation. Some adopted low-precision floating-point formats used AI acceleration are half-precision and the bfloat16 floating-point format.

In-memory computing architectures

In June 2017, IBM researchers announced an architecture in contrast to the Von Neumann architecture based on in-memory computing and phase-change memory arrays applied to temporal correlation detection, intending to generalize the approach to heterogeneous computing and massively parallel systems. In October 2018, IBM researchers announced an architecture based on in-memory processing and modeled on the human brain's synaptic network to accelerate deep neural networks. The system is based on phase-change memory arrays.

Nomenclature

As of 2016, the field is still in flux and vendors are pushing their own marketing term for what amounts to an "AI accelerator", in the hope that their designs and APIs will become the dominant design. There is no consensus on the boundary between these devices, nor the exact form they will take; however several examples clearly aim to fill this new space, with a fair amount of overlap in capabilities.
In the past when consumer graphics accelerators emerged, the industry eventually adopted Nvidia's self-assigned term, "the GPU", as the collective noun for "graphics accelerators", which had taken many forms before settling on an overall pipeline implementing a model presented by Direct3D.

Potential applications