Qualcomm Hexagon


Hexagon is the brand for a family of 32-bit multi-threaded microarchitectures implementing the same instruction set for a digital signal processor developed by Qualcomm. According to 2012 estimation, Qualcomm shipped 1.2 billion DSP cores inside its system on a chip in 2011 year, and 1.5 billion cores were planned for 2012, making the QDSP6 the most shipped architecture of DSP.
The Hexagon architecture is designed to deliver performance with low power over a variety of applications. It has features such as hardware assisted multithreading, privilege levels, Very Long Instruction Word, Single Instruction, Multiple Data, and instructions geared toward efficient signal processing. The CPU is capable of in-order dispatching up to 4 instructions to 4 Execution Units every clock. Hardware multithreading is implemented as barrel temporal multithreading - threads are switched in round-robin fashion each cycle, so the 600 MHz physical core is presented as three logical 200 MHz cores before V5. Hexagon V5 switched to dynamic multithreading with thread switch on L2 misses, interrupt waiting or on special instructions.
At Hot Chips 2013 Qualcomm announced details of their Hexagon 680 DSP. Qualcomm announced Hexagon Vector Extensions. HVX is designed to allow significant compute workloads for advanced imaging and computer vision to be processed on the DSP instead of the CPU. In March 2015 Qualcomm announced their Snapdragon Neural Processing Engine SDK which allow AI acceleration using the CPU, GPU and Hexagon DSP.
Qualcomm's Snapdragon 855 contains their 4th generation on-device AI engine, which includes the Hexagon 690 DSP and Hexagon Tensor Accelerator for AI acceleration.

Software support

Operating systems

The port of Linux for Hexagon runs under a hypervisor layer and was merged with the 3.2 release of the kernel. The original hypervisor is closed-source, and in April 2013 a minimal open-source hypervisor implementation for QDSP6 V2 and V3, the "Hexagon MiniVM" was released by Qualcomm under a BSD-style license.

Compilers

Support for Hexagon was added in 3.1 release of LLVM by Tony Linthicum. Hexagon/HVX V66 ISA support was added in 8.0.0 release of LLVM. There is also a non-FSF maintained branch of GCC and binutils.

Adoption of the SIP block

Qualcomm Hexagon DSPs have been available in Qualcomm Snapdragon SoC since 2006. In Snapdragon S4 there are three QDSP cores, two in the Modem subsystem and one Hexagon core in the Multimedia subsystem. Modem cores are programmed by Qualcomm only, and only Multimedia core is allowed to be programmed by user.
They are also used in some femtocell processors of Qualcomm, including FSM98xx, FSM99xx and FSM90xx.

Third-party integration

In March 2016, it was announced that semiconductor company Conexant's AudioSmart audio processing software was being integrated into Qualcomm's Hexagon.
In May 2018 wolfSSL added support for using Qualcomm Hexagon. This is support for running wolfSSL crypto operations on the DSP. In addition to use of crypto operations a specialized operation load management library was later added.

Versions

There are six versions of QDSP6 architecture released: V1, V2, V3, V4, QDSP6 V5 ; and QDSP6 V6. V4 has 20 DMIPS per milliwatt, operating at 500 MHz.
Clock speed of Hexagon varies in 400–2000 MHz for QDSP6 and in 256–350 MHz for previous generation of the architecture, the QDSP5.
Versions of QDSP6Process node, nmDateNumber of simultaneous threadsPer-thread clock, MHzTotal core clock, MHz
QDSP6 V165Oct 2006
QDSP6 V265Dec 20076100600
QDSP6 V3 452009667400
QDSP6 V3 4520094100400
QDSP6 V4 282010–20113167500
QDSP6 V5 2820133200 or greater with DMT600
QDSP6 V6 68X14/102016-201845002000

Availability in Snapdragon products

Both Hexagon and pre-Hexagon cores are used in modern Qualcomm SoCs, QDSP5 mostly in low-end products. Modem QDSPs are not shown in the table.
QDSP5 usage:
Snapdragon generationChipset IDDSP GenerationDSP Frequency, MHzProcess node, nm
S1MSM7627, MSM7227, MSM7625, MSM7225QDSP532065
S1MSM7627A, MSM7227A, MSM7625A, MSM7225AQDSP535045
S2MSM8655, MSM8255, APQ8055, MSM7630, MSM7230QDSP525645
S4 PlayMSM8625, MSM8225QDSP535045
S2008110, 8210, 8610, 8112, 8212, 8612, 8225Q, 8625QQDSP538445 LP

QDSP6 usage:
Snapdragon generationChipset IDQDSP6 versionDSP Frequency, MHzProcess node, nm
S1QSD8650, QSD8250QDSP660065
S3MSM8660, MSM8260, APQ8060QDSP6 40045
S4 PrimeMPQ8064QDSP6 50028
S4 ProMSM8960 Pro, APQ8064QDSP6 50028
S4 PlusMSM8960, MSM8660A, MSM8260A, APQ8060A, MSM8930,
MSM8630, MSM8230, APQ8030, MSM8627, MSM8227
QDSP6 50028
S4008926, 8930, 8230, 8630, 8930AB, 8230AB, 8630AB, 8030AB, 8226, 8626QDSP6V450028 LP
S6008064T, 8064MQDSP6V450028 LP
S8008974, 8274, 8674, 8074QDSP6V5A60028 HPm
S8208996QDSP6V6200014 FinFet LPP

Code sample

This is a single instruction packet from the inner loop of a FFT:
:endloop0

This packet is claimed by Qualcomm to be equal to 29 classic RISC operations; it includes vector add, complex multiply operation and hardware loop support. All instructions of the packet are done in the same cycle.