ZPU (microprocessor)


The ZPU is a microprocessor stack machine designed by Norwegian company Zylin AS to run supervisory code in electronic systems that include a field-programmable gate array.
The ZPU is a relatively recent stack machine with a small economic niche, and it has a growing number of users and implementations. It has been designed to require very small amounts of electronic logic, making more electronic logic available for other purposes in the FPGA. To make it easily usable, it has a port of the GNU Compiler Collection. This makes it much easier to apply than CPUs without compilers. Sacrificing speed in exchange for small size, it keeps the intermediate results of calculations in memory, in a push-down stack, rather than in registers.
Zylin Corp. made the ZPU open-source in 2008.

Usage

Many electronic projects include electronic logic in an FPGA. It's wasteful to also have a microprocessor, so it is commonplace to add a CPU to the electronic logic in the FPGA. Often, a smaller, less-expensive FPGA could be used if only the CPU used less resources. This is the exact situation that the ZPU was designed to address.
The ZPU is designed to handle the miscellaneous tasks of a system that are best handled by software, for example, a user interface. The ZPU is very slow, but its small size helps to place any needed high-speed algorithm in the FPGA.
Another issue is that most CPUs for FPGAs are closed-source, available only from a particular maker of FPGAs. Occasionally a project needs to have a design that can be widely distributed, for security inspections, educational uses or other reasons. The licenses on these proprietary CPUs can prevent these uses. The ZPU is open-sourced.
Some projects need code that must be small, but run on a CPU that inherently has larger code. Alternatively, a project may benefit from the wide selection of code, compilers and debugging tools for the GNU Compiler Collection. In these cases, an emulator can be written to implement the ZPU's instruction set on the target CPU, and the ZPU's compilers can be used to produce the code. The resulting system is slow, but packs code into less memory than many CPUs and enables the project to use a wide variety of compilers and code.

Design features

The ZPU was designed explicitly to minimize the amount of electronic logic. It has a minimal instruction set, yet can be encoded for the GNU Compiler Collection. It also minimizes the number of registers that must be in the FPGA, minimizing the number of flip-flops. Instead of registers, intermediate results are kept on the stack, in memory.
It also has small code, saving on memory. Stack machine instructions do not need to contain register IDs, so the ZPU's code is smaller than other RISC CPUs, said to need only about 80% of the space of ARM Holdings Thumb2. For example, the signed immediate helps the ZPU store a 32-bit value in at most 5 bytes of instruction space, and as little as one. Most RISC CPUs require at least eight bytes.
Finally, about 2/3 of its instructions can be emulated by firmware implemented using the other 1/3 "required" instructions. Although the result is very slow, the resulting CPU can require as little as 446 lookup-tables.
The ZPU has a reset vector, consisting of 32-bytes of code space starting at location zero. It also has a single edge-sensitive interrupt, with a vector consisting of 32 bytes of code space beginning at address 32. Vectors 2 through 63 each have 32 bytes of space, but are reserved for code to emulate instructions 33 through 63.
The base ZPU has a 32-bit data path. The ZPU also has a variant with a 16-bit-wide data path, to save even more logic.

Tools and resources

The ZPU has a well-tested port of the GNU Compiler Collection. Enthusiasts and firmware engineers have ported ECos, FreeRTOS and μClinux.
At least one group of enthusiasts have copied the popular development environment of the Arduino and adapted it to the ZPU.
There are now multiple models of the ZPU core. Besides the original Zylin cores, there are also the ZPUino cores, and the ZPUFlex core. The Zylin core is designed for a minimal FPGA footprint, and includes a 16-bit version. The ZPUino has practical improvements for speed, can replace emulated instructions with hardware, and is embedded in a system-on-chip framework. The ZPUFlex is designed to use external memory blocks and can replace emulated instructions with hardware.
Academic projects include power efficiency studies and improvements, and reliability studies.
To improve speed, most implementors have implemented the emulated instructions, and added a stack cache. Beyond this, one implementor said that a two-stack architecture would permit pipelining, but this might also require compiler changes.
One implementor reduced power usage by 46% with a stack cache and automated insertion of clock gating. The power usage was then roughly equivalent to the small open-source Amber core, which implements the ARM v2a architecture.
The parts of the ZPU that would be most aided by fault-tolerance are the address bus, stack pointer and program counter.

Instruction set

"TOS" is an abbreviation of the "Top Of Stack." "NOS" is an abbreviation of the "Next to the top Of Stack."
NameBinaryDescription
BREAKPOINT00000000Halt the CPU and/or jump to the debugger.
IM_x1xxxxxxxPush or append a signed 7-bit immediate to the TOS.
STORESP_x010xxxxxPop the TOS and store it into the stack at an offset from the top.
LOADSP_x011xxxxxFetch from a value indexed in the stack and push it into the TOS.
EMULATE_x001xxxxxEmulate an instruction with code at vector x.
ADDSP_x0001xxxxFetch from a value indexed in the stack and add the value to the TOS.
POPPC00000100Pop an address from the TOS and store it to the PC.
LOAD00001000Pop an address and push the loaded memory value to the TOS.
STORE00001100Store the NOS into the memory pointed-to by the TOS. Pop both.
PUSHSP00000010Push the current SP into the TOS.
POPSP00001101Pop the TOS and store it to the SP.
ADD00000101Integer addition of TOS and NOS.
AND00000110Bitwise AND of the TOS and NOS.
OR00000111Bitwise OR of the TOS and NOS.
NOT00001001Bitwise NOT of the TOS.
FLIP00001010Reverse the bit order of the TOS.
NOP00001011No-Operation.

Code points 33 to 63 may be emulated by code in vectors 2 through 32: LOADH and STOREH, LESSTHAN, LESSTHANOREQUAL, ULESSTHAN, ULESSTHANOREQUAL, SWAP, MULT, LSHIFTRIGHT, ASHIFTLEFT, ASHIFTRIGHT, CALL, EQ, NEQ, NEG, SUB, XOR, LOADB and STOREB, DIV, MOD, EQBRANCH, NEQBRANCH, POPPCREL, CONFIG, PUSHPC, SYSCALL, PUSHSPADD, HALFMULT, CALLPCREL