Alpha 21364


The Alpha 21364, code-named "Marvel", also known as EV7 is a microprocessor developed by Digital Equipment Corporation, later Compaq Computer Corporation, that implemented the Alpha instruction set architecture.

History

The Alpha 21364 was revealed in October 1998 by Compaq at the 11th Annual Microprocessor Forum, where it was described as an Alpha 21264 with a 1.5 MB 6-way set-associative on-die secondary cache, an integrated Direct Rambus DRAM memory controller and an integrated network controller for connecting to other microprocessors. Changes to the Alpha 21264 core included a larger victim buffer, which was quadrupled in capacity to 32 entries, 16 for the Dcache and 16 for the Scache. It was reported by the Microprocessor Report that Compaq considered implementing minor changes to branch predictor to improve branch prediction accuracy and doubling the miss buffer in capacity to 16 entries instead of 8 in the Alpha 21264.
It was expected to be taped-out in late 1999, with samples available in early 2000 and volume shipments in late 2000. However, the original schedule was delayed, with the tape-out in April 2001 instead of late 1999. The Alpha 21364 was introduced on 20 January 2002 when systems using the microprocessor debuted. It operated at 1.25 GHz, but production models in the AlphaServer ES47, ES80 and GS1280 operated at 1.0 GHz or 1.15 GHz. Unlike previous Alpha microprocessors, the Alpha 21364 was not sold on the open market.
The Alpha 21364 was originally intended to be succeeded by the Alpha 21464, code-named EV8, a new implementation of the Alpha ISA with four-way simultaneous multithreading. It was first presented in October 1999 at the 12th Annual Microprocessor Forum, but was cancelled on 25 June 2001 at a late stage of development.

Development

The development of the Alpha 21364 was most focused on features that would improve memory performance and multiprocessor scalability. The focus on memory performance was the result of a forward-looking article published in Microprocessor Report titled, "It's the Memory, Stupid!" written by Richard L. Sites, who co-led the definition of the Alpha architecture. The article concluded that, "Over the coming decade, memory subsystem design will be the only important design issue for microprocessors."

Description

The Alpha 21364 was an Alpha 21264 with a 1.75 MB on-die secondary cache, two integrated memory controllers and an integrated network controller.

Core

The Alpha 21364's core is based on the EV68CB, a derivative of the Alpha 21264. The only modification was a larger victim buffer, now quadrupled in capacity to 32 entries. The 32 entries of victim buffer is divided equally into 16 entries each for the Dcache and Scache. Although the Alpha 21364 is a fourth-generation implementation of the Alpha Architecture, aside from this modification, the core is otherwise identical to the EV68CB derivative of the Alpha 21264.

Scache

The secondary cache is a unified cache with a capacity of 1.75 MB. It is 7-way set associative, uses a 64-byte line size, and has a write-back policy. The cache is protected by single-bit error correction, double-bit error detection error-correcting code. It is connected to the cache controller by a 128-bit data path. Access to the cache is fully pipelined, yielding a sustainable bandwidth of 16 GB/s at 1.0 GHz.
The time required for data requested from the cache to when it can be used is 12 cycles. The 12-cycle latency was considered by observers, such as the Microprocessor Report, to be significant. The latency of the Scache was not reduced further as it would have not improved performance. The Alpha 21264 core upon which the Alpha 21364 was based on was designed to use an external cache built from commodity SRAM, which has a significantly higher latency than the on-die Scache of the Alpha 21364. Thus, it could only accept data at a limited rate. Once improving latency saw no further gains, the designers focused on reducing the power consumed by the Scache. Compaq was not willing to remedy this deficiency as it would have required the Alpha 21264 core to be modified significantly. The high latency of the Scache permitted the cache tags be looked up first to determine if the Scache contained the requested data and in which bank it was located in before powering up the Scache bank and accessing it. This avoided unproductive Scache accesses, reducing power consumption.
The tag store consisted of 5.75 million transistors and data store of 108 million transistors.

Memory controller

The Alpha 21364 has two integrated memory controllers that support Rambus DRAM that operate at two thirds of the microprocessor's clock frequency, or 800 MHz at 1.2 GHz. Compaq designed custom memory controllers for the Alpha 21364, giving them capabilities not found in standard RDRAM memory controllers such as having all the 128 pages open, reducing the access latency to those pages; and proprietary fault-tolerant features.
Each memory controller provides five RDRAM channels that support PC800 Rambus inline memory modules. Four of the channels are used to provide memory, while the fifth is used to provide RAID-like redundancy. Each channel is 16 bits wide, operates at 400 MHz and transfers data on both the rising and falling edges of the clock signal for a transfer rate of 800 MT/s, yielding 1.6 GB/s of bandwidth. The total memory bandwidth of the eight channels is 12.8 GB/s.
Cache coherence is provided by the memory controllers. Each memory controller has a cache coherence engine. The Alpha 21364 uses a directory cache coherence scheme where part of the memory is used to store Modified, Exclusive, Shared, Invalid coherency data.

R-box

The R-box contains the network router. The network router connected the microprocessor to other microprocessors using four ports named North, South, East and West. Each port consisted of two 39-bit unidirectional links operating at 800 MHz. 32 bits were for data and 7 bits were for ECC. The network router also has a fifth port, used for I/O. This port connects to an IO7 application specific integrated circuit, which was a bridge to an AGP 4x channel and two PCI-X buses. The I/O port consisted of two unidirectional 32-bit links operating at 200 MHz, yielding a peak bandwidth of 3.2 GB/s. The I/O port link operated at a quarter of the clock frequency to simplify the design of the I/O ASIC.
The Alpha 21364 can connect to as many as 127 other microprocessors using two network topologies: shuffle and an 2D torus. The shuffle topology had more direct paths to other microprocessors, reducing latency and therefore improving performance, but was limited to connecting up to eight microprocessors as a result of its nature. The 2D torus topology enabled the network to feature up to 128 microprocessors.
In multiprocessing systems, each microprocessor is a node with its own memory. Accessing the memory of other nodes is possible, but with a latency. The latency increases with distance, thus the Alpha 21364 implements non-uniform memory access multiprocessing. I/O is also distributed in an identical fashion. An Alpha 21364 microprocessor in a multiprocessing system did not have to have its RIMM slots populated with memory or its I/O port populated with devices. It could use another microprocessor's memory and I/O.

Fault tolerance

The Alpha 21364 could operate in lock-step for fault-tolerant computers. This feature was a result in Compaq's decision to migrate Tandem's Himalaya fault-tolerant servers from the MIPS architecture to Alpha. The machines however never used the microprocessor as the decision to phase out the Alpha in favor of the Itanium was made before the availability of the Alpha 21364.

Fabrication

The Alpha 21364 contained 152 million transistors. The die measured 21.1 mm by 18.8 mm for an area of 397 mm². It was fabricated by International Business Machines in their 0.18 µm, seven-level copper complementary metal-oxide-semiconductor process. It was packaged in a 1,443-land flip-chip land grid array. It used a 1.65 V power supply and a 1.5 V external interface for a maximum power dissipation of 155 W at 1.25 GHz.

Alpha 21364A

The Alpha 21364A, code-named EV79, previously EV78, was a further development of the Alpha 21364. It was intended to be the last Alpha microprocessor developed. Scheduled to be introduced in 2004, it was cancelled on 23 October 2003, with HP cited performance and schedule issues as reasons. A replacement, the EV7z was announced on the same day.
A prototype of the microprocessor was presented by Hewlett-Packard at the International Solid State Circuits Conference in February 2003. It operated at 1.45 GHz, had a die area of 251 mm², used a 1.2 V power supply, and dissipated 100 W.
The Alpha 21364A was to have improved upon the Alpha 21364 by featuring higher clock frequencies in the range of ~1.6 to ~1.7 GHz and support for 1066 Mbit/s RDRAM memory. It was to be fabricated by IBM in their 0.13 µm silicon on insulator process. As a result of the more advanced process, there were reductions in die size, power supply voltage, and in power consumption and dissipation.

EV7z

The EV7z was a further development of the Alpha 21364. It was the last Alpha microprocessor developed and introduced. The EV7z became known on 23 October 2003 when HP announced they had cancelled the Alpha 21364A and would be replacing it with the EV7z. The EV7z was introduced on 16 August 2004 when the only computer using the microprocessor, AlphaServer GS1280, was introduced. It was discontinued on 27 April 2007 when the computer it was featured in was discontinued. It operated at 1.3 GHz, supported PC1066 RIMMs and was fabricated in the same 0.18 µm process as the Alpha 21364. Compared to the Alpha 21364, the EV7z was 14 to 16 percent faster, but was still slower than the Alpha 21364A it replaced, which was estimated to outperform the Alpha 21364 by 25 percent at 1.5 GHz.