Red Storm (computing)


Red Storm is a supercomputer architecture designed for the US Department of Energy’s National Nuclear Security Administration Advanced Simulation and Computing Program. Cray, Inc developed it based on the contracted architectural specifications provided by Sandia National Laboratories. The architecture was later commercially produced as the
Cray XT3.
Red Storm is a partitioned, space shared, tightly coupled, massively parallel processing machine with a high performance 3D mesh network. The processors are commodity AMD Opteron CPUs with off-the-shelf memory DIMMs. The NIC/router combination, called SeaStar, is the only custom ASIC component in the system and uses a PowerPC 440 based core. When deployed in 2005, Red Storm’s initial configuration consisted of 10,880 single-core 2.0 GHz Opterons, of which 10,368 were dedicated for scientific calculations. The remaining 512 Opterons were used to service the computations and also provide the user interface to the system and run a version of Linux. This initial installation consisted of 140 cabinets, taking up of floor space.
The Red Storm supercomputer was designed to be highly scalable from a single cabinet to hundreds of cabinets and has been scaled-up twice. In 2006 the system was upgraded to 2.4 GHz Dual-Core Opterons. An additional fifth row of computer cabinets were also brought online resulting in over 26,000 processor cores. This resulted in a peak performance of 124.4 teraflops, or 101.4 running the Linpack benchmark.
A second major upgrade in 2008 introduced Cray XT4 technology: Quad-core Opteron processors and an increase in memory to 2 GB per core. This resulted in a peak theoretical performance of 284 teraflops.
Top500 performance ranking for Red Storm after each upgrade:
Red Storm is intended for capability computing. That is, a single application can be run on the entire system. This is in contrast to cluster-style capacity computing, in which portions of a cluster are assigned to run different applications. The performance of the memory subsystem, the processor, and the network must be in proper balance to achieve adequate application progress across the entire machine. System software plays a key role as well. The Portals network programming API is used to ensure inter-processor communication can scale as large as the entire system, and has been used on many different supercomputers, including the Intel Teraflops and Paragon. The compute processors use a custom lightweight kernel operating system named Catamount, which is based on the operating system of ASCI Red called "Cougar". A userspace implementation of the Lustre file system, named liblustre, was ported to the Catamount environment using the libsysio library to provide POSIX-like semantics. This filesystem client ran in the single-threaded Catamount environment without interrupts, and only serviced IO requests when explicitly allowed by the application, to reduce jitter introduced by background file system operations.
Red Storm was decommissioned in 2012.