National Center for Computational Sciences


The National Center for Computational Sciences is a United States Department of Energy Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing resources and international expertise in scientific computing.
The NCCS provides resources for calculation and simulation in fields including astrophysics, materials science, and climate research to users from government, academia, and industry who have many of the largest computing problems in science.
The OLCF’s flagship supercomputer, the IBM AC922 Summit, is supported by advanced data management and analysis tools. The center hosted the Cray XK7 Titan system, one of the most powerful scientific tools of its time, from 2012 through its retirement in August 2019. The same year, construction began for Frontier, which is slated to debut as the OLCF’s first exascale system in 2021.

History

On December 9, 1991, Congress signed the High-Performance Computing Act of 1991, created by Senator Al Gore. HPCA proposed a national information infrastructure to build communications networks and databases and also called for proposals to build new high-performance computing facilities to serve science.
On May 24, 1992, ORNL was awarded a high-performance computing research center called the Center for Computational Sciences, or CCS, as part of HPCA. ORNL also received a 66-processor, serial #1 Intel Paragon XP/S 5 for code development the same year. The system had a peak performance of 5 gigaflops.
Oak Ridge National Laboratory joined with three other national laboratories and seven universities to submit the Partnership in Computational Science proposal to the US Department of Energy as part of the High-Performance Computing and Communications Initiative.
With the High-End Computing Revitalization Act of 2004, CCS was tasked with carrying out the Leadership Computing Facility Project at ORNL with the goal of developing and installing a petaflops-speed supercomputer by the end of 2008. The center officially changed its name from the Center for Computational Sciences to NCCS the same year.
On December 9, 2019, Georgia Tourassi–who previously served as the director of ORNL's Health Data Sciences Institute and as group leader for ORNL’s Biomedical Sciences, Engineering, and Computing Group–was appointed to director of the NCCS, succeeding James Hack.

Previous Systems

Intel Paragons

The creation of the CCS in 1992 ushered in a series of Intel Paragon computers, including:
Eagle was a 184-node IBM RS/6000 SP operated by the Computer Science and Mathematics Division of ORNL. It had 176 Winterhawk-II “thin” nodes, each with four 375 MHz Power3-II processors and 2GB of memory. Eagle also had eight Winterhawk-II “wide” nodes - each with two 375 MHz Power3-II processors and 2 GB of memory—for use as filesystem servers and other infrastructure tasks. Eagle’s estimated computational power was greater than 1 teraflop in the compute partition.

Falcon (2000)

Falcon was a 64-node Compaq AlphaServer SC operated by the CCS and acquired as part of an early-evaluation project. It had four 667 MHz Alpha EV67 processors with 2 GB of memory per node and 2 TB of Fiber Channel disk attached, resulting in an estimated computational power of 342 gigaflops.

Cheetah (2001–2008)

Cheetah was a 4.5 TF IBM pSeries System operated by the CCS. The compute partition of Cheetah included 27 p690 nodes, each with thirty-two 1.3 GHz Power4 processors. The login and I/O partitions together included 8 p655 nodes, each with four 1.7 GHz Power4 processors. All nodes were connected via IBM’s Federation interconnect.
The Power4 memory hierarchy consisted of three levels of cache. The first and second levels were on the Power4 chip. Level-1 instruction cache was 128 KB and the data cache was 64 KB The level-2 cache was 1.5 MB shared between the two processors. The level 3 cache was 32 MB and was off-chip. There were 16 chips per node, or 32 processors.
Most of Cheetah’s compute nodes had 32 GB of memory. Five had 64 GB of memory and two had 128 GB of memory. Some of the nodes in Cheetah had approximately 160 GB of local disk space that could be used as temporary scratch space.
In June 2002, Cheetah was ranked the eighth-fastest computer in the world, according to TOP500, the semi-annual list of the world's top supercomputers.

Ram (2003–2007)

Ram was an SGI Altix supercomputer provided as a support system for the NCCS.
Ram was installed in 2003 and was used as a pre- and post-processing support system for allocated NCCS projects until 2007.
Ram had 256 processors running at 1.5 GHz, each with 6 MB of L3 cache, 256K of L2 cache, and 32K of L1 cache. Ram had 8 GB of memory per processor for a total of 2 TB of shared memory. By contrast, the first supercomputer at ORNL, the Cray XMP installed in 1985, had one-millionth the memory of the SGI Altix.

Phoenix (OLCF-1) (2003–2008)

was a Cray X1E provided as a primary system in NCCS.
The original X1 was installed in 2003 and went through several upgrades, arriving at its final configuration in 2005. From October 2005 until 2008, it provided almost 17 million processor-hours. The system supported more than 40 large projects in research areas including climate, combustion, high energy physics, fusion, chemistry, computer science, materials science, and astrophysics.
At its final configuration, Phoenix had 1,024 multistreaming vector processors. Each MSP had 2 MB of cache and a peak computation rate of 18 gigaflops. Four MSPs formed a node with 8 GB of shared memory. Memory bandwidth was very high, roughly half the cache bandwidth. The interconnect functioned as an extension of the memory system, offering each node direct access to memory on other nodes at high bandwidth and low latency.

Jaguar (OLCF-2) (2005–2012)

began as a 25-teraflop Cray XT3 in 2005. Later, it was upgraded to an XT4 containing 7,832 compute nodes, each containing a quad-core AMD Opteron 1354 processor running at 2.1 GHz, 8 GB of DDR2-800 memory, and a SeaStar2 router. The resulting partition contained 31,328 processing cores, more than 62 TB of memory, more than 600 TB of disk space, and a peak performance of 263 teraflops.
In 2008, Jaguar was upgraded to a Cray XT5 and became the first system to run a scientific application at a sustained petaflop. By the time of its ultimate transformation into Titan in 2012, Jaguar contained nearly 300,000 processing cores and had a theoretical performance peak of 3.3 petaflops. Jaguar had 224,256 x86-based AMD Opteron processor cores and operated with a version of Linux called the Cray Linux Environment.
From November 2009 until November 2010, Jaguar was the world's most powerful computer.

Hawk (2006–2008)

Hawk was a 64-node Linux cluster dedicated to high-end visualization.
Hawk was installed in 2006 and was used as the Center’s primary visualization cluster until May 2008 when it was replaced by a 512-core system named Lens.
Each node contained two single-core Opteron processors and 2 GB of memory. The cluster was connected by a Quadrics Elan3 network, providing high-bandwidth and low-latency communication. The cluster was populated with two flavors of NVIDIA graphics cards connected with AGP8x: 5900 and QuadroFX 3000G. Nodes with 3000G cards were directly connected to the EVEREST PowerWall and were reserved for PowerWall use.

Ewok (2006–2011)

Ewok was an Intel-based InfiniBand cluster running Linux. The system was provided as an end-to-end resource for center users. It was used for workflow automation for jobs running from the Jaguar supercomputer and for advanced data analysis. The system contained 81 nodes. Each node contained two 3.4 GHz Pentium IV processors, a 3.4 GHz Intel Xeon central processing unit, and 6 GB of memory. An additional node contained 4 dual-core AMD processors and 64 GB of memory. The system was configured with a 13 TB Lustre file system for scratch space.

Eugene (2008–2011)

Eugene was a 27-teraflop IBM Blue Gene/P System operated by NCCS. It provided approximately 45 million processor-hours yearly for ORNL staff and for the promotion of research collaborations between ORNL and its core university partner members.
The system consisted of 2,048 850Mhz IBM quad-core 450d PowerPC processors and 2 GB of memory per each node. Eugene had 64 I/O nodes; each submitted job was required to use at least one I/O node. This means that each job consumed a minimum of 32 nodes per execution.
Eugene was officially decommissioned in October 2011. However, on December 13 of the same year, a portion of Eugene’s hardware was donated to Argonne Leadership Computing Facility at Argonne National Laboratory.

Eos (2013–2019)

Eos was a 736-node Cray XC30 cluster with a total of 47.104 TB of memory. Its processor was the Intel Xeon E5-2670. It featured 16 I/O service nodes and 2 external login nodes. Its compute nodes were organized in blades. Each blade contained 4 nodes. Every node had 2 sockets with 8 physical cores each. Intel’s HyperThreading Technology allowed each physical core to work as 2 logical cores so each node could function as if it had 32 cores. In total, the Eos compute partition contained 11,776 traditional processor cores.
Eos provided a space for tool and application porting, small scale jobs to prepare capability runs on Titan, as well as software generation, verification, and optimization.

Titan (OLCF-3) (2012–2019)

was a hybrid-architecture Cray XK7 system with a theoretical peak performance exceeding 27,000 trillion calculations per second. It contained both advanced 16-core AMD Opteron CPUs and NVIDIA Kepler graphics processing units. This combination allowed Titan to achieve 10 times the speed and 5 times the energy efficiency of its predecessor, the Jaguar supercomputer, while using only modestly more energy and occupying the same physical footprint.
Titan featured 18,688 compute nodes, a total system memory of 710 TB, and Cray’s high-performance Gemini network. Its 299,008 CPU cores guided simulations and the accompanying GPUs handled hundreds of calculations simultaneously. The system provided decreased time to solution, increased complexity of models, and greater realism in simulations. In November 2012, Titan received the Number 1 position on the TOP500 supercomputer list.
After 7 years of service, Titan was decommissioned in August 2019 to make room for the Frontier supercomputer.

Current Systems

Spider

The OLCF’s center-wide Lustre file system, called Spider, is the operational work file system for most OLCF computational resources. As an extremely high-performance system, Spider has over 20,000 clients, providing 32 PB of disk space, and it can move data at more than 1 TB/s. Spider comprises two filesystems, Atlas1 and Atlas2, in order to provide high availability and load balance across multiple metadata servers for increased performance.

HPSS

HPSS, ORNL’s archival mass-storage resource, consists of tape and disk storage components, Linux servers, and High Performance Storage System software. Tape storage is provided by StorageTek SL8500 robotic tape libraries, each of which can hold up to 10,000 cartridges. Each library has 24 T10K-A drives, 60 T10K-B drives, 36 T10K-C drives, and 72 T10K-D drives.

EVEREST

EVEREST is a large-scale venue for data exploration and analysis. EVEREST measures 30 feet long by 8 feet tall, and its main feature is a 27-projector PowerWall with an aggregate pixel count of 35 million pixels. The projectors are arranged in a 9×3 array, each providing 3,500 lumens for a very bright display.
Displaying 11,520 by 3,072 pixels, the wall offers a tremendous amount of visual detail. The wall is integrated with the rest of the computing center, creating a high-bandwidth data path between large-scale high-performance computing and large-scale data visualization.
EVEREST is controlled by a 14-node cluster. Each node contains four dual-core AMD Opteron processors. These 14 nodes have NVIDIA QuadroFX 3000G graphics cards connected to the projectors, providing a very-high-throughput visualization capability. The visualization lab acts as an experimental facility for development of future visualization capabilities. It houses a 12-panel tiled LCD display, test cluster nodes, interaction devices, and video equipment.

Rhea

Rhea is a 521-node, commodity-type Linux cluster. Rhea provides a conduit for large-scale scientific discovery via pre- and post-processing of simulation data generated on the Titan supercomputer. Each of Rhea’s first 512 nodes contain two 8-core 2.0 GHz Intel Xeon processors with Intel’s HT Technology and 128 GB of main memory. Rhea also has nine large memory GPU nodes. These nodes each have 1 TB of main memory and two NVIDIA K80 GPUs with two 14-core 2.30 GHz Intel Xeon processors with HT Technology. Rhea is connected to the OLCF’s high performance Lustre filesystem, Atlas.

Wombat

Wombat is a single-rack cluster from HPE based on the 64-bit ARM architecture instead of traditional x86-based architecture. This system is available to support computer science research projects aimed at exploring the ARM architecture.
The Wombat cluster has 16 compute nodes, four of which have two AMD GPU accelerators attached. Each compute node has two 28-core Cavium ThunderX2 processors, 256 GB RAM and a 480 GB SSD for node-local storage. Nodes are connected with EDR InfiniBand.

Summit (OLCF-4)

The IBM AC922 Summit, or OLCF-4, is ORNL’s 200-petaflop flagship supercomputer. Summit was originally launched in June 2018, and as of the November 2019 TOP500 list, is the fastest computer in the world with a High Performance Linpack performance of 148.6 petaflops. Summit is also the first computer to reach exascale performance, achieving a peak throughput of 1.88 exaops through a mixture of single- and half-precision floating point operations.
Like its predecessor Titan, Summit makes use of a hybrid architecture that integrates its 9,216 Power9 CPUs and 27,648 NVIDIA Volta V100 GPUs using NVIDIA’s NVLink. Summit features 4,608 nodes, each with 512 GB of Double Data Rate 4 Synchronous Dynamic Random-Access Memory and 96 GB of High Bandwidth Memory per node, with a total storage capacity of 250 petabytes.

Frontier (OLCF-5)

Scheduled for delivery in 2021 with user access becoming available the following year, Frontier will be ORNL’s first sustainable exascale system, meaning it will be capable of performing one quintillion—one billion billion—operations per second. The system will be composed of more than 100 Cray Shasta cabinets with an anticipated peak performance around 1.5 exaflops.

Research areas