GPFS

IBM Spectrum Scale, formerly the General Parallel File System
is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List.
For example, it is the filesystem of the Summit
Supercomputer at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 top500 list of supercomputers
. Summit is a 200 Petaflops system composed of more than 9,000 IBM POWER processors and 27,000 NVIDIA Volta GPUs. The storage filesystem called Alpine has 250 PB of storage using Spectrum Scale on IBM ESS storage hardware, capable of approximately 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.
Like typical cluster filesystems, IBM Spectrum Scale provides concurrent high-speed file access to applications executing on multiple nodes of clusters. It can be used with AIX clusters, Linux clusters, on Microsoft Windows Server, or a heterogeneous cluster of AIX, Linux and Windows nodes running on x86, POWER or IBM z processor architectures. In addition to providing filesystem storage capabilities, it provides tools for management and administration of the IBM Spectrum Scale cluster and allows for shared access to file systems from remote clusters.

History

IBM Spectrum Scale began as the Tiger Shark file system, a research project at IBM's Almaden Research Center as early as 1993. Tiger Shark was initially designed to support high throughput multimedia applications. This design turned out to be well suited to scientific computing.
Another ancestor is IBM's Vesta filesystem, developed as a research project at IBM's Thomas J. Watson Research Center between 1992 and 1995. Vesta introduced the concept of file partitioning to accommodate the needs of parallel applications that run on high-performance multicomputers with parallel I/O subsystems. With partitioning, a file is not a sequence of bytes, but rather multiple disjoint sequences that may be accessed in parallel. The partitioning is such that it abstracts away the number and type of I/O nodes hosting the filesystem, and it allows a variety of logically partitioned views of files, regardless of the physical distribution of data within the I/O nodes. The disjoint sequences are arranged to correspond to individual processes of a parallel application, allowing for improved scalability.
Vesta was commercialized as the PIOFS filesystem around 1994, and was succeeded by GPFS around 1998. The main difference between the older and newer filesystems was that GPFS replaced the specialized interface offered by Vesta/PIOFS with the standard Unix API: all the features to support high performance parallel I/O were hidden from users and implemented under the hood.
Spectrum Scale has been available on IBM's AIX since 1998, on Linux since 2001, and on Windows Server since 2008.
Today it is used by many of the top 500 supercomputers listed on the web site. Since inception, it has been successfully deployed for many commercial applications including digital media, grid analytics, and scalable file services.
In 2010, IBM previewed a version of GPFS that included a capability known as GPFS-SNC, where SNC stands for Shared Nothing Cluster. This was officially released with GPFS 3.5 in December 2012, and is now known as FPO
. This allows it to use locally attached disks on a cluster of network connected servers rather than requiring dedicated servers with shared disks. FPO is suitable for workloads with high data locality such as shared nothing database clusters such as SAP HANA and DB2 DPF, and can be used as a HDFS-compatible filesystem.

Architecture

It is a clustered file system. It breaks a file into blocks of a configured size, less than 1 megabyte each, which are distributed across multiple cluster nodes.
The system stores data on standard block storage volumes, but includes an internal RAID layer that can virtualize those volumes for redundancy and parallel access much like a RAID block storage system. It also has the ability to replicate across volumes at the higher file level.
Features of the architecture include

Distributed metadata, including the directory tree. There is no single "directory controller" or "index server" in charge of the filesystem.
Efficient indexing of directory entries for very large directories.
Distributed locking. This allows for full POSIX filesystem semantics, including locking for exclusive file access.
Partition Aware. A failure of the network may partition the filesystem into two or more groups of nodes that can only see the nodes in their group. This can be detected through a heartbeat protocol, and when a partition occurs, the filesystem remains live for the largest partition formed. This offers a graceful degradation of the filesystem — some machines will remain working.
Filesystem maintenance can be performed online. Most of the filesystem maintenance chores can be performed while the filesystem is live. This ensures the filesystem is available more often, so keeps the supercomputer cluster itself available for longer.

Other features include high availability, ability to be used in a heterogeneous cluster, disaster recovery, security, DMAPI, HSM and ILM.

Compared to Hadoop Distributed File System (HDFS)

's HDFS filesystem, is designed to store similar or greater quantities of data on commodity hardware — that is, datacenters without RAID disks and a Storage Area Network.

HDFS also breaks files up into blocks, and stores them on different filesystem nodes.
IBM Spectrum Scale has full Posix filesystem semantics.
IBM Spectrum Scale distributes its directory indices and other metadata across the filesystem. Hadoop, in contrast, keeps this on the Primary and Secondary Namenodes, large servers which must store all index information in-RAM.
IBM Spectrum Scale breaks files up into small blocks. Hadoop HDFS likes blocks of or more, as this reduces the storage requirements of the Namenode. Small blocks or many small files fill up a filesystem's indices fast, so limit the filesystem's size.
Information lifecycle management

Storage pools allow for the grouping of disks within a file system. An administrator can create tiers of storage by grouping disks based on performance, locality or reliability characteristics. For example, one pool could be high-performance Fibre Channel disks and another more economical SATA storage.
A fileset is a sub-tree of the file system namespace and provides a way to partition the namespace into smaller, more manageable units. Filesets provide an administrative boundary that can be used to set quotas and be specified in a policy to control initial data placement or data migration. Data in a single fileset can reside in one or more storage pools. Where the file data resides and how it is migrated is based on a set of rules in a user defined policy.
There are two types of user defined policies: file placement and file management. File placement policies direct file data as files are created to the appropriate storage pool. File placement rules are selected by attributes such as file name, the user name or the fileset. File management policies allow the file's data to be moved or replicated or files to be deleted. File management policies can be used to move data from one pool to another without changing the file's location in the directory structure. File management policies are determined by file attributes such as last access time, path name or size of the file.
The policy processing engine is scalable and can be run on many nodes at once. This allows management policies to be applied to a single file system with billions of files and complete in a few hours.

General

List of file systems
Shared disk file system
Specific
Alluxio
ASM Cluster File System
BeeGFS
GFS2
Gluster
Google File System
Lustre
MapR FS
MooseFS
OCFS2
QFS
Scale-out File Services – IBM's NAS-grid product using IBM Spectrum Scale
Veritas Cluster Server
ZFS

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

GPFS

History

Architecture

Compared to Hadoop Distributed File System (HDFS)

Information lifecycle management

General

Specific