ZFS

The ZFS file system began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an open source license as OpenSolaris for around 5 years from 2005, before being placed under a closed source license when Oracle Corporation acquired Sun in 2009/2010. During 2005 - 2010, the open source version of ZFS was ported to Linux, Mac OS X and FreeBSD. In 2010, the illumos project forked a recent version of OpenSolaris, to continue its development as an open source project, including ZFS. In 2013 OpenZFS was founded to coordinate the development of open source ZFS. OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in Unix-like systems.

Overview

The management of stored data generally involves two aspects: the physical volume management of one or more block storage devices such as hard drives and SD cards and their organization into logical block devices as seen by the operating system, and the management of data and files that are stored on these logical block devices.
ZFS is unusual because, unlike most other storage systems, it unifies both of these roles and acts as both the volume manager and the file system. Therefore, it has complete knowledge of both the physical disks and volumes. ZFS is designed to ensure that data stored on disks cannot be lost due to physical errors or misprocessing by the hardware or operating system, or bit rot events and data corruption which may happen over time, and its complete control of the storage system is used to ensure that every step, whether related to file management or disk management, is verified, confirmed, corrected if needed, and optimized, in a way that storage controller cards and separate volume and file managers cannot achieve.
ZFS also includes a mechanism for dataset and pool-level snapshots and replication, including snapshot cloning which is described by the FreeBSD documentation as one of its "most powerful features", having features that "even other file systems with snapshot functionality lack". Very large numbers of snapshots can be taken, without degrading performance, allowing snapshots to be used prior to risky system operations and software changes, or an entire production file system to be fully snapshotted several times an hour, in order to mitigate data loss due to user error or malicious activity. Snapshots can be rolled back "live" or previous file system states can be viewed, even on very large file systems, leading to savings in comparison to formal backup and restore processes. Snapshots can also be cloned to form new independent file systems. A pool level snapshot is available which allows rollback of operations that may affect the entire pool's structure, or which add or remove entire datasets.

History

Sun Microsystems (to 2010)

In 1987, AT&T Corporation and Sun announced that they were collaborating on a project to merge the most popular Unix variants on the market at that time: Berkeley Software Distribution, UNIX System V, and Xenix. This became Unix System V Release 4. The project was released under the name Solaris, which became the successor to SunOS 4.
ZFS was designed and implemented by a team at Sun led by Jeff Bonwick, Bill Moore and Matthew Ahrens. It was announced on September 14, 2004, but development started in 2001. Source code for ZFS was integrated into the main trunk of Solaris development on October 31, 2005, and released for developers as part of build 27 of OpenSolaris on November 16, 2005. In June 2006, Sun announced that ZFS was included in the mainstream 6/06 update to Solaris 10.
Historically, Solaris was developed as proprietary software. Sun Microsystems was a strong proponent of open source software. In June 2005, Sun released most of the codebase under the CDDL license, and founded the OpenSolaris open-source project. Sun was an early proponent of open source software, and with OpenSolaris, Sun wanted to build a developer and user community around the software. In Solaris 10 6/06, Sun added the ZFS file system. During the next 5 years, Sun frequently updated ZFS with new features, and ZFS was ported to Linux, Mac OS X and FreeBSD, under this open source license.
The name at one point was said to stand for "Zettabyte File System", but by 2006 was no longer considered to be an abbreviation. A ZFS file system can store up to 256 quadrillion zettabytes.
In September 2007, NetApp sued Sun claiming that ZFS infringed some of NetApp's patents on Write Anywhere File Layout. Sun counter-sued in October the same year claiming the opposite. The lawsuits were ended in 2010 with an undisclosed settlement.

Later development

Ported versions of ZFS began to appear in 2005. After the Sun acquisition by Oracle in 2010, Oracle's version of ZFS became closed source and development of open-source versions proceeded independently, coordinated by OpenZFS from 2013.

Features

Summary

Examples of features specific to ZFS include:

Data integrity

One major feature that distinguishes ZFS from other file systems is that it is designed with a focus on data integrity by protecting the user's data on disk against silent data corruption caused by data degradation, current spikes, bugs in disk firmware, phantom writes, misdirected reads/writes, DMA parity errors between the array and server memory or from the driver, driver errors, accidental overwrites, etc.
A 1999 study showed that neither any of the then-major and widespread filesystems, nor hardware RAID provided sufficient protection against data corruption problems. Initial research indicates that ZFS protects data better than earlier efforts. It is also faster than UFS and can be seen as its replacement.
Within ZFS, data integrity is achieved by using a Fletcher-based checksum or a SHA-256 hash throughout the file system tree. Each block of data is checksummed and the checksum value is then saved in the pointer to that block—rather than at the actual block itself. Next, the block pointer is checksummed, with the value being saved at its pointer. This checksumming continues all the way up the file system's data hierarchy to the root node, which is also checksummed, thus creating a Merkle tree. In-flight data corruption or phantom reads/writes are undetectable by most filesystems as they store the checksum with the data. ZFS stores the checksum of each block in its parent block pointer so the entire pool self-validates.
When a block is accessed, regardless of whether it is data or meta-data, its checksum is calculated and compared with the stored checksum value of what it "should" be. If the checksums match, the data are passed up the programming stack to the process that asked for it; if the values do not match, then ZFS can heal the data if the storage pool provides data redundancy, assuming that the copy of data is undamaged and with matching checksums. It is optionally possible to provide additional in-pool redundancy by specifying , which means that data will be stored twice on the disk, effectively halving the storage capacity of the disk. Additionally some kinds of data used by ZFS to manage the pool are stored multiple times by default for safety, even with the default copies=1 setting.
If other copies of the damaged data exist or can be reconstructed from checksums and parity data, ZFS will use a copy of the data, and recalculate the checksum—ideally resulting in the reproduction of the originally expected value. If the data passes this integrity check, the system can then update all faulty copies with known-good data and redundancy will be restored.
Consistency of data held in memory, such as cached data in the ARC, is not checked by default, as ZFS is expected to run on enterprise-quality hardware with error correcting RAM, but the capability to check in-memory data exists and can be enabled using "debug flags".

RAID ("RaidZ")

For ZFS to be able to guarantee data integrity, it needs multiple copies of the data, usually spread across multiple disks. Typically this is achieved by using either a RAID controller or so-called "soft" RAID.

Avoidance of hardware RAID controllers

While ZFS can work with hardware RAID devices, ZFS will usually work more efficiently and with greater protection of data, if it has raw access to all storage devices, and disks are not connected to the system using a hardware, firmware or other "soft" RAID, or any other controller which modifies the usual ZFS-to-disk I/O path. This is because ZFS relies on the disk for an honest view, to determine the moment data is confirmed as safely written, and it has numerous algorithms designed to optimize its use of caching, cache flushing, and disk handling.
If a third-party device performs caching or presents drives to ZFS as a single system, or without the low level view ZFS relies upon, there is a much greater chance that the system will perform less optimally, and that a failure will not be preventable by ZFS or as quickly or fully recovered by ZFS. For example, if a hardware RAID card is used, ZFS may not be able to determine the condition of disks or whether the RAID array is degraded or rebuilding, it may not know of all data corruption, and it cannot place data optimally across the disks, make selective repairs only, control how repairs are balanced with ongoing use, and may not be able to make repairs even if it could usually do so, as the hardware RAID card will interfere. RAID controllers also usually add controller-dependent data to the drives which prevents software RAID from accessing the user data. While it is possible to read the data with a compatible hardware RAID controller, this isn't always possible, and if the controller card develops a fault then a replacement may not be available, and other cards may not understand the manufacturer's custom data which is needed to manage and restore an array on a new card.
Therefore, unlike most other systems, where RAID cards or similar are used to offload resources and processing and enhance performance and reliability, with ZFS it is strongly recommended these methods not be used as they typically reduce the system's performance and reliability.
If disks must be connected through a RAID or other controller, it is recommended to use a plain HBA or fanout card, or configure the card in JBOD mode, to allow devices to be attached but the ZFS-to-disk I/O pathway to be unchanged. A RAID card in JBOD mode may still interfere, if it has a cache or depending upon its design, and may detach drives that do not respond in time, and as such, may require Time-Limited Error Recovery /CCTL/ERC-enabled drives to prevent drive dropouts, so not all cards are suitable even with RAID functions disabled.

ZFS' approach: [|RAID-Z] and mirroring

Instead of hardware RAID, ZFS employs "soft" RAID, offering RAID-Z and disk mirroring. The schemes are highly flexible.
RAID-Z is a data/parity distribution scheme like RAID-5, but uses dynamic stripe width: every block is its own RAID stripe, regardless of blocksize, resulting in every RAID-Z write being a full-stripe write. This, when combined with the copy-on-write transactional semantics of ZFS, eliminates the write hole error. RAID-Z is also faster than traditional RAID 5 because it does not need to perform the usual read-modify-write sequence.
As all stripes are of different sizes, RAID-Z reconstruction has to traverse the filesystem metadata to determine the actual RAID-Z geometry. This would be impossible if the filesystem and the RAID array were separate products, whereas it becomes feasible when there is an integrated view of the logical and physical structure of the data. Going through the metadata means that ZFS can validate every block against its 256-bit checksum as it goes, whereas traditional RAID products usually cannot do this.
In addition to handling whole-disk failures, RAID-Z can also detect and correct silent data corruption, offering "self-healing data": when reading a RAID-Z block, ZFS compares it against its checksum, and if the data disks did not return the right answer, ZFS reads the parity and then figures out which disk returned bad data. Then, it repairs the damaged data and returns good data to the requestor.
RAID-Z and mirroring do not require any special hardware: they do not need NVRAM for reliability, and they do not need write buffering for good performance or data protection. With RAID-Z, ZFS provides fast, reliable storage using cheap, commodity disks.
There are five different RAID-Z modes: RAID-Z0, RAID-Z1, RAID-Z2, RAID-Z3, and mirroring.
The need for RAID-Z3 arose in the early 2000s as muti-terabyte capacity drives became more common. This increase in capacity—without a corresponding increase in throughput speeds—meant that rebuilding an array due to a failed drive could take "weeks or even months" to complete. During this time, the older disks in the array will be stressed by the additional workload, which could result data corruption or drive failure. By increasing parity, RAID-Z3 reduces the chance of data loss by simply increasing redundancy.

Resilvering and scrub (array syncing and integrity checking)

ZFS has no tool equivalent to fsck. Instead, ZFS has a built-in scrub function which regularly examines all data and repairs silent corruption and other problems. Some differences are:

fsck must be run on an offline filesystem, which means the filesystem must be unmounted and is not usable while being repaired, while scrub is designed to be used on a mounted, live filesystem, and does not need the ZFS filesystem to be taken offline.
fsck usually only checks metadata but never checks the data itself. This means, after an fsck, the data might still not match the original data as stored.
fsck cannot always validate and repair data when checksums are stored with data, because the checksums may also be corrupted or unreadable. ZFS always stores checksums separately from the data they verify, improving reliability and the ability of scrub to repair the volume. ZFS also stores multiple copies of data—metadata, in particular, may have upwards of 4 or 6 copies, greatly improving the ability of scrub to detect and repair extensive damage to the volume, compared to fsck.
scrub checks everything, including metadata and the data. The effect can be observed by comparing fsck to scrub times—sometimes a fsck on a large RAID completes in a few minutes, which means only the metadata was checked. Traversing all metadata and data on a large RAID takes many hours, which is exactly what scrub does.

The official recommendation from Sun/Oracle is to scrub enterprise-level disks once a month, and cheaper commodity disks once a week.

Capacity

ZFS is a 128-bit file system, so it can address 1.84 × 10¹⁹ times more data than 64-bit systems such as Btrfs. The maximum limits of ZFS are designed to be so large that they should never be encountered in practice. For instance, fully populating a single zpool with 2¹²⁸ bits of data would require 3×10²⁴ TB hard disk drives.
Some theoretical limits in ZFS are:

2⁴⁸: number of entries in any individual directory
16 exbibytes : maximum size of a single file
16 exbibytes: maximum size of any attribute
256 quadrillion zebibytes : maximum size of any zpool
2⁵⁶: number of attributes of a file
2⁶⁴: number of devices in any zpool
2⁶⁴: number of zpools in a system
2⁶⁴: number of file systems in a zpool
Encryption

With Oracle Solaris, the encryption capability in ZFS is embedded into the I/O pipeline. During writes, a block may be compressed, encrypted, checksummed and then deduplicated, in that order. The policy for encryption is set at the dataset level when datasets are created. The wrapping keys provided by the user/administrator can be changed at any time without taking the file system offline. The default behaviour is for the wrapping key to be inherited by any child data sets. The data encryption keys are randomly generated at dataset creation time. Only descendant datasets share data encryption keys. A command to switch to a new data encryption key for the clone or at any time is provided—this does not re-encrypt already existing data, instead utilising an encrypted master-key mechanism.
the encryption feature is also fully integrated into OpenZFS 0.8.0 available for Debian and Ubuntu Linux distributions.

Read/write efficiency

ZFS will automatically allocate data storage across all vdevs in a pool in a way that generally maximises the performance of the pool. ZFS will also update its write strategy to take account of new disks added to a pool, when they are added.
As a general rule, ZFS allocates writes across vdevs based on the free space in each vdev. This ensures that vdevs which have proportionately less data already, are given more writes when new data is to be stored. This helps to ensure that as the pool becomes more used, the situation does not develop that some vdevs become full, forcing writes to occur on a limited number of devices. It also means that when data is read, different parts of the data can be read from as many disks as possible at the same time, giving much higher read performance. Therefore, as a general rule, pools and vdevs should be managed and new storage added, so that the situation does not arise that some vdevs in a pool are almost full and others almost empty, as this will make the pool less efficient.

Other features

Storage devices, spares, and quotas

Pools can have hot spares to compensate for failing disks. When mirroring, block devices can be grouped according to physical chassis, so that the filesystem can continue in the case of the failure of an entire chassis.
Storage pool composition is not limited to similar devices, but can consist of ad-hoc, heterogeneous collections of devices, which ZFS seamlessly pools together, subsequently doling out space to as needed. Arbitrary storage device types can be added to existing pools to expand their size.
The storage capacity of all vdevs is available to all of the file system instances in the zpool. A quota can be set to limit the amount of space a file system instance can occupy, and a reservation can be set to guarantee that space will be available to a file system instance.

Caching mechanisms: ARC, L2ARC, Transaction groups, ZIL, SLOG, Special VDEV

ZFS uses different layers of disk cache to speed up read and write operations. Ideally, all data should be stored in RAM, but that is usually too expensive. Therefore, data is automatically cached in a hierarchy to optimize performance versus cost; these are often called "hybrid storage pools". Frequently accessed data will be stored in RAM, and less frequently accessed data can be stored on slower media, such as solid state drives. Data that is not often accessed is not cached and left on the slow hard drives. If old data is suddenly read a lot, ZFS will automatically move it to SSDs or to RAM.
ZFS caching mechanisms include one each for reads and writes, and in each case, two levels of caching can exist, one in computer memory and one on fast storage, for a total of four caches.

	Where stored	Read cache	Write cache
First level cache	In RAM	Known as ARC, due to its use of a variant of the adaptive replacement cache algorithm. RAM will always be used for caching, thus this level is always present. The efficiency of the ARC algorithm means that disks will often not need to be accessed, provided the ARC size is sufficiently large. If RAM is too small there will hardly be any ARC at all; in this case, ZFS always needs to access the underlying disks which impacts performance considerably.	Handled by means of "transaction groups" – writes are collated over a short period up to a given limit, with each group being written to disk ideally while the next group is being collated. This allows writes to be organized more efficiently for the underlying disks at the risk of minor data loss of the most recent transactions upon power interruption or hardware fault. In practice the power loss risk is avoided by ZFS write journaling and by the SLOG/ZIL second tier write cache pool, so writes will only be lost if a write failure happens at the same time as a total loss of the second tier SLOG pool, and then only when settings related to synchronous writing and SLOG use are set in a way that would allow such a situation to arise. If data is received faster than it can be written, data receipt is paused until the disks can catch up.
Second level cache	On fast storage devices	Known as L2ARC, optional. ZFS will cache as much data in L2ARC as it can, which can be tens or hundreds of gigabytes in many cases. L2ARC will also considerably speed up deduplication if the entire deduplication table can be cached in L2ARC. It can take several hours to fully populate the L2ARC from empty. If the L2ARC device is lost, all reads will go out to the disks which slows down performance, but nothing else will happen.	Known as SLOG or ZIL – the terms are often used incorrectly. A SLOG is an optional dedicated cache on a separate device, for recording writes, in the event of a system issue. If an SLOG device exists, it will be used for the ZFS Intent Log as a second level log, and if no separate cache device is provided, the ZIL will be created on the main storage devices instead. The SLOG thus, technically, refers to the dedicated disk to which the ZIL is offloaded, in order to speed up the pool. Strictly speaking, ZFS does not use the SLOG device to cache its disk writes. Rather, it uses SLOG to ensure writes are captured to a permanent storage medium as quickly as possible, so that in the event of power loss or write failure, no data which was acknowledged as written, will be lost. The SLOG device allows ZFS to speedily store writes and quickly report them as written, even for storage devices such as HDDs that are much slower. In the normal course of activity, the SLOG is never referred to or read, and it does not act as a cache; its purpose is to safeguard data in flight during the few seconds taken for collation and "writing out", in case the eventual write were to fail. If all goes well, then the storage pool will be updated at some point within the next 5 to 60 seconds, when the current transaction group is written out to disk, at which point the saved writes on the SLOG will simply be ignored and overwritten. If the write eventually fails, or the system suffers a crash or fault preventing its writing, then ZFS can identify all the writes that it has confirmed were written, by reading back the SLOG, and use this to completely repair the data loss. This becomes crucial if a large number of synchronous writes take place, where the client requires confirmation of successful writing before continuing its activity; the SLOG allows ZFS to confirm writing is successful much more quickly than if it had to write to the main store every time, without the risk involved in misleading the client as to the state of data storage. If there is no SLOG device then part of the main data pool will be used for the same purpose, although this is slower. If the log device itself is lost, it is possible to lose the latest writes, therefore the log device should be mirrored. In earlier versions of ZFS, loss of the log device could result in loss of the entire zpool, although this is no longer the case. Therefore, one should upgrade ZFS if planning to use a separate log device.

A number of other caches, cache divisions, and queues also exist within ZFS. For example, each VDEV has its own data cache, and the ARC cache is divided between data stored by the user and metadata used by ZFS, with control over the balance between these.

Special VDEV Class

In ZFS 0.8 and later, it is possible to configure a Special VDEV class to preferentially store filesystem metadata, and optionally the Data Deduplication Table, and small filesystem blocks. This allows, for example, to create a Special VDEV on fast solid-state storage to store the metadata, while the regular file data is stored on spinning disks. This speeds up metadata-intensive operations such as filesystem traversal, scrub, and resilver, without the expense of storing the entire filesystem on solid-state storage.

Copy-on-write transactional model

ZFS uses a copy-on-write transactional object model. All block pointers within the filesystem contain a 256-bit checksum or 256-bit hash of the target block, which is verified when the block is read. Blocks containing active data are never overwritten in place; instead, a new block is allocated, modified data is written to it, then any metadata blocks referencing it are similarly read, reallocated, and written. To reduce the overhead of this process, multiple updates are grouped into transaction groups, and ZIL write cache is used when synchronous write semantics are required. The blocks are arranged in a tree, as are their checksums.

Snapshots and clones

An advantage of copy-on-write is that, when ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are consistent, and can be created extremely quickly, since all the data composing the snapshot is already stored, with the entire storage pool often snapshotted several times per hour. They are also space efficient, since any unchanged data is shared among the file system and its snapshots. Snapshots are inherently read-only, ensuring they will not be modified after creation, although they should not be relied on as a sole means of backup. Entire snapshots can be restored and also files and directories within snapshots.
Writeable snapshots can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist. This is an implementation of the Copy-on-write principle.

Sending and receiving snapshots

ZFS file systems can be moved to other pools, also on remote hosts over the network, as the send command creates a stream representation of the file system's state. This stream can either describe complete contents of the file system at a given snapshot, or it can be a delta between snapshots. Computing the delta stream is very efficient, and its size depends on the number of blocks changed between the snapshots. This provides an efficient strategy, e.g., for synchronizing offsite backups or high availability mirrors of a pool.

Dynamic striping

Dynamic striping across all devices to maximize throughput means that as additional devices are added to the zpool, the stripe width automatically expands to include them; thus, all disks in a pool are used, which balances the write load across them.

Variable block sizes

ZFS uses variable-sized blocks, with 128 KB as the default size. Available features allow the administrator to tune the maximum block size which is used, as certain workloads do not perform well with large blocks. If data compression is enabled, variable block sizes are used. If a block can be compressed to fit into a smaller block size, the smaller size is used on the disk to use less storage and improve IO throughput.

Lightweight filesystem creation

In ZFS, filesystem manipulation within a storage pool is easier than volume manipulation within a traditional filesystem; the time and effort required to create or expand a ZFS filesystem is closer to that of making a new directory than it is to volume manipulation in some other systems.

Adaptive endianness

Pools and their associated ZFS file systems can be moved between different platform architectures, including systems implementing different byte orders. The ZFS block pointer format stores filesystem metadata in an endian-adaptive way; individual metadata blocks are written with the native byte order of the system writing the block. When reading, if the stored endianness does not match the endianness of the system, the metadata is byte-swapped in memory.
This does not affect the stored data; as is usual in POSIX systems, files appear to applications as simple arrays of bytes, so applications creating and reading data remain responsible for doing so in a way independent of the underlying system's endianness.

Deduplication

capabilities were added to the ZFS source repository at the end of October 2009, and relevant OpenSolaris ZFS development packages have been available since December 3, 2009.
Effective use of deduplication may require large RAM capacity; recommendations range between 1 and 5 GB of RAM for every TB of storage. An accurate assessment of the memory required for deduplication is made by referring to the number of unique blocks in the pool, and the number of bytes on disk and in RAM required to store each record—these figures are reported by inbuilt commands such as zpool and zdb. Insufficient physical memory or lack of ZFS cache can result in virtual memory thrashing when using deduplication, which can cause performance to plummet, or result in complete memory starvation. Because deduplication occurs at write-time, it is also very CPU-intensive and this can also significantly slow down a system.
Other storage vendors use modified versions of ZFS to achieve very high data compression ratios. Two examples in 2012 were GreenBytes and Tegile. In May 2014, Oracle bought GreenBytes for its ZFS deduplication and replication technology.
As described above, deduplication is usually not recommended due to its heavy resource requirements and impact on performance, other than in specific circumstances where the system and data are well-suited to this space-saving technique.

Additional capabilities

Explicit I/O priority with deadline scheduling.
Claimed globally optimal I/O sorting and aggregation.
Multiple independent prefetch streams with automatic length and stride detection.
Parallel, constant-time directory operations.
End-to-end checksumming, using a kind of "Data Integrity Field", allowing data corruption detection. A choice of 3 hashes can be used, optimized for speed, standardization and security and salted hashes.
Transparent filesystem compression. Supports LZJB, gzip and LZ4.
Intelligent scrubbing and resilvering.
Load and space usage sharing among disks in the pool.
Ditto blocks: Configurable data replication per filesystem, with zero, one or two extra copies requested per write for user data, and with that same base number of copies plus one or two for metadata. If the pool has several devices, ZFS tries to replicate over different devices. Ditto blocks are primarily an additional protection against corrupted sectors, not against total disk failure.
ZFS design is safe when using disks with write cache enabled, if they honor the write barriers. This feature provides safety and a performance boost compared with some other filesystems.
On Solaris, when entire disks are added to a ZFS pool, ZFS automatically enables their write cache. This is not done when ZFS only manages discrete slices of the disk, since it does not know if other slices are managed by non-write-cache safe filesystems, like UFS. The FreeBSD implementation can handle disk flushes for partitions thanks to its GEOM framework, and therefore does not suffer from this limitation.
Per-user, per-group, per-project, and per-dataset quota limits.
Filesystem encryption since Solaris 11 Express.
Pools can be imported in read-only mode.
It is possible to recover data by rolling back entire transactions at the time of importing the zpool.
ZFS is not a clustered filesystem; however, clustered ZFS is available from third parties.
Snapshots can be taken manually or automatically. The older versions of the stored data that they contain can be exposed as full read-only file systems. They can also be exposed as historic versions of files and folders when used with CIFS ; this is known as "Previous versions", "VSS shadow copies", or "File history" on Windows, or AFP and "Apple Time Machine" on Apple devices.
Disks can be marked as 'spare'. A data pool can be set to automatically and transparently handle disk faults by activating a spare disk and beginning to resilver the data that was on the suspect disk onto it, when needed.
Limitations

There are several limiations in the ZFS filesystem.

Limitations in preventing data corruption

The authors of a 2010 study that examined the ability of file systems to detect and prevent data corruption, with particular focus on ZFS, observed that ZFS itself is effective in detecting and correcting data errors on storage devices, but that it assumes data in RAM is "safe", and not prone to error. The study comments that "a single bit flip in memory causes a small but non-negligible percentage of runs to experience a failure", with the probability of committing bad data to disk varying from 0% to 3.6%," and that when ZFS caches pages or stores copies of metadata in RAM, or holds data in its "dirty" cache for writing to disk, no test is made whether the checksums still match the data at the point of use. Much of this risk can be mitigated in one of two ways:

Other limitations specific to ZFS

Capacity expansion is normally achieved by adding groups of disks as a top-level vdev: simple device, RAID-Z, RAID Z2, RAID Z3, or mirrored. Newly written data will dynamically start to use all available vdevs. It is also possible to expand the array by iteratively swapping each drive in the array with a bigger drive and waiting for ZFS to self-heal; the heal time will depend on the amount of stored information, not the disk size.
As of Solaris 10 Update 11 and Solaris 11.2, it was neither possible to reduce the number of top-level vdevs in a pool, nor to otherwise reduce pool capacity. This functionality was said to be in development in 2007. Enhancements to allow reduction of vdevs is under development in OpenZFS.
it was not possible to add a disk as a column to a RAID Z, RAID Z2 or RAID Z3 vdev. However, a new RAID Z vdev can be created instead and added to the zpool.
Some traditional nested RAID configurations, such as RAID 51, are not configurable in ZFS. Vdevs can only be composed of raw disks or files, not other vdevs. However, a ZFS pool effectively creates a stripe across its vdevs, so the equivalent of a RAID 50 or RAID 60 is common.
Reconfiguring the number of devices in a top-level vdev requires copying data offline, destroying the pool, and recreating the pool with the new top-level vdev configuration, except for adding extra redundancy to an existing mirror, which can be done at any time or if all top level vdevs are mirrors with sufficient redundancy the zpool split command can be used to remove a vdev from each top level vdev in the pool, creating a 2nd pool with identical data.
IOPS performance of a ZFS storage pool can suffer if the ZFS raid is not appropriately configured. This applies to all types of RAID, in one way or another. If the zpool consists of only one group of disks configured as, say, eight disks in RAID Z2, then the IOPS performance will be that of a single disk. However, there are ways to mitigate this IOPS performance problem, for instance add SSDs as L2ARC cache—which can boost IOPS into 100.000s. In short, a zpool should consist of several groups of vdevs, each vdev consisting of 8–12 disks, if using RAID Z. It is not recommended to create a zpool with a single large vdev, say 20 disks, because IOPS performance will be that of a single disk, which also means that resilver time will be very long.
Online shrink zpool remove was not supported until Solaris 11.4 released in August 2018
Resilver of a crashed disk in a ZFS RAID can take a long time which is not unique to ZFS, it applies to all types of RAID, in one way or another. This means that very large volumes can take several days to repair or to being back to full redundancy after severe data corruption or failure, and during this time a second disk failure may occur, especially as the repair puts additional stress on the system as a whole. In turn this means that configurations that only allow for recovery of a single disk failure, such as RAID Z1 should be avoided. Therefore, with large disks, one should use RAID Z2 or RAID Z3. ZFS RAID differs from conventional RAID by only reconstructing live data and metadata when replacing a disk, not the entirety of the disk including blank and garbage blocks, which means that replacing a member disk on a ZFS pool that is only partially full will take proportionally less time compared to conventional RAID.
Data recovery

Historically, ZFS has not shipped with tools such as fsck to repair damaged file systems, because the file system itself was designed to self-repair, so long as it had been built with sufficient attention to the design of storage and redundancy of data. If the pool was compromised because of poor hardware, inadequate design or redundancy, or unfortunate mishap, to the point that ZFS was unable to mount the pool, traditionally there were no tools which allowed an end-user to attempt partial salvage of the stored data. This led to threads in online forums where ZFS developers sometimes tried to provide ad-hoc help to home and other small scale users, facing loss of data due to their inadequate design or poor system management.
Modern ZFS has improved considerably on this situation over time, and continues to do so:

OpenZFS and ZFS

ceased the public development of both ZFS and OpenSolaris after the aquisition of Sun in 2010. Some developers forked the last public release of OpenSolaris as the Illumos project. Because of the significant advantages present in ZFS, it has been ported to several different platforms with different features and commands. For coordinating the development efforts and to avoid fragmentation, OpenZFS was founded in 2013.
According to Matt Ahrens, one of the main architects of ZFS, over 50% of the original OpenSolaris ZFS code has been replaced in OpenZFS with community contributions as of 2019, making “Oracle ZFS” and “OpenZFS” politically and technologically incompatible.

Commercial and open source products

2008: Sun shipped a line of ZFS-based 7000-series storage appliances.
2013: Oracle shipped ZS3 series of ZFS-based filers and seized first place in the SPC-2 benchmark with one of them.
2013: iXsystems ships ZFS-based NAS devices called FreeNAS for SOHO and TrueNAS for the enterprise.
2014: Netgear ships a line of ZFS-based NAS devices called ReadyDATA, designed to be used in the enterprise.
2015: rsync.net announces a cloud storage platform that allows customers to provision their own zpool and import and export data using zfs send and zfs receive.
Oracle Corporation, close source, and forking (from 2010)

In January 2010, Oracle Corporation acquired Sun Microsystems, and quickly discontinued the OpenSolaris distribution and the open source development model. In August 2010, Oracle discontinued providing public updates to the source code of the Solaris kernel, effectively turning Solaris 11 back into a closed source proprietary operating system.
In response to the changing landscape of Solaris and OpenSolaris, the illumos project was launched via webinar on Thursday, August 3, 2010, as a community effort of some core Solaris engineers to continue developing the open source version of Solaris, and complete the open sourcing of those parts not already open sourced by Sun. illumos was founded as a Foundation, the illumos Foundation, incorporated in the State of California as a 5016 trade association. The original plan explicitly stated that illumos would not be a distribution or a fork. However, after Oracle announced discontinuing OpenSolaris, plans were made to fork the final version of the Solaris ON kernel allowing illumos to evolve into a kernel of its own. As part of OpenSolaris, an open source version of ZFS was therefore integral within illumos.
ZFS was widely used within numerous platforms, as well as Solaris. Therefore, in 2013, the co-ordination of development work on the open source version of ZFS was passed to an umbrella project, OpenZFS. The OpenZFS framework allows any interested parties to collaboratively develop the core ZFS codebase in common, while individually maintaining any specific extra code which ZFS requires to function and integrate within their own systems.

Version history

ZFS Filesystem Version Number	Release date	Significant changes
	OpenSolaris Nevada build 36	First release
	OpenSolaris Nevada b69	Enhanced directory entries. In particular, directory entries now store the object type. For example, file, directory, named pipe, and so on, in addition to the object number.
	OpenSolaris Nevada b77	Support for sharing ZFS file systems over SMB. Case insensitivity support. System attribute support. Integrated anti-virus support.
	OpenSolaris Nevada b114	Properties: userquota, groupquota, userused and groupused
	OpenSolaris Nevada b137	System attributes; symlinks now their own object type

ZFS Pool Version Number	Release date	Significant changes
	OpenSolaris Nevada b36	First release
	OpenSolaris Nevada b38	Ditto Blocks
	OpenSolaris Nevada b42	Hot spares, double-parity RAID-Z, improved RAID-Z accounting
	OpenSolaris Nevada b62	zpool history
	OpenSolaris Nevada b62	gzip compression for ZFS datasets
	OpenSolaris Nevada b62	"bootfs" pool property
	OpenSolaris Nevada b68	ZIL: adds the capability to specify a separate Intent Log device or devices
	OpenSolaris Nevada b69	ability to delegate zfs administrative tasks to ordinary users
	OpenSolaris Nevada b77	CIFS server support, dataset quotas
	OpenSolaris Nevada b77	Devices can be added to a storage pool as "cache devices"
	OpenSolaris Nevada b94	Improved zpool scrub / resilver performance
	OpenSolaris Nevada b96	Snapshot properties
	OpenSolaris Nevada b98	Properties: usedbysnapshots, usedbychildren, usedbyrefreservation, and usedbydataset
	OpenSolaris Nevada b103	passthrough-x aclinherit property support
	OpenSolaris Nevada b114	Properties: userquota, groupquota, usuerused and groupused; also required FS v4
	OpenSolaris Nevada b116	STMF property support
	OpenSolaris Nevada b120	triple-parity RAID-Z
	OpenSolaris Nevada b121	ZFS snapshot holds
	OpenSolaris Nevada b125	ZFS log device removal
	OpenSolaris Nevada b128	zle compression algorithm that is needed to support the ZFS deduplication properties in ZFS pool version 21, which were released concurrently
	OpenSolaris Nevada b128	Deduplication
	OpenSolaris Nevada b128	zfs receive properties
	OpenSolaris Nevada b135	slim ZIL
	OpenSolaris Nevada b137	System attributes. Symlinks now their own object type. Also requires FS v5.
	OpenSolaris Nevada b140	Improved pool scrubbing and resilvering statistics
	OpenSolaris Nevada b141	Improved snapshot deletion performance
	OpenSolaris Nevada b145	Improved snapshot creation performance
	OpenSolaris Nevada b147	Multiple virtual device replacements

Note: The Solaris version under development by Sun since the release of Solaris 10 in 2005 was codenamed 'Nevada', and was derived from what was the OpenSolaris codebase. 'Solaris Nevada' is the codename for the next-generation Solaris OS to eventually succeed Solaris 10 and this new code was then pulled successively into new OpenSolaris 'Nevada' snapshot builds. OpenSolaris is now discontinued and OpenIndiana forked from it.
A final build of OpenSolaris was published by Oracle as an upgrade path to Solaris 11 Express.

List of operating systems supporting ZFS

List of Operating Systems, distributions and add-ons that support ZFS, the zpool version it supports, and the Solaris build they are based on :

OS	Zpool version	Sun/Oracle Build #	Comments
Oracle Solaris 11.3	37	0.5.11-0.175.3.1.0.5.0
Oracle Solaris 10 1/13	32
Oracle Solaris 11.2	35	0.5.11-0.175.2.0.0.42.0
Oracle Solaris 11 2011.11	34	b175
Oracle Solaris Express 11 2010.11	31	b151a	licensed for testing only
OpenSolaris 2009.06	14	b111b
OpenSolaris	22	b134
OpenIndiana	5000	b147	distribution based on illumos; creates a name clash naming their build code 'b151a'
Nexenta Core 3.0.1	26	b134+	GNU userland
NexentaStor Community 3.0.1	26	b134+	up to 18 TB, web admin
NexentaStor Community 3.1.0	28	b134+	GNU userland
NexentaStor Community 4.0	5000	b134+	up to 18 TB, web admin
NexentaStor Enterprise	28	b134 +	not free, web admin
GNU/kFreeBSD "Squeeze"	14		Requires package "zfsutils"
GNU/kFreeBSD "Wheezy-9"	28		Requires package "zfsutils"
FreeBSD	5000
zfs-fuse 0.7.2	23		suffered from performance issues; defunct
ZFS on Linux 0.6.5.8	5000		0.6.0 release candidate has POSIX layer
KQ Infotech's ZFS on Linux	28		defunct; code integrated into LLNL-supported ZFS on Linux
BeleniX 0.8b1	14	b111	small-size live-CD distribution; once based on OpenSolaris
Schillix 0.7.2	28	b147	small-size live-CD distribution; as 0.8.0 based on OpenSolaris
StormOS "hail"			distribution once based on Nexenta Core 2.0+, Debian Linux; superseded by Dyson OS
Jaris			Japanese Solaris distribution; once based on OpenSolaris
MilaX 0.5	20	b128a	small-size live-CD distribution; once based on OpenSolaris
FreeNAS 8.0.2 / 8.2	15
FreeNAS 8.3.0	28		based on FreeBSD 8.3
FreeNAS 9.1.0	5000		based on FreeBSD 9.1
NAS4Free 10.2.0.2/10.3.0.3	5000		based on FreeBSD 10.2/10.3
Korona 4.5.0	22	b134	KDE
EON NAS	22	b130	embedded NAS
EON NAS	28	b151a	embedded NAS
	28/5000	Illumos/Solaris	Storage appliance; OpenIndiana, OmniOS, Solaris 11, Linux
	28/5000	illumos-OmniOS branch	minimal stable/LTS storage server distribution based on Illumos, community driven
SmartOS	28/5000	Illumos b151+	minimal live distribution based on Illumos ; cloud and hypervisor use
macOS 10.5, 10.6, 10.7, 10.8, 10.9	5000		via MacZFS; by
macOS 10.6, 10.7, 10.8	28		via ZEVO; by
NetBSD	22
MidnightBSD	6
Proxmox VE	5000		native support since 2014,
Ubuntu Linux 16.04 LTS, 18.04 LTS, 18.10, 19.10, 20.04 LTS	5000		,
ZFSGuru 10.1.100	5000

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

ZFS