AV1


AOMedia Video 1 is an open, royalty-free video coding format designed for video transmissions over the Internet. It was developed as a successor to VP9 by the Alliance for Open Media, a consortium founded in 2015 that includes semiconductor firms, video on demand providers, video content producers, software development companies and web browser vendors. The AV1 bitstream specification includes a reference video codec. In 2018 Facebook testing that approximates real world conditions, AV1 achieved 34%, 46.2% and 50.3% higher data compression than libvpx-vp9, x264 high profile, and x264 main profile respectively.
AV1 has a royalty-free licensing model, unlike existing technologies such as HEVC that require royalty payments and prevent widespread adoption in open-source projects. The work of Google and Mozilla on royalty-free video is not attributed to HEVC's licensing woes, although it is seen as a problem by Mozilla.

History

The Alliance's motivations for creating AV1 included the high cost and uncertainty involved with the patent licensing of HEVC, the MPEG-designed codec expected to succeed AVC. Additionally, the Alliance's seven founding members – Amazon, Cisco, Google, Intel, Microsoft, Mozilla and Netflix – announced that the initial focus of the video format would be delivery of high-quality web video. The official announcement of AV1 came with the press release on the formation of the Alliance for Open Media on 1 September 2015. Only 42 days before, on 21 July 2015, HEVC Advance's initial licensing offer was announced to be an increase over the royalty fees of its predecessor, AVC. In addition to the increased cost, the complexity of the licensing process increased with HEVC. Unlike previous MPEG standards where the technology in the standard could be licensed from a single entity, MPEG-LA, when the HEVC standard was finished, two patent pools had been formed with a third pool was on the horizon. In addition, various patent holders were refusing to license patents via either pool, increasing uncertainty about HEVC's licensing. According to Microsoft's Ian LeGrow, an open-source, royalty-free technology was seen as the easiest way to eliminate this uncertainty around licensing.
The negative effect of patent licensing on free and open-source software has also been cited as a reason for the creation of AV1. For example, building an H.264 implementation into Firefox would prevent it from being distributed free of charge since licensing fees would have to be paid to MPEG-LA. Free Software Foundation Europe has argued that FRAND patent licensing practices make the free software implementation of standards impossible due to various incompatibilities with free software licenses.
Many of the components of the AV1 project were sourced from previous research efforts by Alliance members. Individual contributors started experimental technology platforms years before: Xiph's/Mozilla's Daala already published code in 2010, Google's experimental VP9 evolution project VP10 was announced on 12 September 2014, and Cisco's Thor was published on 11 August 2015. Building on the codebase of VP9, AV1 incorporates additional techniques, several of which were developed in these experimental formats.
The first version 0.1.0 of the AV1 reference codec was published on 7 April 2016.
Although a soft feature freeze came into effect at the end of October 2017, development continued on several significant features. One of these in-progress features, the bitstream format, was projected to be frozen in January 2018 but was delayed due to unresolved critical bugs as well as further changes to transformations, syntax, the prediction of motion vectors, and the completion of legal analysis. The Alliance announced the release of the AV1 bitstream specification on 28 March 2018, along with a reference, software-based encoder and decoder. On 25 June 2018, a validated version 1.0.0 of the specification was released. On 8 January 2019 a validated version 1.0.0 with Errata 1 of the specification was released.
Martin Smole from AOM member Bitmovin said that the computational efficiency of the reference encoder was the greatest remaining challenge after the bitstream format freeze had been completed. While still working on the format, the encoder was not targeted for production use and speed optimizations were not prioritized. Consequently, the early version of AV1 was orders of magnitude slower than existing HEVC encoders. Much of the development effort was consequently shifted towards maturing the reference encoder. In March 2019, it was reported that the speed of the reference encoder had improved greatly and within the same order of magnitude as encoders for other common formats.

Purpose

AV1 aims to be a video format for the web that is both state of the art and royalty free. The mission of the Alliance for Open Media is the same as that of the WebM project.
A recurring concern in standards development, not least of royalty-free multimedia formats, is the danger of accidentally infringing on patents that their creators and users didn't know about. The concern has been raised regarding AV1, and previously VP8, VP9, Theora and IVC. The problem is not unique to royalty-free formats, but it uniquely threatens their status as royalty-free.
To fulfill the goal of being royalty free, the development process requires that no feature can be adopted before it has been confirmed independently by two separate parties to not infringe on patents of competing companies. In cases where an alternative to a patent-protected technique is not available, owners of relevant patents have been invited to join the Alliance. For example, Alliance members Apple, Cisco, Google, and Microsoft are also licensors in MPEG-LA's patent pool for H.264. As an additional protection for the royalty-free status of AV1, the Alliance has a legal defense fund to aid smaller Alliance members or AV1 licensees in the event they are sued for alleged patent infringement.
Under patent rules adopted from the World Wide Web Consortium, technology contributors license their AV1-connected patents to anyone, anywhere, anytime based on reciprocity. As a defensive condition, anyone engaging in patent litigation loses the right to the patents of all patent holders.
This treatment of intellectual property rights, and its absolute priority during development, is contrary to extant MPEG formats like AVC and HEVC. These were developed under an IPR uninvolvement policy by their standardization organisations, as stipulated in the ITU-T's definition of an open standard. However, MPEG's chairman has argued this practice has to change, which it is: EVC is also set to have a royalty-free subset, and will have switchable features in its bitstream to defend against future IPR threats.
The creation of royalty-free web standards has been a long-stated pursuit for the industry. In 2007, the proposal for HTML5 video specified Theora as mandatory to implement. The reason was that public content should be encoded in freely implementable formats, if only as a "baseline format", and that changing such a baseline format later would be hard because of network effects.
The Alliance for Open Media is a continuation of Google's efforts with the WebM project, which renewed the royalty-free competition after Theora had been surpassed by AVC. For companies such as Mozilla that distribute free software, AVC can be difficult to support as a per-copy royalty easily is unsustainable given the lack of revenue stream to support these payments in free software. Similarly, HEVC has not successfully convinced all licensors to allow an exception for freely distributed software.
The performance goals include "a step up from VP9 and HEVC" in efficiency for a low increase in complexity. NETVC's efficiency goal is 25% improvement over HEVC. The primary complexity concern is for software decoding, since hardware support will take time to reach users. However, for WebRTC, live encoding performance is also relevant, which is Cisco's agenda: Cisco is a manufacturer of videoconferencing equipment, and their Thor contributions aim at "reasonable compression at only moderate complexity".
Feature wise, AV1 is specifically designed for real-time applications and higher resolutions than typical usage scenarios of the current generation of video formats where it is expected to achieve its biggest efficiency gains. It is therefore planned to support the color space from ITU-R Recommendation BT.2020 and up to 12 bits of precision per color component. AV1 is primarily intended for lossy encoding, although lossless compression is supported as well.

Technology

AV1 is a traditional block-based frequency transform format featuring new techniques. Based on Google's VP9, AV1 incorporates additional techniques that mainly give encoders more coding options to enable better adaptation to different types of input.
The Alliance published a reference implementation written in C and assembly language as free software under the terms of the BSD 2-Clause License. Development happens in public and is open for contributions, regardless of AOM membership.
The development process was such that coding tools were added to the reference codebase as experiments, controlled by flags that enable or disable them at build time, for review by other group members as well as specialized teams that helped with and ensured hardware friendliness and compliance with intellectual property rights. When the feature gained some support in the community, the experiment was enabled by default, and ultimately had its flag removed when all of the reviews were passed. Experiment names were lowercased in the configure script and uppercased in conditional compilation flags.
To better and more reliably support HDR and color spaces, corresponding metadata can now be integrated into the video bitstream instead of being signaled in the container.

Partitioning

Frame content is separated into adjacent same-sized blocks referred to as superblocks. Similar to the concept of a macroblock, superblocks are square-shaped and can either be of size 128×128 or 64×64 pixels. Superblocks can be divided in smaller blocks according to different partitioning patterns. The four-way split pattern is the only pattern whose partitions can be recursively subdivided. This allows superblocks to be divided into partitions as small as 4×4 pixels.
"T-shaped" partitioning patterns are introduced, a feature developed for VP10, as well as horizontal or vertical splits into four stripes of 4:1 and 1:4 aspect ratio. The available partitioning patterns vary according to the block size, both 128×128 and 8×8 blocks can't use 4:1 and 1:4 splits. Moreover, 8×8 blocks can't use "T" shaped splits.
Two separate predictions can now be used on spatially different parts of a block using a smooth, oblique transition line. This enables more accurate separation of objects without the traditional staircase lines along the boundaries of square blocks.
More encoder parallelism is possible thanks to configurable prediction dependency between tile rows.

Prediction

AV1 performs internal processing in higher precision, which leads to compression improvement due to smaller rounding errors in reference imagery.
Predictions can be combined in more advanced ways in a block, including smooth and sharp transition gradients in different directions as well as implicit masks that are based on the difference between the two predictors. This allows combination of either two inter predictions or an inter and an intra prediction to be used in the same block.
A frame can reference 6 instead of 3 of the 8 available frame buffers for temporal prediction while providing more flexibility on bi-prediction.
The Warped Motion and Global Motion tools in AV1 aim to reduce redundant information in motion vectors by recognizing patterns arising from camera motion. They implement ideas that were tried to be exploited in preceding formats like e.g. MPEG-4 ASP, albeit with a novel approach that works in three dimensions. There can be a set of warping parameters for a whole frame offered in the bitstream, or blocks can use a set of implicit local parameters that get computed based on surrounding blocks.
Switch frames are a new inter-frame type that can be predicted using already decoded reference frames from a higher-resolution version of the same video to allow switching to a lower resolution without the need for a full keyframe at the beginning of a video segment in the adaptive bitrate streaming use case.

Intra Prediction

consists of predicting the pixels of a given blocks only using information available in the current frame. Most often, intra predictions are built from the neighboring pixels above and to the left of the predicted block. The DC predictor builds a prediction by averaging the pixels above and to the left of block.
Directional predictors extrapolate these neighboring pixels according to a specified angle. In AV1, 8 main directional modes can be chosen. These modes start at an angle of 45 degrees and increase by a step size of 22.5 degrees up until 203 degrees. Furthermore, for each directional mode, six offsets of 3 degree can be signalled for bigger blocks, three above the main angle and three below it, resulting in a total of 56 angles.
The "TrueMotion" predictor got replaced with a Paeth predictor which looks at the difference from the known pixel in the above left corner to the pixel directly above and directly left of the new one and then chooses the one that lies in direction of the smaller gradient as predictor. A palette predictor is available for blocks with very few colors like in some computer screen content. Correlations between the luminosity and the color information can now be exploited with a predictor for chroma blocks that is based on samples from the luma plane. In order to reduce visible boundaries along borders of inter-predicted blocks, a technique called overlapped block motion compensation can be used. This involves extending a block's size so that it overlaps with neighboring blocks by 2 to 32 pixels, and blending the overlapping parts together.

Data transformation

To transform the error remaining after prediction to the frequency domain, AV1 encoders can use square, 2:1/1:2, and 4:1/1:4 rectangular DCTs, as well as an asymmetric DST for blocks where the top and/or left edge is expected to have lower error thanks to prediction from nearby pixels, or choose to do no transform.
It can combine two one-dimensional transforms in order to use different transforms for the horizontal and the vertical dimension.

Quantization

AV1 has new optimized quantization matrices. The eight sets of quantization parameters that can be selected and signaled for each frame now have individual parameters for the two chroma planes and can use spatial prediction. On every new superblock, the quantization parameters can be adjusted by signaling an offset.

Filters

For the in-loop filtering step, the integration of Thor's constrained low-pass filter and Daala's directional deringing filter has been fruitful: The combined Constrained Directional Enhancement Filter exceeds the results of using the original filters separately or together.
It is an edge-directed conditional replacement filter that smoothes blocks with configurable strength roughly along the direction of the dominant edge to eliminate ringing artifacts.
There is also the loop restoration filter based on the Wiener filter and self-guided restoration filters to remove blur artifacts due to block processing.
Film grain synthesis improves coding of noisy signals using a parametric video coding approach.
Due to the randomness inherent to film grain noise, this signal component is traditionally either very expensive to code or prone to get damaged or lost, possibly leaving serious coding artefacts as residue. This tool circumvents these problems using analysis and synthesis, replacing parts of the signal with a visually similar synthetic texture, based solely on subjective visual impression instead of objective similarity. It removes the grain component from the signal, analyzes its non-random characteristics, and instead transmits only descriptive parameters to the decoder, which adds back a synthetic, pseudorandom noise signal that's shaped after the original component. It is the visual equivalent of the Perceptual Noise Substitution technique used in AC3, AAC, Vorbis, and Opus audio codecs.

Entropy coding

Daala's entropy coder, a non-binary arithmetic coder, was selected for replacing VP9's binary entropy coder. The use of non-binary arithmetic coding helps evade patents, but also adds bit-level parallelism to an otherwise serial process, reducing clock rate demands on hardware implementations. This is to say that the effectiveness of modern binary arithmetic coding like CABAC is being approached using a greater alphabet than binary, hence greater speed, as in Huffman code.
AV1 also gained the ability to adapt the symbol probabilities in the arithmetic coder per coded symbol instead of per frame.

Quality and efficiency

A first comparison from the beginning of June 2016 found AV1 roughly on par with HEVC, as did one using code from late January 2017.
In April 2017, using the 8 enabled experimental features at the time, Bitmovin was able to demonstrate favorable objective metrics, as well as visual results, compared to HEVC on the Sintel and Tears of Steel animated films. A follow-up comparison by Jan Ozer of Streaming Media Magazine confirmed this, and concluded that "AV1 is at least as good as HEVC now". Ozer noted that his and Bitmovin's results contradicted a comparison by Fraunhofer Institute for Telecommunications from late 2016 that had found AV1 38.4% less efficient than HEVC, underperforming even H.264/AVC, and justified this discrepancy by having used encoding parameters endorsed by each encoder vendor, as well as having more features in the newer AV1 encoder. Decoding performance was at about half the speed of VP9 according to internal measurements from 2017.
Tests from Netflix in 2017, based on measurements with PSNR and VMAF at 720p, showed that AV1 was about 25% more efficient than VP9. Tests from Facebook In 2018, based on PSNR, showed that AV1 was able to achieve 34%, 46.2% and 50.3% higher data compression than libvpx-vp9, x264 high profile, and x264 main profile respectively.
Tests from Moscow State University in 2017 found that VP9 required 31% and HEVC 22% more bitrate than AV1 in order to achieve similar levels of quality. The researchers found that the used AV1 encoder was operating at a speed "2500–3500 times lower than competitors", while admitting that it has not been optimized yet.

Profiles and levels

Profiles

AV1 defines three profiles for decoders which are Main, High, and Professional. The Main profile allows for a bit depth of 8- or 10-bits per sample with 4:0:0 and 4:2:0 chroma sampling. The High profile further adds support for 4:4:4 chroma sampling. The Professional profile extends capabilities to full support for 4:0:0, 4:2:0, 4:2:2 and 4:4:4 chroma sub-sampling with 8, 10 and 12 bit color depths.

Levels

AV1 defines levels for decoders with maximum variables for levels ranging from 2.0 to 6.3. The levels that can be implemented depend on the hardware capability.
Example resolutions would be 426×240@30fps for level 2.0, 854×480@30fps for level 3.0, 1920×1080@30fps for level 4.0, 3840×2160@60fps for level 5.1, 3840×2160@120fps for level 5.2, and 7680×4320@120fps for level 6.2. Level 7 has not been defined yet.
LevelMaxPicSize
MaxHSize
MaxVSize
MaxDisplayRate
MaxDecodeRate
MaxHeader
Rate
MainMbps
HighMbps
Min Comp BasisMax TilesMax Tile ColsExample
2.0147456204811524,423,6805,529,6001501.5-284426×240@30fps
2.1278784281615848,363,52010,454,4001503.0-284640×360@30fps
3.06658564352244819,975,68024,969,6001506.0-2166854×480@30fps
3.110650245504309631,950,72039,938,40015010.0-21661280×720@30fps
4.023592966144345670,778,88077,856,76830012.030.043281920×1080@30fps
4.1235929661443456141,557,760155,713,53630020.050.043281920×1080@60fps
5.0891289681924352267,386,880273,715,20030030.0100.066483840×2160@30fps
5.1891289681924352534,773,760547,430,40030040.0160.086483840×2160@60fps
5.28912896819243521,069,547,5201,094,860,80030060.0240.086483840×2160@120fps
5.38912896819243521,069,547,5201,176,502,27230060.0240.086483840×2160@120fps
6.0356515841638487041,069,547,5201,176,502,27230060.0240.08128167680×4320@30fps
6.1356515841638487042,139,095,0402,189,721,600300100.0480.08128167680×4320@60fps
6.2356515841638487044,278,190,0804,379,443,200300160.0800.08128167680×4320@120fps
6.3356515841638487044,278,190,0804,706,009,088300160.0800.08128167680×4320@120fps

Supported container formats

; Standardized
; Unfinished standards
; Not standardized

Adoption

Content providers

has begun rolling out AV1, starting with its . According to the description, the videos are encoded at high bitrate to test decoding performance, and YouTube has "ambitious goals" for rolling out AV1. YouTube for Android TV supports playback of videos encoded in AV1 on capable platforms as of version 2.10.13, released in early 2020.
Vimeo's videos in the "Staff picks" channel are available in AV1. Vimeo is using and contributing to Mozilla's Rav1e encoder, and expects, with further encoder improvements, to eventually provide AV1 support for all videos uploaded to Vimeo as well as the company's "Live" offering.
In October 2016, Netflix stated they expected to be an early adopter of AV1. On February 5, 2020, Netflix began using AV1 to stream select titles on Android, providing 20% improved compression efficiency over their VP9 streams.
Following their own very positive test results, Facebook said they would gradually roll out AV1 as soon as browser support emerges, starting with their most popular videos.
Twitch plans to roll out AV1 for its most popular content in 2022 or 2023, with universal support projected to arrive in 2024 or 2025.
On April 30, 2020 iQIYI announced support for AV1 for users on PC web browsers and Android devices, becoming "the first and the only Chinese video streaming site to adopt the AV1 format to date."

Software implementations

Several other parties have announced to be working on encoders, including EVE for AV1, NGCodec, Socionext, Aurora and MilliCast.

Software support

Hardware

Several Alliance members demonstrated AV1 enabled products at IBC 2018, including Socionext's hardware accelerated encoder. According to Socionext, the encoding accelerator is FPGA based and can run on an Amazon EC2 F1 cloud instance, where it runs 10 times faster than existing software encoders.
According to Mukund Srinivasan, chief business officer of AOM member Ittiam, early hardware support will be dominated by software running on non-CPU hardware, as fixed-function hardware will take 12–18 months after bitstream freeze until chips are available, plus 6 months for products based on those chips to hit the market. The bitstream was finally frozen on 28 March 2018, meaning chips could be available sometime between March and August 2019. According to the above forecast, products based on chips could then be on the market at the end of 2019 or the beginning of 2020.
On January 7, 2019, NGCodec announced AV1 support for NGCodec accelerated with Xilinx FPGAs.
On April 18, 2019, Allegro DVT announced its AL-E210 multi-format video encoder hardware IP, the first publicly announced hardware AV1 encoder. The AL-E210 supports, aside from VP9, H.265/HEVC, H.264/AVC and JPEG, the AV1 Main profile, with which it can encode 4:2:0 Chroma subsampling with 8 and 10 bit color depth. A single core can encode 4K with 30 fps, with multiple cores that should even be higher.
On April 23, 2019, Rockchip announced their RK3588 SoC which features AV1 hardware decoding up to 4K 60fps at 10 bit color depth.
On May 9, 2019, Amphion announced a video decoder with AV1 support up to 4K 60fps On May 28, 2019, Realtek announced the RTD2893, its first integrated circuit with AV1 decoding, up to 8K.
On June 17, 2019, Realtek announced the RTD1311 SoC for set-top boxes with an integrated AV1 decoder.
On October 20, 2019, a roadmap from Amlogic was shown which includes 3 set-top box SoCs that are able to decode AV1 content, the S805X2, S905X4 and S908X. The S905X4 was used in the SDMC DV8919 by December.
On October 21, 2019, Chips&Media announced the WAVE510A VPU supporting decoding AV1 at up to 4Kp120.
On November 26, 2019, MediaTek announced world's first smartphone SoC with an integrated AV1 decoder. The Dimensity 1000 is able to decode AV1 content up to 4K 60fps.
On January 3, 2020, LG Electronics announced that its 2020 8K TVs, which are based on the α9 Gen 3 processor, support AV1.
At CES 2020, Samsung announced that its 2020 8K QLED TVs, featuring Samsung's "Quantum Processor 8K SoC," are capable of decoding AV1.
D = decode, E = Encode

Patent claims

Sisvel, a Luxembourg-based company, has formed a patent pool, and are selling a patent license for AV1.
The pool was announced in early 2019, but a list of claimed patents was first published on March 10, 2020. This list contains over 1050 patents.
The substance of the patent claims remains to be challenged.
Sisvel's prices are 0.32 € for display devices and 0.11 € for non-display devices using AV1. Sisvel has stated that they won't seek content royalties, but their license makes no exemption for software.
, the Alliance for Open Media has not responded to the list of patent claims. Their statement after Sisvel's initial announcement reiterated the commitment to their royalty-free patent license and made mention of the "AOMedia patent defense program to help protect AV1 ecosystem participants in the event of patent claims", but did not mention the Sisvel claim by name.

AV1 Image File Format (AVIF)

The AV1 Image File Format is a specification for storing images or image sequences compressed with AV1 in the HEIF file format. It competes with HEIC which uses the same container format, build upon ISOBMFF, but HEVC for compression. Version 1.0.0 of the specification was finalized in February 2019.
AVIF supports features like:
On 14 December 2018 Netflix published the first.avif sample images, and support was added in VLC. Microsoft also announced support with the Windows 10 "19H1" preview release, including File Explorer, Paint and multiple APIs, together with sample images. Mozilla and Google are also working on support for the new image format in Firefox and Chrome. On September 18, 2019, paint.net added support for opening AVIF files, however saving is not supported yet. The Colorist format conversion and Darktable RAW image data have each released support for and provide reference implementations of libavif, and a GIMP plugin implementation has been developed supporting both 3.x and 2.10.x plugin APIs.
On 14 February 2020, Netflix published a blog article with objective measurements on AVIF's image quality & compression efficiency in comparison to JPEG.