What Do I Need to Know About the AV1 Codec?

In this blog, Luke Durham, CTO, Switch Media looks at AV1 Codec – the open, royalty-free video coding format developed by the Alliance for Open Media (AOMedia), that’s freely available to anyone.

Raw 4K video is big, like really big; around 33 Gigabits per minute big! The bandwidth needed to transmit 4K and the storage required to hold the media files gets linearly worse as the number of assets and the length of each asset increases.

To address this issue, the Alliance for Open Media (AOMedia) was formed as a way to unite top tech leaders behind a collaborative effort to offer open, royalty-free and interoperable solutions for the next generation of media delivery. The Alliance’s shared vision is to make media technology more efficient, cost-effective and of superior quality for all users, on all devices, and on all platforms using AOMedia standards and tools. AOMedia member companies are committed to continuing to provide the focus and investment required to develop and maintain a future roadmap of standards and tools for AV1 and media delivery.

As such, AV1 is freely available to anyone as open-source code and is designed to be interoperable with existing media container and transport specifications. All Open Bitstream Unit (OBU) headers are unencrypted, which makes manipulating encrypted AV1 streams easy. Similarly, temporal delimiter, sequence header and metadata (except for those requiring protection) are also unencrypted, allowing easy processing and transport of streams containing encrypted AV1 payloads. Tile groups are individually encrypted so encryption can be parallel/concurrent.

AV1 compresses 30 percent better than HEVC. But how?  Larger and more dynamic block sizes and transforms have less compression overhead and have been proven to compress better overall. For live content, tile parallelism means we can highly compress more sets of tiles by sharing the same compute time-space. Algorithms from Daala and Thor such as Chroma-to-Luma predictions and other coefficient predictions are all included as part of AV1.  Superblocks – 64-bit transforms – can be dynamically split into smaller blocks in order to choose more efficient transforms, which yield better results.

Similar to MPEG, AV1 maintains the concept of inter and intra frames. New to AV1 is the concept of the golden frame, which is a key frame encoded at the highest quality, allowing reference frames to link through to that, without wasting space. As with MPEG, inter-frames also include a compound mode, i.e. bi-prediction with other frames. Eight reference frames are kept, enough for three temporal or three spatial layers, at the same time as a golden frame, allowing run-time dynamic choices.

Predictors take previous frames to determine where blocks will end up/look like. Whilst having more predictors increases encoding complexity, there are more choices available, which can improve the overall quality of playback. Machine learning selects the best predictors per asset to improve overall quality versus compression time.

AV1 introduces the concept of tiles, which can be grouped as sets of blocks, and encoded and decoded independently from other tiles. With sports in particular, this means playback of the action can continue even if part of the image is lost. Processing power can be dedicated to the tiles with the most motion. Currently, that uses a lot of processing power, but this will improve.

Texture segmentation methods use a deep learning-based approach to detect the texture regions in a frame that is perceptually insignificant to the human eye. High dynamic range and wide colour gamut are supported to a point where colour gamut spaces, matrices and transfer functions can be encoded directly in the bitstream itself.