How Video Codecs Works

How Video Codecs Works

Video codecs are a fundamental aspect of modern digital video technology. They allow us to compress video data, which makes it easier to store, transmit, and display high-quality video content. But how do video codecs work, and what are the different types of video codecs available today?

In this blog, we'll explore the basics of video compression technology, including why it is necessary and how it works. We'll also discuss the different types of video codecs, including lossy and lossless compression, and provide examples of each. We'll then delve into the decoding process in detail, examining the five steps such as picture partitioning, predictions, transform, quantization, and entropy encoding, and the types of algorithms that are used.

Finally, we'll discuss how video codecs impact the quality of the video output, and explore some of the trade-offs involved in choosing a compression method. We'll provide examples of popular video codecs, including H.264, H.265, and VP9, and discuss the advantages and disadvantages of each.

So, let's get started by looking at why video compression is necessary in the first place, and how it works.

Why Video Compression is Necessary

Video files are often large and require significant storage space, making them difficult to store and transmit. Without compression, a single high-resolution video could take up several gigabytes of storage space, which would make it impractical to store or transfer.

To solve this problem, video compression technology was developed. Video compression works by reducing the amount of data needed to represent a video file while maintaining its quality. By compressing video data, we can reduce the size of the file, making it easier to store, transmit, and display.

How Video Compression Works

Video compression works by taking advantage of redundancies in the video data. Most videos contain areas of static or slowly changing content, such as backgrounds or stationary objects. By only storing the changes that occur in these areas from one frame to the next, rather than storing each frame in its entirety, we can significantly reduce the amount of data needed to represent the video.

This process is known as interframe compression. The video is divided into individual frames, and each frame is compressed using a combination of spatial and temporal compression techniques.

Spatial compression reduces the amount of data needed to represent each individual frame by removing redundancies within each frame. This is achieved using techniques such as color subsampling, which reduces the number of colors used to represent the image, and spatial downsampling, which reduces the resolution of the image.

Temporal compression reduces the amount of data needed to represent the changes between frames. This is achieved using techniques such as motion estimation and compensation, which identify areas of the frame that remain unchanged between frames and only store the changes that occur in moving objects.

By combining these techniques, video codecs can significantly reduce the amount of data needed to represent a video file while maintaining its quality.

Types of Video Codecs

There are two main types of video codecs: lossy compression and lossless compression.

 
Types of Video Codecs

Types of Video Codecs

 

Lossy Compression

Lossy compression is the most common type of video compression and is used in applications where the highest possible quality is not required. Lossy compression works by removing some of the data from the video file, which reduces its size. The amount of data that is removed depends on the compression level used, with higher compression levels resulting in greater loss of data and lower file sizes.

The primary advantage of lossy compression is that it can achieve significant file size reductions, making it ideal for streaming and other applications where bandwidth and storage are limited. However, lossy compression also results in a loss of quality, with compressed video files appearing blocky or pixelated.

Examples of lossy video codecs include H.264, VP8, and MPEG-4.

Lossless Compression

Lossless compression is less common than lossy compression and is used in applications where the highest possible quality is required. Lossless compression works by removing redundancies in the video data without losing any of the original data.

The primary advantage of lossless compression is that it preserves the original quality of the video file, making it ideal for applications where quality is critical, such as professional video editing. However, lossless compression also results in larger file sizes than lossy compression.

Examples of lossless video codecs include Apple ProRes and Avid DNxHD.

In the next section, we'll delve into the decoding process in detail, examining the five steps of picture partitioning, predictions, transform, quantization, and entropy encoding, and the types of algorithms that are used.

Decoding Process: An In-depth Look

Now that we understand the basics of video compression technology, let's take a closer look at the decoding process. As we mentioned earlier, the decoding process involves several steps, including picture partitioning, predictions, transform, quantization, and entropy encoding.

1. Picture Partitioning

The first step in the decoding process is picture partitioning. During encoding, the video frames are divided into macroblocks or blocks, depending on the codec used. In the decoding process, these macroblocks are identified and reconstructed to form the original frames.

For example, in H.264, the frames are divided into macroblocks of size 16x16 pixels or 8x8 pixels, depending on the codec settings. The macroblocks are then transmitted and reconstructed during decoding. Picture partitioning helps to reduce the amount of data transmitted and, therefore, the overall file size.

H.264 (the frames are divided into macroblocks of size 16x16 pixels or 8x8 pixels, depending on the codec settings)

H.264 (the frames are divided into macroblocks of size 16x16 pixels or 8x8 pixels, depending on the codec settings)

2. Predictions

After the partitioning process, the encoder makes predictions about each macroblock to reduce redundancy in the data. Predictions are based on the previously encoded frames and are used to eliminate spatial and temporal redundancy.

Spatial redundancy occurs when the same data is repeated in different parts of an image. For example, in a scene with a blue sky, the encoder can predict that all of the macroblocks containing sky pixels will have similar color and intensity values. By using prediction, the encoder can store only the differences between the predicted value and the actual value, which reduces the amount of data that needs to be stored.

Temporal redundancy occurs when the same data is present in consecutive frames of a video. In this case, the encoder predicts that the macroblocks in the current frame will be similar to those in the previous frame. The encoder then stores only the differences between the predicted values and the actual values.

Intra Prediction in H.264

Intra Prediction in H.264

There are two types of predictions: intra-frame prediction and inter-frame prediction. Intra-frame prediction is used within a single frame and is used to eliminate spatial redundancy. Inter-frame prediction is used between frames and is used to eliminate temporal redundancy.

Intra-frame prediction involves predicting the values of a macroblock based on the values of the neighboring macroblocks in the same frame. The prediction can be based on the average value of neighboring macroblocks, or it can use more complex algorithms such as spatial prediction based on gradient information or model-based prediction.

Inter-frame prediction involves predicting the values of a macroblock in the current frame based on the values of the corresponding macroblock in the previous frame. This prediction is called motion compensation. The encoder searches for the best match for the current macroblock in the previous frame and calculates the motion vector, which describes how the macroblock has moved between frames. The current macroblock is then predicted based on the values of the corresponding macroblock in the previous frame and the motion vector.

In both intra-frame and inter-frame prediction, the encoder stores only the difference between the predicted value and the actual value. These differences are called residuals and are used in the next step of the decoding process.

Examples of prediction techniques include spatial prediction using the Median Edge Detection (MED) algorithm and temporal prediction using the motion vectors.

3. Transform

Transformation step

Transformation step

After the prediction step, the next step in the decoding process is the transformation step. The goal of the transformation step is to decorrelate the residuals so that they can be compressed more efficiently.

The transformation step is performed using mathematical functions called transforms. The most commonly used transform is the Discrete Cosine Transform (DCT), which converts the residuals from the spatial domain to the frequency domain. The DCT is a lossless transformation that does not lose any information during the transformation process.

During the transformation step, the encoder divides each macroblock into smaller blocks and applies the DCT to each block. The result is a set of coefficients that represent the amount of energy in each frequency band. The coefficients are sorted in descending order of energy, so the first few coefficients represent the most important information about the macroblock. The remaining coefficients are considered to be less important and can be discarded without a significant loss of quality.

The number of coefficients that are kept is determined by the quantization step, which is the next step in the decoding process. The quantization step controls the amount of compression that is applied to the video data and has a considerable effect on the quality of the video output.

Examples of transforms include the DCT, the Discrete Wavelet Transform (DWT), and the Integer Transform (IT). The choice of transform depends on the requirements of the application and the trade-off between compression efficiency and computational complexity.

In summary, the transformation step converts the residuals from the spatial domain to the frequency domain using transforms such as the DCT. The resulting coefficients represent the energy in each frequency band and are sorted in descending order of importance. The number of coefficients that are kept is determined by the quantization step, which is the next step in the decoding process.

4. Quantization

Quantization

The quantization step controls the amount of compression that is applied to the video data and has a considerable effect on the quality of the video output. During the quantization step, the encoder divides the coefficients obtained from the transformation step by a set of quantization values.

Quantization is a lossy process that reduces the accuracy of the coefficients by rounding them to the nearest integer value. The quantization values are selected to achieve the desired level of compression. Higher quantization values result in more compression but also a greater loss of quality.

The quantization values can be adjusted to control the quality of the video output. A smaller quantization value will result in higher quality video but also larger file sizes, while a larger quantization value will result in lower quality video but smaller file sizes. The quantization step is a trade-off between video quality and file size.

The quantization process is typically performed using a uniform quantization scheme, where the same quantization value is applied to all coefficients. However, more advanced quantization schemes can be used to achieve higher compression ratios while maintaining acceptable video quality. These advanced schemes use different quantization values for different frequency bands, which allows more compression to be applied to high-frequency coefficients, which are typically less important for video quality.

In summary, the quantization step controls the amount of compression that is applied to the video data by dividing the coefficients obtained from the transformation step by a set of quantization values. The quantization values are selected to achieve the desired level of compression, with higher values resulting in more compression and lower quality video. Advanced quantization schemes can be used to achieve higher compression ratios while maintaining acceptable video quality.

5. Entropy Encoding

Entropy Encoding

Entropy Encoding

The final step in the decoding process is entropy encoding. After the quantization step, the video data consists of a sequence of quantized coefficients. Entropy encoding is used to further compress the video data by assigning shorter codes to more common coefficients and longer codes to less common coefficients.

Entropy encoding is a lossless compression technique that removes redundancy in the video data. The goal of entropy encoding is to reduce the number of bits needed to represent the video data without losing any information. There are several different entropy encoding algorithms used in video codecs, including Huffman coding, arithmetic coding, and adaptive binary arithmetic coding (ABAC). These algorithms assign shorter codes to more frequently occurring coefficients and longer codes to less frequently occurring coefficients.

Variable Length Coding (VLC) is the most commonly used entropy encoding technique in video codecs. It assigns shorter codes to more common coefficients and longer codes to less common coefficients. VLC works by creating a code table that maps each coefficient to a unique binary code. More frequent coefficients are assigned shorter codes, and less frequent coefficients are assigned longer codes.

Arithmetic coding is another type of entropy encoding technique that is used in some video codecs. It works by encoding the probability of a particular coefficient occurring rather than the coefficient itself. Arithmetic coding is more efficient than VLC, but also more computationally expensive.

In summary, entropy encoding is the final step in the decoding process, and it is used to further compress the video data by assigning shorter codes to more common coefficients and longer codes to less common coefficients. There are two types of entropy encoding: variable length coding (VLC) and arithmetic coding. VLC is the most commonly used entropy encoding technique in video codecs, while arithmetic coding is more efficient but also more computationally expensive.

By following these five main steps, the decoder is able to reconstruct the original video data from the compressed video stream. However, it's important to note that this process is not perfect, and some loss of quality is inevitable. The amount of quality loss can vary depending on the codec being used, the compression settings, and the type of video content being compressed.

 
 

Types of Algorithms Used in the Decoding Process

There are many different algorithms used in the decoding process, including motion compensation, intra-frame prediction, and deblocking filters. These algorithms are used to remove artifacts that can occur during the compression process, such as blockiness and blurring.

In the next section, we'll explore how video codecs impact the quality of the video output and some of the trade-offs involved in choosing a compression method.

Impact of Video Codecs on Video Quality

The choice of video codec has a significant impact on the quality of the video output. As we discussed earlier, lossy compression results in a loss of quality, while lossless compression preserves the original quality of the video.

When using lossy compression, the quality of the video output is affected by the compression level used. Higher compression levels result in greater loss of quality, with compressed video files appearing blocky or pixelated. The bitrate, resolution, and frame rate of the video also affect the quality of the video output, with higher values resulting in better quality but also larger file sizes.

Trade-Offs in Choosing a Compression Method

When choosing a compression method, there are several trade-offs to consider. The primary trade-off is between file size and quality. Lossy compression algorithms produce smaller file sizes but result in a loss of quality, while lossless compression algorithms produce larger file sizes but preserve the original quality of the video.

Another trade-off to consider is compatibility. Not all devices and software support all video codecs, so it's important to choose a codec that is widely supported. H.264 is currently the most widely supported codec, with support on almost all devices and platforms.

Finally, there is a trade-off between encoding and decoding time. Some video codecs require more time to encode or decode video files than others. This can be a critical factor in real-time applications, such as video conferencing and live streaming.

Popular Video Codecs

There are many different video codecs available, each with its own advantages and disadvantages. Here are three of the most popular video codecs in use today:

H.264

H.264, also known as MPEG-4 AVC (Advanced Video Coding), is the most widely used video codec in the world. It is used for a wide range of applications, including Blu-ray discs, online video streaming, and video conferencing. H.264 uses lossy compression and offers a good balance between file size and quality.

H.265

H.265, also known as HEVC (High-Efficiency Video Coding), is the successor to H.264 and offers better compression efficiency than its predecessor. H.265 can produce video files that are up to 50% smaller than those produced by H.264, but it requires more processing power to encode and decode video files.

VP9

VP9 is a video codec developed by Google and is used primarily for online video streaming. VP9 uses lossy compression and offers a good balance between file size and quality. It is not as widely supported as H.264, but it is becoming increasingly popular and is supported on many newer devices and platforms.

In the next section, we'll delve into the advantages and disadvantages of each of these popular video codecs.

Advantages and Disadvantages of Popular Video Codecs

Let's look closely at the advantages and disadvantages of the three popular video codecs we discussed earlier: H.264, H.265, and VP9.

H.264

Advantages:

  • H.264 is widely supported on almost all devices and platforms, making it a popular choice for video content creators.

  • H.264 provides a good balance between file size and quality, making it ideal for online video streaming and other applications where file size is a concern.

  • H.264 offers hardware acceleration support, which reduces the processing power required to encode and decode video files, making it ideal for use on mobile devices and other low-powered devices.

Disadvantages:

  • H.264 uses lossy compression, which results in a loss of quality compared to the original video.

  • H.264 is not as efficient as newer codecs like H.265, which means it produces larger video files.

H.265

Advantages:

  • H.265 offers better compression efficiency than H.264, resulting in smaller file sizes while maintaining the same level of video quality.

  • H.265 supports 8K resolution, making it ideal for future-proofing video content.

  • H.265 provides support for HDR (High Dynamic Range) video, which offers better contrast and color accuracy in video content.

Disadvantages:

  • H.265 requires more processing capacity to encode and decode video files than H.264, which can make it difficult to use on low-powered devices.

  • H.265 is not as widely supported as H.264, which can limit its use in some applications.

VP9

Advantages:

  • VP9 provides a good balance between file size and quality, making it ideal for online video streaming.

  • VP9 is supported by many newer devices and platforms, making it a good choice for future-proofing video content.

  • VP9 is royalty-free, which means it can be used without paying licensing fees.

Disadvantages:

  • VP9 is not as widely supported as H.264, which can limit its use in some applications.

  • VP9 requires more processing power to encode and decode video files than H.264, which can make it difficult to use on low-powered devices.

In the next section, we'll discuss the future of video codecs and where the technology is headed.

The Future of Video Codecs

As video technology continues to evolve, so do video codecs. In recent years, we've seen the development of new codecs that offer even better compression efficiency, higher-quality video, and new features. Let's take a glance at some of the new codecs that are on the horizon.

AV1

AV1 is an open-source video codec that was released in 2018. Developed by the Alliance for Open Media, AV1 offers better compression efficiency than even H.265, making it an ideal choice for video streaming and other applications where file size is a concern. Like VP9, AV1 is royalty-free and can be used without paying licensing fees.

One of the key advantages of AV1 is that it's designed to be scalable, meaning it can be used to encode video at a variety of resolutions and bitrates. This makes it ideal for use in applications like video conferencing, where video quality needs to be adjusted based on network conditions.

While AV1 is still relatively new, it's already gaining support from major players in the tech industry, including Google, Apple, and Mozilla. As more devices and platforms add support for AV1, we can expect to see it become a more common choice for video content creators.

VVC

VVC, or Versatile Video Coding, is a new video codec that is currently under development. Like H.265, VVC is being developed by the Joint Video Experts Team (JVET), a group of experts from the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group.

VVC promises even better compression efficiency than H.265, making it an ideal choice for applications where file size is a concern. It also offers support for HDR video and is designed to be scalable, meaning it can be used to encode video at a variety of resolutions and bitrates.

While VVC is still in the early stages of development, it's already gaining support from major players in the tech industry, including Apple, Ericsson, and Huawei. As VVC continues to evolve, we can expect to see it become a more common choice for video content creators.

Conclusion

In conclusion, video codecs play a critical role in modern video technology. By compressing video files to reduce their size while maintaining video quality, codecs make it possible to store and transmit video content over a wide range of devices and networks.

As we've seen, there are many different types of video codecs available, each with its own advantages and disadvantages. When choosing a video codec, it's important to consider factors like file size, video quality, processing power, and device compatibility.

While H.264, H.265, and VP9 are currently the most popular video codecs, new codecs like AV1 and VVC are on the horizon, promising even better compression efficiency, higher-quality video, and new features. As video technology continues to develop, we can expect to see even more advancements in the field of video codecs, making it possible to deliver high-quality video content to an ever-growing audience.

Disclaimer: This report is based on information that is publicly available and is considered to be reliable. However, Lumenci cannot be held responsible for the accuracy or reliability of this data.​

Disclaimer: This report is based on information that is publicly available and is considered to be reliable. However, Lumenci cannot be held responsible for the accuracy or reliability of this data.​


Author

Dheeraj Soni

Consultant at Lumenci

Dheeraj has experience with codec (H.265, H.264, Av1, and VP9) related assets where the technology ranges from motion vector estimations, scalability, quantization, buffer management, and random access among others. He is an electrical engineer from IIT (ISM) Dhanbad, India.

He has 3 years of experience in the IP industry and worked on multiple service lines, including Patent Monetization, Source Code Review, Market Research, Technology Analysis, Infringement Analysis, damage estimations, etc.

Lumenci Team