Image encoding is necessary as a means of compressing an image or set of images in order to reduce it’s bandwidth and be able to transmit it and process it quickly enough.

## Compression

Compression eliminates spatial and temporal redundancy. There are different types of redundancy that facilitate the process without noticeable degradation.

- spatial redundancy: refers to the common information between adjacent pixels (JPEG),
- temporal redundancy: works on common pixel information on neighbouring frames (MPEG),
- spectral redundancy: studies the correlation between different color components and
- psicovisual redundancy: due to the human visual system.

## Data transmission system encoding model

Differential encoding consists of encoding only the differences between the current instant and the one before, reducing the dispersion of the source sample set.

From the point of view of information theory, if we have a source χ and an alphabet A formed by M symbols {a_{1}, …, a_{M}}, each symbol with probability p(a_{i}) = p_{i}, the **information** (how unexpected a symbol is) is given by:

and the **entropy** (mean real information per symbol):

For instance, Lena’s image is an 8 bit source with an alphabet of 256 symbols [0,255]. Assuming they are equiprobable,

- Original image (8-bits): H =7.4986 (8 bits aprox. per symbol)
- Differential image (9-bits): H = 4.43 (4 or 5 bits aprox. per symbol)

For symbols that are not equiprobable, codes will be of variable length.

## JPEG – the basics

JPEG or JPG is currently the most used image compression method. It is based on the Discrete Cosine Transform (DCT), cuantization and Huffman variable length encoding.

In the transform selection we must bare in mind that the encoding process usually involves some information loss (during cuantization). The process will be applied on the image block by block of 8×8 pixels, which the decoder will translate in border errors (block effect) depending on the transform we are using. Fourier and Walsh-Hadamard Transforms create the so called block effect, whereas the Cosine Transform works quite well around the borders and that’s why it is so widely use for JPEG and MPEG-2 compressions. Wavelet Transform is also a good one around edges, used in the MPEG-4.

The size of the block is chosen depending on the correlation with neighbouring blocks, typically being 8×8 or 16×16 pixels.

## Step 1: Discrete Cosine Transform (DCT)

Analysis equation:

Synthesis equation:

This implies that any other image can be synthesised by properly combining (C[k,l]) a series of base images (H_{k,l}[n,m]) the same size as the original.

**Properties of the DCT**

- Energy compaction in the transformed domain. Allows, for example, a very accurate encoding with very few coefficients.
- Independent of data. There are other transforms that adapts to image content providing greater compaction (Karhunen-Loeve) but are not for general purposes.
- Efficient algorithms already exist for calculating the DCT in integrated circuits.
- The DCT has little block effect as errors in high frequency coefficients are less severe.
- It is possible to identify and interpret spatial componentes of transformations, therefore it allows psychovisual technics to be applied, asigning less bits to those coefficients on spatial frequencies that are not crucial for the human visual system.

Visually, the DCT of Lena would result as:

As we see above, the DCT image contains all the information from the original encoded, but is much more redundant, so we can simply apply compression without losing the original information while compressing directly would be much more complicated. In this image we can see how DCT Lena has been divided into 8×8 blocks that are coded as explained later on.

The following images are simple examples of the representation of *Fig. 3*. They are 8×8 blocks that have been DCTed. We can see that pixel (1,1), which is the DC component, is an average shade of gray of the whole block. The first is blank, almost black in the last, and shades of gray for the other. Vertical high frequency information (sudden changes) is mainly focused on the first line of pixels while the horizontal high frequency is located in the first column (blocks 2 and 3).

*Fig. 6* shows a comparison of the coefficients from the DCT (blue) with coefficients from the Fourier Transform (green). We can observe how the DCT is a better approximation to the original (red) that the green line. Careful to check out the axis!

Furthermore, compaction is much higher. In the chart below we see the difference between the 9th and 11th coefficients is minimal for the DCT (graph above) in comparison to that from the Fourier Transform (graph below). Therefore we can encode very similarly but using fewer coefficients.

## Step 2: Cuantization

Quantization is the process by which a bit depth is assigned optimally to the data that is to be encoded, that is, how many bits are going to quantize each pixel. Obviously, the greater the number of quantization bits, the more accurate the pixel value. As we see in the image below, with 1 bit, we can encode 2^{1} = 2 states, 3 will encode 2^{3} = 8 states, and 5 bits 2^{5} = 32 possible states per pixel. 8 bits, as most JPGs, encode 256 possible states per pixel per channel. Hence, the color range is 0 to 255 per channel, and hence the full information for a pixel is 24 bits JPG (2 channels).

In a non-uniform quantization, we weigh the pixels out with different bit depths as we want them more or less accurately. As we have seen, the Discrete Cosine Transform has the most important information at the pixel (1,1) and the importance descends diagonally, so it is logical to assign more bits to the information in pixels from the top left corner and decrease progressively.

This allocation of bits is stored in a **quantization table**. After applying the DCT, the matrix will be divided by the quantization table, so the smaller the quantization coefficient, the lesser effect on the pixel value of the DCT, and therefore it will keep the most valuable information and have better accuracy than others.

It is clear that by dividing by the quantization table many DCT coefficients are canceled, especially in blocks with fewer details. In blocks with many details and larger variations in pixel values, fewer coefficients are void after quatization. To cancel out even more values we could saturate the quantization table by multiplying it by a threshold value, and use that instead. Of course, this results in less output quality and smaller files.

#### Block effect

And how exactly does the threshold encoding affect the quality?

The larger the values in the quantization table, the lower the values in the DCT matrix after division and the fewer the bits used to encode the states per pixel, mostly losing high frequency information. The result of encoding with fewer states for each block pixel is an adequate average (DC – pixel (1,1) of the block), but less information on the details (AC). In a very poor quantization, areas that had little detail will be totally flattened with the DC value only, and areas near edges contain some more information of this though inaccurately, leading to the typical tingling around the edges of a JPG image. These abrupt changes between edges that are noticeable in the final image is what is known as the “block effect”.

## Step 3: Variable Length Huffman Encoding

The last step after quantization will be to encode the data in order to be transmitted. In JPG, Huffman encoding type is used to enhance the bit saving on transmission. It is based on the fact that, as symbols are not equiprobable, it is absurd to use the maximum number of bits needed for all of them, when most of the time they will be carrying redundant information. The Huffman algorithm is based on an alphabet to assign codes to certain situations instead of encoding the information itself, and pays off if the savings is greater than the cost of inserting these code tables.

[alert variation=”alert-info”]Stay tuned because I will be posting more JPG enconding information in depth on my next post![/alert]