Image Compression:

How Math Led to the JPEG2000 Standard

Digital Image Basics

Grayscale Images

   
   

Thumbnail of a digital grayscale image. Fullsize version

The image at right is a thumbnail version of a digital grayscale image. The image is a rectangular tiling of fundamental elements called pixels. A pixel (short for picture element) is a small block that represents the amount of gray intensity to be displayed for that particular portion of the image. For most images, pixel values are integers that range from 0 (black) to 255 (white). The 256 possible gray intensity values are shown below.

   
   

The range of intensity values from 0 (black) to 255 (white).

Even if you view the full-size image, it is difficult to see the individual pixel intensities. This is the advantage of high-resolution images - more dots (or pixels) per inch (dpi) produces a finer image. Web applications often require images to have resolution 200dpi will printed matter such as books require 300dpi. The full-size image above is artificially enlarged - it is actually 2.016" x 3.024" or 508dpi. In order to get a better idea of pixel intensity values, we have taken the 15 x 15 pixel block that represents the bottom left-hand corner of the image and enlarged it. The images below show the enlarged block as well as their intensity values.

   
   

The lower left 15 x 15 pixel portion of the image.

150154160157106140147142141147132150171117136
144159125121157143132136153138155164169162152
190175169155161136152158141162147153161168169
185203139161151159145167179167150155165159158
15115316315216015216413113151124152154145143
164162158167157164166139132138119148154139146
147148143155169160152161159143138163132152146
66129163165163161154157167162174153156151156
162173172161158158159167171169164159158159162
163164161155155158161167171168162162163164166
167167165163160160164166169168164163165167170
172171170170166163166170169168167165163163160
173172170169166163167169170170170165160157148
167168165173173172167170170171171169162163162
200198189196191188163168172177177186180180188

Pixel intensity values of the lower left 15 x 15 pixel portion of the image.

The fundamental unit on a computer is a bit. A bit (binary unit) takes either the value 0 or 1. The byte (the fundamental unit of storage on a PC) is composed of 8 bits. Since each bit takes on one of two values and 8 bits make a byte, we can use the multiplication principle to realize that there are 28 = 256 possible bytes. We represent these bytes in base 2. For example, the intensity 125 can be written as

125= 64 + 32 + 16 + 8 + 4 + 1
= 0 x 128 + 1 x 64 + 1 x 32 + 1 x 16 + 1 x 8 + 1 x 4 + 0 x 2 + 1 x 1
= 0 x 27 + 1 x 26 + 1 x 25 + 1 x 24 + 1 x 23 + 1 x 22 + 0 x 21 + 1 x 20
= 011111012

The American Standard Code for Information Exchange (ASCII) has assigned each of the 256 possible bytes to each keyboard character. Click here to see the ASCII chart for the first 128 bytes and here for the (extended) last 128 bytes. And yes, there are 256 characters on a standard keyboard! If you are using a PC, open Notepad, enable NUMLOCK, hold down the ALT key and type 125 on the numeric keypad - you will see the ASCII character for the right brace } appear on screen.

If an image of size M x N pixels is stored in raw format on a web server or digital camera, then aside from some header information, the file consists of M x N x 8 bits (zeros and ones) where the rows of the image are concatenated to form one long bit stream. For our example image, the bit stream has length 768 x 512 x 8 = 3,145,728 bits!

Color Images

Color images require more storage space than grayscale images. Pixels in grayscale images need just one byte to indicate the intensity of gray needed to render the pixel on screen. It turns out that any color can be built using the correct combination of red, green, and blue. Thus, pixels in color images are represented by three values (r,g,b). The values indicate the intensity of red, green, and blue, respectively, needed to render the pixel on screen. The range of intensities is exactly the same as grayscale images - 0 means none of the color appears in the pixel and 255 indicates the highest level of the color is evident in the pixel. For example, the triple (128, 0, 128) would represent a medium purple while (255, 215, 0) represents gold. Click here for examples of other color triples.

The set of all triples form the RGB Colorspace. The space is shown in more detail in the two images that follow.

   
   

The RGB colorspace cube. Two-dimensional version

We can actually look at each of the red, green, and blue channels separately. Below are thumbnails of each. The dimensions of the original image are 768 x 512 pixels. Since each pixel requires 3 bytes of information, we can store the image to disk in raw format using 768 x 512 x 3 = 1,179,648 bytes. The bit stream would have length 1,179,648 x 8 = 9,437,184.

   
   
   

Thumbnail of a digital image. Fullsize version

   

Red channel. Fullsize version

   
   
   

Green channel. Fullsize version

   

Blue channel. Fullsize version

Color Space Conversion

Researchers have learned that for applications such as image compression, the RGB color space is not optimal. It turns out the human brain is more attuned to small changes in terms of luminance and chrominance. A luminance channel carries information regarding the brightness of a pixel. Chrominance is the difference between a color and a reference channel at the same brightness. Since there are three channels in an RGB colorspace, we will use one luminance and two chrominance channels to form a new space for image compression of color images. The most common of these spaces and the one used by JPEG2000 is the YCbCr space. The Y channel is luminance while Cb and Cr are chrominance channels.

The first step to convert a red, green, and blue triple (r,g,b) to YCbCr space is to divide each intensity by 255 so that the resulting triple has values in the interval [0,1]. Let's define (r',g',b') = (r/255,g/255,b/255).

We obtain the luminance value y using the formula

y = .299r' + .587g' + .114b'

Note that if (r,g,b) = (0,0,0) (black), then y = 0. If (r,g,b) = (255,255,255) (white), then y = .299 + .587 + .114 = 1. We say the above formula for y is a convex combination of r', g', and b' since the multipliers .299, .587, and .114 are nonnegative and sum to one. Actually, this is exactly the formula suggested by the National Television System Committee (NTSC) for converting color feeds to black and white televisions sets.

For the chrominance channels, we measure the difference between two color channels and the reference channel y. In the YCbCr space, the colors we use are blue and red. The Cb channel is defined as

Cb = (b'-y)/1.772

and the Cr channel is given by

Cr = (r'-y)/1.402

The 1.772 and 1.402 appear in the denominators of Cb and Cr, respectively so that the resulting values lie in the interval [-1/2, 1/2].

For display purposes, the values y, Cb, and Cr are scaled by the formulas

Y = 219y + 16
Cb = 224Cb+128
Cr = 224Cr+128

and then rounded to the nearest integer. The images below show the Y, Cb, and Cr channels for a color image.

   
   
   

Thumbnail of a digital image. Fullsize version

   

Y channel. Fullsize version

   
   
   

Cb channel. Fullsize version

   

Cr channel. Fullsize version





 
Images courtesy of Radka Tezaur.