Image Compression:
How Math Led to the JPEG2000 Standard
The Discrete Cosine Transformation
A key component of the JPEG Image Compression Standard is the transformation step. The goal of this step is to move (transform) the preprocessed image to a setting where the coding portion of the compression algorithm can be more effective. The coding method works best if there are relatively few distinct values. How do we accomplish this task with a transformation? Think about it this way  digital images are typically comprised of regions where there is little change in grayscale intensities. We need a transformation that exploits this observation we seek a transformation that takes an NxM block comprised of similar values and maps the block to a matrix of the same size where most of the information about the original block is stored in relatively few elements and the remaining elements are either zero or very close to zero. If every region of like values is mapped to values near zero, then the output of the transformation will consist of a relatively large amount of nearzero values.
Discrete Cosine Transformation Defined
The transformation the Joint Photographic Experts Group chose for the task was the Discrete Cosine Transformation (DCT). Recall that the preprocessing portion of algorithm partitions the image into 8 x 8 blocks, so the DCT is an 8 x 8 matrix. The values of the matrix are given below.
U=\frac{1}{2}\left[\matrix{ \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} \\ \cos \frac{\pi}{16} & \cos \frac{3\pi}{16} & \cos \frac{5\pi}{16} & \cos \frac{7\pi}{16} & \cos \frac{9\pi}{16} & \cos \frac{11\pi}{16} & \cos \frac{13\pi}{16} & \cos \frac{15\pi}{16} \\ \cos \frac{2\pi}{16} & \cos \frac{6\pi}{16} & \cos \frac{10\pi}{16} & \cos \frac{14\pi}{16} & \cos \frac{18\pi}{16} & \cos \frac{22\pi}{16} & \cos \frac{26\pi}{16} & \cos \frac{30\pi}{16} \\ \cos \frac{3\pi}{16} & \cos \frac{9\pi}{16} & \cos \frac{15\pi}{16} & \cos \frac{21\pi}{16} & \cos \frac{27\pi}{16} & \cos \frac{33\pi}{16} & \cos \frac{39\pi}{16} & \cos \frac{45\pi}{16} \\ \cos \frac{4\pi}{16} & \cos \frac{12\pi}{16} & \cos \frac{20\pi}{16} & \cos \frac{28\pi}{16} & \cos \frac{36\pi}{16} & \cos \frac{44\pi}{16} & \cos \frac{52\pi}{16} & \cos \frac{60\pi}{16} \\ \cos \frac{5\pi}{16} & \cos \frac{15\pi}{16} & \cos \frac{25\pi}{16} & \cos \frac{35\pi}{16} & \cos \frac{45\pi}{16} & \cos \frac{55\pi}{16} & \cos \frac{65\pi}{16} & \cos \frac{75\pi}{16} \\ \cos \frac{6\pi}{16} & \cos \frac{18\pi}{16} & \cos \frac{30\pi}{16} & \cos \frac{42\pi}{16} & \cos \frac{54\pi}{16} & \cos \frac{66\pi}{16} & \cos \frac{78\pi}{16} & \cos \frac{90\pi}{16} \\ \cos \frac{7\pi}{16} & \cos \frac{21\pi}{16} & \cos \frac{35\pi}{16} & \cos \frac{49\pi}{16} & \cos \frac{63\pi}{16} & \cos \frac{77\pi}{16} & \cos \frac{91\pi}{16} & \cos \frac{105\pi}{16} }\right]

Elements of the 8 x 8 DCT matrix.

Graphical View of the Rows of the DCT
Row 0 (it is easier if we number the rows 0  7) contains the constant value \sqrt{2}/2. We form row k, k=1,\ldots,7, by first creating 8 equally spaced points starting at k\pi/16 with the distance between each point equal to 2k\pi/16, evaluating the cosine of each point, and dividing the result by 2. Alternatively, you can think of creating 8 equally spaced points on the interval \left[ k\pi/16, k\pi  k\pi/16 \right], evaluating cosine at each point, and dividing the output by 2. Graphical representations of rows 17 appear below.
How the DCT Processes Gray Intensities
We can gain some insight as to how the DCT works if we have a closer look at the images above. It takes a bit of work, but with the help of the plots, you can see that the heights of the orange points in any one plot sum to zero. That means if we take any constant vector and dot it with any of rows 1 through 7, the result will be zero. More importantly, if we dot any of rows 1 through 7 with a nearconstant vector, the result will be either zero or a value close to zero.
The DCT, applied to 8 x 8 block A, is B = U A U^T. We first compute C = U A and this product is simply an application of U to each column of A. If A is a matrix of nearconstant values, elements in C = U A will be (near) zero as we proceed down the column. The final computation is B = C U^T. Here we are simply applying the columns of U^T (the rows of U) to the rows of C. So if elements in C are (near) constant, elements in B will be (near) zero as we proceed to the right in each row. Putting this all together, we see that in general, the DCT tends to store information about all 64 input values in a few values and "shoves" them to the upper lefthand corner of the output. The remaining values are either zero or approximately zero. This makes the DCT wellsuited to the JPEG standard.
Examples
Let's look at two examples. The 8 x 8 matrix A below contains only the value 100.
A=\left[\matrix{ 100 & 100 & 100 & 100 & 100 & 100 & 100 & 100 \\ 100 & 100 & 100 & 100 & 100 & 100 & 100 & 100 \\ 100 & 100 & 100 & 100 & 100 & 100 & 100 & 100 \\ 100 & 100 & 100 & 100 & 100 & 100 & 100 & 100 \\ 100 & 100 & 100 & 100 & 100 & 100 & 100 & 100 \\ 100 & 100 & 100 & 100 & 100 & 100 & 100 & 100 \\ 100 & 100 & 100 & 100 & 100 & 100 & 100 & 100 \\ 100 & 100 & 100 & 100 & 100 & 100 & 100 & 100 }\right]

An 8 x 8 matrix with constant value.

The DCT B = U A U^T appears below:
B=\left[\matrix{ 800 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 }\right]

The DCT of the constant matrix A.

Note that the entire block, save the (1,1) element, was converted to 0. Let's have a look at another example. The 8 x 8 matrix A below contains only the values 50, 51, and 52. As far as images go, we would consider this a region where there is little change in gray intensites.
A=\left[\matrix{ 51 & 52 & 51 & 50 & 50 & 52 & 50 & 52 \\ 51 & 52 & 51 & 51 & 50 & 52 & 52 & 51 \\ 50 & 50 & 51 & 52 & 52 & 51 & 51 & 51 \\ 51 & 50 & 50 & 50 & 52 & 50 & 50 & 51 \\ 51 & 50 & 50 & 51 & 50 & 50 & 51 & 50 \\ 50 & 51 & 52 & 52 & 51 & 50 & 50 & 50 \\ 51 & 52 & 51 & 50 & 52 & 50 & 52 & 50 \\ 50 & 51 & 52 & 52 & 50 & 51 & 52 & 51 }\right]

An 8 x 8 matrix with nearconstant values.

The DCT (rounded to three digits) of the nearconstant matrix appears below. Note that many of the values are close to zero.
B=\left[\matrix{ 407 & 0.058 & 0.518 & 0.592 & 0.5 & 0.118 & 0.597 & 0.086 \\ 0.352 & 0.654 & 1.019 & 0.818 & 0.179 & 1.074 & 1.190 & 1.194 \\ 1.904 & 0.116 & 1. & 0.598 & 2.174 & 0.352 & 0.293 & 1.006 \\ 0.661 & 1.350 & 0.689 & 0.055 & 0.425 & 0.599 & 0.254 & 0.412 \\ 1. & 0.335 & 1.171 & 0.102 & 0.5 & 0.020 & 0.868 & 0.502 \\ 0.229 & 0.162 & 0.115 & 0.711 & 0.956 & 1.902 & 0.108 & 1.454 \\ 0.023 & 0.173 & 1.707 & 1.529 & 0.630 & 0.109 & 1. & 0.603 \\ 0.110 & 0.383 & 0.105 & 0.470 & 0.005 & 0.568 & 0.470 & 0.111}\right]

The DCT of the nearconstant matrix.

Properties of the DCT
It turns out that the DCT matrix U is invertible. Indeed, it can be shown that U is orthogonal so that U^T = U^{1}. (A wonderful proof of this result appears in a paper by Gilbert Strang).
The DCT for a preprocessed block A is B = U A U^T. If you have taken linear algebra, you recognize the DCT as an example of a similarity transformation. Suppose \bf a respresents the first column of A. Then the computation U \bf a requires 8 multiplications and 7 additions for each row for a total of 120 operations. Fast algorithms, based on the Fast Fourier Transformation, can be employed to significantly reduce the time required to compute U \bf a. For more information on these fast algorithms, the book by Rao and Yip is an excellent resource.
Advantages and Disadvantages of the DCT
So the DCT possesses several traits that make it an effective tool for image compression. The transformation is orthogonal (inverse is transpose and energy is preserved), fast algorithms can be used for computation, and the output for (near) constant matrices generally consists of a large number of (near) zero values.
There is one major disadvantage of the DCT. While the input from preprocessed 8 x 8 blocks are integervalued, the output values are typically realvalued. Thus we need a quantization step to make some decisions about the values in each DCT block and produce output that is intergervalued. Please see the subsection on Quantization for more details.