Chapter 9:

MPEG

The basic scheme is to predict motion from frame to frame in the temporal direction,

and then to use DCT's (discrete cosine transforms) to organize the redundancy in the
spatial directions. The DCT's are done on 8x8 blocks, and the motion prediction is done in the
luminance (Y) channel on 16x16 blocks. In other words, given the 16x16 block in the current frame
that you are trying to code, you look for a close match to that block in a previous or future frame
(there are backward prediction modes where later frames are sent rst to allow interpolating
between frames). The DCT coefcients (of either the actual data, or the difference between this block and
the close match) are quantized, which means that you divide them by some value to drop bits off the bottom
end. Hopefully, many of the coefcients will then end up being zero. The quantization can change for every
"macroblock" (a macroblock is 16x16 of Y and the corresponding 8x8's in both U and V). The results of all of this, which
include the DCT coefcients, the motion vectors, and the quantization parameters (and other stuff) is Huffman coded using xed tables.
The DCT coefcients have a special Huffman table that is two-dimensional in that one code species a run-length of zeros and the non-
zero value that ended the run. Also, the motion vectors and the DC DCT components are DPCM, (subtracted from the last one) coded.
--Berkeley Multimedia Research Center MPEG-1 Document
23