Basic Principles of Perceptual Coding

finalized in November 1992 with three related algorithms, called Layers, defined to take advantage of psychoacoustic effects when coding audio. Layer 1 and 2 are intended for compression factors of about 4:1 and 6 or 8:1 respectively, and these algorithms have become popular in satellite and hard-disk systems. Layer 3 achieves compression up to 12.5:1 — 8% of the original size — making it ideal for ISDN.

With perceptual coding, only information that can be perceived by the human auditory system is retained.

Lossless – which, for audio, translates to noiseless – coding with perfect reconstruction would be an optimum system, since no information would be lost or altered. It might seem that lossless, redundancy-reducing methods (such as PKZIP, Stuffit, and others used for computer hard-disk compression) would be applicable to audio. Unfortunately, no constant compression rate is possible due to signal-dependent variations in redundancy: There are highly redundant signals like constant sine tones (where the only information necessary is the frequency, phase, amplitude, and duration of the tone), while other signals, such as those which approach broadband noise, may be completely unpredictable and contain no redundancy at all. Furthermore, looking for redundancy can take time: while a popular song might have three choruses with identical audio data that would need to be coded only once, you’d have to store and analyze the entire song in order to find them. Any system intended for a real-time use over telephone channels must have a consistent output rate and be able to accommodate the worst case, so effective audio compression is impossible with redundancy reduction alone.

Fortunately, psychoacoustics permits a clever solution! Effects called “masking” have been discovered in the human auditory system. These masking effects (which merely prove that our brain is also doing the equivalent of coding) have been found to occur in both the frequency and time domains and can be exploited for audio data reduction.

Most important for audio coding are the effects in the frequency domain. Research into perception has revealed that a tone or narrow-band noise at a certain frequency inhibits the audibility of other signals that fall below a threshold curve centered on a masking signal.

The figure below shows two threshold of audibility curves. The lower one is the typical frequency sensitivity of the human ear when presented with a single swept tone. When a single, constant tone is added, the threshold of audibility changes, as shown in the upper curve. The ear’s sensitivity to signals near the constant tone is greatly reduced. Tones that were previously audible become “masked” in the presence of “masking tones,” in this case, the one at 300 Hz.

All signals below the upper “threshold of audibility” curve, or Masking Threshold are not audible, so we can drop them out or quantize them crudely with the least number of bits. Any noise which results from crude quantization will not be audible if it occurs below the threshold of masking. The masking depends upon the frequency, the level, and the spectral distribution of both the masker and the masked sounds.

CODING 5-3

Telos ZephyrExpress user manual Basic Principles of Perceptual Coding

Models: ZephyrExpress

Basic Principles of Perceptual Coding