960L Owner's Manual | Using The Reverb Program
Reverberation and Reality, Continued
send if we wish, and control the distance or depth of each sound source by controlling the amplitude of this source in the echo send.
But distance is not the only perception we need. We need the envelopment that makes notes come alive. How can we produce envelopment with a 5.1 system? Once again the key is the way reflections affect horizontal localization. Our brains have an exquisitely sensitive detector for differences in sound arrival times between our two ears. These time differences are converted into perceived horizontal angles, or azimuth. In the presence of reflected energy – particularly reflections not in the direction of the source – the time differences are not constant. As reflections come and go the time differences (and level differences) fluctuate, with the amount of fluctuation depending on the direction and strength of the reflections.
When the sound source is continuous – like legato strings, or pink noise – we perceive these fluctuations as an enveloping room impression. The time delay of the reflections does not matter very much, as long as they are longer than about 10ms. (Below 10ms there are severe combing effects we will try to avoid in this discussion.) But most musical sounds (and all speech sounds) are not continuous.
To understand what happens with speech or music we must learn how the brain separates sounds into streams. Streams are the perceptual equivalent of musical lines. Sentences from a single talker form a stream. A stream has in general a specific source and a single continuous semantic content. However the streams themselves are not continuous at all – in music the streams are composed of notes, in speech streams are composed of phones – little bursts of sound roughly equivalent to syllables. When we hear a string of phones, our speech apparatus goes into high gear. First we must separate the phones one from another, then we must use frequency and time information to assign an identity to each phone – at which point the phone becomes a phoneme, the basic building block of speech. From phonemes to words, from words to sentences, from sentences to meaning – all seemingly effortless and automatic – our brains decode the spoken word.
There are only two ways possible – we can detect the stop of a phone, or we can assume it has stopped when we detect the start of another. Naturally, we do both. But if we are to hear background sounds at all, we must detect the stop of phones before a new phone starts.
How do you know if a phone has stopped? We can do an experiment – about a 2dB drop in level in a 20ms time period seems sufficient. What if the level drops more slowly? Experiment shows that even with a slow drop a 6dB change is sufficient. What if the sound drops in level by 2dB, and then within 30ms comes back up again? (This drop could be caused by a
In general, to find the ends of phones the brain looks for a level drop, and waits for 50ms to be sure the level stays down. If it does, the sound event – the phone – is assumed to have ended. Now imagine another simple experiment. You are listening to someone talk in a noisy room. You can easily understand the person, but you are aware of the noise in the room - which is perceived as continuous. How can this be? It is clear that during the phones of the person who is talking you are unable to hear the room – the phones are masking the background. Yet you perceive the background as continuous.
The brain is clearly separating the sound of the room into a separate stream – the background stream. The neurology that detects the background stream works in the spaces between phones. Thus it cannot work without the participation of the mechanism that determines the ends of phones. Again we can experiment. It turns out that the background detection is inhibited during phones, as we would expect, and is still inhibited for the first 50ms after the end of each phone. After this time the inhibition is gradually released, so the background detector has full sensitivity within 150ms after the end of each phone. The loudness of the background is then perceived through a standard loudness integration, taking about 200ms for full loudness to develop.
It is the background perception of reverberation that gives us the sense of envelopment. Clearly it is the reflection level 150ms and more after the end of sound events that matters. Note that the relevant time is after the END of sound events. We are conditioned by years of looking at impulse responses to think about reflections as always coming from