|
Along with faithful recreation of tone color (timbre) and
accurate localization, "listener envelopment" (LEV) is sought
by audiophiles in recordings and playback systems, just as it is
prized when heard live by concert-goers in fine halls. Or by
movie viewers or gamers who crave immersion. LEV is the
reason for “surround” speakers – to recreate life-like spatiality.
But “How many (few) are required?” and “Where should they
be located?” And “What benefit/interference does the listening
room contribute?” Decades of research are behind the answers.
With eyes closed, humans need only hear another’s voice,
or clap their hands, to visualize the space they’re hearing – a
large hall, bathroom, cave, etc. Characteristics of our head,
outer ears, and torso guide perception of any space we’re
immersed within, based on familiar environs, learned since
childhood. But how we are able to do this depends on reflected
after-sounds – the acoustic response to the original direct sound.
More important, what otherwise would be sound emanated only
in a direct line to our ears, as is an anechoic test chamber or a
snow-covered field, this augmentation contributed by room
acoustics adds life and enjoyment to live performances, and
therefore to well recorded music, movies, and games.
Whether hearing live or recorded sounds, our ear-brain
system is sensitive to sounds arriving later in time and from
different directions from the direct sound of a source, whether a
live voice/instrument, or a speaker during reproduction. Researchers (Haas and others since) measured a tradeoff in
human detection of reflections, calling the effect precedence,
which refers to the favor we bestow on sounds that arrive before
others – the direct sounds preceding their reflections, delayed as
these are by the trip they make via a wall, floor, or ceiling to the
ear. Within the shortest interval, a reflection is perceived as
fused with the direct sound (even when in experiments the later
sound has been made artificially louder). With longer delay, we
increasingly detect reflective boundaries and unconsciously
form an impression of the enclosure we and the source of sound
are in. Not only its size, we piece together its shape, because all
sonic arrivals are colored as to direction by our personal headrelated
transfer function (HRTF). Our preference for reflectionbased
spatial cues translates to adding life to a concert, or to its
reproduction, by creating or preserving envelopment – LEV.¹
Reflections due to the home theater/media room itself are
delayed too short to cause envelopment – therefore the signals
needed to convey a sense of envelopment must be recorded in a
larger space, or simulated. The enveloping reflections and
diffuse reverberation of large rooms are not possible in small
ones. Furthermore, multiple loudspeakers, especially surround
speakers, provide the diversity of angles that enable recorded
envelopment signals to be differentiated naturally from recorded
direct sounds. So LEV is a product of 1) signals presented in
multi-channel recordings and 2) surround sound reproduction
involving four or more speakers positioned around the listener.
Attenuated by wall absorption, media room reflections
usually fail, at such short delay times, to be strong enough to
cross the threshold of the precedence curve. However,
recorded, longer-delayed reflections fall on the precedence
curve where the ear-brain is increasingly sensitive. Therefore re
LEV, most listening room’s acoustics are far less effective than
recorded ones, and tend to be ignored. Human adaptation is
also at play, so in short order the listener is focused on cues in
the recording in spite of many, possibly deleterious, acoustic
characteristics of the media room. (More damaging acoustic
conditions need be addressed and, if possible, fixed.) This
ability of humans to ignore and, within a matter of minutes,
adapt is sometimes called “listening through the room” – i.e.
perceiving cues in the recording in the presence of mostly
ineffective cues added by the listening acoustics².
Therefore, conveying spatiality, along with faithful tone
color (timbre) and accurate localization, involves fine recording
technique to capture LEV, reasonably good listening room
design/execution to avoid big acoustics issues and preserve
LEV, recognizing listener adaptation to mild conditions, and
simply ignoring effects that prove negligible. Still, the money
seat is where you’ve paid to be. But this “sweet spot” also
depends on choosing and placing loudspeakers.
Wavefield Synthesis (WFS) recreates impressive two-dimensional
surround sound, but it requires the resources
implied by 48~96 speakers. For home use, most people would
shy from 24 or even 12 speakers. But what is the justification
for choosing a layout with only from 4 to 7 loudspeakers, plus
subwoofer(s)? One such layout is
|
standardized internationally
as “5.1” (and related 6.1 and 7.1) surround sound systems and
content. Is 5.1 the best we can do? Or is there knowledge
indicating what is better, or next in the evolution in audio?
As in generating original LEV live in the more successful
concert halls, some audio reproduction speaker layouts are more
effective than others in enabling recorded spatiality to reach the
listener. Researchers over many decades have investigated and
agree generally about the degree to which speaker layout is
important. Their results explain why two-channel stereo does
not convey much envelopment, why quad was unsuccessful, and
how the industry came to standardize on 5.1 surround sound.
The degree of LEV is related to the sameness of signals at
the two ears, termed inter-aural cross-correlation (IACC). The
more different the ear signals are, the more envelopment is
perceived. The most effective range of angles for arriving
envelopment-producing signals is between 30° and 120° on
either side of straight ahead, with the highest sensitivity at ±60°
(see illustrations above). Demonstrate this range for yourself by
extending and swinging each arm within its most comfortable
range, from two-thirds forward through slightly behind.
Spanning typically 45 to 60° in front, conventional two-channelstereo speakers lie at the edge or outside this envelopmentproducing
range. This explains why stereo, and 5.1’s front
stage alone, are limited in their contribution to life-like LEV.
Adding speakers in back as did quadraphonics (and the
rear-most speakers of 7.1), sounds intended to convey hall
ambience are even more outside the back-most range for LEV,
behind ±120° – well beyond comfortable pointing range. So
quad did not satisfy as a method of two-dimensional surround.
However, adding speakers and appropriate recorded
signals between 60° to 120° – within comfortable pointing
range on both sides – maximizes LEV. In fact in controlled
experiments, four speakers placed optimally is almost as good
as 12! Indeed “4.1” (5.1 with no C signal) is preferred by some
content producers for instrumental music, where, compared to
quad, the two surround speakers are spaced widely, and
therefore are more effective in conveying LEV. Completing the
circle with a center speaker to anchor on-screen dialog or solo
voices, and we have international standard ITU-R775 for “5.1”
surround. (Note that 7.1 still has only the two side-most
speakers in LEV regions, explaining its marginal improvement
over 5.1, although it is useful for increasing seating.)
Referring to the figures and caption above, using crosstalk
cancellation and a pair of speakers in front alone in order to
reproduce two-channel recordings, or using two pairs front and
back to reproduce multi-channel surround, spatiality is life-like
because both stages, between ±60° in front and ±120° around
back, reach within the prime LEV regions. These are layouts of
Ambiophonics for playing stereo recordings and PanAmbio for
surround . (Note: Play 5.1 in PanAmbio by setting the player to
“no center” to mix the C channel to the front speaker pair.)
Listening on the median plane where ear signals are equivalent
to virtual speaker positions shown, all “speakers” are perceived
to reach well within the maximum LEV regions. PanAmbio has
double 5.1/7.1’s two surround speakers within LEV regions.
Speaker models are a trade-off of price-performance: price
can buy status if not real performance; cheap may be a wasted
investment, advertised as too good to be true. Believable
performance, clearly specified, should outweigh mere looks –
there are choices to be had that both perform well and look fine.
As important as power-handling and smooth, flat frequency
response on-axis, problems will occur if dispersion (off-axis
response) is poor at the wide angles that are mirrored at the
room walls, producing reflections that might not be ignored if
much altered in timbre from direct sound that arrives on-axis.
The science above is the work of many researchers, and is
nicely reported in Toole’s Sound Reproduction: Loudspeakers
and Rooms, see pp.99~126 and 292~305; Ambiophonics is
discussed on p.277.
¹ “Envelopment” in two-dimensional (2D) surround sound means being
encircled. However, in natural hearing we also perceive sounds
elevated above and below horizontal – “immersion” in a 3D sphere.
² Sensitized with practice, professionals in control rooms need a more
critical approach.
Internationally recognized engineer and Peabody awardwinning
film producer Robin Miller has presented papers and
demonstrations on 2D and 3D audio to the Audio Engineering
Society, Society of Motion Picture & Television Engineers,
Acoustical Soc. of America, Canadian Acoustical Assn., and
German Tonmeisters. His company, Filmaker Technology,
does applied science research, systems design & integration,
surround recording, and has patented a system of full-sphere
3D recording & reproduction – www.filmaker.com
|