Human hearing using two ears is called
binaural and was developed by evolution. Binaural sound is what most of us listen to all
the time. Audiophiles sometimes think of binaural sound as a recording made with a dummy
head and played back through earphones. This is a poor imitation of the real thing and is
not what we will mean when we refer to the binaural hearing mechanism in this book.
Stereophonic sound, by contrast, is simply one man-made method of recreating a remote or
recorded sound field in a completely different space. Stereophonic sound fields are almost
always auditioned by binaural listeners whose brains must then reconcile the lack of a
binaural field with the presence of a stereophonic one. The commonplace (but misnamed)
stereophonic recordings that normally consist of two full-range unencoded, discrete
channels, one left and one right are (despite adjustments by recording engineers based on
what they hear using studio stereo monitors) not inherently stereophonic and therefore
need not suffer the ills that playback via the stereo triangle engenders. That is, the
microphones don't know that the sound they pick up is going to be played back via two
widely spaced loudspeakers and thus none of the imperfections of the stereo triangle
discussed below apply to the recording before it is played back.
Although we later describe an Ambiophonically optimized recording
microphone arrangement, almost any mic setup used to produce two channel recordings works
reasonably well when reproduced Ambiophonically. Indeed one of the basic premises of this
book and the technology it describes is that the usual two-channel recorded program
material contains sufficient information to allow accurate simulation of a binaural
concert-hall experience. This is indeed fortunate since it allows the existing library of
LPs and CDs to be reproduced with unprecedented realism and shows that multi-channel
mic'ing and recording methods, where music is concerned, are actually counter productive
according to the tenets of binaural technology.
That as few as two channels should be more than adequate can be
intuitively understood by simply stating that if we deliver the exact sound required to
simulate a live performance at the entrance to each ear canal, then since we only have two
ear canals, we should only need to generate two such sound fields. The questions are why
existing stereophonic and earphone binaural recording techniques fall short, and what can
be done to make up for these shortcomings at least where music reproduction is concerned.
Monophonic Sound
Before the advent of stereo recording we had single-channel or monophonic
recordings. Most recordings were made by using one or more microphones and mixing their
outputs together before cutting the record or making a tape. Such a monophonic recording,
if reproduced by two loudspeakers, can be thought of as a special case of stereophonic
sound reproduction. It is the case where a sound is the same at both ears and the
interaural cross-correlation factor of the sound is 1. In a concert hall, such a signal
coming from the stage is sensed as coming from that stage regardless of which direction a
concert goer faces.
Let us now consider a listener in the balcony of a large hall during a
live concert. For this listener, the angle that the stage subtends is very small. Both
ears get essentially the same signal, the direct sound from the stage is weak because of
distance, and the hall ambience is strong and both are largely the same at each ear. Thus,
the players seem to be remote, but still front and center. However, the balcony listener
is enveloped in a completely realistic but mostly monophonic reverberant field and,
therefore, hardly notices that his ability to localize left and right sounds is minimal.
The lesson we want to draw from this is that mono recordings can be made to sound every
bit as realistic in the home concert hall as stereo recordings, if you don't mind the
impression of sitting further back in the auditorium. The same applies to recordings of
solo instruments such as the piano or a singer standing in the curve of the piano. (See
"Caruso On Stage" for advanced experiments in
mono reproduction)
The reproduction of single central mono or panned sources via two spaced
front loudspeakers is also prone to exactly the same crosstalk effects that result from
stereophonic reproduction, but, fortunately the solution is the same (see below) for both
mono and two channel recordings. To summarize, it is possible to have realism without
separation, via a combination of true hall ambience with a corrected front stage and this
is the main thrust of the Ambiophonic method.
The Stereophonic Illusion
There is a slightly flawed theory, still quoted quite often, that a
perfect replica of a given concert-hall sound field can always be produced by putting an
infinite number of stage-facing microphones at the front of the stage, all the way up to
the ceiling. After being stored on a recorder with an infinite number of channels, this
recording can then be played back through an infinite number of point-source loudspeakers,
each placed exactly as its corresponding microphone was placed. The performance
replication of such a wall would not be perfect, because the loudspeakers would not
radiate sound with the same directional characteristics as the sound impinging on the
microphone and the final result would also be impacted by the quality of the room into
which all these speakers were radiating but at least the stage would be wide, have depth,
and be realistic sounding.
As the number of microphones and speakers is reduced, the quality of the
sound field being simulated suffers. By the time we are down to two channels height cues
have certainly been lost and instead of a stage that is audible from anywhere in the room
we find that sources on the stage are now only localizable if we listen along a line
equidistant from the last two remaining speakers and face them. While there are many two
channel speaker arrangements possible, the most popular two-channel reproduction method is
the stereophonic technique of reproducing two-channel recordings through two loudspeakers
with the listener and the two speakers forming an equilateral or wider isosceles triangle.
Stereo takes advantage of one rather unnatural psychoacoustic illusion, which is that as a
recorded sound source moves on the stage from the left to the right, and as the playback
signal likewise shifts from the left speaker to the right speaker, most listeners hear a
virtual sound or phantom sound image move from one speaker position to the other. Compared
to real life hearing, the phantom audible illusion does not move linearly with a tendency
for the sound to jump to the speaker location as the sound moves to the side. If identical
sounds come from each speaker, (the monophonic case above) then most central listeners
hear a phantom sound that hangs in the air at the halfway point on the line between the
loudspeakers. Just as there are some individuals who cannot see optical illusions, so
there are a few individuals who cannot hear phantom images. Just as optical illusions are
just that-illusions that no sighted person would confuse with a true three-dimensional
object, so phantom stereo illusions could never be confused with a real acoustical sound
field. Nevertheless, for some 70 years this illusion of frontal separation and space is so
pleasing to most listeners that stereophonic reproduction has remained the standard music
reproduction technique ever since Alan Dower Blumlein applied for his patent at the end of
1931. (See The Blumlein Conspiracy)
The illusion created by stereo reproduction techniques is far from
perfect, even if the highest grade of audiophile caliber reproducing and recording
equipment is used. The first problem is that the image of the stage width is confined to
the arc that the listener sees looking from one speaker to the other. Occasionally, an
out-of-phase sound from the opposite loudspeaker, an accidental room reflection, or a
recording site anomaly will make an instrument appear to come from beyond the speaker
position. These images, however, are almost always ephemeral and often not reproducible.
Thus, in non-Ambiophonic systems, in order to get a useful stage width with stable
left-right localization, the loudspeakers must be placed at a wide enough angle to mimic
the angular proportions of a concert hall or theater stage. As we shall see in Chapter 4,
it is better if speakers are put closer together so as to complement the pinna part of the
binaural hearing mechanism. With most stereo systems, there is a "sweet spot" at
the point of the triangle where the listening is best. This, unfortunately, is what we are
faced with when only two front channels (or three for that matter) are available and the
"sweet spot" is also a characteristic of the Ambiophonic reproduction technique
described below although the spot is somewhat larger and less critical in the case of
Ambiophonics. It is difficult enough to recreate concert-hall sounds from two discrete
recorded channels (and even harder using multi channels) for one or two listeners in the
home, without trying to do it for a whole room full of people.
Stereophonic Crosstalk
By far the major defect of stereophonic reproduction is caused by the
presence of crosstalk at the listener's ears generated by the loudspeakers. Again, the
crosstalk is an artifact of stereophonic reproduction and is not present in the recording.
We will show that eliminating this crosstalk widens the stereo soundstage way beyond the
position of the loudspeakers, eliminates spurious frequency response peaks and dips (comb
filter effects), and allows the speakers to be moved much closer together eliminating the
need for phantom imaging or a center channel.
In a concert hall, direct sound rays from a centrally located instrument
reach each ear simultaneously: one ray per ear. See Figure 1,
left. By contrast, for a centrally located recorded sound source, reproduced in stereo,
identical rays come from the right and left speakers to the right and left ears, but a
second pair of uninvited, only slightly attenuated, longer, right and left speaker rays
also passes around the nose to the left and right ears. See Figure 1,
right.
The problem is that these unwanted rays, which cross in front of the eyes
and diffract around the back and top of the head, are delayed by the extra distance they
travel across the head. At its greatest, this distance is just under 7 inches. For an
average distance of say 3 1/2 inches, it takes sound one-quarter of a millisecond to do
this. A quarter of a millisecond is half the period and, therefore, half the wavelength of
a 2000 Hz tone. When two signals, one direct and one a half-wavelength delayed, but of
similar amplitude, meet at the ear, cancellation will occur. At 4000 Hz the delay is one
full wavelength and the sounds will add. Thus at frequencies from the octave above middle
C and up, all sounds add or subtract at the ears to a greater or lesser degree, depending
on the original sound source position, the angle to the speakers, the listener's head
position, nose size and shape, head size, differing path lengths around the head, and
other geometrical considerations. Note that if the sound source at the recording studio or
the listener at home moves a few feet or inches to the left or right, a whole new pattern
of additions and subtractions at different frequencies will assault the listener. This
interference phenomena is called comb filtering, and largely explains why many critical
listeners are so sensitive to small adjustments in stereo listening or speaker position,
and to relatively minute playback system electrical and acoustical delay or attenuation
characteristics.
Bock and Keele measured comb filter nulls as deep as 15 dB for the
60-degree stereo loudspeaker setup. Note that for extreme side images the comb-filter
effect is minimal. Thus the frequency response of a normal stereo setup actually depends
on the angular position of the original instrument or singer. As indicated above, it is
fascinating that these frequency response anomalies are not clearly audible as changes in
tone but rather manifest themselves as imprecisions in imaging and a sense that the music
is canned. But it is possible to hear the change in timbre caused by comb filtering.
Simply play pink noise from a test CD over your stereo system and rotate the balance
control from hard left to hard right. As the image of the noise passes thru the center one
can clearly hear a drop in the treble loudness of the noise and a distinct change in its
character.
As a possible, but as yet not truly proven example of how interaural
crosstalk and the family of acoustic notches it produces can heighten many listener's
sensitivity to component differences, the case of tube-versus-transistor amplifiers stands
out. There is often a subtle but audible difference between a vacuum-tube amplifier and a
transistor amplifier used alternately in a given stereo system that is still detectable
even after distortion, noise, power and volume characteristics are matched. This sonic
difference is usually described in terms of the stereophonic sound stage produced. The
image with one amplifier is said to be more transparent, wider, deeper, narrower,
shallower, more detailed, less ambient, or have more air than the other. However, if you
listen to just one channel with just one speaker and even better, one ear, there is of
course no stereo effect, and no crosstalk null patterns and so these audible differences
evaporate entirely. The apparent difference in sound-stage imaging due to changing from
tube to transistor amplifiers seems due to the different output impedances of these two
devices, leading to subtle but slightly audible changes in the stereo crosstalk sound
field. Vacuum tube amplifiers have a higher output impedance than transistor amplifiers,
sometimes as high as one or two ohms. Thus if two loudspeakers have slightly different
reactive treble impedances (due to crossover component or tweeter tolerances or control
settings) or the amplifier output impedances are not precisely matched due to tube aging
or bias drift, the delay differences in the treble range between the two channels will be
appreciatively greater in the high-impedance vacuum-tube case than in the more constant
voltage solid-state case. Note that a phase shift difference between speaker sounds of
only a few degrees can shift a stereo crosstalk comb filter null by hundreds of Hertz.
Similar arguments can be made for anything in a system that changes the comb-filtering
pattern including vacuum tube amplifier speaker cables, if length and type are seriously
mismatched. Even a small, one-degree phase shift change between the left and right
channels at 2000 Hz will cause a shift of 71 Hertz in the position of a crosstalk null or
peak.
Crosstalk comb-filter patterns are thus a function of any asymmetry in
amplifier output impedances or delays, differential delays in cables, or differential
speaker time delay by virtue of their positions relative to the listening position or
their impedance networks. For instance, a vacuum-tube driven left midrange speaker can
interact with a right tweeter to produce interaural crosstalk peaks and nulls that are
otherwise not present in the solid-state amplifier case. Such patterns may be audible to
some individuals. Any changes in the interaural crosstalk pattern are interpreted by the
brain as a spatial artifact such as more or less depth, air, or hollowness. Of course, any
change in listener position, or speaker location causes similar shifts in the crosstalk
peaks and nulls and further complicates equipment comparisons by ear in stereo or surround
sound. The irregular directional and largely unpredictable frequency response of the
standard stereophonic 60 to 90 degree listening arrangement would never be accepted in an
amplifier, a speaker, or a cable. Why such a basic listening system defect continues to be
so universally tolerated and studiously ignored is difficult to fathom.
The binaural perception of directional cues depends on both the relative
loudness of sound and the relative time of arrival of sound at each ear. Which mechanism
predominates depends on the frequency and the direction of the sound. Unfortunately, since
these delay and stereophonic comb-filter artifacts have an effect extending from below 500
Hz on up, they very seriously impact on both mechanisms and thus impair the ability of the
listener to detect angular position with lifelike ease. It is also these crossing rays
that limit stereo and surround sound imaging to the line between the two front speakers.
(See below) If we are to achieve anything close to concert-hall realism, we must eliminate
these crosstalk effects and provide a directionally correct single ray for each ear. But
first we will need to present evidence of the extraordinary sensitivity of the ear pinna
to such comb filter patterns.
Imaging Beyond the Speaker Positions
Another problem with stereophonic crosstalk is that it limits the apparent
stage width. For sound sources that originate, say, far to the right of the right
microphone, we can temporarily ignore the left channel microphone pickup. Then in the
stereophonic listening setup, the right speaker will send unobstructed sound to the right
ear and a somewhat modified version of the same sound to the left ear. The ear-brain
naturally localizes this everyday sound situation to the speaker position itself. Thus, no
matter how low the left channel volume is, the recorded image can never extend beyond the
right speaker in standard stereo. See Figure 2. If, however, the
right speaker sound ray crossing over to reach the left ear could be blocked or
attenuated, then at least the low and mid frequency sound could be localized to the
extreme right, well beyond the speaker position and just where the recording microphones
said the source was located. (High frequency localization is discussed in the next
chapter.) Remember, the microphones don't know that the playback will be in stereo with
crosstalk and therefore it is not the recording setup that limits stage width. Clearly,
eliminating the extra sound ray would result in wide spectacular imaging even from
existing two channel media. In Chapter five we will discuss two methods of eliminating
crosstalk as well as doing away with the stereo triangle altogether.
Loudspeaker Out-of-Phase Effects
In stereo systems it is necessary for the right and left main speakers to
be in phase or better expressed be of the same polarity. Phase in this case means that if
identical electrical signals are applied to each speaker, the speakers will both generate
a rarefaction, or both generate a compression in response to a simultaneous input pulse.
When a monophonic recording is played through a pair of out-of-phase loudspeakers, the
sound at the ears lacks bass, the phantom center image is not present, and a hazy,
undefined sound field seems to extend far beyond the speakers to the extreme sides and
sometimes even rearward. Similar effects only slightly less pronouncedare also present
using two channel sources.
These subjective effects can be better comprehended now that we understand
all about stereo crosstalk. It is clear that equal but out-of-phase very low frequency
signals, with wavelengths much longer than the width of the head will always arrive
unattenuated and 180 degrees out-of phase at either ear and therefore will always largely
cancel. This factor accounts for the thinness of the mono or central (L+R) stereo sound.
At somewhat higher frequencies the cancellation is not total. The left ear
hears pure left signal from the left speaker that is reduced only somewhat by the now
slightly delayed and thus only partially out-of-phase crosstalk from the right speaker.
Similarly, at that same instant the right ear is hearing a reduced but pure right-speaker
sound that is similar in amplitude but not identical to the pure left-ear sound because
the resultant sounds are still out-of-phase. We know that a midrange frequency sound heard
only in the right ear seems to come from the extreme right and a sound heard only in the
left ear seems to come from the extreme left. This phenomenon is still operative even if
the two sounds that come from the sides are identical in amplitude and timbre. Thus, one
can easily hear two identical bells as separate left and right sound sources. If, however,
we exchange the bells for pink noise, then we can hear the noise only as separate sources
when they are not precisely in step (uncorrelated). Since our signals are out of phase
they are not identical in time or highly auto correlated and therefore audible as separate
entities.
Thus, the inadvertent crosstalk elimination caused by out-of-phase
speakers that occurs at mid frequencies widens the perceived sound field. As the frequency
increases, instead of simple canceling, the comb-filtering effect predominates and the
position of the images becomes frequency, and therefore program, dependent, changing so
rapidly that no listener can sort out this hodgepodge of constantly shifting side images.
Most listeners describe this effect as diffuse, unfocussed or phasy. Even in Ambiophonics,
where crosstalk is eliminated the speakers should still be properly phased. In general,
mechanical or software crosstalk elimination is not fully effective or needed at very low
bass frequencies and so the bass out-of-phase thinness effect, while much reduced,
remains. In Ambiophonics, the audibility of the out-of-phase effect is much reduced. The
stage image still extends from the speakers outward when the recording calls for this.
That is, sound sources at the extreme right and left image just as they do when the
speakers are in-phase. This makes sense, since we are, listening to one sound source with
one ear.
To repeat. In the out-of-phase case, for most of the frequency range, each
ear is hearing a signal that is distinctive because the signals are of opposite polarity
and, therefore the ear localizes each sound as originating from beyond their respective
speakers. A phantom center image does not form and the infamous hole-in-the-middle
appears. In the out-of-phase Ambiophonic case the speakers are very close together.
Therefore, the middle hole is almost nonexistent and the bottom line is that, except for
extreme bass response, front speaker phasing or other timing anomalies are more critical
in stereo than in biophonics,
Absolute Polarity
When an instrument produces a sound, the sound consists of a series of
alternating rarefactions and compressions of air. The sonic signatures of such acoustic
musical instruments are determined by the pressure and spacing of these rarefactions and
compressions. Electronic recording and reproduction have now made it possible to turn
rarefactions into compressions and vice-versa.
The significance of this to the problem of establishing a home concert
hall is not entirely clear. But a few people seem to be able to hear a difference between
correct and incorrect polarity. Therefore, care should be taken that all amplifiers,
speakers and ambience sources, taken together, do not invert. Since acoustic reflectors in
concert halls do not invert polarity, the key early reflections, at least, should not be
inverted accidentally in home reproduction either and should be delivered to the ears with
the same polarity as the direct sound which is, one hopes, also of the correct absolute
polarity.
If you cannot tell one polarity from the other in your own system, don't
despair. For a few people, polarity is only audible when special test signals are used.
One possible reason for difficulty in this regard is the nature of many instruments. A
listener to the left of a violinist hears one polarity, while a listener to the right
hears the other polarity, assuming the string is vibrating in the same plane as the ears
of both listeners. But no matter where you stand around a trumpet you get the same
polarity. The inverted polarity sound in this case is inside the trumpet. Indeed it has
been reported that test subjects are more likely to hear polarity differences where wind
instruments are involved.
On balance, one would have to say that it does not pay to agonize over the
absolute polarity effect unless you are certain that you or your friends are sensitive to
it.