>Home
>What Is Ambiophonics
>Introduction
>MUST READ!
>Related Articles
>Testimonials
>Bio
>Email
>Photo Page
[ Technical Papers ]
^Updated 11/02/03^
Ambiophonics
2nd Edition
Introduction
Preface
Chapter
1
Chapter
2
Chapter
3
Chapter
4
Chapter
5
Chapter
6
Chapter
7
Chapter 8
Chapter
9
Appendix
A
Appendix B
Figures
>Figure 1
>Figure 2
>Figure 3
>Figure 4
>Figure 5
|
Ambiophonics,
2nd Edition
Replacing Stereophonics to Achieve
Concert-Hall Realism
By Ralph Glasgal
Chapter 4
Pinna Power
Those fluted, rather grotesque,
protuberances that extend out from each ear canal are called pinnae. The importance of
satisfying one's pinnae by reproducing sound fields that complement their complex nature
cannot be exaggerated. Like fingerprints, no two individuals have exactly identical ear
pinnae. Thought to be vestigial, even as late as the mid 20th century, the intricacy which
characterizes these structures would suggest that their function must not only be very
important to the hearing mechanism but also that their working must be of a very complex,
personal and sensitive nature. For audiophiles in search of more realistic sound
reproduction, an understanding of how the pinna, head, and torso interact with
stereophonic or surround-sound fields is of importance since at the present time a major
mismatch exists. Repairing the discrepancy between what the present recording and playback
methods deliver and what the human ear pinnae expect and require is the last major
psychoacoustic barrier to be overcome, both in hi-fi music reproduction and in the hot PC
multi-media field.
We wish to duplicate the normal biological
binaural listening experience a listener would have had at a specific location in that
original space. As live music enthusiasts rather than seekers after virtual computer
reality, we are concerned with the recreation of horizontal-staged-acoustic, usually
musical, events recorded in enclosed spaces such as concert halls, opera houses, pop
venues, etc., where the listening position is centered, fixed and usually close to the
stage. I have called this two-channel subset of the broader 360-degree movie requirement
Ambiophonics because it is both related to and a suitable replacement for stereophonics.
Another way of stating a major goal of Ambiophonics and describing a still, unsolved
problem of virtual reality or surround auralization is the externalization of the binaural
earphone effect. In brief, this means duplicating the full, everyday binaural hearing
experience, either via earphones, without having the sound field appear to be within one's
head, or via loudspeakers, without losing either binaural's directional clarity or the
"cocktail party" effect whereby one can focus on a particular conversation
despite noise or other voices. So far this goal has eluded those researchers trying to
externalize the binaural effect over a full sphere or circle, but it can be done using
Ambiophonic methods for the front half of the horizontal plane.
Pinnae as Direction
Finders
It is intuitively obvious, as mathematicians
are fond of observing, that duplicating the binaural effect at home, simply involves
presenting at the entrance of the home ear canal an exact replica of what the same ear
canal would have been presented with at the live music event. But to get to the entrance
of the ear canal, almost all sound over about 1.5 kHz must first interact with the surface
of a pinna. Each pinna of your ear is in essence your own personal high frequency
direction finder. The pinna of my ear produces a quite different (and undoubtedly
superior) series of nulls and peaks than does yours. The sound that finally makes it to
the entrance of the ear canal, in the kilohertz region, is subject to severe attenuation
or boost, depending on the angle from which the sound originates as well as on its exact
frequency. Additionally, sounds that come from the remote side of the head are subject to
additional delay and filtering by the head and torso and this likewise very individual
head plus pinna characteristic is called the Head-Related Transfer Function or HRTF. In
this book I will try to distinguish between the functions of one pinna alone, both pinna
working together, the HRTF without any pinna effects, and finally the whole enchilada
which is understood to include the shadowing, reflection, and diffraction due to the head
and torso, and all the resonances and delays in the pinna cavities, particularly the large
bowl known as the concha.
The effects of the head and torso become
appreciable starting at frequencies around 500 Hz with the pinna becoming extremely active
over 1500 Hz. Because the many peaks and nulls of the HRTF are very close together and
sometimes very narrow it is exceedingly difficult to make measurements using human
subjects, and not every bit of fine structure can be captured, particularly at the higher
frequencies where the interference pattern is very hard to resolve. Figure 4.1 shows a series of measurements recorded by
Henrik Moller made using a small microphone placed right at the entrance to the ear canals
for several subjects. As the sound source moves about the head both the variety and the
complexity of the response is plainly evident. One can also see the obvious variation
between different auditors. Note that when the sound source is at the far side of the head
the curves include the head shadowing frequency response. Because the peaks or nulls are
so narrow and also because a null at one ear is likely to be something else at the other
ear, we do not hear these dips as changes in timbre or a loss or boost of treble response,
but, as we shall see, the brain relies on these otherwise inaudible serrations to
determine angular position with phenomenal accuracy.
Much research has been devoted to trying to
find an average pinna response curve and an average HRTF that could be used to generate
virtual reality sound fields for military and commercial use in computer simulations,
games, etc. So far no average pinna-HRTF emulation program has been found that satisfies
more than a minority of listeners and none of these efforts is up to audiophile standards.
Remember that a solution to this problem must take into account the fact that each of us
has a different pattern of sound transference around, over and under the head, as well as
differing pinna.
The moral of all this is that if you are
interested in exciting, realistic sound reproduction of concert hall music, it does not
pay to try to fool your pinna. If a sound source on a stage is in the center, then when
that sound is recorded and reproduced at home it had better come from speakers that are
reasonably straight ahead and not from nearby walls, surround or Ambisonic speakers. The
traditional equilateral stereophonic listening triangle is quite deficient in this regard.
It causes ear-brain image processing confusion for central sound sources because although
both ears get the same full range signal telling the brain that the source is directly
ahead, the pinnae are simultaneously reporting that there are higher frequency sound
sources at 30ƒ to the left and at 30ƒ to the right. All listeners will hear a center
image under these conditions, which is why stereophonic reproduction has lasted 70 years
so far, but almost no one would confuse this center image with the real thing.
Unfortunately, a recorded discrete center channel and speaker is of little help in this
regard. We will see later that such a solution has its own problems and is an unnecessary
expense that does nothing for the existing unencoded two-channel recorded library.
Testing Your Single
Pinna Power
A very simple experiment demonstrates the
ability of a single pinna to sense direction in the front horizontal plane at higher
frequencies. Set up a metronome or have someone tap a glass, run water, or shake a rattle
about ten feet directly in front of you. Close your eyes and locate the sound source using
both ears. Now, keeping your eyes closed, block one ear as completely as possible and
estimate how far the apparent position of the sound has moved in the direction of the
still-open ear. Most audio practitioners would expect that a sound that is only heard in
the right ear would seem to come from the extreme right, but you will find that in this
experiment the shift is seldom more than 5 degrees, and if you have great pinnae the
source may not move at all. A variation of this experiment is to spin around with your
eyes closed and then see how close you come to locating the sound source. In this case the
shadowing effect of the head assists the pinna in the process until you are facing the
source head on. These are both cases where the single pinna directional detecting system
is stronger than the interaural intensity effect and explains why one-eared individuals
can still detect sound source positions.
Another moral of this experiment is that for
most people, over the higher audible frequency range, which includes most musical
transients and harmonics, the one-eared pinna/head directional sense is easily a match for
the interaural or two-eared-intensity-time difference localization mechanism. Therefore,
all recorded music signals, including direct sound, early reflections, and reverberation
had better come from directions that please the pinnae, if you want your brain to accept
the listening experience as real.
If you now switch to a fuller range music
source, such as a small radio, and repeat the experiment above you will likely hear a
greater image shift, since the external ear and head are less important to sound
localization as the sound gets down to 400 Hz or so. Even the best stereo systems that
seemingly have great localization based on lower frequency interaural time and intensity
cues, still sound naggingly unrealistic because of the conflict between the interaural and
the intraaural localization mechanisms inherent in the old fashioned stereo triangle.
The Department of
the Interior
Eliminate the outer ears, and all the sound
will appear to originate inside your head. Do you doubt this? Then open your mouth and hum
or sing with your mouth open. You will hear this sound coming from the lip area. Now put
both hands over your ears and the sound will jump up into the middle of your skull. Every
child has tried this at one time except maybe you. What the effect illustrates is that in
the complete absence of pinna and head shape filtering, the brain makes the only perfectly
logical decision it can based on the sonic facts. That is, that the sound must originate
from a point on the brain side of the eardrum, for how otherwise could the sound have
avoided being modified by the pinna, the head, and the ear canal.
Now while listening to running water or
other transient rich sound, bring the flat palms of your hands to within a half-inch of
both your ears. You will hear the character of the sound change, usually in a manner that
makes the sound seem closer to you. The presence of the additional mass and enclosed air
trapped between your palm and ear interferes with the resonances in the cavities of the
pinna and changes what you think you hear.
These effects, are why it is so difficult to
get a natural externalized sound image using earphones. In-the-ear-canal phones, while
quite realistic compared to stereo, are especially prone to producing very pronounced
internalization. Again, it does not pay to fool pinna nature and that is why the
Ambiophonic method limits itself to using loudspeakers.
I Am Not Alone
Martin D. Wilde, in his paper,
"Temporal Localization Cues and Their Role in Auditory Perception" AES Preprint
3798, Oct., 1993 states:
"There has been much discussion in
the literature whether human localization ability is primarily a monaural or binaural
phenomena. But interaural differences cannot explain such things as effective monaural
localization. However, the recognition and selection of unique monaural pinna delay
encodings can account for such observed behavior. This is not to say that localization is
solely a monaural phenomenon. It is probably more the case that the brain identifies and
makes estimates of a sound's location for each ear's input alone and then combines the
monaural results with some higher-order binaural processor."
Again, any reproduction system that does not
take into account the sensitivity of the pinna to the direction of music incidence will
not sound natural or realistic. Two-eared localization is not superior to one-eared
localization, they must both agree at all frequencies for realistic concert hall music
reproduction.
Pinna and Phantom
Images at the Sides
A phantom front center image can be
generated by feeding identical in-phase signals to speakers at the front left and front
right of a forward facing listener. Despite the inferiority of the phantom illusion, the
surround sound crowd would be ecstatic if they could pan as good a phantom image, to the
side, in a similar way, by feeding in-phase signals just to a right front and a right rear
speaker pair. Unfortunately, phantom images cannot be panned this way between side
speakers. The reason realistic phantom side images are difficult to generate is that we
are largely dealing with a one-eared hearing situation. Let us assume that for a right
side sound only negligible sound is reaching the remote left ear. We already know that the
only directional sensing mechanism a one-eared person has for higher frequency sound is
the pinna convolution mechanism. Thus if a sound comes from a speaker at 45 degrees to the
front, the pinna will locate it there. If, at the same time, a similar sound is coming
from 45 degrees to the rear, one either hears two discrete sound sources or one speaker
predominates and the image hops backward and forward between them. Of course, some sound
does leak around the head to the other ear and depending on room reflections, this affects
every individual differently and unpredictably. One can also use Ambisonic or HRTF
processing to position side virtual images but such methods usually do not sound realistic
where music is concerned.
Apparent Front Stage
Width
The sensitivity of the ears to the direction
from which a sound originates, mandates that to achieve realistic Ambiophonic
reproduction, all signals in the listening room must originate from directions that will
not confuse the ear-brain system. Thus if a concert hall has strong early reflections from
55 degrees (as the best halls should) then the home reproduction system should similarly
launch such reflections from approximately this direction. In the same vein, much stage
sound, particularly that of soloists, originates in the center twenty degrees or so more
often than at the extremes. Thus it makes more sense to move the front-channel speakers to
where the angle to the listening position is on the order of ten degrees instead of the
usual thirty. This eliminates most of the pinna angular position distortion.
One might suppose that, if a main speaker is
in front, that sounds that are meant to image to the extreme sides will suffer from pinna
angle distortion and that we will just have traded the central pinna angle error of the
stereo triangle for the side pinna angle error of Ambiophonics. But if you look at the
curves of Figures 4.1 and 4.2 you will see that
at the wider angles beyond say 60 degrees a sound coming from the side has a clear shot at
the entrance to the ear canal and thus pinna curve is relatively flat and therefore
minimal. In practice Ambiophonics easily produces easy to listen to images out to 85
degrees either side of center.
It should also be remembered that, in an
Ambiophonic sound field, a seemingly narrower stage is simply equivalent to moving back a
few rows in the auditorium and so has not proven to be noticeable. In the same vein, the
sensitivity of the pinnae to the directions from which any sound comes dictates that
reconstructed or recorded early reflections or reverberant tails attributed to the sides
or rear of a concert hall should not come to the home ears from the main front speakers.
Pinna Considerations
in Binaural or Stereo Recording
The pinna must be taken into account when
recordings are made, particularly recordings made with dummy heads. For example, if a
dummy-head microphone has molded ear pinnae then such a recording will only sound
exceptionally realistic if played back through earphones that fit inside the ear canal.
Even then, since each listener's pinnae are different from the ones on the microphone,
most listeners will not experience an optimum binaural effect. On the other hand, if the
dummy head does not have pinnae, then the recording should either be played back
Ambiophonically, using loudspeakers, or through earphones that stand out in front of the
ears far enough to excite the normal pinna effect. (As in the IMAX system, loudspeakers
can then be used to provide the lost bass.)
But one must also take into account the
head-related effects as well. Thus if one uses a dummy head microphone without pinnae,
then listening with stereo spaced loudspeakers would produce side image distortion, due to
the doubled transmission around, over and under both the microphone head and the
listener's head.
The Rule Is:
In any recording/reproduction chain there
should be only one set of Pinnae and it better be yours and only one but at least one head
which need not necessarily be yours.
Normal two channel recordings LP or CD or
DVD are not inherently old stereo. No recording engineer takes into account the crosstalk
and the pinna response errors in reproduction when microphones are selected and spaced.
Panning equations used to shift sonic images, likewise, seldom consider the full extent of
HRTF effects. This is fortunate since the existing library of recordings is thus not
obsoleted in the slightest where Ambiophonic reproduction and the pinna are concerned.
Pinna Foolery or
Feet of Klayman
Arnold Klayman (SRS, NuReality) (and many
other companies) has gamely tackled the essentially intractable problem of manipulating
parts of a stereo signal to suit the angular sensitivity of the pinna, while still
restricting himself to just two loudspeakers. To do this, he first attempts to extract
those ambient signals in the recording that should reasonably be coming to the listening
position from the side or rear sides. There is really no hi-fi way to do this, but let us
assume, for argument's sake, that the difference signal (l-r) is good enough for this
purpose, particularly after some Klayman equalization, delay and level manipulation. This
extracted ambient information, usually mostly mono by now, must then be passed through a
filter circuit that represents the side pinna response for an average ear. Since this
pinna-corrected ambience signal is to be launched from the main front speakers, along with
the direct sound, these modified ambience signals are further corrected by subtracting the
front pinna response from them. The fact that all this legerdemain produces an effect that
many listeners find pleasing is an indication that the pinnae have been seriously
impoverished by Blumlein stereo for far too long, and is a tribute to Klayman's
extraordinary perseverance and ingenuity.
While Klayman's and other similar boxes cost
relatively little and are definitely better than doing nothing at all about pinna
distortion, any method that relies on average pinna response or, like matrixed forms of
surround sound, attempts to separate early reflections, reverberant fields or extreme side
signals from standard or matrixed stereo recordings of music is doomed to only minor
success. The Klayman approach must also consider that an average HRTF is also required and
should be used when launching side images from the front speakers. Someday we will all be
able to get our own personal pinna and HRTF responses measured and stored on CD-ROM for
use in Klayman type-synthesizers, but until then, the bottom line, for audiophiles, is
that the only way to minimize pinna and head-induced image distortion is to give the
pinnae what they are listening for. This means launching all signals as much as is
feasible from the directions nature intended and requires that pure ambient signals such
as early reflections and hall reverberation (uncontaminated with direct sound) come from
additional speakers, appropriately located. It implies that recorded ambient signals,
inadvertently coming from the front channels, have not been unduly enhanced to the point
where the anomaly of rear hall reverb coming strongly from up front causes subconscious
confusion. (Most CDs and LPs are fine in this regard but would be improved by a more
Ambiophonic recording style.) It means that strong room reflections that allow almost
undelayed direct sound to hit the listener from the wrong angle or allow early reflections
to come from the sides, the ceiling, the floor or the rear wall, have been eliminated
through inexpensive and simple room treatment and/or through the use of focused (point
source or collimated) loudspeakers. Finally it means moving the left and right main
loudspeakers much closer together, as discussed in the following chapters.
Two-Eared Pinnae
Effects
So far we have been considering single ear
and head response effects. Now we want to examine the even more dramatic contribution of
both pinnae and the head, jointly, to the interaural hearing mechanism that gives us such
an accurate ability to sense horizontal angular position. William B. Snow, a one-time Bell
Telephone Labs researcher, in 1953, and James Moir of CBS in Audio Magazine, in 1952,
reported that for impulsive clicks or speech and, by extension, music, differences in
horizontal angular position as small as one degree could be perceived. For a source only
one degree off dead ahead we are talking about an arrival-time difference between the ears
of only about ten microseconds and an intensity difference just before reaching the ears
so small as not to merit serious consideration. Moir went even further and showed that
with the sound source indoors (even at a distance of 55 feet!), and using sounds limited
to the frequency band over 3000 Hz, that the angular localization got even better,
approaching half a degree. It appears that when it comes to the localization of sounds
like music, the ear is only slightly less sensitive than the eyes in the front horizontal
plane.
It is not a coincidence that the ear is most
accurate in sensing position in the high treble range, for this is the same region where
we find the extreme gyrations in peaks and nulls due to pinna shape and head diffraction.
This is also the frequency region where interaural intensity differences have long been
claimed to govern binaural perception. However, it is not the simple amplitude difference
in sound arriving at the outer ears that matters, but the difference in the sound at the
entrance to the ear canal after pinna convolution.
Going even further, at frequencies in excess
of 2000 Hz it is not the average intensity that matters but the differences in the pattern
of nulls and peaks between the ears that allow the two-eared person to locate sounds
better than the one-eared individual. Remember that at these higher audible frequencies,
direct sounds bouncing off the various surfaces of the pinna add and subtract at the
entrance to the ear canal. This random and almost unplottable concatenation of hills and
deep valleys is further complicated by later but identical sound that arrives from hall
(but hopefully not home) wall reflections or from over, under, the front of, or the back
of the head. This pattern of peaks and nulls is radically different at each ear canal and
thus the difference signal between the ears is a very leveraged function of both frequency
and source position. In their action a pair of pinnae are exquisitely sensitive mechanical
amplifiers that convert small changes in incident sound angles to dramatic changes in the
fixed unique, picket fence, patterns that each individual's brain has learned to associate
with a particular direction.
Another way of describing this process is to
say that the pinna converts small differences in the angle of sound incidence into large
changes in the shape of complex waveforms by inducing large shifts in the amplitude and
even the polarity of the sinewave components of such waveforms. (Martin D.Wilde, see
above, also posits that the pinna generate differential delays or what amount to micro
reflections or echoes of the sound reaching the ear and that the brain is also adept at
recognizing these echo patterns and using them to determine position. Since such temporal
artifacts would be on the order of a few microseconds it seems unlikely that the brain
actually makes use of this time delay data.)
Angular Perception
at Higher Frequencies
To put the astonishing sensitivity of the
ear in perspective, a movement of one degree in the vicinity of the median plane (the
vertical plane bisecting the nose) corresponds to a differential change in arrival time at
the ears of only 8 microseconds. Eight microseconds can be compared to a frequency of
120,000Hz or a phase shift of 15 degrees at 5kHz. I think we can all agree that the
ear-brain system could not possibly be responding to such differences directly. But when
we are dealing with music that is rich in high-frequency components, a shift of only a few
microseconds can cause a radical shift in the frequency location, depths, and heights of
the myriad peaks and nulls generated by the pinnae in conjunction with the HRTF. To
repeat, it is clear that very large amplitude changes extending over a wide band of
frequencies at each ear and between the ears can and do occur for small source or head
movements. It is these gross changes in the fine structure of the interference pattern
that allow the ear to be so sensitive to source position.
Thus, just considering frequencies below
10kHz, at least one null of 30db is possible for most people at even shallow source
angles, for the ear facing the sound source. Peaks of as much as 10db are also common. The
response of the ear on the far side of the head is more irregular since it depends on
head, nose and torso shapes as well as pinna convolution. One can easily see that a
relatively minute shift in the position of a sound source could cause a null at one ear to
become a peak while at the same time a peak at the other ear becomes a null resulting in
an interaural intensity shift of 40db! When we deal with broadband sounds such as musical
transients, tens of peaks may become nulls at each ear and vice versa, resulting in a
radical change in the response pattern, which the brain then interprets as position or
realism rather than as timbre.
In setting up a home listening system, it is
not possible to achieve a realistic concert hall sound field unless the cues provided by
the pinnae at the higher frequencies match the cues being provided by the lower
frequencies of the music. When the pinna cues don't match the interaural low frequency
amplitude and delay cues, the brain decides that the music is canned or that the
reproduction lacks depth, precision, presence, and palpability or is vague, phasey, and
diffuse. But even after insuring that our pinnae are being properly serviced, other
problems are inherent in the old stereo or new multi-channel surround-sound paradigms. We
must still consider and eliminate the psychoacoustic confusion that always arises when
there are two or three widely spaced front loudspeakers delivering information about a
stage position but erroneously communicating with both pinnae and both ear canals. We must
deal with non-pinna induced comb-filter effects and the stage-width limitations still
inherent in these modalities even after 64 years. But this is a subject for the next
chapter.
<
Chapter 3 | ^ Back to Top ^ | Chapter
5 >
|