Banner2.GIF (3838 bytes)Related Articles

>Home
>What Is Ambiophonics
>Introduction
>MUST READ!
>Related Articles
>Testimonials
>Bio
>Email

>Photo Page

[ Technical Papers ]
^Updated 11/02/03^

Ambiophonics
2nd Edition
Introduction
Preface
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8

Chapter 9
Appendix A
Appendix B

Figures
>Figure 1
>Figure 2
>Figure 3
>Figure 4
>Figure 5

Realism in Music Reproduction
   Part II

Four Methods Used to Generate Reality at a Distance

Audio engineers have grappled with the problem of recreating sound fields since the time of Alexander Graham Bell. The classic Bell Labs theory suggests that a curtain, in front of a stage, with an infinite number of ordinary microphones driving a like curtain of remote loudspeakers can produce both an accurate and a realistic replica of a staged musical event and listeners could sit anywhere behind this curtain, move their heads and still hear a realistic sound field. Unfortunately, this method, even if it were economically feasible, fails on the first two counts with any finite number of speakers. Such a curtain can act like a lens and change the direction or focus of the sound waves that impinge on it. Like lightwaves, sound waves have a directional component that is easily lost in this arrangement either at the microphone, the speaker or both places. Thus each radiating loudspeaker in practice represents a new discrete sound source of uncontrolled directionality, communicating directly with both ears and therefore generating comb filter interference patterns and pinna directional distortion not present on the live side of the curtain.

Finally this curtain of loudspeakers does not radiate into a concert-hall size listening room and so one would have, say, an opera house stage attached to a listening room not even large enough to hold the elephants in Act 2 of Aida. This lack of opera-house ambience wouldn't by itself make this reproduction system sound unreal, even if the rest of the field were somehow made accurate, but it certainly wouldn't sound perfect. The use of speaker arrrays (walls of hundreds of speakers) surrounding a relatively large listening area have been shown to be able to synthesize any sound field in a room with remarkable accuracy. But while this technique may be useful in sound amplification systems in halls, theaters or labs, application to the playback of even multi-channel recordings in the home seems doubtful except for the use of speaker arrays at the sides and rear or even overhead to deliver truly diffuse, reconstituted reverberant ambience to the home listener.

In general, multi-channel recording methods or matrix surround systems (Hafler, SQ, QS, UHJ, Dolby, 5.1,etc.) seem like exciting improvements when first heard by long deprived stereo music auditors, but in the end don't sound real.

The Binaural Approach

A second more practical and often exciting approach is the binaural one. The idea is that, since we only have two ears, if we record exactly what a listener would hear at the entrance to each ear canal at the recording site and deliver these two signals, intact, to the remote listener's ear canals then both accuracy and realism should be perfectly captured. This concept almost works and could conceivably be perfected, in the very near future, with the help of advanced computer programs, particularly for virtual reality applications involving headsets or near field speakers. The problem is that if a dummy head, complete with modeled ear pinnae and ear canal embedded microphones, is used to make the recording, then the listener must listen with in-the-ear-canal earphones because otherwise the listeners own pinnae would also process the sound and spoil the illusion.

The real conundrum, however, is that the dummy head does not match closely enough any particular human listeners head shape or external ear to avoid the internalization of the sound stage whereby one seems to have a full symphony orchestra (and all of Carnegie Hall) from ear to ear and from nose to nape. Internalization is the inevitable and only logical conclusion a brain can come to when confronted with a sound field not at all processed by the head or pinnae. For how else could a sound have avoided these structures unless it originated inside the skull? If one uses a dummy head without pinnae, then, to avoid internalization, one needs earphones that stand off from the head, say, to the front. But now the direction of ambient sound is incorrect and side localization is not fully accurate. IMAX is an example of this off the ear method, as supplemented with loudspeakers. There is also a circumaural earphone design that places a tiny speaker just over the notch in the lower front part of the ear so that many of the pinna resonances are still normally excited for frontal sounds. A similar ear speaker over the upper rear part of the ear can provide a similar pinna-friendly input for rear originating sounds. Unfortunately, headshape differences between the dummy head and the listeners head remain, and the dummy head should not have modeled pinnae if these earphones are to be used.

The fact that binaural sound via earphones runs into so many difficulties is a powerful indication that individual head shapes and outer ear convolutions are critically important to our ability to sense sonic reality.

Wavefield Synthesis

A third theoretical method of generating both an accurate and a realistic soundfield is to actually measure the intensity and the direction of motion of the rarefactions and compressions of all the impinging soundwaves at the single best listening position during a concert and then recreate this exact sound wave pattern at the home listening position upon playback. This method is the one expounded by the late Michael Gerzon starting in the early 70's and embodied in the paradigm known as Ambisonics. In Ambisonics, (ignoring height components) a coincident microphone assembly, which is equivalent to three microphones occupying the same point in space, captures the complete representation of the pressure and directionality of all the sound rays at a single point at the recording site. In reproduction, speakers surrounding the listener, produce soundwaves that collectively converge at one point (the center of the listeners head) to form the same rarefactions and compressions, including their directional components, that were recorded.

In theory, if the reconstructed soundwave is correct in all respects at the center of the head (with the listeners head absent for the moment) then it will also be correct three and one half inches to the right or left of this point at the entrance to the ear canals with the head in place. The major advantage of this technique is that it can encompass front stage sounds, hall ambience and rear sounds equally, and that since it is recreating the original sound field (at least at this one point) it does not rely on the phantom image mechanism of Blumlein stereo. On the other hand Ambisonic theory is mute on the subject of how the sounds coming from the various loudspeakers are modified by the ear pinna and the head shape and how a decoder might compensate for these effects.

Thus the Ambisonic method is not easy to keep accurate at frequencies much over 2000 Hz and must and does rely on the apparent ability of the brain to ignore this lack of realistic high frequency pinna, head and waveform localization input and localize on the basis of the easier to reconstitute lower frequency waveforms alone. This would be fine if localization, by itself, equated to realism or we were only concerned with movie surround sound applications.

Other problems with basic Ambisonics include the fact that it requires at least three recorded channels (if we are concerned about quality) and therefore can do little for the vast library of existing recordings. Back on the technical problem side, one needs to have enough speakers around the listener to provide sufficient diversity in sound direction vectors to fabricate the waveform with exactitude and all these speakers positions, relative to the listener, must be precisely known to the Ambisonic decoder. Likewise the frequency, delay and directional responses of all the speakers must be known or closely controlled for best results and as in all other loudspeaker systems the effects of listening room reflections must also be taken into account, or better yet, eliminated.

As you might imagine, it is quite difficult, particularly as the frequency goes up, to insure that the size of the reconstructed field at the listening position is large enough to accommodate the head, all the normal motions of the head, the everyday errors in the listener's position, and more than one listener. Those readers who have tried to use the Lexicon panorama mode, the Carver sonic hologram or the Polk SDA speaker system, all designed to correct the higher frequency parts of a simple stereo soundfield at the listener's ear by acoustic cancellation will appreciate how difficult this sort of thing is to do in practice, even when only two speakers are involved.

In my opinion, however, the basic barrier to reality, via any single point waveform reconstruction method, like Ambisonics, is its present inability, as in the binaural case, to accommodate to the effects of the outer ear and the head itself on the shape of the waveform actually reaching the ear canal. For instance, if a wideband soundwave from a left front speaker is supposed to combine with a soundwave from a rear right speaker and a rear center speaker etc. then for those frequencies over say 2500 Hz the left ear pinna will modify the sound from each such speaker quite differently than expected by the equations of the decoder, with the result that the waveform will be altered in a way that is quite individual and essentially impossible for any practical decoder to control. The result is good low frequency localization but poor or non-existent pinna localization. Unfortunately, as documented below, mere localization, lacking consistency, as is unfortunately the case in stereo, surround sound or Ambisonics is no guarantor of realism. Indeed, if we must sacrifice a localization mechanism, let it be the lowest frequency one.

Finally, one can make a case that one can have glorious realism, even without any detailed front stage localization, as long as ambient localization is directionally correct (as anyone who has sat in the last row of the family circle in Carnegie Hall can attest to).

< Part I < | > Part III > | ^ Back to Top ^