|The Science of Domestic Concert Hall Design|
by Ralph Glasgal
Ambiophonics, 2nd Edition
Replacing Stereophonics to Achieve Concert-Hall Realism
By Ralph Glasgal, Founder Ambiophonics Institute, Rockleigh, New Jersey www.ambiophonics.org
Ambiophonics, considering music rather than video for the moment, is the logical replacement for stereophonics and a technical methodology which, if adhered to closely, makes it possible to immerse oneself in an exceedingly real acoustic space, sharing it with the music performers on the stage in front of you. Ambiophonics can do this, even using ordinary standard and existing two channel recordings. We will show in the chapters that follow that, as hard as this may be to believe, there is nothing to be gained as far as realism in acoustic music reproduction is concerned by using more than two recorded channels (as opposed to multi-speaker) and that the complex microphone arrangements that multichannel recording implies are actually deleterious and wasteful of bandwidth that could be put to better use. Ambiophonics is like a visit to a concert hall and is for serious listeners who do not often read, talk, eat, knit, or sleep in their home concert halls, any more than they would at a live performance.
Ever since 1881 when Clement Ader ran signals from ten spaced pairs of telephone carbon microphones clustered on the stage of the Paris Opera via phone lines to single telephone receivers in the Palace of Industry that were listened to in pairs, practitioners of the recording arts have been striving to reproduce a musical event taking place at one location and time at another location and time with as little loss in realism as possible. While judgments as to what sounds real and what doesn't may vary from individual to individual, and there are even some who hold that realism is not the proper concern of audiophiles, such views of our hearing life should not be allowed to slow technical advances in the art of realistic auralization that listeners may then embrace or disdain as they please.
What is Realism in Sound Reproduction?
Realism in staged music sound reproduction will usually be understood to mean the generation of a sound field realistic enough to satisfy any normal ear-brain system that it is in the same space as the performers, that this is a space that could physically exist, and that the sound sources in this space are as full bodied and as easy to locate as in real life. Realism does not necessarily equate to accuracy or perfection. Achieving realism does not mean that one must slavishly recreate the exact space of a particular recording site. For instance, a recording made in Avery Fisher Hall but reproduced as if it were in Carnegie Hall is still realistic, even if inaccurate. While a home reproduction system may not be able to outperform a live concert in a hall the caliber of Boston's Symphony Hall, in many cases the home experience can now exceed a live event in acoustic quality. For example, a recording of an opera made in a smallish studio can now easily be made to sound better at home than it did to most listeners at a crowded recording session. One can also argue that a home version of Symphony Hall, where one is apparently sitting tenth row center, is more involving that the live experience heard from a rear side seat in the balcony with obstructed visual and sonic prospect. In a similar vein, realism does not mean perfection. If a full symphony orchestra is recorded in Carnegie Hall but played back as if it were in Carnegie Recital Hall, one may have achieved realism but certainly not perfection. Likewise, as long as localization is as effortless and as precise as in real life, the reproduced locations of discrete sound sources usually don't have to be exactly in the same positions as at the recording site to meet the standards of realism discussed here. (Virtual Reality applications, by contrast, often require extreme accuracy but realism is not a consideration.) An example of this occurs if a recording site viewed from the microphone has a stage width of 120 degrees but is played back on a stage that seems only 90 degrees wide. What this really means in the context of realism is that the listener has moved back in the reproduced auditorium some fifteen rows, but either stage perspective can be legitimately real. But being able to localize a stage sound source in a stereo or surround multi channel system does not guarantee that such localization will sound real. For example, a soloist's microphone panned by a producer to one loudspeaker is easy to localize but almost never sounds real.
In a similar vein, one can make a case that one can have glorious realism, even without any detailed front stage localization, as long as the ambient field is correct. Anyone who has sat in the last row of the family circle in Carnegie Hall can attest to this. This kind of realism makes it possible to work seeming miracles even with mono recordings.
Reality is in the Ear of the Behearer
While it is always risky to make comparisons between hearing and seeing, I will live dangerously for the moment. If from birth, one were only allowed to view the world via a small black and white TV screen, one could still localize the position of objects on the video screen and could probably function quite well. But those of us with normal sight would know how drab, or I would say unrealistic, such a restricted view of the world actually was. If we now added color to our subject's video screen, the still grossly handicapped (by our standards) viewer would marvel at the previously unimaginable improvement. If we now provided stereoscopic video, our now much less handicapped viewer would wonder how he had ever functioned in the past without depth perception or how he could have regarded the earlier flat monoscopic color images as being realistic. Finally, the day would come when we removed the small video screens and for the first time our optical guinea pig would be able to enjoy peripheral vision and the full resolution, contrast and brightness that the human eye is capable of and fully appreciate the miracle of unrestricted vision. The moral of all this is that only when all the visual sense parameters are provided for, can one enjoy true visual reality and the same is true for sonic reality.
Since most of us are quite familiar with what live music in an auditorium sounds like, we can sense unreality in reproduction quite readily. But in the context of audio reproduction, the progression toward realism is similar to the visual progression above. To make reproduced music sound fully realistic, the ears, like the eyes, must be stimulated in all the ways that the ear-brain system expects. Like the visual example, when we go from mono to stereo to matrix surround to multi-channel discrete, etc. we marvel at each improvement. But since we already know what real concert halls sound like, we soon realize that something is missing. In general, multi-channel recording methods or matrix surround systems (Hafler, SQ, QS, UHJ, Dolby, 5.1,etc.) seem like exciting improvements when first heard by long realism deprived stereo music auditors, but in the end don't sound real. What is usually missing is completeness and sonic consistency. One can only achieve realism if all the ear's expectations are simultaneously satisfied. If we assume that we know exactly how all the mechanisms of the ear work, then we could conceivably come up with a sound recording and reproduction system that would be quite realistic. But if we take the position that we don't know all the ear's characteristics or that we don't know how much they vary from one individual to another or that we don't know the relative importance of the hearing mechanisms we do know about, then the only thing we can do, until a greater understanding dawns, is what Manfred Schroeder suggested over a quarter of a century ago, and deliver to the remote ears a realistic replica of what those same ears would have heard when and where the sound was originally generated.
Four Methods Used to Generate Reality at a Distance
Audio engineers have grappled with the problem of recreating sound fields since the time of Alexander Graham Bell. The classic Bell Labs theory suggests that a curtain, in front of a stage, with an infinite number of ordinary microphones driving a like curtain of remote loudspeakers can produce both an accurate and a realistic replica of a staged musical event and listeners could sit anywhere behind this curtain, move their heads and still hear a realistic sound field. Unfortunately, this method, even if it were economically feasible, does not deliver either accuracy or realism. Such a curtain acts like a lens and changes the direction or focus of the sound waves that impinge on it. Like light waves, sound waves have a directional component that is easily lost in this arrangement either at the microphone, the speaker or both places. Thus each radiating loudspeaker, in practice, represents a new discrete source of sound with uncontrolled directionality, possibly diverting sound meant for oblivion in the ceiling down to the listener and causing other sounds to impinge on the head at odd angles.
Finally this curtain of loudspeakers does not radiate into a concert-hall size listening room and so one would have, say, an opera house stage attached to a listening room not even large enough to hold the elephants in Act 2 of Aida. This lack of opera-house ambience wouldn't by itself make this reproduction system sound unreal, even if the rest of the field were somehow made accurate, but it certainly wouldn't sound perfect. The use of speaker arrays (walls of hundreds of speakers) surrounding a relatively large listening area has been shown to be able to reproduce ambient sound fields with remarkable accuracy. But while this technique may be useful in sound amplification systems in halls, theaters or labs, application to playback in the home seems doubtful. This approach is called Wavefield Synthesis or WFS.
The Binaural Approach
A second more practical and often exciting approach is the binaural one. The idea is that, since we only have two ears, if we record exactly what a listener would hear at the entrance to each ear canal at the recording site and deliver these two signals, intact, to the remote listener's ear canals then both accuracy and realism should be perfectly captured. This concept almost works and could conceivably be perfected, in the very near future, with the help of advanced computer programs, particularly for virtual reality applications involving headsets or near field speakers. The problem is that if a dummy head, complete with modeled ear pinnae and ear canal embedded microphones, is used to make the recording, then the listener must listen with in-the-ear-canal earphones because otherwise the listeners own pinnae would also process the sound and spoil the illusion.
The real conundrum, however, is that the dummy head does not match closely enough any particular human listeners head shape or external ear to avoid the internalization of the sound stage whereby one seems to have a full symphony orchestra (and all of Carnegie Hall) from ear to ear and from nose to nape. Internalization is the inevitable and only logical conclusion a brain can come to when confronted with a sound field not at all processed by the head or pinnae. For how else could a sound have avoided these structures unless it originated inside the skull? If one uses a dummy head without pinnae, then, to avoid internalization, one needs earphones that stand off from the head, say, to the front. But now the direction of ambient sound is incorrect. The original 3D IMAX is an example of this off the ear method, as supplemented with loudspeakers for bass and rear direct sound effects.
The fact that binaural sound via earphones runs into so many difficulties is a powerful indication that aveerage head shadows and individual outer ear convolutions are critically important to our ability to sense sonic reality but as we shall see loudspeaker binaural is an essential element of the Ambiophonic paradigm.
A third theoretical method of generating both an accurate and a realistic soundfield is to actually measure the intensity and the direction of motion of the rarefactions and compressions of all the impinging soundwaves at the single best listening position during a concert and then recreate this exact sound wave pattern at the home listening position upon playback. This method is the one expounded by the late Michael Gerzon starting in the early 70's and embodied in the paradigm known as Ambisonics. In Ambisonics, (ignoring height components) a coincident microphone assembly, which is equivalent to three microphones occupying the same point in space, captures the complete representation of the pressure and directionality of all the sound rays at a single point at the recording site. In reproduction, speakers surrounding the listener, produce soundwaves that collectively converge at one point (the center of the listeners head) to form the same rarefactions and compressions, including their directional components, that were heard by the microphone.
In theory, if the reconstructed soundwave is correct in all respects at the center of the head (with the listeners head absent for the moment) then it will also be correct three and one half inches to the right or left of this point at the entrance to the ear canals with the head in place. The major advantage of this technique is that it can encompass front stage sounds, hall ambience and rear direct sounds equally, and that since it is recreating the original sound field (at least at this one point) it does not rely on the quirky phantom image illusion of traditional Blumlein stereo.
The Ambisonic method is not easy to keep accurate at frequencies much over 1500 Hz and thus must and does rely on the apparent ability of the brain to ignore this lack of realistic high frequency localization input and localize on the basis of the easier to reconstitute lower frequency waveforms alone. This would be fine if localization, by itself, equated to realism or we were only concerned with movie surround sound applications.
Other problems with basic (first order) Ambisonics include the fact that it requires at least three recorded channels and therefore can do nothing for the vast library of existing recordings. Back on the technical problem side, one needs to have enough speakers around the listener to provide sufficient diversity in sound direction vectors to fabricate the waveform with exactitude and all these speakers positions, relative to the listener, must be precisely known to the Ambisonic decoder. Likewise the frequency, delay and directional responses of all the speakers must be known or closely controlled for best results and as in many loudspeaker systems the effects of listening room reflections must also be taken into account, or better yet, eliminated. Higher order ambisonics (HOA) require many more media channels and speakers and so is not very useful in a home system context.
As you might imagine, it is quite difficult, particularly as the frequency goes up, to insure that the size of the Ambisonic field at the listening position is large enough to accommodate the head, all the normal motions of the head, the everyday errors in the listener's position, and more than one listener. Those readers who have tried to use the Lexicon panorama mode, the Carver sonic hologram or the Polk SDA speaker system, all designed to correct parts of a simple stereo soundfield at the listener's ear by acoustic cancellation will appreciate how difficult this sort of thing is to do in practice, even when only two speakers are involved.
In my opinion, however, the basic barrier to reality, via any single point waveform reconstruction method, like Ambisonics, is its present inability, as in the earphone binaural case, to accommodate to the effects of the outer ear and the head itself on the shape of the waveform actually reaching the ear canal. For instance, if a wideband soundwave from a left front speaker is supposed to combine with a soundwave from a rear right speaker and a rear center speaker etc. then for those frequencies over say 2500 Hz the left ear pinna will modify the sound from each such speaker quite differently than expected by the equations of the decoder, with the result that the waveform will be altered in a way that is quite individual and essentially impossible for any practical decoder to control. The result is good low frequency localization but poor or non-existent pinna localization. Unfortunately, as documented below, mere localization, lacking consistency, as is unfortunately the case in stereo, 5.1 surround sound or Ambisonics is no guarantor of realism. Indeed, if a system must sacrifice a localization mechanism, let it be the lowest frequency one.
The fourth approach, that I am aware of, I have called Ambiophonics. Ambiophonics assumes that there are more localization mechanisms than are dreamed of in the previous philosophies and strives to satisfy them all, even the unknown ones. The advantage of focusing on sonic reality is that this reality is achievable today, is reasonable in cost, and is applicable to existing LPs, CDs, DVDs, movies, games, in homes, cars, PCs, etc.
One basic element in Ambiophonic theory, in the case of music, is that it is best not to record rear and side concert-hall ambience or try to extract it later from a difference signal or recreate it via waveform reconstruction, but to regenerate the ambient part of the field using real, stored concert hall, data to generate early reflections and reverberant tail signals using the new generation of digital signal processors. The variety and accuracy of such synthesized ambient fields is limited only by the skill of programmers and data gatherers, and the speed and size of the computers used. Thus, in time, any wanted degree of concert hall design perfection could be achieved. A library of the worlds great halls may be used to fabricate the ambient field as has already been done in the pioneering JVC XP-A1010. The number of speakers needed for ambience generation does not need to exceed six or eight (although Tomlinson Holman of THX fame is now up to ten and I usually go with 16) and is comparable to Ambisonics or 7.1 surround sound in this regard. But even more speakers could be used as this ambience recovery method, called convolution, is completely scaleable and the quality and location of these speakers is not critical.
Ambiophonics is less limited as to the number of listeners who can share the best experience at the same time than stereo, 5.1 or most implementations of other methods using a similar number of speakers but Ambiophonics is certainly not suited to group listening. However, like a non-ideal seat in a concert hall one has a marked sense of space anywhere in the room while the orchestra is playing somewhere over there.
The other basic tenet of Ambiophonics is similar to Ambisonics and that is to recreate at the listening position an exact replica of the original pressure soundwave. Ambiophonics does this by transporting you to the sound source, stage, and hall. In other words, Ambiophonics externalizes the binaural effect, using, as in the binaural case, just two recorded channels but with two front stage reproducing loudspeakers and eight or so ambience loudspeakers in place of earphones. Ambiophonics generates stage image widths up to almost 180 degrees with an accuracy and realism that far exceeds that of any other 2 channel or even multi channel recording scheme.
Psychoacoustic Fundamentals Related to Realism in Reproduced Sound
The question is how to achieve realistic sound with the psychoacoustic knowledge at hand or suspected. For starters, the fact that separated front loudspeakers can produce centrally located phantom images between themselves is a psychoacoustic fluke akin to an optical illusion that has no purpose or counterpart in nature and is a poor substitute for natural frontal localization. Any reproduction method that relies on stimulating phantom images, and this includes not only stereo but most versions of surround sound, can never achieve realism even if they achieve localization. Realism cannot be obtained merely by adding surround ambience to frontal phantom localization. Ambisonics, earphone binaural, and Ambiophonics do not employ the phantom image mechanism to provide the front stage localization and therefore, in theory, should all sound more realistic than stereo and, in fact, almost always do.
The optimized Ambiophonic microphone arrangement discussed later could make this approach to realism even more effective, but I am happy to report that Ambiophonics works quite well with most of the microphone setups used in classical music, video, or audiophile caliber jazz recordings. Adding home-generated ambience, provides the peripheral sound vision to perfect the experience.
Since our method is to just give the ears everything they need to get real, it is not essential to prove that the pinna are more important than some other part of the hearing mechanism, but the plain fact is that they are. To me it seems inconceivable that anyone could assume that the pinna are vestigial or less sensitive in their frequency domain then the other ear structures are in theirs. As a hunter-gatherer animal, it would be of the utmost importance to sense the direction of a breaking twig, a snake's hiss, an elephant's trumpet, a birds call, the rustle of game etc. and probably of less importance to sense the lower frequency direction of thunder, the sigh of the wind, or the direction of drums. The size of the human head clearly shows the bias of nature in having humans extra sensitive to sounds over 700 Hz.
Look at your ears. The extreme non-linear complexity of the outer ear structures, and their small dimensions defies mathematical definition and clearly implies that their exact function is too complex and too individual to understand, much less fool, except in half-baked ways. The convolutions and cavities of the ear are so many and so varied so as to make sure that their high frequency response is as jagged as possible and as distinctive a function of the direction of sound incidence as possible. The idea is that no matter what high frequencies a sound consists of or from what direction a transient sound comes from, the pinnae and head together or even a single pinna alone will produce a distinctive pattern that the brain can learn to recognize in order to say this sound comes from over there.
The outer ear is essentially a mechanical converter that maps sound arrival directions to preassigned frequency response patterns. There is also no purpose in having the ability to hear frequencies over 10 kHz, say, if they cannot aid in localization. The dimensions of the pinna structures and the measurements by Moller, strongly suggest, if not yet prove, that the pinna do function for this purpose even in the highest octave. Moller's curves of the pinna and head functions with frequency and direction are so complex that the patterns are largely unresolvable and very difficult to measure using live subjects. Again, it doesn't matter whether we know exactly how anyone's ears work as long as we don't introduce psychoacoustic anomalies or compromise on the delivery of frequency response, dynamic range, loudness, low distortion, and especially source and ambience directionality, during reproduction.
Basics of Concert Hall Psychoacoustics
In order to produce a concert-hall sound field or any other sonic experience in the home without actually building a concert hall, we need to know what the ear requires at the minimum for accepting a sound field as real. Knowing this, it is then possible to look for ways to accomplish this feat in a small space and within a budget, without compromising the reality of the aural illusion. While not everything is known about how the ear perceives distance, horizontal and vertical angular position, hall enclosure size and type, and maybe absolute polarity, enough is known to allow Ambiophonics to create a variety of sound fields suited to different types of music or drama that are real enough to be accepted as such by the ear-brain system.
In general the only parts of the hearing mechanism that concern us specifically are the ear pinnae and the existence of two ears separated by a head. Even without consulting the hundreds of papers on this subject, it is clear that the pinnae are designed to modify the frequency response of sound waves as a function of the direction from which the sound comes. It is also clear that no two individuals have ear pinnae that are identically shaped. But to give a general idea of what one person's pinna does in the horizontal plane: for a sound coming from directly in front, the frequency response at the ear canal entrance, measured with a tiny microphone inserted into the ear canal, is essentially flat up to 1000 Hertz. For most people, the response then rises as the rear of the pinna interdicts sound and reflects it additively into the ear canal. A broad 11 dB peak in the response is reached at about 3000 Hz after which the response drops off to minus 10 dB at 10 kHz and then begins to rise again. A response spread such as this of 21 dB in the treble region is quite substantial, and if a loudspeaker had this kind of response it would get very poor reviews indeed. It is also easy to see that differences in individual pinnae are not easy to correct with tone controls or equalizers. For a sound coming from the side to the near ear, a slow rise in response starts at 200 Hz, reaches 15 dB at 2500 Hz, drops to 1 dB at 5 kHz, rises to 12dB at about 7 kHz and then drops to 4 dB at about 10 kHz. (after Henrik Moller et al) This side response is quite different from the dead ahead response and indicates that we are very sensitive to the direction from which sounds originate even if we listen with only one ear. For sounds directly rearward, the pinna cause a dropoff of 23 dB between 2500 Hz and 10 kHz. Other radically different frequency responses occur for sounds coming from above or below. The pinnae seem to be entirely responsible for our sense of center-front sound source height.
What this means for realistic sound reproduction is that whatever sound we generate must come to the listening position from the proper direction. In theory, it would be possible to modify the pinna frequency response of say ceiling reflections to mimic side reflections, but such an equalizer would have to be readjusted for each human being. It is much easier to place the ambient loudspeakers around the listener and feed the appropriate signals to them, as described in later chapters. These pinnae effects also explain why launching, deliberately or inadvertently, recorded rear reverberant hall sounds from the main front loudspeakers, (or proscenium stage ambience from rear speakers) in stereo or 5.1 surround systems, does not and cannot sound realistic.
Although a one-eared music lover, uisng one pinna, can tell the difference between a live performance and a stereo recording (and Ambiophonics works for such an individual) it is two-eared listeners that Ambiophonics can help the most. Two ears can enhance the listening experience in a concert hall (and life in general) only if there are differences between the sounds reaching each ear, at least most of the time. The only differences the sound at one ear compared to that of the other ear can have are differences in intensity, arrival time, two pinna patterns and absolute polarity. In an acoustical concert hall or any real physical space, it is not possible for absolute polarity to be inverted at just one ear and certainly not at just one ear at all frequencies simultaneously. Thus we need to consider what the difference (or lack of difference) between the ears in sound arrival time and intensity (over the frequency region where the pinna do not function) does for listeners at a concert.
It is clear, since the distance between the ears is relatively small, that at very low frequencies there can be no significant intensity difference, regardless of where a low-bass sound originates. At the other, very high frequency extreme, the head is an effective barrier to sounds coming from the side and, therefore, intensity differences provide the strongest non-pinna related directional dues. At the higher bass frequencies the brain can begin to use arrival time differences to locate a sound. At higher frequencies in the 500 to 1500 Hz region, both time and intensity differences play a role, until as the frequency continues to rise only pinna pattern intensity differences matter. Finally, the sensitivity of the ear to the arrival time of sharp transients is often cited as a hearing parameter but this is just a different way of describing the mechanisms cited above.
There is one more relevant psychoacoustic characteristic of the binaural hearing mechanism which does relate to intensity and arrival time. This is the ability of the ear-brain system to focus on one particular sound source out of many. Most of us can, if we wish, pick out just one voice or instrument in a quartet, or in the classic example, overhear one conversation at a noisy cocktail party. This focusing ability is strong in live three-dimensional concert situations and weak when trying to distinguish one voice in a monophonic recording of Gregorian chant. The relevance to Ambiophonics is that if you can generate a concert-hall stage and sound field real enough to fool the brain, the ability to focus does appear. At a live concert, distractions such as coughing, subway rumble, and program rattling are much less obtrusive because one can focus on the stage and the music. Likewise at home, such distractions as needle scratch, tape hiss, hum, cable idiosyncrasies, amplifier defects, and domestic noises become easier to ignore if you are immersed in Ambiophonic atmosphere. This concentration effect is particularly startling when playing CD transfers of noisy Caruso acoustic-era recordings.
The Ambiophonic Playback System
Ambiophonics was developed to provide audiophiles, record collectors, equipment manufacturers, and, eventually, recording engineers with a clear, understandable recipe for generating realistic music or movie surround sound fields, consistently and repeatedly, either from the vast library of existing two channel recordings or from new multi-channel media made, hopefully, even more realistic by keeping Ambiophonic principles in mind.
The basic home elements required, if the ultimate in realism is desired, are as follows:
The technical reasons for these requirements are discussed here and in the chapters that follow. It is hoped that once the physics and the psychoacoustic laws are understood that the reader may be able to think of better ways to achieve the same end. Ambiophonics was not developed in a day and the reader may not want to implement the entire Ambiophonic system at one time. But each element in the system, when implemented, does result in an appreciable audible improvement.
What Ambiophonics Specifically Achieves
If you employ the techniques described in the chapters below, you will produce a rock-solid sound stage that consistently extends far beyond the right and left positions of the closely spaced front loudspeakers. You will find that even with the main left and right loudspeakers directly in front of you, there is not only no compromise in the perceived stage width or depth, but a substantial improvement over 60 degree stereo or 5.1 surround with virtually any recording or file. You will also see that recreated hall ambience, if launched from the correct direction by well-situated loudspeakers will yield the sense that you are in a hall similar to that in which the recording was made.
Since two-eared listening is more vibrant than one-eared listening, sound fields that differ at each ear in intensity or arrival time are more exciting, and in concert halls add spatial interest to the event. Thus when we come to consider home-concert-hall/home theater design, it is not enough to just maintain the separation of the front left and right channels; it is also necessary to ensure the diversity of all the signals launched into the home listening space. Correlation is the opposite of diversity, and in the next chapter we will consider the significance of the correlation factors of both music and auditoriums so that we can have sound as realistic as possible.