|The Science of Domestic Concert Hall Design|
by Ralph Glasgal
Ambiophonics, 2nd Edition
Replacing Stereophonics to Achieve Concert-Hall Realism
By Ralph Glasgal, Founder Ambiophonics Institute, Rockleigh, New Jersey www.ambiophonics.org
Understanding Sound Fields
Human hearing using two ears is called binaural and was developed by evolution. Binaural sound is what most of us listen to all the time. Audiophiles sometimes think of binaural sound as a recording made with a dummy head and played back through earphones. This is a poor imitation of the real thing and is not what we will mean when we refer to the binaural hearing mechanism in this book. Stereophonic sound, by contrast, is a sonic illusion, akin to optical illusions, and simply one of several non-binaural man-made methods of recreating a remote or recorded sound field in a completely different place and time. Stereophonic sound fields are almost always auditioned by binaural listeners whose brains must then reconcile the lack of a binaural field with the presence of a stereophonic one and like optical illusions stereophonic sonic illusions are not always stable and almost never realistic sounding. The commonplace (but misnamed) stereophonic recordings that normally consist of two full-range unencoded, discrete channels, one left and one right are (despite adjustments by recording engineers based on what they hear using studio stereo monitors) not inherently stereophonic and therefore need not suffer the ills that playback via the stereo triangle engenders. That is, the microphones don't know that the sound they pick up is going to be played back via two widely spaced loudspeakers and thus none of the imperfections of the stereo triangle discussed below apply to the recording before it is played back.
Although we later describe an Ambiophonically optimized recording microphone arrangement, almost any mic setup used to produce two channel recordings works reasonably well when reproduced Ambiophonically. Indeed one of the basic premises of this book and the technology it describes is that the usual two-channel recorded program material contains sufficient information to allow accurate simulation of a binaural concert-hall experience. This is indeed fortunate since it allows the existing library of LPs and CDs to be reproduced with unprecedented realism and shows that multi-channel mic'ing and recording methods, where music is concerned, are actually counter productive according to the tenets of binaural technology. That as few as two channels should be more than adequate can be intuitively understood by simply stating that if we deliver the exact sound required to simulate a live performance at the entrance to each ear canal, then since we only have two ear canals, we should only need to generate two such sound fields. The questions are why existing stereophonic and earphone binaural recording techniques fall short, and what can be done to make up for these shortcomings at least where music reproduction is concerned.
Before the advent of stereo recording we had single-channel or monophonic recordings. Most recordings were made by using one or more microphones and mixing their outputs together before cutting the record, filming the sound track, or making a tape. Such a monophonic recording, if reproduced by two loudspeakers, can be thought of as a special case of stereophonic sound reproduction. It is the case where a sound is the same at both ears and the interaural cross-correlation factor of the sound is 1. In a concert hall, such a signal coming from the stage is sensed as coming from that stage regardless of which direction a concert goer faces. Let us now consider a listener in the balcony of a large hall during a live concert. For this listener, the angle that the stage subtends is very small. Both ears get essentially the same signal, the direct sound from the stage is weak because of distance, and the hall ambience is strong and both are largely the same at each ear. Thus, the players seem to be remote, but still front and center. However, the balcony listener is enveloped in a completely realistic but mostly monophonic reverberant field and therefore hardly notices that his ability to localize left and right sounds is minimal. The lesson we want to draw from this is that mono recordings can be made to sound quite realistic in the home concert hall if you don't mind the impression of sitting further back in the auditorium. The same applies to recordings of solo instruments such as the piano or a singer standing in the curve of the piano.
The reproduction of single central mono or panned sources via two spaced front loudspeakers is also prone to exactly the same crosstalk effects that result from stereophonic reproduction, but, fortunately the solution is the same (see below) for both mono and two channel recordings. To summarize, it is possible to have realism without separation, via a combination of true hall ambience with a corrected front stage and this is one of the main tenets of the Ambiophonic method.
The Stereophonic Illusion
There is a slightly flawed theory, still quoted quite often, that a perfect replica of a given concert-hall sound field can always be produced by putting an infinite number of stage-facing microphones at the front of the stage, all the way up to the ceiling. After being stored on a recorder with an infinite number of channels, this recording can then be played back through an infinite number of point-source loudspeakers, each placed exactly as its corresponding microphone was placed. But the performance replication of such a wall would not be perfect because the loudspeakers would not radiate sound with the same directional characteristics as the sound impinging on the microphone and the final result would also be impacted by the quality of the room into which all these speakers were radiating, but at least the stage would be wide, have depth, and be realistic sounding. As the number of microphones and speakers is reduced, the quality of the sound field being simulated suffers. By the time we are down to two channels height cues have certainly been lost and instead of a stage that is audible from anywhere in the room we find that sources on the stage are now only localizable if we listen along a line equidistant from the last two remaining speakers and face them. While there are many two channel speaker arrangements possible, the most popular two-channel reproduction method is the stereophonic technique of reproducing two-channel recordings through two loudspeakers with the listener and the two speakers forming an equilateral or wider isosceles triangle.
Stereo takes advantage of one rather unnatural psychoacoustic illusion, which is that as a recorded sound source moves on the stage from the left to the right, and as the playback signal likewise shifts from the left speaker to the right speaker, most listeners hear a virtual sound or phantom sound image move from one speaker position to the other. Compared to real life hearing, this phantom audible illusion does not move linearly and there is a tendency for the sound to jump to the speaker location as the sound moves to the side. If identical sounds come from each speaker, (the monophonic case above) then most central listeners hear a phantom sound that hangs in the air at the halfway point on the line between the loudspeakers. Just as there are some individuals who cannot see optical illusions, so there are a few individuals who cannot hear phantom images. Just as optical illusions are just that-illusions that no sighted person would confuse with a true three-dimensional object, so phantom stereo illusions could never be confused with a truly binaural sound field.
Nevertheless, for some 70 years this illusion of frontal separation and space is so pleasing to most listeners that stereophonic reproduction has remained the standard music reproduction technique ever since Alan Dower Blumlein applied for his patent at the end of 1931. (See Appendix) The illusion created by stereo reproduction techniques is far from perfect, even if the highest grade of audiophile caliber reproducing and recording equipment is used. The first problem is that the image of the stage width is confined to the arc that the listener sees looking from one speaker to the other. Occasionally, an out-of-phase sound from the opposite loudspeaker, an accidental room reflection, or a recording site anomaly will make an instrument appear to come from beyond the speaker position. These images, however, are almost always ephemeral and often not reproducible. Thus, in non-Ambiophonic systems, in order to get a useful stage width with stable left-right localization, the loudspeakers must be placed at a wide enough angle to mimic the angular proportions of a concert hall or theater stage but not so wide that the phantom center image illusion collapses.
With most stereo systems, there is a "sweet spot" at the point of the triangle where the listening is best. This, unfortunately, is what we are faced with when only two front channels (or three for that matter) are available. The "sweet spot" is also a characteristic of Ambiophonic reproduction although the spot is somewhat larger and less critical in the case of Ambiophonics. It is difficult enough to recreate concert-hall sounds from two discrete recorded channels (and even harder using multi channels) for one or two listeners in the home, without trying to do it for a whole room full of people.
Basically several listeners can listen to Ambiophonics at the same time but they have do to be one behind the other. Consider the following. In stereo if you move too close to the speakers you get a hole in the middle. If you move back you get mono. If you move to the side you mostly hear just one channel. In general, out-of- -the-sweet-spot stereo is tolerated by most everyone since it is clearly not truly realistic when you are at the sweet spot you don't feel you are missing much when you are off center. With Ambiophonics if you move too close to the speakers you get stereo, if you move back you still get the same wide stage until you bump into the rear wall. If you move offside, you get normal mono sound since both channels are present in both speakers and this is good for movie dialog even if there is no center speaker. With Ambiophonics, you can recline, nod, lean, and rotate your head, stand, etc. There are similar advantages for Ambiophonics versus 5.1, one prime advantage being that no center speaker is ever needed.
If you use the two optional rear speakers then offside 5.1 listeners will distinctly hear both the rear and front stages in their proper locations. For most movies this works better in the home than in the movie theater. Most 5.1 systems cannot really reproduce a rear stage of direct sound effects, but Ambiophonics does this even for offside listeners. In 5.1 if you are off-center and back a bit you will likely just localize to one of the rear surround speakers. If you have four speaker (two Ambiodipoles) then, if you like, one can listen facing front and one can listen facing the rear sitting back to back if you are playing two channel media.
By far the major defect of stereophonic reproduction is caused by the presence of crosstalk at the listener's ears generated by the loudspeakers. Again, the crosstalk is an artifact of stereophonic reproduction and is not present in the recording. We will show that eliminating this crosstalk widens the stereo soundstage way beyond the narrow position of the loudspeakers, eliminates spurious frequency response peaks and dips (comb filter effects), and allows the speakers to be moved much closer together eliminating the need for phantom imaging or a center channel.
Figure 1: Comparison of live concert hall listening geometry with home stereophonic listening practice showing the additional unwanted crosstalk sound rays impinging on the pinna from too large an angle, thereby, causing unrealistic playback artifacts.
In a concert hall, direct sound rays from a centrally located instrument reach each ear simultaneously: one ray per ear (see Figure 1). By contrast, for a centrally located recorded sound source, reproduced in stereo, identical rays come from the right and left speakers to the right and left ears, but a second pair of uninvited, only slightly attenuated, longer, right and left speaker rays also passes around the nose to the left and right ears (see Figure 1). The problem is that these unwanted rays, which cross in front of the eyes and diffract around the back and top of the head, are delayed by the extra distance they travel across the head. At its greatest, this distance is just under 7 inches. For a middling distance of say 3 1/2 inches, it takes sound one-quarter of a millisecond to do this. A quarter of a millisecond is half the period and, therefore, half the wavelength of a 2000 Hz tone.
When two signals, one direct and one a half-wavelength delayed, but of similar amplitude, meet at the ear, cancellation will occur. At 4000 Hz the delay is one full wavelength and the sounds will add. Thus at frequencies from the octave above middle C and up, all sounds add or subtract at the ears to a greater or lesser degree, depending on the recorded sound source position, the angle to the speakers, the listener's head position, nose size and shape, head size, differing path lengths around the head, and other geometrical considerations. Note that if the sound source at the recording studio or the listener at home moves a few feet or inches to the left or right, a whole new pattern of additions and subtractions at different frequencies will assault the listener. This interference phenomena is called comb filtering, and largely explains why many critical listeners are so sensitive to small adjustments in stereo listening or speaker position, and to relatively minute playback system electrical and acoustical delay or attenuation. Bock and Keele measured comb filter nulls as deep as 15 dB for the 60-degree stereo loudspeaker setup. Note that for extreme side images the comb-filter effect is minimal.
Thus the acoustical frequency response of a normal stereo setup actually depends on the angular position of the original instrument or singer. As indicated above, it is fascinating that these frequency response anomalies are not clearly audible as changes in tone but rather manifest themselves as imprecisions in imaging and a sense that the music is canned. But it is possible to hear the change in timbre caused by comb filtering. Simply play pink noise from a test CD over your stereo system and rotate the balance control from hard left to hard right. As the image of the noise passes thru the center one can clearly hear a drop in the treble loudness of the noise and a distinct change in its character.Alternatively one can walk normally from the left to right and hear the change in the noise as one passes through the center area.
Note that a phase shift change between channels of only a few degrees can shift a stereo crosstalk comb filter null by hundreds of Hertz. Even a small, one-degree phase shift change between the left and right channels at 2000 Hz will cause a shift of 71 Hertz in the position of a crosstalk null or peak. Crosstalk comb-filter patterns are thus a function of any asymmetry in amplifier output impedances or delays, differential delays in cables, or differential speaker time delay by virtue of their positions relative to the listening position or their impedance networks. For instance, a vacuum-tube driven left midrange speaker can interact with an overlapping right tweeter to produce interaural crosstalk peaks and nulls that are otherwise not present in the solid-state amplifier case. Such patterns may be audible to some individuals. Any changes in the interaural crosstalk pattern are interpreted by the brain as a spatial artifact such as more or less depth, air, or hollowness. Of course, any change in listener position, or speaker location causes similar shifts in the crosstalk peaks and nulls and further complicates equipment comparisons by ear in stereo or surround sound. The irregular directional and largely unpredictable frequency response of the standard stereophonic 60 to 90 degree listening arrangement would never be accepted in an amplifier, a speaker, or a cable. Why such a basic listening system defect continues to be so universally tolerated and studiously ignored is difficult to fathom.
The binaural perception of directional cues depends on both the relative loudness of sound and the relative time of arrival of sound at each ear. Which mechanism predominates depends on the frequency. Unfortunately, since these delay and stereophonic comb-filter artifacts have an effect extending from below 800 Hz on up, they very seriously impact on both mechanisms and thus impair the ability of the listener to detect angular position with lifelike ease. It is also these crossing rays that limit stereo and surround sound imaging to the line between the two front speakers. (See below) If we are to achieve anything close to concert-hall realism, we must eliminate these crosstalk effects and provide a directionally correct single ray to each ear. But first we will need to present evidence of the extraordinary sensitivity of the ear pinna to such comb filter patterns.
Imaging Beyond the Speaker Positions
A major problem with stereophonic crosstalk is that it limits the apparent stage width. For sound sources that originate, say, at 90 degrees far to the right of the right microphone, we can temporarily ignore the left channel microphone pickup. Then in the stereophonic listening setup, the right speaker will send unobstructed sound to the right ear and a somewhat modified version of the same sound to the left ear. The ear-brain naturally localizes this everyday sound situation to the speaker position itself instead of to the 90 degrees the data is indicating. Thus, no matter how low the left channel volume is, the recorded image can never extend beyond the right speaker in standard stereo reproduction (see Figure 2).
Figure 2: Stereophonic crosstalk limits the sound stage and alters localization of original sound image.
If, however, the right speaker sound ray crossing over to reach the left ear could be blocked or attenuated, then at least the low and mid frequency sound could be localized to the extreme right, well beyond the speaker position and just where the recording microphones said the source was located. (High frequency localization is discussed in the next chapter.) Remember, the microphones don't know that the playback will be in stereo with crosstalk and therefore it is not the recording setup that limits stage width. Clearly, eliminating the extra sound ray results in wide spectacular imaging even from existing two channel media. In Chapter five we will discuss two methods of eliminating crosstalk as well as doing away with the stereo triangle altogether.
In the case of live hearing, a sound at the extreme side produces an interaural time difference (ITD) at the ears of about 700-microseconds. If such a recording is played via the stereo triangle the maximum ITD that can be produced is about 220-microseconds. This is because sound coming from a speaker at the 30-degree angle does not have to pass accross the entire head and so is not delayed the full 700-microseconds. The shorter ITD is interpreted by the brain as a shallower angle, 30 degrees, for this source.
Similarly, if the microphones detect a channel level difference (ILD) of say 10 dB indicating a source far to the side, then, when reproduced by stereo loudspeakers, this difference will be reduced by half or so since the louder speaker also can see the far ear and thus increase the sound level at that ear reducing the ILD so much that the sound that the mics heard at the far side is now no more than 30 degrees off center.
Loudspeaker Out-of-Phase Effects
In stereo systems it is necessary for the right and left main speakers to be in phase or better expressed be of the same polarity. Phase in this case means that if identical electrical signals are applied to each speaker, the speakers will both generate a rarefaction, or both generate a compression in response to a simultaneous input pulse. When a monophonic recording is played through a pair of out-of-phase loudspeakers, the sound at the ears lacks bass, the phantom center image is not present, and a hazy, undefined sound field seems to extend far beyond the speakers to the extreme sides and sometimes even rearward. Similar effects, only slightly less pronounced, are also present using two channel sources. These subjective effects can be better comprehended now that we understand all about stereo crosstalk. It is clear that equal but out-of-phase very low frequency signals, with wavelengths much longer than the width of the head will always arrive unattenuated and 180 degrees out-of phase at either ear and therefore will always largely cancel. This factor accounts for the thinness of the mono or central (L+R) stereo sound. At somewhat higher frequencies the cancellation is not total. The left ear hears pure left signal from the left speaker that is reduced only somewhat by the now slightly delayed and thus only partially out-of-phase crosstalk from the right speaker. Similarly, at that same instant the right ear is hearing a reduced but pure right-speaker sound that is similar in amplitude but not identical to the pure left-ear sound because the resultant sounds are still out-of-phase. We know that a midrange frequency sound heard only in the right ear seems to come from the extreme right and a sound heard only in the left ear seems to come from the extreme left. This phenomenon is still operative even if the two sounds that come from the sides are identical in amplitude and timbre. Thus, one can easily hear two identical bells as separate left and right sound sources. If, however, we exchange the bells for pink noise, then we can hear the noise only as separate sources when they are not precisely in step (uncorrelated). Since our signals are out of phase they are not identical in time or highly auto correlated and therefore audible as separate entities. Thus, the inadvertent crosstalk elimination caused by out-of-phase speakers that occurs at mid frequencies widens the perceived sound field. As the frequency increases, instead of simple canceling, the comb-filtering effect predominates and the position of the images becomes frequency, and therefore program, dependent, changing so rapidly that no listener can sort out this hodgepodge of constantly shifting side images. Most listeners describe this effect as diffuse, unfocussed or phasy.
In general, mechanical or software crosstalk elimination is not fully effective or needed at very low bass frequencies and so the bass out-of-phase thinness effect, while much reduced, remains if speakers are out-of-phased. In Ambiophonics, the audibility of the out-of-phase effect is much reduced. The stage image still extends from the speakers outward when the recording calls for this. That is, sound sources at the extreme right and left image just as they do when the speakers are in-phase. This makes sense, since we are, listening to one sound source with one ear. To repeat. In the out-of-phase case, for most of the frequency range, each ear is hearing a signal that is distinctive because the signals are of opposite polarity and, therefore the ear localizes each sound as originating from beyond their respective speakers. A phantom center image does not form and the infamous hole-in-the-middle appears. In the out-of-phase Ambiophonic case the speakers are very close together. Therefore, the middle hole is almost nonexistent and the bottom line is that, except for extreme bass response, front speaker phasing or other timing anomalies are more critical in stereo than in Ambiophonics.
When an instrument produces a sound, the sound consists of a series of alternating rarefactions and compressions of air. The sonic signatures of such acoustic musical instruments are determined by the pressure and spacing of these rarefactions and compressions. Electronic recording and reproduction have now made it possible to turn rarefactions into compressions and vice-versa.The significance of this to the problem of establishing a home concert hall is not entirely clear. But a few people seem to be able to hear a difference between correct and incorrect polarity. Therefore, care should be taken that all amplifiers, speakers and ambience sources, taken together, do not invert. Since acoustic reflectors in concert halls do not invert polarity, the key early reflections, at least, should not be inverted accidentally in home reproduction either and should be delivered to the ears with the same polarity as the direct sound which is, one hopes, also of the correct absolute polarity.If you cannot tell one polarity from the other in your own system, don't despair. For a few people, polarity is only audible when special test signals are used. One possible reason for difficulty in this regard is the nature of many instruments. A listener to the left of a violinist hears one polarity, while a listener to the right hears the other polarity, assuming the string is vibrating in the same plane as the ears of both listeners. But no matter where you stand around a trumpet you get the same polarity. The inverted polarity sound in this case is inside the trumpet. Indeed it has been reported that test subjects are more likely to hear polarity differences where wind instruments are involved.
On balance, one would have to say that it does not pay to agonize over the absolute polarity effect unless you are certain that you or your friends are sensitive to it.