|The Science of Domestic Concert Hall Design|
by Ralph Glasgal
360° Localization via 4.x RACE Processing
Recursive Ambiophonic Crosstalk Elimination (RACE), implemented as a VST plug-in, convolved from an impulse response, or purchased as part of a TacT Audio or other home audiophile product, properly reproduces all the ITD and ILD data sequestered in most standard two or multichannel media. Ambiophonics is so named because it is intended to be the replacement for 75 year old stereophonics and 5.1 in the home, car, or monitoring studio, but not in theaters. The response curves show that RACE produces a loudspeaker binaural soundfield with no audible colorations, much like Ambisonics or Wavefield Synthesis. RACE can do this starting with most standard CD/LP/DVD two, four or five-channel media, or even better, 2 or 4 channel recordings made with an Ambiophone, using one or two pairs of closely spaced loudspeakers. The RACE stage can easily span up to 170° for two channel orchestral recordings or 360° for movie/electronic-music surround sources. RACE is not sensitive to head rotation and listeners can nod, recline, stand up, lean sideways, move forward and back, or sit one behind the other. As in 5.1, off center listeners can easily localize the center dialog even though no center speaker is ever needed.
1. AMBIOPHONIC TECHNOLOGY
Ambiophonics is a loudspeaker binaural technology. Like Ambisonics and Wavefield Synthesis it places the home listener/viewer at the recording/filming location and like these it does not require head tracking or the use of head response transfer functions (HRTF). Ambiophonics, Ambisonics, and Wavefield Synthesis [4[ all try to recreate the original
soundfield or a “you-are-there” effect in the listener’s room or in the vicinity of the head.
To create physiological verisimilitude at the ears of a home listener, it is necessary to provide the same Interaural Time Differences (ITD) (which includes the precedence effect), Interaural Level Differences (ILD), and pinna cues at home that that listener would have experienced at the microphone location or the best seat in the concert hall or rock venue. This also applies if the source of the sound is not acoustical but computer generated, etc. (We assume here that vital things like frequency response, distortion and noise are not compromised and so this discussion is primarily concerned with adding ultra- natural surround localization to the mix.)
Unlike Ambisonics and WFS, Ambiophonics is compatible with all new or existing 2, 4 or 5.1 channel media and, unlike Ambisonics or WFS, Ambiophonics can be implemented with as few as two speakers where only a binaurally correct front stage is desired for say string quartets or orchestras. Four speakers are required for both front and rear stages as in movies, games and virtual reality.
Ambiophonics and RACE in particular are designed to extract the most ILD and ITD possible from existing recordings and also deliver with the utmost fidelity the ITD and ILD from new recordings made using the two or four channel Ambiophone. In both cases, also preserved, are the vitally important pinna cues for the central region of the stage or the direct sounds sources coming from the rear as in some movies.
Full bore Ambiophonics, includes recursive crosstalk elimination for a frontal speaker pair (called an Ambiodipole), the same for a rear Ambiodipole which is mandatory if direct sound is to come from the rear half circle but optional if all the direct sound is frontal, concert-hall ambience signals for any number of surround speakers convolved from real 3D hall impulse responses, and optionally, room/speaker correction for flat response. This paper is only concerned with the first leg of Ambiophonics which deals with reproducing the direct sound. The other techniques are discussed on the Ambiophonic and TacT websites. 
2. STEREOPHONIC/5.1 CROSSTALK BASICS
Just as black and white photography can not be perfected to the point where it is optically as natural as color photography, so stereophonic loudspeaker sound and its 5.1 cousin cannot ever be tweaked to deliver a sonic experience comparable to what one of the loudspeaker binaural technologies (Ambisonics, Ambiophonics, WFS or 3D IMAX) can produce.
As there are those who regard black and white movies as superior to say Technicolor, so there are those who regard the stereo triangle as an art form.
Ambiophonics is based on the premise that most other listeners will prefer a home system that delivers as much binaurally correct sound (i.e. psychoacoustic verity) as possible.
Ambiophonics is so named to suggest that it is the logical next step beyond 75 year old stereophonics and its cousins 5.1, 7.1, 10.2, etc.) in home theaters, recording studios, or cars, but not theaters, and can replace stereo in the same way that stereo has replaced some, but not all, monophonic sound venues (not telephones, AM radio, etc.) RACE produces a stage up to three times the width of a stereo loudspeaker pair via two conveniently much closer together front speakers. Also unlike 5.1, Ambiophonics never requires a center speaker.
In essence, one crucial difference between Ambiophonics and stereophonics is the elimination of the acoustic crosstalk, comb filtering, and pinna direction finding errors introduced when two channel recordings such as CDs, LPs, or 5.1 DVD/SACDs, MP3s (all essentially free of such errors as pressed) are reproduced using speakers that form an equilateral triangle with the listener. Doing away with such stereo crosstalk is essential to achieving high-fidelity sound reproduction and increasing the front stage width beyond the 60° angle of 2.0 or 5.1 LCR.
The main problem with stereo is this: Where in normal binaural hearing, just one ray reaches each ear from a sound source, for most sources on a stereo or 5.1 disc, two or three sound rays, one from each speaker, reach each ear. These extra rays, or crosstalk, cause high frequency peaks and dips (comb filtering) that negate the direction finding function of the pinna (outer ear) and cause audible changes in timbre as sound images move from the side to the center or a listener moves off center. But even more significant is that at lower frequencies the crosstalk and the speaker locations alter the perhaps correctly recorded and stored time and level differences between the left and right ears, causing localization to be fuzzy or inaccurate.  For example a sound at the side in real life produces an ITD of about 700 µsec. The maximum ITD that the stereo triangle can produce is about 250µsec and 5.1 often half that. 
When a sound source such as a soloist is centered, speakers, positioned at ± 30° toward the sides instead of the front, generate false side pinna localization cues that contradict the ITD and ILD thus telling the brain that the soundfield is not natural. This 60° front-speaker separation also restricts stage width to that angle.  With the speakers at 30° the head shadow is significant. In real hearing a central sound directly in front of you does not engender a change in timbre, delay and level due to a head shadow. But in stereo a centered soloist does, since all sound reaching the ear canal is altered by the 30 degree head shadow. These artifacts (along with rear hall ambience coming wrongly from the front) prevent stereo and 5.1 reproduction from being any where near as compelling as hearing a live concert.
Contrast this with Ambiophonics where the front stage is rendered without pinna confusion, without altering timbre, and with a possible stage width of up to 180° wide (using the 4 or 6 speaker options below) if that width has been captured in the recording. Ambiophonics does this by turning stereo inside out- moving the speakers together to the front where they recreate sound not just in between but both between and outside of themselves and in the process even enhance depth cues.
3. CROSSTALK CANCELLATION BASICS
Eliminating crosstalk to improve sound reproduction is a long established principle. It has been shown that the only way to have effective crosstalk cancellation is to use a pair of speakers (an Ambiodipole) that are relatively close together, 24 degrees or so as viewed from the listening area.  Having both speakers directly in front of the listeners means that the crosstalk correction parameters can be assumed to be approximately independent of the shape and size of the listener’s head, and this has proved to be true in practice for adults. In this regard, one needs to observe that everyday localization of a frontal sound does not change when one nods, rotates, or leans. Thus in the frequency range where crosstalk is a factor, the head shadow part of the HRTF is not an audible problem and thus the RACE process does not include any individualized or Kemar-like head shadow functions or filters.
In general, think of Ambiophonics as “virtual headphones” but with the greater comfort and quality of speakers. Compared to earphone listening to stereo or special binaural recordings, the sound is never inside your head, and you can rotate your head without the stage rotating with you. Also since the pinnae are not impacted by the earphones, they can function normally, guaranteeing externalization and no front to back inversions.
There are two different approaches to cancellers used in Ambiophonics. The first is the simple physical barrier – an absorbent vertical wall extending from the listener toward two speakers placed just slightly further apart than the width of the barrier. The fact that such a barrier can produce a wide stage from ordinary stereo recordings is an easy way to demonstrate the concept. The barrier also shows that there is at least one way to crosstalk cancel that does not cause timbre coloration or require any HRTF compensation. The trick is how to duplicate what the barrier does without using a physical barrier and make it commercially affordable.
4. RECURSIVE AMBIOPHONIC CROSSTALK ELIMINATION
Earlier attempts at crosstalk elimination in audiophile commercial products by Carver and Lexicon in the analog era were less than successful because they still relied on the stereo triangle which leads to the HRTF and Pinna problems above. Also they were not recursive as explained now.
The basic crosstalk cancelling technique the Ambiophonic team has developed (and are making available free to the audio community) is Recursive Ambiophonic Crosstalk Elimination or RACE. Recursive is the operating word. When a signal from the left speaker undesirably reaches the right ear, it must be cancelled acoustically at that ear by an inverted, slightly delayed, slightly lower level replica from the right speaker. But this cancellation signal will also then jouney on to the left ear and so it must also be cancelled (2nd order cancellation) by a properly conditioned signal from the left speaker, which signal then also reaches the right ear requiring another round (3rd order) of cancellation and so on. For a greater tolerance for non-ideal speakers, to avoid frequency response errors, and to enlarge the ideal listening area, this recursive “ping-pong” correction needs to be carried out to inaudibility. It has been demonstrated that some five people can hear the same wide stage, even from two small loud speakers in front of them, using this method.
Figure 1 below is the logical representation of the RACE process showing the feedback loop that keeps the see-saw process going until the cancelling digital samples become zero. One can think of this process in terms of concert hall RT60 reverberation times. In RACE it takes about 24 cycles and some 1.5 ms to reach this standard level of inaudibility. I know of no previous crosstalk methodology that spans so long a time.
The equations in the next section show that the process results in correct cancellation for both one- sided and center signals if the assumptions about incremental delay and incremental attenuation are realistic. These are the two adjustable parameters and in practice they have not been found to be all that critical. First is the attenuation of the offside speaker signal as it goes to the wrong ear. For most heads and speaker angles this is 2 to 3dB. Adjusting this parameter is like tweaking a stereo system. This value optimizes the maximum stage width and the depth of the best listening area which normally far exceeds the depth of the stereo or 5.1 sweet spot. Unlike stereo, there is no hole in the middle when you are too close, and little stage narrowing when you are too far back. In standard 5.1 movie reproduction via four speaker Ambiophonics, many viewers can sit centered on the screen one behind the other without losing 360° localization. Too little attenuation tends to over-cancel, will exaggerate the stage width, and begin to sound a bit like an out-of- phase stereo system. Too much attenuation will not completely cancel the crosstalk, so the stage width will narrow.
The delay parameter ranges from 60 to less than 100 µs and is determined by the path length difference from a speaker to both ears and so is a function of the separation of the speakers and their distance to each ear of the listeners. Changes in this parameter have surprisingly little audible effect which explains why nodding, back and forth motion, and head rotation have little effect. Of course in extreme cases such as a child or someone with an unusually narrow head, this parameter should be adjusted as appropriate.
If you look at the R recursive mixer and trace straight down and all around you will see what amounts to a feedback loop. Depending on the frequency the feedback will sometimes be positive and sometimes negative. This is because as the delay increases in the later terms the phase of the signal may change polarity. At low frequencies, the effect is small since the level rolls off before the phase inverts. But at higher frequencies the response will undulate. The attenuation still mitigates against oscillation but a peak begins to rise starting at about 5kHz. This makes sense since if the delay parameter is half a wavelength the inverter is no longer really inverting already at the first ping-pong cycle.
One must also understand that at frequencies such as 5kHz localization is not dependent on ITD or ILD but on pinna localization. (Pinna localization correction is discussed below) Thus it is not necessary or even
There is also a minor issue in the low bass. The assumption made in Fig. 1 that the attenuation due to the longer trip past the nose is constant begins to fall apart below about 400 Hz. The attenuation due to the passage of sound through the air is less at lower frequencies and any part in the process due to the face becomes insignificant. Also at about 400 Hz the ILD from an Ambiodipole (two speakers typically
If bass bypass is not employed, then the cancellation of the non-existent crosstalk results in a modest loss (0dB minus 2.5dB) in bass amplitude for one sided signals and the same relative loss but at double the level of the side-only amplitude for mono bass. In my experience this loss is inaudible in real rooms or easily compensated for by woofer level controls. The stereo triangle also boosts mono bass in a similar manner since each ear gets the same bass signal twice and you can see that in Figure 3.
5. RACE EQUATIONS
The following equations describe the two extreme cases RACE has to deal with. One where both channels are the same (mono) as is often the case with spot mic’d soloists and the other where only one channel has any data, a rather rare event at least in acoustic recording. It should also be appreciated that if spaced microphones such as the ORTF or Schoeps KFM-6 are used, then there is a large ITD for extreme side signals and thus the ILD is not as one sided as these worst case equations assume.
R and L may be thought of as PCM samples in the digital world or as ksin(ωt)’s, the individual sinewaves that add up to an analog input signal.
a=the difference in level between a sample at one ear and the same sample slightly delayed at the other. (typically -2.5dB)
d=the difference in the time of arrival of a given sample at each ear. (typically 65µs)
Let us first take the case where there is only a right input representing a sound far to the right side so L=0.
As per figure 1: Rrecursive=R+R2a2d+R4a4d+R6a6d+R8a8d+R10a10d+………
Thus the output of the right speaker in the 250Hz to 5000Hz band is the original sound plus a series of decaying and delayed replicas of itself. At the lower frequencies, R2a2d is essentially in-phase with R and simply adds to it. This is the largest boost but it is already down about 5 dB. At higher frequencies there is addition and subtraction as well. The curves below show the frequency responses of this signal and all the others. At the higher frequencies, the first term is delayed by say 120µs and so at 4kHz would subtract rather than add and at 8kHz add rather than subtract if not bypassed. However, as we shall see, the signal in the listening area or most areas of the room rather than right at the speaker is quite different.
From Figure 1 again:
Now let us look at what happens on the line between the speakers. We have postulated that at the head the left speaker sound reaches the right ear attenuated and delayed and similarly the right speaker sound reaches the left ear delayed and attenuated.
At the right ear the sound heard is the sum of Rrecursive and then Lrecursive delayed by 1d and attenuated by 1a thus:
At the left ear the sound heard is the sum of Lrecursive and a delayed by 1d and attenuated by 1a version of Rrecursive thus:
It can also be seen that if these series are truncated too soon there can be an audible residue.
Let us now consider the case where R=L a commonplace in the dialog/rap/spot mic’d world.
If L and R are the same then both speakers have the same output and since the terms oscillate in polarity they have less influence on the frequency response than in the one sided case. In general, even away from the sweet area there is little change in timbre. But again it is what is heard on the sweet line that is significant. At the right ear what one hears is the right speaker signal plus the left speaker signal delayed and attenuated one unit. So:
Rear=R. Similarly Lear=L
6. RESPONSE MEASUREMENTS
The following acoustic measurements were made using wideband pink noise. All the speakers being measured were Soundlab electrostatic panels plus two subwoofers. The measurements of the signals closer to the speakers are only of significance in indicating what one would hear if largely off center or if room reflections reaching the center line were unusually severe, although in this case it is likely that neither RACE nor stereo would be working well. The picture of the room in the last section shows where the measurements were made except that the front speakers were not MBL 101’s but Sound-Lab electrostatic Ambiopoles.
Figure 2 below, is the nonprocessed response of the left speaker at the listening position with a left input only. This response should be used as a reference to compare with the RACE curves. Basically the speaker is flat plus or minus 5 dB out in the room.
Figure 2: Unprocessed Single ESL Response
Figure 3 below is the stereo triangle response for both speakers with mono pink noise. The microphone is at the listening position but displaced about one head width from the centerline. The combing starting at about 1kHz is clearly evident. The 3dB boost in the bass region is also evident which makes center bass sounds slightly louder than side bass sounds. This is why speakers should be evaluated individually as well as part of a stereo pair.
Figure 3: Stereo Triangle Combing
Figure 4 below is the response of the left speaker to a mono pink noise input with RACE engaged. Remember the high treble and the low bass are bypassed so the 125 Hz dip has nothing to do with RACE. The only thing to note here is that any listener off center or near one of the speakers will not hear anything unusual.
Figure 4: Lrecursive for Mono Input
Figure 5 below, is the response at the left ear with both speakers going and a mono pink noise input to RACE. Again in the 250 to 5000Hz range there is nothing remarkable. But compare this to Figure 3 above which is the stereo triangle response to the same input. Indeed if you can’t play old mono recordings via just one speaker then use RACE which produces the same result.
Figure 5: RACE at Near Left Ear, Mono Input
Figure 6 below is the response as above but with the microphone placed at the entrance to the ear canal. One can easily see that the pinna resonances are more than 10dB and begin to have an effect at 1kHz or so. If you compare this pattern with the rolling stereo triangle pattern above 1000Hz in Figure 3 you will see one important reason why listeners to stereo recordings seldom feel they are hearing a real soundfield.
Figure 6: As Fig. 5 at Ear Canal
Figure 7 below is the signal coming out of the left speaker when wideband pink noise is present in the left channel only. It looks awful, but hidden in this averaging trace is the fact that there are components separated in time that relate to signals in the other speaker. This is a rare case which can only occur if a coincident mic is used and a sound source is at the extreme side or is panned that way. Again this signal is only heard as shown in the immediate vicinity of the speaker.
Figure 7: Lrecursive, Left Input Only
Figure 8 below is the RACE signal at the right speaker when the input, as in Fig. 7, is left side only. It is quite similar but roughly 2.5dB lower in midrange amplitude. Also notice that there is much less bass. This is because the bass is bypassed and there is nothing to bypass in the right speaker for a left only input.
Figure 8: Rrecursive, for Left only Input
Figure 9 below is the response for a left only input at left side of the listening position. Notice that although there are a lot of ups and downs that the 6dB rolloff is no longer in evidence. It is amazing that this relatively simple program can combine two speaker signals this way with such an easily measured result. In general, unless you are very close to one speaker, the response sounds reasonably uncolored anywhere else in the room. Likewise there is little change in timbre for a sound source that moves across a sound stage as in movies or operas. Certainly RACE is not perfect, but relative to stereo and 5.1 it is better in home systems.
Figure 9: RACE sound at Left Ear for Left Input
In Figure 10 below the microphone has been placed in each ear and the response to a left signal has been measured. The upper trace is of course, the level at the left ear. You can see in the region where RACE is active that there is a level difference of 10dB, sometimes more sometimes less. If you look at the similar curves in  for stereo you will see that here the RACE ILDs are larger than those in stereo. Note also that the bass level is close to the same in both ears even though only the left subwoofer has a signal, indicating that it is indeed tough to get ILD at low frequencies.
Figure 10: RACE Generated ILD with Pinna
7. DUAL AMBIODIPOLE SYSTEMS
For the central part of the sound stage, dialog and soloists, the sound above 1000Hz reaches the pinna from a much more favorable angle than via the stereo triangle. But as a source moves to the side, the ILD and the ITD may be okay, but the pinnae are still getting a frontal pattern from the front Ambiodipole which tends to limit stage width. 
If you place a second Ambiodipole behind the listening position also using RACE (with slightly different parameters) at the same loudness level then the ILD and the ITD are roughly the same. If the rear level is not excessive there is no front to back reversal. If you use slightly different delays and attenuations, then there is less likelihood of some kind of aliasing although in my experience this almost never happens. But the main advantage of doing this is that there is now a completely different pattern generated by the pinna. The patterns from the front and the rear combine and the psychoacoustic result is that the ear no longer senses the frontal central bias and allows the side values of the ITD and ILD to be registered fully by the brain. The subjective effect is that the stage gets wider, for some reason deeper, and in general more natural sounding.
For playing surround movies or synthesized music, the rear Ambiodipole serves to generate a rear stage and if the ITD and ILD is there in the recording, the four speakers provide easily localizable images throughout the 360° horizontal plane even at 90 degrees. One advantage of RACE reproducing 5.1 media is that a center speaker is never necessary. Also, the two front speakers can usually be placed either side of even a very large TV screen.
While the rear speaker helps the pinna cope, it is possible to really gild the lily and firm up images at the 90 degree points, even when playing two channel recordings, by using side speakers. These speakers need not be full range and do not require crosstalk cancelling. The idea is that the side speakers provide a direct shot at the ear canal and thus enhance the localization to the extreme side. They also provide a
One advantage of having six speaker direct-sound Ambiophonics is that any colorations tend to even out especially for listeners not on the center line. With surround material you hear a front and rear stage even if you are nowhere near the centerline. This is like standing up in a concert hall and turning sideways. You still hear where the stage is even if you can’t localize a particular instrument.
Figure 11 below shows the response at each ear canal for the six speaker system for a left only input. It should be compared to Figure 10. Again the ups and downs above 1000Hz are what the pinna do to any sound impinging on the ear. But note that although the ILD is a bit better, the pattern of the curve at the pinna frequencies is smoother as well as different. The smoothness comes from the side signals that get to the ear canal without stimulating a lot of pinna resonances. It is this difference that allows the ear to sense the wider stage more gracefully. This display does not show the improvement in ITD but it is audible.
Figure 11: Six-Speaker ILD with Pinna
8. FULL SURROUND HOME SYSTEM
Figure 12: Ambiophonics Institute Research Lab
Figure 12 is an experimental, no-expense- spared, full-surround Ambiophonic system. It includes front and rear Ambiodipoles, two side speakers and 30 electrostatic panels that mimic concert hall walls. How the ambience signals for the surround panels are generated, may be found on the Ambiophonic website. But here we are interested in the parts of the system that are relevant to RACE.
When the picture was taken the front Ambiodipole was formed from two MBL 101D loudspeakers rather than the electrostatic Ambiopoles used for the measurements. The rear Ambiodipole is formed from two Soundlab Majestic full range electrostatic panels. The black panels elsewhere are the ESL panels that emit sound much like reflecting concert hall walls do.
At the lower right corner there are two small speakers mounted on a piece of wood. This is a small Ambiodipole used to demonstrate that the same wide stage can be generated by RACE from inexpensive components. Indeed, using three very directional Bose AM-5s, or similar, you can have two stages side by side (one reversed) for two people who want to sit side by side at home or in a car.
9. CONCLUSION 1
The functions, virtues, and imperfections of RACE represented by the equations and the curves are self- evident. Readers should download RACE for free into their own computers or buy a TacT device. Then they should listen, draw their own conclusions, and design improvements.
10. CONVENTIONAL CONCLUSION
We have shown that it is possible to have a crosstalk canceller that does not incorporate specific HRTF functions (such as Kemar) and does not have noticeable tonal colorations on the listening line or at most room locations especially for soloists. It is also clear from the response curves that Ambiophonics moves the frequency at which comb filtering occurs up from the 1000Hz of the stereo triangle to two octaves above that or more for the Ambiodipole. Although also the subject of previous papers,  we have again shown that four channel recordings especially those made with a four microphone Ambiophone) easily deliver 360° direct sound via two pairs of speakers that are easily placed in almost any room compared to the 5.1 speaker arrangement. Although for the best localization one must sit on the center line between the speakers, the length of this line is longer than for the stereo triangle or the 5.1 arrangement. Being off the center line does not impact the dialog channel of movies any more and sometimes less than the LCR arrangement of 5.1
The issue is not whether Ambiophonics and RACE are perfect or rival concert-hall sound, but rather are they better at reproducing standard 2, 4, and 5.1 recordings than the stereo triangle or the 5.1 speaker arrangement. Listening tests performed by Robin Miller, Filmaker Inc., the subject of a different paper, indicate that Ambiophonics is consistently preferred over stereo by a wide margin and that four speaker Ambiophonics is preferred over two speaker Ambiophonics. It is hoped that such listening tests can be expanded and extended to the case where surround speakers carrying hall ambience can be included in the tests as well as preference tests involving full circle direct sound and height ambience.
The TacT stereo and home theater speaker/room response correction processors, such as the new RCS
2.2XP DRC, Mini, and TCS mk III now include Ambiophonic reproduction features. The combination of RACE with speaker/room response correctionand hall IR convolution comes closer to recreating the live concert hall or movie experience for home listeners and audiophiles than any other home or monitoring studio sound reproduction system now extant. We have demonstrated that in Ambiophonics, the stage extends from between the speakers to the far sides for two channel sources and is binaurally correct 360 degrees all around if the two rear channels of 5.1 call for direct sound to come from there.
 Eric Benjamin, “An Experimental Verification of Localization in Two-Channel Stereo”, AES 121st Convention Preprint 6968
 Ralph Glasgal, “Improving 5.1 and Stereophonic Mastering/Monitoring by Using Ambiophonic Techniques”, International Tonmeister Symposium, Oct. 2005 Proceedings
 Francis Rumsey, “Spatial Sound Techniques, Part 2” AES Anthology
 Don Keele Jr. “The Effects of Interaural Crosstalk on Stereo Reproduction and Minimizing Interaural Crosstalk by the Use of a Physical Barrier”, AES Preprints 2420-A and 2420-B, 1986