Header image  
by Ralph Glasgal
line decor
Home Tutorials Tech
Kudos and
Demos Bio Free Ambio
Glossary The Home
Concert Hall
Rec Engineers
FAQ/Forum Links Contact us
line decor

Ambiophonics, 2nd Edition
Replacing Stereophonics to Achieve Concert-Hall Realism

By Ralph Glasgal, Founder Ambiophonics Institute, Rockleigh, New Jersey www.ambiophonics.org

Appendix B

Human sound localization is possible using three and only three sonic clues (not counting bone conduction)
  1. Time, including phase and transient edge, differences between the ears. This ITD includes the precedence effect.
  2. Sound level differences between the ears. (ILD)
  3. Single and twin eared pinna direction finding effects.
Each of these mechanisms is only effective in a specific frequency range but they overlap and the predominance of one over the other also depends on genetics, the nature of the signal, i.e. sinewave, pink noise, music, or venue, etc.

For a full range complex sound such as music, experienced live, all three mechanisms are always in play and normally agree. By definition such an experience is said to be realistic or, better phrased for the creative and artistic recording fraternity, said to yield guaranteed physiological verisimilitude. If the three mechanisms are not consistent then we often make errors in localization such as in most earphone listening where the interference with the pinna and head shadow usually result in internalization even if the ITD, including some deliberate ILD crosstalk, is perfect.

Before we get to stereophony, let me discuss the relative strengths of the three mechanisms listed above. Snow and Moir in their classic papers showed that localization of complex signals in the pinna range above 1000 Hz was superior by a few degrees, to localization that relied solely on complex lower frequencies. That is, their subjects could localize bands of high frequencies to within one half a degree but only to one or two degrees at lower frequencies. The accuracy of localization, in general, declines with frequency until at 90 Hz or so, as Bose has demonstrated, it goes to zilch. Remember this when we get to discuss crosstalk.

It is important for understanding the workings of Stereophony that you are convinced that all three mechanisms are significant and I would suggest, with Keele, Snow, and Moir, that the Pinnae are first among equals. You should satisfy yourself on some of this by running water in a sink to get a nice complex high frequency source. Close your eyes to avoid bias, block one ear to reduce ILD and ITD, and see if you can localize the water sound with just the one open ear. Point to the sound, open your eyes, and like most people you will be pointing correctly within a degree or so. With both ears you should be right on despite having a signal too high in frequency to have much ITD or ILD. But with two pinnae agreeing and the zero ILD clue, the localization is easily accurate.

Again, if a system like stereo or 5.1 cannot deliver, the ITD, ILD and Pinna cues intact without large errors it cannot ever deliver full localization versimilitude for signals like music. If the cues are inconsistent, localization may occur but it is fragile, it may vary with the note or instrument played, and such localization is usually accompanied by a sense that the music is canned, lacks depth, presence, etc. Mere localization is no guarantee of fidelity.

Let us now look at the stereo triangle in reproduction and the microphones used to make such recordings and see what happens to the three localization cues. Basically Stereophonics is an audible illusion, like an optical illusion. In an optical illusion the artist uses two dimensional artistic tricks to stimulate the brain into seeing a third dimension, something not really there. The Blumlein stereo illusion is similar in that most brains perceive a line of sound between two isolated dots of sound. Like optical illusions, where one is always aware that they are not real, one would never confuse the stereophonic illusion with a live binaural experience. For starters, the placement of images on the line is nonlinear as a function of ITD and ILD, and the length of the line is limited to the angle between the speakers. (I know, everyone, including Blumlein, has heard sounds beyond the speakers on occasion but diatribe space is limited.)

I want to get to the ILD/ITD phantom imaging issue involved in this topic. But let us first get the pinna issue tucked away. No matter where you locate a speaker, high frequencies above 1000 Hz can be detected by the pinna and the location of the speaker will be pinpointed unless other competing cues override or confuse this mechanism. In the case of the stereo triangle the pinna and the ILD/ITD agree near the location of the speakers. Thus in 5.1 LCR triple mono sounds fine especially for movie dialog. In stereo, for central sounds, the pinna angle impingement error is overridden by the brain because the ITD and the ILD are consistent with a centered sound illusion since they are equal at each ear. The brain also ignores the bogus head shadow since its coloration and attenuation is symmetrical for central sources and not large enough to destroy the stereo sonic illusion. Likewise, the comb-filtering due to crosstalk, in the pinna frequency region, interferes with the pinna direction finding facility thus forcing the brain to rely on the two remaining lower frequency cues. All these discrepancies are consciously or subconsciously detected by golden ears who spend time and treasure striving to eliminate them and make stereo perfect. Similarly, the urge to perfect 5.1 is now manifest.

Consider just the three front speakers in 5.1. Unless we are talking about three channel mono, we really have two stereo systems side by side. Remember, stereo is a rather fragile illusion. If you listen to your standard equilateral stereo system with your head facing one speaker and the other speaker moved in 30-degrees, you won't be thrilled. The ILD is affected since the head shadows are not the same with one speaker causing virtually no head shadow and the other a 30 degree one. Similarly the pinna functions are quite dissimilar. (In the LCR arrangement the comb-filtering artifacts now are at their worst in two locations at plus and minus 15-degrees instead of just around 0-degrees as in stereo)  Thus for equal amplitudes (such as L&C) where a signal is centered at 15 degrees, as in our little experiment, the already freakish stereo illusion is badly strained. Finally, the ITD is still okay and partly accounts for the fact that despite the center speaker there is still a sweet spot in almost all home 5.1 systems. Various and quite ingenious 5.1 recording systems try to compensate for some of these errors but the results are highly subjective and even controversial. It is also probably lucky that in 5.1 recording, it is difficult to avoid an ITD since a coincident main microphone is seldom used in this environment.

Technical Digression for Recording Engineers

Before getting to side imaging, there are some points on the relationship between microphones and reproductive crosstalk that should be elucidated. Whether crosstalk is beneficial or not depends on what frequency range you are talking about and thus what localization method you are relying on. At high frequencies, in the pinna range, stereo speaker crosstalk is obviously not a benefit. There is no way that this unpredictable pattern of peaks and valleys can enhance localization in a stereo or LCR system This is true whether spaced or coincident mics are used.

Stereo crosstalk can cause a phase shift at frequencies below where comb filtering predominates. That is, two sinewave signals with slightly different delays but with comparable amplitudes will combine to form a new sinewave with some different amplitude and phase angle. I maintain that the phase part of this change is inaudible from 90 Hz on down, nonexistent for the central 10-degrees and virtually non-existent for images from the far right or left, and thus of doubtful audibility in between or in LCR systems. Stereo crosstalk cannot create an ITD for transients captured in coincident mic recordings but it can shift the phase of midbass and low bass. But there is no evidence that the small phase shifts of this type are audible or affect localization. If spaced mics are used, then there is an ITD and crosstalk has little deleterious effect but likewise no benefit.

The ILD is a slightly different story. In the low bass, say below 90 Hz, the phase difference between the direct sound and the crosstalk sound is too small (heads are too small) to cause any significant change in phase and thus change amplitude at an ear when the two signals are added together. So regardless of the microphone used, low bass crosstalk is not the issue. Again, I maintain that the very low bass energy at both ears remains almost the same even if the left and right signals are different in amplitude as in coincident mic'ing. As Blumlein observed, as the frequency goes up the path length difference is equivalent to larger phase angles and so, if there is a difference in amplitude, between the speakers, the signals will go up at one ear and down at the other as the signals are combined on each side of the head. Clearly if the phase shift gets to 90-degrees on up this same crosstalk mechanism becomes detrimental. This boost in mid bass separation is only applicable to phantom stereo images around 15-degrees. In the center there is no crosstalk amplitude asymmetry to take advantage of and at 30-degrees where the speakers are, hopefully, the stereo separation ensures that the crosstalk has little to add to or subtract from.  

If spaced microphones are used, the ILD at low frequencies may be minimum especially for omnis. But let us assume that above 90 Hz there is a substantial ILD as well as an ITD. In this case the LF effect of the crosstalk phase change is sort of unpredictable. Again in the 15 degree region there could be enhancement of bass separation but the ITD induced phase shift could counter this. In summary, crosstalk is really only desirable in the case of coincident mic stereo recordings, as Blumlein wrote, and only if restricted to frequencies below 300 Hz or so as I claim.

Surround Sound Localization

Let us consider surround sound localization. Obviously, if a mono signal is placed at 110 degrees it can be localized using pinna, ILD, and ITD even when facing forward. Between the two rear surround speakers you have effectively a stereo pair spanning 140 degrees. In such a situation, if there is a lot of high frequency energy, the pinna will localize to the speakers and it will be difficult for some individuals to hear sound directly behind or in the central rear region. (The new rear surround channel can fix this, but the LCR anomalies as above will then apply.) However, if there is a real ITD and a real ILD between the rear speakers it is theoretically possible to hear a wide stage to the rear as in the frontal stereo illusion. However the crosstalk, and thus the comb-filtering, is extreme at this angle and it starts at a lower frequency thus interfering with the ILD at 800 Hz or lower. If there is an ITD this can help but then the speakers must be properly placed or delay adjusted. Obviously, if 140-degree spacing was a good way to make a stereo stage, front or rear, it would have been done this way long before now.

Finally, let us see what happens when we try to image from the front side speaker to a speaker at 110 degrees on the same side while facing forward.  In the case of the pinnae, the pinna facing the speakers can localize to each speaker discretely if the signals are different. If they are correlated or identical, the brain will use some other cue to localize. There may be some gifted individuals who can localize high frequency phantoms between the speakers using one pinna but I can't do it. The higher frequencies also go around the head to produce a head shadow and this at least allows the brain to decide the source is at the loud side.

If there is a time difference, then the two signals from each speaker reach the exposed ear canal and add together to produce garbage and a head shadowed version of this time garbage also reaches the far ear. Basically, regardless of the recorded TD, the ITD the brain perceives is always the ITD based on one's ear spacing. However this is sufficient to localize to the louder side but makes localization between the speakers wishful thinking.

If there is a level difference, then the two signals from each speaker reach the exposed ear canal and add together to produce garbage and a head shadowed version of this level difference garbage also reaches the far ear. Basically regardless of the recorded LD, the ILD the brain perceives is always the ILD based on one's head shadow. However, as above, this is sufficient to localize to the louder side but makes localization between the speakers wishful thinking.

That the above scenario is more or less correct is attested to by the fact that the industry keeps adding more speakers to correct these defects. We have the rear center speaker, height speakers, and the 7.1 and 10.2 proposals, etc.