| >Home >What Is Ambiophonics >Introduction >MUST READ! >Related Articles >Letters >Testimonials >Bio
|
Realism in Music Reproduction Ambiophonics The fourth approach, that I am aware of, I have called Ambiophonics. Ambiophonics, which borrows a little from Binaural and still less from Ambisonics, assumes that there are more localization mechanisms than are dreamed of in the previous philosophies and strives to satisfy all of the mechanisms, as far as is possible. It also takes the psychoacoustic position that absolute binaural positional accuracy, as opposed to absolute realism, is not as vital and furthermore, that this reproduction technology need only be concerned with reproducing staged acoustical musical events, not movies or virtual reality. The advantage of focusing on just one aspect of sonic reality is that this reality is achievable today, is reasonable in cost, and is applicable to existing LPs and CDs. One basic element in Ambiophonic theory is that it is best not to record rear and side concert-hall ambience or try to extract it later from a difference signal or recreate it via waveform reconstruction, but to synthesize the ambient part of the field using real stored concert hall data to generate ambience signals using the new generation of digital signal processors. The variety and accuracy of such synthesized ambient fields is limited only by the skill of programmers and data gatherers, and the speed and size of the computers used. Thus, in time, any wanted degree of concert hall design perfection could be achieved. A library of the worlds great halls may be used to fabricate the ambient field as has already been done with startling success in the JVC XP-A1010. The number of speakers needed for ambience generation does not need to exceed six or eight (although speaker walls would be optimum) and is comparable to Ambisonics or surround sound in this regard, but even more speakers could be used as this synthesis method is completely scaleable and the quality and location of these speakers is not critical. Ambiophonics is usually less limited as to the number of listeners who can share the experience at the same time compared to most implementations of other methods using a similar number of speakers. Fortunately, two to five people can be accommodated by Ambiophonics in several of its practical incarnations. The other basic tenet of Ambiophonics is similar to Ambisonics and that is to recreate at the listening position an exact replica of the original pressure soundwave. However, Ambiophonics does this by transporting the sound source, stage, and hall to the listening room rather than a point wavefront to the ears. In other words, Ambiophonics externalizes the binaural effect, using, as in the binaural case, just two recorded channels but with two front stage reproducing loudspeakers and eight or so ambience loudspeakers in place of earphones. Ambiophonics generates stage image widths up to 120° with an accuracy and realism that far exceeds that of any other 2 channel reproducing scheme. While it hardly seems to be necessary, the use of four channels and four main front loudspeakers can produce a full 180° stage image, (see below) but I doubt the expense would be worth it since I for one have never attended a live performance, and had a seat, where the music came from anything approaching a full half circle. For reasons outlined below, Ambiophonic reproduction does require that the two main front speakers subtend an angle of only about 10° each side of the listening position so as not to generate the kind of pinna angle distortion for central sounds that phantom-image-stereo, Ambisonic or surround sound speaker placement almost always gives rise to. Ambiophonics also requires that a small lightweight, sound absorbing panel be placed on edge, centered in front of the listening position so as to prevent the left front speaker signal from reaching the right ear and vice versa. While there are electronic means to accomplish this end, (Carver, Lexicon, Cooper-Bauck-Harmon Intl.) and extra speaker means (Polk or easily home made) none of these work as well as a small inexpensive panel. The Ambiophonic listener is free to rotate his head, rock back and forth, and undulate from side to side, without image shift, just as in a concert hall. Most audio enthusiasts imagine that the use of the panel will be objectionable on aesthetic grounds. I certainly wish I could think of a less problematical way to accomplish the same end, but, at least in practice, one gets used to the panel very quickly and soon wonders why anyone listens without one. The new lightweight materials make it easy to store the panel between sessions or provide extras for multiple listeners. The Stereo Dipole, AES Preprint 4463 For those unalterably opposed to using a panel, Ole Kirkeby, and Philip A. Nelson of The University of Southampton with Hareo Hamada of Tokyo Denki University have developed an electronic version of the panel. They have shown that the ideal speaker spacing for a crosstalk cancellation sytem be it mechanical or electronic is about 10 degrees. They refer to two speakers placed so close together as a "stereo dipole". The electronic filters required to cancel crosstalk in this arrangement are somewhat easier to design and are more effective since at the narrower angle there is little diffraction around the head for the correction signals and so HRTF correction is not critical. Pinna angle distortion of the correction signals is also not a major factor and so the crosstalk cancellation can be allowed to operate over the full upper frequency range without restricting the size of the listening area or generating the audible phasiness effects that afflict electronic crosstalk cancellation schemes for widely spaced loudspeakers. A simple, low cost, lightweight panel will still remain the best choice for critical listeners. Since Ambiophonics is a binaural based system, it does not provide the Blumlein loudspeaker crosstalk signal that furnishes the lowest frequency phase shift localization cues for recordings made with a coincident microphone. But to counterbalance this, Ambiophonics, or any crosstalk elimination idea, is more compatible than is standard stereo with the overwhelming majority of non-coincident microphone recording arrangements and the improvement in HF localization more than compensates for any loss in coincident mic LF localization. Furthermore, depending on its size and absorbency, the barrier (and even its electronic cousins) loses its effectiveness at low frequencies thus allowing some crosstalk and therefore amplifying LF phase cues for coincident microphone recordings. One can also move a little further back from the edge of the barrier or use a smaller panel when listening to coincident mic recordings. As in all realistic systems, room treatment is essential for a good result and I have found that reducing the room reverberation time to less that .2 seconds works well in this context especially if used in combination with very directional, diffraction-free, point-source, front channel loudspeakers as once recommended by Malcolm Hawksford in another context. Other Contrasts Between Ambiophonics and Ambisonics The really fundamental difference between Ambisonics and Ambiophonics is that Ambisonics attempts to fabricate the exact compressions and rarefactions, including their intensities and directions at each ear canal, by summing the outputs of a given array of sound emitters whose drive signals must be derived by computation from three f (front), s (side), directional velocity microphone signals and the o omnidirectional pressure signal. (Since most readers will not be familiar with the mathematical symbols used in Ambisonics I will use o instead of w for the omnidirectional signal, f for the front-rear x signal and s for the left-right side y signal. We ignore the 'h'eight (z) axis signal here.) In theory, there could be a playback computer that was fast enough to process such a three channel mic input with the accuracy needed to produce a perfect spherical wave front of say the fifth degree and the fourth order up to 15kHz. Each user would also have to load his personal pinna response curve into this computer to get the correct waveform at the entrance to the ear canal. Each speaker signal would then be convoluted by the appropriate direction-dependent pinna function. You would also have to place six or more speakers accurately, enter their delay and polar responses into the computer, and do something about room reflections. The recording medium would still require three discrete channels and so this powerful Ambisonic computer would not do much for non-Ambisonic recordings. So far, by restricting itself to relatively low frequency waveform synthesis, Ambisonics has been able to function at reasonable cost, please its adherents and seem a promising candidate as a standard for 360° surround sound for video or virtual reality applications that require height. In contrast, Ambiophonics does not try to create the exact sound field at the recording site, only one that could exist, that can be reproduced without generating localization contradictions, and one that can be accepted by the brain as real. The stage image heard or the hall ambience that the Ambiophonic computer generates may not be exactly Carnegie Hall or may not always be as the recording engineer remembers, but this system is doable now and works well with most existing recordings, even mono ones. This is not to say that the design of a stored ambience convolution computer is a trivial project, or that it will ever be a very low cost device, but using stored descriptions of existing halls makes the job a lot easier than starting a synthesis program from scratch and the concert hall auralization tools, already used by architects today, could be applied at once to a consumer product. Also, in contrast to Ambisonics, Ambiophonics does not require a known precise placement of ambience speakers and their polar radiation characteristics are not of critical importance. Remember that small changes in any ambient field are equivalent to small changes in the hall volume, shape, or finishes, or shifts in ones seat, or in the number of people in the audience. Psychoacoustic Fundamentals Related To Realism In Reproduced SoundOur problem is how to achieve realistic sound with the psychoacoustic knowledge at hand or suspected. For starters, the fact that separated front loudspeakers can produce centrally located phantom images between themselves is a psychoacoustic fluke that has no purpose or counterpart in nature and is a poor substitute for natural frontal localization. Any reproduction methods that rely on stimulating phantom images, and this includes not only stereo but most versions of surround sound, can never achieve realism even if they achieve localization. Realism also cannot be obtained merely by adding surround ambience to phantom localization. Ambisonics, Binaural, and Ambiophonics do not employ the phantom image mechanism to provide the front stage localization and therefore, in theory, should all sound more realistic than stereo and, in fact, do. Ambiophonic microphone arrangements could make this approach to realism even more effective, but I am happy to report that Ambiophonics works quite well with most of the microphone setups used in classical music or audiophile caliber jazz recordings. Adding home generated ambience, provides the peripheral sound vision to perfect the experience. Since our method is to just give the ears everything they need to get real, it is not essential to prove that the pinna (and I usually mean this word to also include the concha, the head and the torso) are more important than some other part of the hearing mechanism, but the plain fact is that they are. To me it seems inconceivable that anyone could assume that the pinna are vestigial or less sensitive in their own frequency domain then the other ear structures are in theirs. As a hunter-gatherer animal, it would be of the utmost importance to sense the direction of a breaking twig, a snake's hiss, an elephant's trumpet, a birds call, the rustle of game etc. and probably of less importance to sense the lower frequency direction of thunder, the sigh of the wind, or the direction of drums. The size of the human head clearly shows the bias of nature in having humans extra sensitive to sounds over 700 Hz. Look at your ears. The extreme non-linear complexity of the outer ear structures, and their small dimensions defies mathematical definition and clearly implies that their exact function is too complex and too individual to understand, much less fool, except in half-baked ways. The convolutions and cavities of the ear are so many and so varied so as to make sure that their high frequency response is as jagged as possible and as distinctive a function of the direction of sound incidence as possible. The idea is that no matter what high frequencies a sound consists of or from what direction a transient sound comes from, the pinnae and head together or even a single pinna alone will produce a distinctive pattern that the brain can learn to recognize in order to say this sound comes from over there. The outer ear is essentially a mechanical converter that maps discrete received sound directions to preassigned frequency response patterns. There is also no purpose in having the ability to hear frequencies over 10 kHz, say, if they cannot aid in localization. The dimensions of the pinna structures and the measurements by Møller, strongly suggest, if not yet prove, that the pinna do function for this purpose even in the highest octave. Møller's curves of the pinna and head functions with frequency and direction are so complex that the patterns are largely unresolvable and very difficult to measure using live subjects. Again, it doesn't matter whether we know exactly how anyone's ears work as long as we don't compromise on bandwidth, frequency response, loudness, distortion, and especially source directionality, at all frequencies, during reproduction. The Evidence For Pinna Localization Priority The above doesn't mean that we have to ignore all the research that has preceded us. The literature overwhelmingly supports the view that localization for broadband sounds at frequencies over approximately 1.5 kHz., is based on single pinna, dual pinnae and the HRTF (Head Related Transfer Function) and is stronger, (more accurate is a better word) than the localization ability of the ear at frequencies below say 600 Hz. (In this and my other papers on this subject, I try to use the term HRTF to refer only to head and torso effects that modify sounds before they reach the outer ears.) I believe the references referred to below, even support the notion that localization accuracy is directly proportional to the frequency of complex music-like sounds which goes a long way toward explaining why transient localization is so strong. It also explains the Franssen Effect where sound is localized to the source sounding the transient part of a complex signal that has been broken up into two parts, one the transient and the other the continuing lower frequency sinusoid. See Blauert, Spatial Hearing. William B. Snow in Basic Principles of Stereophonic Sound, 1953, as reprinted in Stereophonic Techniques, states "for impulsive sounds such as speech or clicks, differences as small as 1° or 2° can be perceived." He goes on "The intensity differences (at the ears) due to diffraction are functions of frequency and cause a complex sound to have a different frequency-intensity composition or quality at each ear. It is undoubtedly this effect which removes ambiguities in direction because the diffraction effects are so complicated that a given quality difference can correspond only to one direction." Unfortunately, Snow never used the word, pinna, but he does say "however, in the higher frequency region, intensity differences produced by the diffraction or sound shadow effects of the head and external ears become great enough to give angular localization." But you say, Snow does not say one mechanism is stronger than another although his use of the word, clicks, strongly implies this. Fair enough. An earlier bit of research in England by James Moir in Oct. 1952 in Audio Magazine as reprinted in Stereophonic Techniques is even more explicit on this point. (See the complete bibliography now on my web site http://www.ambiophonics.org). In his Table Two he reports on the accuracy of location as a function of the frequency band of filtered male speech used as a test signal. For a frequency band of 50 to 500 Hz the average localization error was 3.8°, for 500 to 3000Hz the average error was .9°, and for 3000 to 7000Hz (a rather restricted bandwidth) the average localization error was an astonishingly low .5°. Furthermore, although Moir did not comment on this phenomena his last table entry for 50 to 7000 Hz wideband speech shows a slightly greater error of .7°. One could infer from this result, that in the presence of sufficient high frequency localization cues, the lower frequencies just get in the way. Don Keele Jr., in AES preprint 2420, Nov.1986, says "We used wide-band pink noise for the input signal in all carrier tests. An interesting phenomena that we observed, was the breakup of the sound image. Changes in amplitude and delay are effective in shifting the image only at certain frequencies: Up to 700Hz, for delay and greater than 2000Hz, for amplitude with the region between 700Hz and 2000Hz effective for both in combination. At times we would perceive the low frequencies staying at the origin and the high frequencies shifting or vice-versa. The soundfield (with barrier) extends much beyond the typical stereo arrangement of 30° to the left and right, however the goal of a 180° soundfield was not met. The amplitude panned data show that the image shifted in direct proportion to the amplitude differential, out to roughly 50° or 60° off axis. The delay panned data is similar in that an image shift limit is found to occur at roughly the same angles. This image shift limit noted in both amplitude and delay panned data could be due to two possible reasons: Imperfect blocking of the crosstalk signal by the barrier and the effect of the ear's pinna on the frequency response of the received acoustic signal..... In the second case, the barrier-speaker setup generates acoustic signals that always reach the listener coming from directly ahead. The ears are not receiving the correct frequency response cues due to pinna effects, etc. that signals coming from large off-axis angles would have. This means that additional processing to include these effects may be necessary to swing the signals further around to the side." Don't Tolerate Pinna Privation! In my own experience, the pinna clearly outvote the lower frequency localization senses. Try the experiment outlined at the Ambiophonic web site to test your own pinna power. The internalization of the binaural sound field reproduced with earphones is another good example of the pinna riding rough shod over the interaural delay cues. It is true that the binaural image does spread from ear to ear, but is this accuracy realistic? Blauert in Spatial Hearing, on page 49 confirms the everyday observation that excellent broadband localization is possible even for people totally deaf in one ear. One eared hearing cannot possibly use interaural phase mechanisms or interaural intensity cues. Thus there exists a strong non-interaural localization mechanism that cannot simply be ignored. The ability of pinna equalizing boxes such as the Klayman NuReality device to move images freely, despite the presence of unaltered low frequency cues, is also indicative of pinna power. Why is this relevant to surround sound for music or to Ambisonics? Well, if the pinnae are as important as the literature and I suggest, then the reconstruction of the Ambisonic plane wave must be accurate to beyond 10kHz and this is probably not achievable. Let me quote from Vanderkooy and Banford, Ambisonic Sound For Us, AES Preprint 4138, Oct. 1995. "the benefits of the Ambisonic system rapidly decline with increasing frequency." Or in Oct. 1987, Vanderkooy and Lipshitz, AES preprint 2554, "We show that it is only in the low frequency regime, below maybe 700Hz that the spatial region within which an Ambisonic system will reasonably-well reconstruct the traveling wave which would correspond to a real acoustic source, is large enough to encompass the head of a central listener." Similar considerations will likely apply to most of the proposed multi-channel recording standards now being considered, as far as the realistic reproduction of acoustical musical events is concerned. Ambiophonic Recording for Realism One can heighten the accuracy, if not gild the lily of realism, of an Ambiophonic reproduction system by taking advantage in the microphone arrangement of the knowledge that in playback, the rear half and side hall ambience will be synthesized, that there is no crosstalk, that listening room reflections are minimized and that the front loudspeakers are relatively close together. For political reasons and as an educational exercise, we can use Ambisonic nomenclature to describe such an arrangement. The sound waves at a given point in space can be completely captured by placing three imaginary microphones simultaneously at that precise spot, again ignoring height. One of these microphones is a pressure microphone whose omnidirectional output (o) is simply proportional to the instantaneous value of all the compressions and rarefactions at that point and moment adding and subtracting. A single unobstructed o microphone signal inherently contains no directional information (although it could if it were baffled). The second microphone is a figure eight (velocity) microphone pointing straight ahead and straight behind. The output of this microphone (f) is amplitude sensitive to the direction from which the soundwave comes and declines to zero in cosine fashion as the sound moves from directly in front to directly at the side. In other words a velocity microphone is a direction-to-amplitude encoder. Such a microphone is similarly sensitive to sounds coming from the rear, but the polarity of such signals is inverted. The third microphone (s) is a second identical figure eight microphone so oriented that it is most sensitive to sounds from the left or right and has zero output for signals dead ahead or dead behind. We will ignore, for the moment, the frequency response and other aberrations of real microphones and the difficulty of actually making three or even two microphones truly coincident by mechanical means alone. It would be very simple, in theory, to combine Binaural and Ambisonic methods to produce Ambinaural by using two headspaced o,f,s microphones, six recording channels, and one Ambisonic speaker decoder for each ear. Ideally one would want the playback decoder for each ear to be able to completely determine the sound at each ear but, in reality it is almost impossible to prevent one set of speakers from also communicating with the wrong ear but one could use the Ambiophonic panel to isolate the two sets of speakers. Ambiosonics? The decoders in this six channel system would only have to generate a frontal wavefield at one ear from signals in one 90° quadrant, perhaps using three speakers on each frontal side, which is much easier to do than for the general case of an entire circle since the rear half of the ambient field is synthesized in our case. Since all of the decoded signals would be coming from more or less the proper directions the pinna angle distortion problem would not be serious. This is the brute force, cost-no-object, method and indeed the recent ARA proposal, for DVD audio, suggests a six channel recording format but they don't have anything like Binaurosonics in mind, yet. < Part II < | > Part IV > | ^ Back to Top ^ |