Header image  
by Ralph Glasgal
line decor
Home Tutorials Tech
Kudos and
Demos Bio Free Ambio
Glossary The Home
Concert Hall
Rec Engineers
FAQ/Forum Links Contact us
line decor

Audio Engineering Society
Convention Paper

Spatial Definition and the PanAmbiophone microphone array
for 2D surround & 3D fully periphonic recording

Robert E. (Robin) Miller III ©2004
FilmakerStudios, Bethlehem, Pennsylvania 18018, USA

Presented at the 117th Convention
2004 October 28-31 San Francisco, CA

This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society.  For a PDF version of this paper (0.5 MG), click here.


Higher sampling rates are necessary for high spectral resolution, but it is higher angular resolution and precision that preserves source directionality, and therefore higher tonal/timbral quality of that source, termed spatial definition. In acoustic spaces that are extensions of musical instruments, voices, and sources of sound effects (for movies, virtual reality, training simulation), tonality is a major contributor to lifelike perception – but in audio reproduction, lifelike tonality is limited by the recording system. A surround microphone has been developed both for more precise 2D surround (“PanAmbio”), compatible with ITU 5.1 and stereo, and for “PerAmbio” 3D (with height) for the ultimate in tonal reality distributable using ordinary 6-channel media for either decoderless 2D replay or 3D with decoder and five additional speakers.


The goal of Ambiophonics [1,2,3,4] is more lifelike reproduction of sound. This ongoing pursuit was spawned by acknowledged issues of reproduction accuracy with stereo and stereo-based surround systems, including ITU 5.1 [5,6]. While 5.1 has improved over stereo as stereo improved over monaural, the situation is far from perfect for critical music or movie listening, or for realistic virtual reality (VR) and training simulation. Nevertheless, it is a “stereo world,” and 5.1 is well accepted, so compatibility is of importance to users, as is extensibility as their needs change in the future. But beyond higher sampling rate or bit depth, more lifelike sound is the product of other perceptual qualities termed spatial definition, the first subject of this paper.

Second, for developing a microphone for high spatial definition recording systems, the objectives were:

  • Lifelike spatial and enveloping sound;
  • Accurate source localization and timbre;
  • Relative ease of use with consistent results and future value of recordings for music, movie, broadcast, VR, & training simulator industries.

1.1. Lifelike, spatial, enveloping sound

We engage in discussion about “high resolution” audio, meaning using higher sampling rates, pushing upward the Nyquist limit of content frequencies and temporal resolution. But it may be argued that just as important Miller PanAmbiophone for 2D & 3D surround recording is more precise spatiality, and resulting lifelike envelopment and correct tonality (timbral quality), of sources in acoustic spaces – termed “high spatial resolution” or “high spatial definition.”

Contrasted with 2 channel stereo, 5.1 surround developed for home movie theaters offers advantages also for music reproduction. However, phantom images between 5 or more main speakers are highly imprecise, especially to the sides but well outside the cone of confusion. Add that arrivals from above and below are projected onto the horizontal plane of 5.1 speakers, placing listeners in the center of a circle, not the sphere of lifelike hearing. High spectral resolution alone cannot compensate if absent is high spatial resolution of indirect energy that may equal or exceed that of direct sound in ambient recordings.

Listening taste for recorded music and sounds seems to fall in one of two camps: they are here, or you are there. In the first case, sources are usually recorded using closely-placed microphones – one per instrument or section – and “dry” (inaudible venue acoustics). Often with this practice, the listening room acoustic is more reverberant than the recording, so any recording space “disappears” and the instrument “appears” inside a speaker box. Even if “artificial” reverberation is added, it is often sufficiently ”disembodied” from the source so that the source still seems to be in the listening room. Although unnatural, such artificial “intimacy” is the learned hallmark of much popular music that has become de facto the audience’s/market’s acquired taste.

The second case – you are there – is the more natural state where the reproduction is, to the greatest extent possible, like the experience of being present at the recording. Now with the original venue’s acoustics unmasked by the listening acoustics, the listener feels “transported” to the concert hall, movie location, or jet fighter cockpit. Rather than every recording sounding like the listening room (boring?), each recording sounds more like its real venue (you get to travel). Assuming constant listening acoustics, the recording engineer has control over whether the musicians are here or the listener is there more-or-less by varying the venue-spatiality of the recording [7,8]. If the spaciousness of the recording is significantly less than that of the listening room, they are here; if the spaciousness of the recording is the greater, you are there. But this choice will not be best accomplished by superimposing disembodied artificial reverb – the recorded spaciousness must be perfectly matched reverberation (including early reflections), possibly by convolution with actual hall impulse responses, or directly recorded – the work of a spatial microphone technique that is the latter subject of this paper.

1.2. Accurate localization & timbre

Localization is important for more than direct sound. True, in life it is important to know when a bus is bearing down on you and from what direction. And matching screen direction is important for movies. But in acoustic spaces, preserving directionality of each echo arrival also preserves the timbre of that arrival. This is because of our unique spectral “coding” – or coloring – imprinted upon each direction by our individual HRTF. Including pinna effects, height as well as horizontal directions are interpreted in our brain by learned association between HRTF-filtered sound and experienced source direction.

Although our acuity for height is less (±10°) than in the horizontal plane (±1°) [9,10], when height is included in the timbral mix in our brain, we are in the center of not just a circle, but a sphere. Therefore the definition of envelopment and spatial definition is not just circular, but spherical. Each room reflection, arriving at everlater times and colored ever-differently by our pinna to represent different directions, develops a composite timbre over the reverberation time of the room. When the source ceases, the sound collapses timbrally in the same complex order of tonal changes, as each reflection ceases over time and direction. Thus, musicians in the same room artfully form each note, phrase, even pause of their performance. Listeners in the same space regard subsequent reproduction as “correct” if this complex tonality-in-time is preserved, which means it must be preserved in directionality. Any height cues contributed by the listening room are invalid. Therefore, accurate recorded localization is key to the lifelike quality of sounds – and the “musicality” of music – and therefore key to high spatial definition.

These spatial qualities have been recognized in other advanced audio reproduction systems, such as (3D) Higher Order Ambisonics (HOA) [11,12,13,14] and (2D) Wavefield Synthesis (WFS) [15,16]. However elegant and promising, their requirements in terms of processing and number of channels may be impractical for home use for some time [17,18]. (On the other hand, “PerAmbio 3D/2D,” discussed in 2.1 below, exhibits high spatial definition and is practical today, Miller PanAmbiophone for 2D & 3D surround recording requiring just 6 channels distributable on available media (DVD-A, SACD, DTS-ES CD) and 10 speakers for fully periphonic 3D [19,20,21]).

Fig.1 - The PanAmbiophone recording the Greenwich Village Symphony, New York City.
The experiment used no outrigger or spot microphones for ease of use capturing natural surround.

1.3. Ease of use – production & post

Errors in localizing phantom images were known to stereo’s inventor, Alan Blumlein, but were necessary to the very way stereo worked (in transforming level difference into phase difference). The work-arounds are well-known to recording engineers, and are applicable in 5.1 as well. Capturing more accurate spatial signals in the original recording is inherent in the new microphone that is latter subject of this paper.

With stereo’s acceptance, recording engineers developed ways to overcome problems of “hole-in-themiddle” (“bunching” at the speakers) and comb-filtering of important central (solo) sounds, developing use of such tools as equalization and spot microphones. As mentioned above, use of spot microphones has led to the popular taste for they are here, and has become standard practice for music recording and sound reinforcement, despite the other complexities of mixing, especially when done live. However, mixing any two or more microphones, each contributing different replicas in time of the same source, results in complex errors of tonality and imaging (due to comb filtering and time smearing). Advanced multi-microphone mixing techniques such as Room Related Balancing [22] can help mitigate these errors. But being able to record without (or with fewer) spot microphones would greatly simplify production or post-production and avoid degradation, and is inherent in the approach of the high spatial definition main-microphone described below.

1.4. Future value – compatibility with 5.1

As of this writing, it is still a “stereo world,” although 2D surround reproduction such as ITU 5.1 is becoming more accepted, popularized by movies and DVDs. And surround music is a natural potential market for home theaters. Great future value awaits content producers, musicians, and home theater owners who demand a system that could produce stereo, 2D surround, and future 3D surround (with height) – and that exhibits backward/forward compatibility. The family of Ambiophonics adopts compatibility to the greatest extent in the pursuit of lifelike 2D and 3D reproduction, as described below.

1.5. Non-audio application

Music and movies for home theater would benefit from the most lifelike reproduction. However, virtual reality (VR) for gaming and simulation for training are other important applications, especially if the system has sufficient accuracy of localization in 3-space (3D, with height), such as PerAmbio 3D [19,20,21, Appendix].

Next Page >>

Article Pages 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9