Header image  
by Ralph Glasgal
line decor
Home Tutorials Tech
Kudos and
Demos Bio Free Ambio
Glossary The Home
Concert Hall
Rec Engineers
FAQ/Forum Links Contact us
line decor

Audio Engineering Society
Convention Paper

Contrasting ITU 5.1 and Panor-ambiophonic 4.1 Surround Sound Recording Using OCT and Sphere Microphones
Robert E. (Robin) Miller III ©20021

1 FilmakerStudios, Bethlehem, Pennsylvania 18018, USA

Presented at the 112th Convention
2002 May 10-13 Munich, Germany

This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society.  For a PDF version of this paper (0.5 MG), click here.

Page 5


Fig. 16.
Two (seated) AES 111th Conv. attendees hear PanAmbio surround at the Ambiophonics Institute. The back speaker-pair is silhouetted in front of two gentlemen in back. [back]

For AES 111th, December, 2001 tour of the Ambiophonics Institute (Figure 16), the author prepared two DTSencoded audio CDs titled PerAmbiolating 360° (pun intended), one in ITU 5.0 and a companion in PanorAmbiophonic 4.0 [13]. A ".1" LFE channel was considered unnecessary for musical demonstrations. Recorded in April, September, and October 2001, artists and venues were Lehigh University Opera at Zoellner Center for the Arts, and Martin Guitar Quintet, Satori Flute Quartet, & Mainstreet Brass at FilmakerStudios, Bethlehem PA, USA. Selection numbers in parenthesis ( ) below indicate pre-crosstalkcancelled versions on the PanorAmbiophonic disc, so no special hardware is needed for evaluation -- just temporarily moving four speakers (C unused) of a 5.1 layout. Except Parade, comparison PanAmbio and OCT 5.0 recordings were made simultaneously with OCT and Ambiophone microphones described earlier, with source locations and description of audible effect upon replay as follows:

1 (&7) Barber of Seville Sitzprobe - 1:58

Recording Angle 120° front, hall back

The first rehearsal with soloists, chorus, and orchestra of a mixed professional/student production. Hall is 9,200 m„ with RT=2.1s and 3.77m (calculated) room radius. Room microphones are side-facing figure-8s back 10m (no delay). A spot microphone for soloists is mixed according to Room-Related Balancing.

In the benchmark PanAmbio 2/2 playback, individual instruments and voices are distinctly localizable and widely spread, nearly equal to the 120° recording angle. The spatial impression is "natural-sounding" with front and rear stage seamlessly integrated, but dependant upon listener taste for the relative back level. In contrast, the ITU 3/2 playback over five identical speakers - 2-way with 10in (250mm) woofer -- exhibits "commercially acceptable" (some listeners claimed "the best theyíd heard") spatial impression and envelopment with plausible localization, albeit across a compressed front stage, 60° L-to-R, but over a much larger and stable listening area than either PanAmbio or two-speaker stereo.

2 (&8) Lunchbreak at Martin Guitar Blues - 1:59

Quintet 0°, ±30°, ±60°, fans sides & back

Simulating a jazz club (or "unplugged" telecast) with bluegrass quintet and audience, the studio is 500m„ with modal profile shown in Figure 17, RT=0.31s (controllable, chosen to mimic a performance space) and with players in a 120° arc of approx. the measured 3.2m critical distance (room radius). Instruments from left to right are bottle (slide) guitar, acoustic bass guitar, fiddle & vocal, 6-string rhythm guitar, and 12-string guitar & harmonica. Eight fans, positioned as shown in Figure 4 hoot, clap, and clink glasses.

The benchmark PanAmbio 2/2 playback has the effect, astonishing at first, of replacing the listening environment with the recording environment, achieving a remarkably natural "you are there" result -- see Figure 4. In ITU 3/2 playback, the listener is enveloped in a quite plausible club atmosphere, notwithstanding the less precise localization, as shown in Figure 5.

3 (&9) Mozart Wrap-a-Rondo in F - 1:42

Flute quartet ±20°, ±60°, room back

A chamber quartet in the 500m„ studio with modal profile shown in Figure 17, RT=0.31s (controllable, chosen to mimic a recital hall) and with players in a 120° arc the measured 3.2m critical distance (room radius) - from left: violin, viola, cello, and flute.

The benchmark PanAmbio 2/2 playback is a bit unsatisfying in its unequal representation of directional (string) and omni-directional (flute) in the live studio, possibly because the systemís capability has created higher expectations. In contrast, the ITU 3/2 seems more acceptable in this regard, although the author feels that, in a commercial recording situation, a retake should be indicated with adjustments to acoustics and positioning. It is included on the evaluation CDs to study these error conditions.

4 (&10) Sousa's Fairest Brass - 2:37

Brass quintet 0°, ±30°, ±60°, room back

Recorded April, 2001, for AES 19th International Surround Conference, June, 2001, in the 500m„ studio with modal profile shown in Figure 17, RT=0.31s (controllable, chosen to mimic concert stage-house) and with players in a 120° arc of approx. the measured 3.2m critical distance (room radius) but with ORTF room microphone. Instruments from left: 1st Trumpet, French horn, Tuba, Trombone, and 2nd Trumpet.

In benchmark PanAmbio 2/2 replay, the more directional instruments are slightly narrower than their recorded positions across the total 120° stage due to an earlier prototype Ambiophone (larger diameter sphere). The rearward-speaking French horn, as might be expected, is only vaguely correct. In contrast, the ITU 3/2 replay is "commercially present," although images are confined to the 60° front L/C/R speakers. Both envelop the listener with room ambience.

5 (&11) SPL Setup & PerAmbiolating 360° - 4:36

Voice ea 15°; quartet ±45°, ±135°

The "Walkabout" was recorded in the 500m„ studio with modal profile shown in Figure 17, RT=0.31s and with the announcer perambiolating (pun intended) the twin baffled sphere microphone array at a radius of 2.5m. To parallel real-world conditions and the recordings above, studio acoustics were adjusted to replicate the stage house of a concert hall, with early reflections <15ms limited to those from horizontal planes (the floor), so their virtual "images" arrive at the same horizontal angle as their direct sound [17].

The benchmark PanAmbio 2/2 replay localizes announcements to the nearest 5° around all 360° with some "fuzziness" near 90° on each side. Accompanying bursts of filtered pink noise are more difficult to locate, but provide data for Figures 10a, b and 11a, b. In contrast, the ITU 3/2 replay exhibits maximum error of 45° (75° each side is solidly reproduced by a speaker at 30°) as is illustrated in Figures 18 & 19a, b. In both systems, the quartet, now surrounding the array at the corners of a square, are difficult to localize for reasons postulated above.

Fig. 17
. Lowest 50 eigentone modes of studio where experimental recordings were made. RT=0.31s (controllable). [back]

6 (&12) Marching Bands on Parade - 3:40

Subject Angle 180°; recreated surround

Unlike others above, this excerpt illustrates "upproducing" surround from a 2-channel stereo field recording using editing and mixing of original and additional processed tracks such as for film mixing. For ITU 3/2, L/C/R is derived after Gerzon [18]. To evaluate creative potential in post-production, surround is six effects tracks derived from the original stereo, edited and processed to simulate crowd and building echoes. The illusion has been successful with all trained listeners to date.

In benchmark PanAmbio 2/2, the result is plausible envelopment of a listener standing on the sidewalk while bands march by in the street, beginning extreme right and continuing to extreme left, with cheering and building echoes around and behind. Groups of instruments are heard to move smoothly (no perceptible angular distortion) across right-of-center through center to left-of-center to a degree of realism that the listener can readily imagine it. In contrast, ITU 3/2 replay of course is confined to the 60° triangle, but creates a satisfactory illusion nonetheless. In further contrast to traditional two-speaker stereo replay, the ITU 3/2 result exhibits less angular distortion, with no perceptible "hole in the middle."


While these demonstrations involve acoustic sources, principles apply to ambient popular instrumentation. Expect them to lack the resources, retakes, and approval layers of a commercial release. Risks were taken for purposes of discovery and testing limits of techniques in order to serve artistic purposes to follow. Even high-end reproduction systems will be tested by the tracksí raw dynamic range; there is significant content at 15Hz in the opera track by an enthusiastic student bass drum player and believer in subwoofers!

Except Parade, recordings were made with no level compression, effects, or equalization (except filtration for OCT lows). No panned mono spots except opera principals (Room-Related Balancing using time delays). The sole exception, "Parade of Marching Bands," is a single stereo Ambiophone synthesized to PanAmbio and ITU 5.0 surround in post-production. Intended to test "up-mixing" from a simple two-channel field recording, the synthesis comprises six stereo tracks of crowd loop, spot crowd FX, and delayed & low EQíd "building echoes" to create the surround.

Replay of the ITU 5.1 evaluation disc requires a DVD player and 5-channel receiver capable of decoding DTS to five speakers in the standard ITU-R BS.775 layout. Replay of the PanorAmbiophonic disc requires two pairs of closely spaced speakers at ±10° and ±170° and crosstalk cancellation using DSP, mechanical barriers, or pre-crosstalk-cancelled cuts 7~12 on the PanAmbio evaluation CD, as summarized in Appendix A.

Recording Level Calibration

For any multi-channel production, recording levels are critical and must be maintained -- or their changes precisely controlled - in post-production and distribution. In essence, to preserve localization, the record-to-reproduce chain must exhibit constant relative channel levels from instruments to ears. Microphones vary in sensitivity even within the same model, and preamplifiers often have uncalibrated variable gain. Once analog signals reach studio level, usually +4dBu (ref .775 vrms) - or --15 to --20 dB FS depending upon the digital standard chosen - levels can be preserved by good practice. The comparison recordings above relied on the technique in Appendix B to calibrate multi-channel recording, beginning with acoustic source levels.

Once tracks are recorded, levels can be preserved or varied in post-production according to artistic choices. For OCT where omnis have been recorded for bass compensation of the supercardioids, the author has found in-band gain identity is a good starting point (used in the experiments above) after low-pass filtration at 100Hz. Similarly, room array contribution can begin at identity gain with front channels and then varied to taste. Note that spot microphones mixed using Room Related Balancing often contribute sufficiently several dB below identity with front channels (as demonstrated in the opera experimental recording).

Compatible Surround Production

The market for ITU 5.1 surround music seems assured as of this writing, yet enthusiasts exist for whom "compromise" is not to be heard, literally. This niche is typically high-end and would likely be interested in PanAmbio as an alternative for personal listening.

Costs for surround production are higher than for less complex stereo, and would be higher yet providing for two surround formats. Finite dollars, bandwidth, and numbers of channels both for recording and distribution have already led the author to dualpurpose approaches with managed compromises. For AES 19th in Bavaria, June 2001, the authorís experimental CD Ambiophonic Surround Sound Demonstration contained front-only Ambio derived from INA/MMA and OCT as well as the Ambiophone sphere. For the ITU 5.1 DTS audio CD, LS and RS channels in the studio were derived from the PanAmbio back sphere. Conversely, several room mic configurations yielded useful PanAmbio LB, RB (similar to the way Ambiophony works with many existing stereo recordings). For compatibility with two-channel stereo, "Parade" derives both 5.1 and PanAmbio from a two-channel field recording, typical of film location effects. PanAmbio pairs "fold" front to perfect stereo, equivalent to a single unbaffled sphere.

New experiments have developed during post-production an acceptable PanAmbio front stage from OCT, a 5.1 front stage from the PanAmbio front sphere, and surround for 5.1 or PanAmbio by hall convolution. More work is planned to distill these combinations to straight-forward procedures.

Compatible Surround Delivery

To facilitate both ITU and PanAmbio listening requires moving four speakers (or switching nine) -- the home theaterís subwoofer, receiver/decoder, and universal DVD player are the same (C unused for PanAmbio). Distribution formats for PanAmbio can be the same DTS-CD, DVD-V, DVD-A, SACD, or multi-channel broadcast using AC-3 (Dolby Digital) of the DTV standard. PanAmbio is unsuitable for large audiences such as the cinema. Requirements for PanAmbio replay are in Appendix A.

Listening Environment

For any surround approach, the listening environment is critical if precise localization is expected -- especially for PanAmbio because its subtle capabilities can be more easily destroyed. Informal evaluations below were made in a control room, a home theater, a large and live demonstration room, and an automobile -- with better results obtained in better acoustics. Generally, the listening room should be symmetrical or acoustically treated and "drier" than the recording venue. See Appendix A.

PanAmbio as a tool for better ITU 5.1

Ultimately, post-production decisions and approval of a final product must be made while monitoring in the delivery format. However during recording of a 5.1 production, monitoring using Ambiophonic (front only) or PanAmbio surround techniques has demonstrable operational advantages. As a location recording engineer and television "A-1," the author uses a custom portable Ambiophonic monitoring system. Its compactness suits off-base production trucks. Its "you are there" capability transforms the usually tiny, acoustically alien audio booth into the performance space. Subtle panning adjustments of spot microphones, panning errors with respect to main microphones, and phase errors are revealed and can be dealt with quickly in the heat of the session or live telecast. On replay, musical directors can discern individual voices and whether a "natural" blend and impression of hall ambience has been captured. Especially for music recording, there is often less to "fix" in post, lowering costs.


Only the three engineers present at the recording sessions could compare the results with the live events. The need for evaluation of recordings by trained listeners (cf. students often recruited for listening tests [16]) was observed during AES 111th Convention demonstrations, where trained and untrained attendees simultaneously perceived far different source directions.

Initial results show data from several trained auditioners, with the moderator using the form in Figure 18. Future work will require double blind analysis -- the evaluation CDs divulge the angle so they can stand alone -- and use a statistically larger group [19]. 

With ITU 5.1, trained listeners to date report critical front stage localization is compressed angularly in half, but with less angular distortion than two-speaker stereo, including less "hole in the middle," and less nearer-speaker toggling anomaly for listeners off center as shown in Figure 19a, b. When possible, sources might be positioned during recording to compensate for any objectionable "relocation." In contrast, PanAmbio reproduces original directions nearly linearly (to the nearest 5°).

Fig. 18
. Form for moderators to report where to the nearest 5° trained listeners perceive voiced angles on the ITU 3/2 and PanAmbio 2/2 evaluation DTS-CDs [13]. Listening environment reverberation must be less than recording studio (RT=0.31s). More formal listening tests are planned. [back]

Fig. 19
. Perceived localization around a) entire 360° horizontal plane and b) 180° front stage - ITU 3/2 vs. PanAmbio. Listeners reported to thenearest 5° that ITU 3/2 is "ambiguous" at ±90°, ±105°, ±120°, and ±150°. PanAmbio approaches the ideal straight line but is "fuzzy" nearing ±90°. [back]

Note that to different degrees, both ITU and PanAmbio lack focus at the sides, within the "cone of confusion" of human hearing if the listener does not rotate his/her head. 5.1 exhibits spectral "tearing" [2] for phantoms in two 80° sectors between L & LS and between R & RS due to the HRTF of human hearing (rotating the head to ±70° restores side phantoms). Localization is reported "ambiguous" at ±90°, ±105°, ±120°, and ±150°. No front-back confusion was reported for 180° unless the listener is off center, in which case all back phantoms toggle to the nearer of LS or RS, as in two-speaker stereo -- the situation addressed by the center-back channel/speaker of 6.1. Recordists should exercise care placing critical voices in these sectors.

In contrast, PanAmbio suffers ambiguity, coloration, and pinna confusion within two 30° sectors at ±90° (rotating helps confirm direction, but translating off center inhibits crosstalk cancellation). The consensus is that, toward the goal of a natural illusion of spaciousness, envelopment, and localization, PanAmbio is superior for critical personal listening, but ITU 5.1 is the choice for a group.

<< Previous Page | Next Page >>

Article Pages 1 | 2 | 3 | 4 | 5 | 6 | 7

^ Back to Top ^