Header image  
by Ralph Glasgal
line decor
Home Tutorials Tech
Kudos and
Demos Bio Free Ambio
Glossary The Home
Concert Hall
Rec Engineers
FAQ/Forum Links Contact us
line decor

Audio Engineering Society
Convention Paper

Scalable Tri-play Recording for Stereo, ITU 5.1/6.1 2D, and Periphonic 3D (with Height) Compatible Surround Sound Reproduction - Page 4
Robert E. (Robin) Miller III, FilmakerStudios, Bethlehem, Pennsylvania 18018, USA


Three elements define the system for delivering 3D surround audio in compatible 2D form:

  1. Microphone array for 3D/2D recording;
  2. 'Shiny Disc' 3D/2D transformation;
  3. Home Theater replay - owner's choice of:
    1. ITU 5.1/6.1 standard layout for 2D replay;
    2. Speaker array & decoder to reconstitute 3D
This systems approach gives both the producer and the consumer practical choices and added value: For the producer: whether to capture and when to deliver a transformed 3D/2D release of higher value because it plays today in 5.1/6.1 and in the future in 3D (or may be re-released as 3D). For the consumer: whether to purchase a 3D/2D release and, separately, when to augment his/her 5.1/6.1 home theater to reconstitute 3D (by adding four or five small speakers and a decoder).

6.1. 3D/2D Microphone Recording Array

PerAmbio 3D/2D uses six microphones feeding six recording channels. A second approach is to record just two microphones and convolve 3D ambience in post or replay [16]. Additional spot microphones are possible. However, when lifelike 3D is replayed, it is evident that spot mics are needed less, if at all. This is also the case with main-microphone techniques for 5.1/6.1 - revealing to many veteran recordists that the need for spot mics in the first place was to fix stereo's shortcomings.


Fig. 3. Early prototype PerAmbio microphone array combines a baffled and pinna-less ellipsoidal head, or Ambiophone, with 1st-order Ambisonics microphone using discrete capsules (B-format). (A back-facing sphere is also shown.)

Fig. 4. PerAmbio 3D/2D microphone array combines a baffled pinna-less ellipsoid Ambiophone built by the author and a 1st-order (B-format) Ambisonics microphone atop realized using high quality discrete capsules. Note that the transducers of the Ambiophone are at human ear positions.

Figures 3 & 4 show a PerAmbio 3D microphone array. The front stage, to be reproduced using a crosstalk cancelled Ambiopole pair, uses a baffled and pinna-less sphere [19], ellipsoid [20] or Ambiophone. Envelopment is recorded as 1st order B-format (W, X, Y, Z channels) using a Soundfield or discrete microphones (shown).

As with any recording, it is important to position the array to taste by monitoring in both 3D and 2D. Spot microphones may be mixed and panned within front channels FL, FR as they do not affect 3D reconstitution-although panning through the center produces non-binaural phantom images. A hard C channel results from the B-format transform and contributes a stable center image when moving around the listening space, as with 5.1/6.1.

Alternatively, B-format ambience can be sourced by 3D auralization, convolved during post-production or by the user from a library of 3D hall impulse responses. Ambisonics is summarized in Appendix B. In Appendix A, Ambiophonics after Glasgal et al [11, 12, 13, 14, 15, 16] contributes a 120-degree wide stage with ±5degree imaging accuracy using a closely spaced "ambiodipole" speaker-pair and RACE crosstalk cancellation.

6.2. Transformed 3D Tri-play 'Shiny Disc'

Although it would be trivial to simply deliver the six native PerAmbio channels instead of standard ITU 6.0 channels, it would repurpose these channels entirely. Instead, a lossless algebraic matrix transforms the six 3D channels to play compatibly without decoding and  flattened to standard ITU 5.1 or 6.1 horizontal 2D surround speaker layouts (Fig. 5 & 6). Or the disc can be decoded to PerAmbio 3D full sphere surround by recovering the Ambisonic B-format plus a binaural-based Ambiophonic front stage ( Fig. 7). The PerAmbio 5.1/6.1 transformation is linear and bi-directionally lossless. Users can implement full sphere 3D surround by adding a matrix decoder and 4 or 5 speakers. 

Fig. 5. Recording perspective using Transform mode
i that encodes 3D for direct replay in ITU 5.1/6.1 (see Fig.8a,b) and future 3D reconstitution.

Fig. 6.
PerAmbio replay in 2D flattens to ITU 5.1/6.1 as speakers lie in the horizontal plane. Only listening room reflections contribute height, but these are invalid, so localization and spatial sense are as in ITU 5.1/6.1. Tri-play PerAmbio 3D/2D recordings deliver 5.1/6.1 surround or stereo until the user implements 3D.

Fig. 7.
Implemented in 3D (bi-square shown) preserves correct directionality  using 10 speakers + subwoofer(s). For serious music listening, all sit in the median plane. For 5.1, viewers move back 26% of the speaker diameter, where angles match ITU standards ( see Fig. 14).

6.3. 3D/2D Recording Transform Modes

If metadata permitted, the recording engineer could have available all 80 combinations   considered for encoding 3D directionality into 6 full-range ITU compatible media channels for direct replay in 5.1/6.1. For 3D replay, decoding corresponding to the recording mode is implemented in DSP. It is also possible for users to download new matrices via the Internet.

The author has identified [ 7] six useful modes for music recording, cinema ambience, and multichannel broadcast (Fig. 8a, b). Work is ongoing to refine these selections. A mode chosen during recording may be changed in post-production, or by a user with a smart decoder, reconstituting original channels and making a new transformation. Changing the tilt of a raised (suspended) microphone is also easily done. In DVD-A mastering, a flag is set in metadata of the tri-play 3D/2D disc for automatic selection by replay equipment.

Fig. 8a.
  Elevations of PerAmbio Transformations i, j, & k to ITU 5.1/6.1 of amphitheater, concert with soloist & audience, and arena. Source is right.

Fig. 8b.
Elevations of tilted modes i, j, & k that transform PerAmbio 3D to ITU 5.1/6.1 for opera, drama, and organ behind choir (source is right).

For ease of use, mnemonics describe the three basic modes, i, j, & k, in terms of channels C, SC, SL, & SR: "i" represents C and SC inclined upward while SL & SR incline downward. "j" juxtaposes these pairs. "k" lying on its back has C and SC angling upward from the corner channels, which lie flat.

Three tilted variants i, j, & k rotate C, SC, SL, and SR with respect to L,R by any practical angle, e.g. ~30-degrees, in order to raise the microphone (suspended or on a high stand). The output of the baffled Ambiophone varies only slightly with height incidence, so physical tilting is inconsequential for FL,FR channels. The same applies were a sphere or ORTF microphone used for FL,FR.

From experience, recording engineers might identify applications described below for each of the six modes (keeping in mind they can be changed in post or replay):

i The microphone array is placed at source level (L, R), below acoustic shell reflections (C), e.g. an outdoor amphitheater event, with audience low and behind (SL, SR) and raked upward (SC).

ií The array is on a high stand or hanging in an opera house or symphony hall, the orchestra widely spaced in a pit or strings downstage (L, R), singers or winds upstage (C), hall ambience back (SL, SR) & up (SC).

j The array is more closely placed before a small ensemble at source level for direct sound and early floor and sidewall reflections (L, R), higher direct solo and ceiling reflections (C), and hall ambience from back-up (SL, SR) and back-down (SC).

jí The array hangs closer to a proscenium to pickup downstage sounds (L, R), upstage drama (C), high-back ambience (SL, SR), and audience (SC).

k The microphone array is in an arena with sports play-action or musical instruments at microphone level (L, R), and with good high-front (C) and back (SC) crowd sounds or ceiling ambience.

kí The array is suspended in a cathedral with upstage choir (C) and front-of-church organ divisions and floor reflections (L, R), antiphonal and congregation in back (SL, SR), and organ trumpet overhead (SC).

6.4. Tri-play 3D/2D Transformation

After recording six PerAmbio 3D channels, given as {Pin} in 6x1 matrix form, a transformation matrix {S} for standard 5.1/6.1 surround is applied to obtain the 6 ITU-compatible media channels {Sout} as follows:

{Sout} = {S} {Pin}

For a standard ITU home theater surround system, a multichannel disc (6 discrete channel DVD-A, SACD, or DTS-CD/DVD-V) plays {Sout} directly in 5.1/6.1. If the speaker layout is 5.1, current implementations sum SC information into SL and SR speaker feeds at ~3dB.

When the user augments his/her system for fully periphonic 3D, a reconstitution matrix {P} is applied, implemented in DSP in response to flags in metadata that select one of six recording modes to recover losslessly PerAmbio 3D in matrix form {Pout} as follows:

{Pout} = {P}x{Sout}

Since matrix {P} is the inverse of matrix {S},

{Pout} = {S}-1 x{Sout}

PerAmbio 3D reconstitution is lossless if {Pout} = {Pin} Experiments [7] have led to simplifying matrices to 4x4, shown in Fig. 9 & 10 for six transformation modes. For reconstitution, Fig. 11 & 12 also show signal-to-noise ratio loss of 1.4 to 3dB, a cost of less than 1 bit in 24.

Fig. 9.  4x4 matrix {S} transforms six-channel PerAmbio 3D into one of three modes i, j, or k for ITU 5.1/6.1 replay without a decoder, or with a converter back to PerAmbio 3D. Selection is reversible in post or replay.

Fig. 10.  4x4 matrix {S} transforms six PerAmbio 3D channels into one of three tilted modes i, j, or k for standard ITU 5.1/6.1 replay without a decoder (30° tilt shown). Selection is reversible in post or replay.

6.5. 3D/2D Mastering & Release

Are the efforts of true 3D reproduction justified by subjective results? So far in its development, the answer regarding PerAmbio 3D/2D is yes; its life-like, natural sound has been described by attendees at demonstrations (recording engineers and musicians; see comments below) as "blowing away the walls [of the listening space]" and transporting them to the recording venue.

As with any new technique, recording engineers will find that some relearning is necessary. Microphone positioning, choice of Transformation mode (changeable in post), and use of (need for) spot microphones have been mentioned above. Work is ongoing to refine the microphone array is used. Generally, channel level calibration is similar to 5.1/6.1.

Different from 5.1/6.1, timbral robustness of PerAmbio 3D/2D varies less with recording venue. For example in reverberant recordings, coloration of stage (direct) sound is less prominent because the diffuse sound captured in 3D has more influence [21].

While correct stage localization is contributed partly by Ambisonics, it is the Ambiophonic contribution that dominates, but only for those listeners on the median plane. Ambiophonics degrades away from it, but there, overall timbre is largely unaffected due to Ambisonics' predominance, so listeners off median do not feel deprived. For them, the listening area is actually a larger proportion of the listening space than for 5.1/6.1 since any one surround speaker is operating at lower level and therefore may be more closely approached before it is localizable. Timbre and spatial effects are quite listenable even outside the speaker sphere.

Transformation may in the future be implemented in a hardware encoder or audio workstation software, as currently implemented by the author. Similar to common practice, monitoring in stereo, 5.1/6.1, and 3D will reveal the need for changing the transformation mode, adjusting relative levels of stage v. ambience components, and other compensations.

After mastering, the tri-play recording can be released on DVD-A, SACD, or DTS-ES Discrete CD or DVD-V. DVD-A standards are met because the six channels are the speaker feeds. 6 channel broadcast e.g. for HDTV is possible when AAC is implemented.

<< Previous Page | Next Page >>

Article Pages 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8