Header image  
by Ralph Glasgal
line decor
Home Tutorials Tech
Kudos and
Demos Bio Free Ambio
Glossary The Home
Concert Hall
Rec Engineers
FAQ/Forum Links Contact us
line decor
e WIFR Structure

Ambiophonic Principles for the Recording and Reproduction of Surround Sound for Music - Part 6

Angelo Farina, Ralph Glasgal, Enrico Armelloni, Anders Torger


In most cases, the stereo recording provided on commercial CDs is obtained with good spatial information related to the position of the sound source and their direct soundfield, but very little "ambience" information is encoded in the source material. In fact, in traditional stereo reproduction, it is quite annoying to hear, superimposed on the direct sound coming from front, the discrete reflections and reverb which should arrive from the sides, above and behind the listener in a good concert hall.

So sound engineers tend to place their microphones very close to the sound source, and to shield them from "annoying" reflections and reverb coming from the back of the room. A realistic replication of a music listening experience requires that the whole three-dimensional sound space be reconstructed in the reproduction space.

This space can be obtained with the Ambisonics technique, although with very limited definition or localization of the direct sound source location. The basis for the Ambisonics method is the description of the spatial properties of the sound field by means of the B-format signal: it is a 4-channel signal, obtained from the sound pressure captured by an omnidirectional microphone (called W) and by the three first-order spherical harmonics of the sound pressure field, corresponding to the response of three figure-of-eight microphones aligned with the axes of a 3D Cartesian reference system (called X, Y and Z).

It must be clear here that the inventors of the Ambisonics technique [2,18] did not know anything about modern energy analysis of sound fields [19, 20, 3], although their very original theories were in a certain sense anticipatory of these subsequent modern developments. In practice, the spherical harmonics of the sound pressure field were easily confused with the Cartesian components of the particle velocity vector, as they are coincident, following the Eulerís equation, in the case of plane, progressive waves. In a generic sound field, though, pressure and velocity exhibit significant phase and gain mismatch, and these quantities should not be confused at present.

This is not a problem for "synthetic" B-format signals, obtained by panning a single mono track with proper gains, computed simply by using the values of the cosines of the angles between the intended direction of the sound with the three Cartesian axes; it is though a great problem for signals captured from a "true" sound intensity probe, which captures the real physical quantities (pressure and particle velocity components). The widely employed Soundfield microphone has a strange, intermediate behavior: it is close to a true sound intensity probe when the wavefront has little curvature, but deviates from it in the cases of strong curvature. However, it does not achieve the theoretical behavior of a pure cosineweighted pressure microphone.

In a generic, reactive sound field, deriving accurate three-dimensional information from measurements done with a Soundfield microphone is not an easy task: although not corresponding to the definition of a B-format signal, it is simpler to process true pressurevelocity recordings obtained with a sound intensity probe, because the relationship between these physical quantities is mathematically known. But the signals coming out from a Soundfield mike are not so easily interpreted, because the published theory describing its behavior [18] is valid only with plane, progressive waves.

The process described here for creating a threedimensional soundfield surrounding the listener is called "Virtual Ambisonics" because it is not based on native B-format recordings, but on B-format signals reconstructed by convolution of the original stereo recordings with B-format impulse responses.

As the B-format signal needs to be "decoded" for feeding a three dimensional loudspeaker array, and being that this decoding process is implementable as another convolution with a set of proper decoding filters [21], it is possible to connect the two convolutions into a single one: a set of threedimensional IRs can be derived, which can be used directly as filters, applied by convolution to the original stereo recording, and then use them to drive the loudspeakers in the reproduction array. This combination can be seen as the synthesis of the impulse response obtainable by a virtual microphone, characterized by strong directivity, and pointing in the direction, relative to the listener, of the specific loudspeaker being considered, when the sound field is produced, in the original concert hall, by a sound source located on the stage. As this surround methodology need not be as accurate as the binaural one, just two positions of the sound source can be considered on the stage, as shown in fig. 2, corresponding to a generic "L" and "R" positions. Thus, for each loudspeaker in the reproduction space, two "3D Impulse Responses" are defined, named sL,3D and sR,3D respectively: the speaker feed is obtained as the sum of the results of the convolution of the two original signals with these filters.

4.1 Measurement of 3D Impulse Responses in theatres and concert halls

M.Gerzon [22] first proposed to start a systematic collection of 3D impulse responses measured in ancient theatres and concert halls, for assessing their acoustical behavior and preserving it for the posterity. His proposal found sympathetic response only very recently, with the publication of the "Charta of Ferrara" [23] and the birth of an international group of researchers who agreed on the experimental methodology for collecting these measurements [24].

Only a small number of theatres have yielded a complete three-dimensional impulse response characterization up till now. Among them, we employed for the present work the IRs measured in three Italian theatres:

- Gran Teatro La Scala in Milan
- Teatro Comunale in Ferrara
- Teatro Verdi in Trieste

The following table reports the main technical data regarding the measurement technique employed in each of these three rooms:


La Scala Milano

Comunale Ferrara

Verdi Trieste



Look Line

Look Line

Excitation Signal

MLS order 16

Log sweep 5 s

Log sweep 15 s


7 stacked Positions of a B&K ‡" type 4166

3D sound intensity probe (B&K type wa0447)

Soundfield MK-V

Sound Board


Echo Layla

Echo Layla

Sampl. Rate 60606 Hz  44100 Hz 44100 Hz

The measured three-dimensional impulse responses of these three theatres can be downloaded from: HTTP://pcangelo.eng.unipr.it/public/AES19.

Figs. 18, 19 and 20 show, for each theatre, a schematic plan of the room with the positions of the sound sources and of the microphone.

Fig. 18 Plan of La Scala in Milan

Fig. 19 Plan of Teatro Comunale in Ferrara

Fig. 20 Plan of T. Verdi in Trieste

It must be noted that La Scala was in opera configuration, but the other two were in concert configuration, with a reflective orchestra shell mounted on the stage.

Some details are required regarding the three different kinds of microphones employed in these three rooms. In La Scala, a "virtual" 7-omnis microphonic array was employed [25], obtained by moving a single omnidirectional pressure microphone (B&K type 4166) into 7 close positions, and measuring a separate impulse response at each of them.

Fig. 21 7-omnis microphonic array

The geometry of the array is shown in fig. 21. From these 7 IRs, it is easy to extract 4 processed IRs, the first being simply the pressure response in the central microphone, and the other three being the particle velocity components along the three axes computed by means of the classic Eulerís relationship, with the finite differences approximation commonly employed in sound intensity analyzers:

The same approach is employed for deriving pressure and velocity components from the 6 IRs measured at the Teatro Comunale in Ferrara by means of the three-dimensional sound intensity probe B&K type WA0447, which is shown in fig. 22.

Fig. 22 B&K type WA0447 sound intensity probe

Fig. 23 The Soundfield MK-V microphone

In this case there is no central pressure microphone, so the pressure signal has to be derived simply as the arithmetic mean of the 6 signals measured around the central virtual position. It must be noted, however, that this fact introduces some minor artifacts, as the 6 signals summed together are not perfectly coherent, and this introduces some high-frequency amplitude fluctuation, and a certain degree of smearing in the time domain.

In the third case, a standard Soundfield microphone was employed, as shown in fig. 23. This unit is equipped with a special electronic processor, which extracts the 4 signals labeled W, X, Y and Z. As discussed earlier, in very reactive sound fields (close to the sound source, in a small, highly reverberant room) none of these three microphonic probes produces exactly the theoretical B-format signal (spherical harmonics of 0th and 1st order of the pressure field). But in the halls studied here, the sound source was very far away and the room was quite dry compared to its huge size (Italian theatres are known for their low reverberation times compared to north-European concert halls of the same size); this is demonstrated by fig. 24, which shows the measured reverberation times in the three theatres.

Fig. 24 Reverberation time of the three theatres

Consequently, it can be assumed that in these three cases the measured pressure-velocity 4-channelIRs are a reasonable approximation of the theoretical Bformat signals, and thus they can be processed with classic Ambisonics-like math for extracting the responses of virtual microphones, with proper directivity patterns, and pointing in any desired direction.

<< Previous Page | Next Page >>

Article Pages 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10