Header image  
by Ralph Glasgal
 
line decor
Home Tutorials Tech
Papers
Kudos and
Pictures
Demos Bio Free Ambio
book
Glossary The Home
Concert Hall
PC/Mobile
Applications
Rec Engineers
Corner
FAQ/Forum Links Contact us
line decor
e WIFR Structure

Ambiophonic Principles for the Recording and Reproduction of Surround Sound for Music - Part 7

Angelo Farina, Ralph Glasgal, Enrico Armelloni, Anders Torger

4.2 Derivation of directive microphone responses

The first point to clarify here is that the Soundfield microphones adhere to the old-style B-format standard, in which the W channel has a gain reduction of 3 dB compared with the other three signals XYZ. To make use of uniform notation, it is assumed here that this 3 dB gain reduction is immediately compensated for, and thus all three microphone probes result in the measurement of 4 IRs with the same absolute gain for all 4 channels.

The basis for the synthesis of a virtual microphone from the B-format signals is the fact that combining the response of an ominidirectional microphone (W) with a figure-of-eight microphone (X, for example), a cardioid response is obtained, as shown in fig. 25. If the gain of X is reduced in comparison with W, the response becomes sub-cardioid, but if the gain of X is greater than the gain of W, a hypercardioid response is obtained. This same fact is employed in the control unit of the Soundfield microphone, which allows for the recreation of the signal of two virtual microphones with selectable directivity patterns.

If the virtual microphone has to point along a generic direction described by its unitary vector r , the response of the single figure-of-eight microphone X has to be replaced by a linear combination of the three signals XYZ, employing the directional cosines of RG as weighting factors. Thus the response V of a generically-oriented virtual microphone can be computed as:

In which the directivity factor D can assume these values:

D=0.0   omnidirectional
D=0.5
  subcardioid
D=1.0
  cardioid
D=1.5
  hypercardioid
D=2.0
  figure-of-eight

The above relationship (8) makes it easy to derive the proper impulse response corresponding to the position of each loudspeaker in the reproduction array, by post-processing the B-format IR measured in the theatre. This has to be repeated, of course, for both the B-format IRs, measured from the two source positions inside the theatre (L and R). For a reproduction array of 8 loudspeakers, for example, 16 synthetic IRs are obtained, and saved as 8 stereo waveforms (one for each loudspeaker).


Fig. 25 Synthesis of directive patterns

The feed for each loudspeaker can thus be derived simply convolving the stereo original recording with the stereo IR (LčIRL and RčIRR) and summing (mixing together) the results.

By trial and error, it was found that the optimal value of the directivity factor D is approximately equal to 1.4 (hypercardioid), because this way each derived IR is much less correlated with the others. This corresponds approximately to the maximization of the field indicator rE, as suggested in [26] by J. Daniel for optimizing the decoding of B-format recordings. This also corresponds roughly with the method suggested by Okubo [27].

4.3 Modification of the impulse responses

For pure Ambisonics reproduction, the impulse responses derived in the previous section are theoretically perfect, provided that the reproduction space is almost completely anechoic.

This was verified by a direct comparison between a live B-format recording made in the Teatro Comunale in Ferrara of a piano concert, which was then compared with the virtual reconstruction obtained by convolution of an anechoic recording of the same music piece with the impulse responses derived by the B-format measurement made with the sound source and the microphone placed exactly in the same positions as during the live performance. 13 of 18 listeners were unable to detect the difference between the live recording and the virtual one, and those who were capable of detecting a difference, were unable to reliably rank their preference for either one of the two recordings.

These tests were performed in the listening room of the University of Ferrara, making use of students of the Engineering Faculty as subjects (thus they were not sharped-ear musicians or audiophiles, and this can partially explain their inability to identify the difference between the two sound samples). It was concluded, however, that the virtual implementation of Ambisonics by convolution is at least as good as the "live" Ambisonics recording/playback, and thus it is generally preferable, requiring the recording of just two channels instead of four, and producing a much wider dynamic range (the background noise of the XYZ channels is strongly reduced by the MLS or sine sweep measurement methods, and after convolution these channels have wider dynamic range than the corresponding channels coming from live recording).

But for the use of the Ambisonics array as a complement to the Stereo Dipole inside the Ambiophonics system, the derived three-dimensional impulse responses have to be processed: first of all, they do not need to reproduce the direct sound or the early reflections coming from the orchestra shell on the stage, because these are already included on the original stereo soundtrack, and are being reproduced with much finer detail by the frontal loudspeaker pair driven through the cross-talk canceling filters.

This means that the first part of the impulse response must be silenced (not cut away), to preserve the proper delay of the subsequent reverberant tail in relationship with the direct sound being reproduced by the stereo dipole.

In reality, the proper time alignment between the signal being reproduced through the stereo dipole and through the surround array must be checked, taking into account two other facts:

- the position of the main microphone during the original stereo recording is usually much closer to the sound source than any real listener in a concert hall, particularly looking at the place where the three-dimensional IRs were measured, as shown in fig. 18, 19 and 20.

- the cross-talk canceling filters introduce a significant delay (approximately half their length), and thus they tend to partially compensate the previous statement, causing a substantial increase of the source-receiver apparent distance.

It seems, however, that a few milliseconds of error in the delay of the reverberant tail with respect to the direct sound does not cause anything harmful, although, of course, the perceived distance from the stage is slightly changed.

The second modification required is a proper amplitude-shaping of the IRs. This is due to the fact that the reproduction space is not anechoic, and thus its reverberation (h) tends to add to the reverberation of the original theatre (s). We consider a simple exponential model of the two IRs, in the form:

in which Th and Ts are respectively the reverberation time of the original theatre and of the reproduction space, and w(τ) is a white noise random process.

During the reproduction, the two IRs are convolved together, resulting in a global IR having a longer reverberation time:

It must be noted that the convolution of two purely exponential decays is not another exponential decay, as clearly shown in fig. 26: the resulting impulse response exhibit a complex shape, with an initial part during which the amplitude of the signal increases, followed by a decay with a non-constant slope.

It can be seen from fig. 26 that the final slope of the convolved impulse response asymptotically tends to the slope of the original IR: in fact the EDT value is significantly increased (1.46 s), whilst T30 (1.05 s) is only slightly larger than the original theoretical value (1.0 s). In practice, the most important effect is that there is much more energy in the reverberant tail (after 1.9 s the backward-integrated curve is approximately 6 dB higher): this means that the values of most important early-to-late energy ratios have been substantially altered. The value of Center Time, for example, is 156 ms instead of the original 73 ms.

It might seem straightforward to solve the above problems by creation of a mathematically exact inverse filter: in principle, both the Mourjopoulos [9] and Kirkeby [6] theories allow for the creation of inverse filters, which can be convolved with the signal removing the effect of h(τ). Unfortunately this is true only at the exact point where h(τ) is measured. At all other points of the reproduction space the convolution with these inverse filters cause even more reverberant energy to be added.


Fig. 26 Convolution of two purely exponential decays 
(Ts=1s, Th=0.5s), resulting in a nonexponential decay

The only viable solution is thus an empirical editing of the impulse response s(τ) before this is convolved with the signal to be reproduced, so that the result of the subsequent reproduction in the listening room maximally resembles the IR of the original space. In the case of the theoretical signals shown in fig. 24 and described by the equations (9), it is necessary to apply to s(τ) an amplitude modulation described by the shape shown in fig. 27.


Fig. 27 Amplitude shaping of s(τ)

It must be clear that with real impulse responses there is not (yet) any simple theory for optimizing this amplitude shaping: furthermore, as the reverberation time is frequency-dependent, often the amplitude modification of the IRs should be made by means of a time-varying octave-band equalization, which is possible with the standard editing tools of CoolEditPro. So in practice this adjustment is actually done by trial and error, depending on the acoustical properties of the listening room: of course this process cannot solve a major problem of the reproduction space, such as focusing or resonances, it can simply ameliorate the listening experience a little. The optimal solution is always obtained by proper room treatment, employing, if required, bass traps and other sound absorbing devices, ensuring a short reverberation time with uniform spectral behavior. When the reverberation time of the reproduction space is less than 1/5 of the reverberation time of the original theatre, the effect of the listening room becomes absolutely unnoticeable, being masked by the much larger reverberation of the original space.

<< Previous Page | Next Page >>

Article Pages 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10