Header image  
by Ralph Glasgal
 
line decor
Home Tutorials Tech
Papers
Kudos and
Pictures
Demos Bio Free Ambio
book
Glossary The Home
Concert Hall
PC/Mobile
Applications
Rec Engineers
Corner
FAQ/Forum Links Contact us
line decor
e WIFR Structure

Ambiophonic Principles for the Recording and Reproduction of Surround Sound for Music - Part 8

Angelo Farina, Ralph Glasgal, Enrico Armelloni, Anders Torger

5. IMPLEMENTATION

5.1 Hardware Implementation

From the theory discussed in the previous sections, it is clear how the complete Ambiophonics system can be implemented simply by means of multiple convolution of the original input signals with a number of impulse responses. A typical system can have, for example, 10 loudspeakers: two for the frontal stereo dipole, and 8 for a three-dimensional surround array.

In the most common case of a stereo (two-channel) source recording, each of these ten loudspeakers needs to be fed with the mix of the results of the convolution of the two input channels with two loudspeaker-specific impulse responses: these are cross-talk canceling filters in the case of the first two loudspeakers, and three-dimensional room IRs for the other 8 loudspeakers.

This means that, in principle, there is no need to differentiate the processing of the first two channels from the other: and in fact both the currently available software solutions do not differentiate between them in any way. But when the system is implemented by means of hardware digital convolvers, it can be useful to exploit the differences between the filters used to convolve: it was shown in section 3, for example, how a low-cost DSP board can be used for cross talk cancellation by frequency warping the FIR coefficients.

Another possible implementation makes use of a general-purpose multichannel DSP unit (Soundweb by BSS) for performing the cross-talk cancellation in real-time. This versatile machine can easily be configured for processing a cross-talk cancellation network based on FIR filters, by means of its very friendly graphical programming environment, as shown in fig. 28.


Fig. 28 Soundweb cross-talk cancellation network 

In any case, the other 8 channels of room convolution need to be produced, again in real-time and with no processing latency, by means of hardware DSP convolvers. Nowadays the only units capable of doing this are the Lake DSP workstations and the Sony DRE-S777 processor, the JVC XP-A1010 processor having been discontinued.

Regarding the Lake DSP platform, a high-end Huron system is required for performing in real-time 16 convolutions with filters of suitable length (typically 128 kpoints at 48 kHz). The Lakeís Huron system is highly versatile and easily configurable, and allows for easy substitution of the IRs; its only practical limit, apart from the cost, is the limited S/N ratio, principally due to low quality AD and DA converters; although they can be bypassed through the use of SPDIF digital interfaces, the internal processing is still done with fixed-point math, and thus the dynamic range and linearity are inherently limited compared with todayís audiophile standards (24 bits resolution). It must be noted that, although the input signal can suitably be limited to 16-bits, the convolution process dramatically enlarges the dynamic range, restoring the very extended response to transients (particularly at a sudden end to the sound) which can be experienced in a real concert hall; this means that its output needs the full currently available 24 bits.

In this respect the Sony convolvers are actually state of the art, ensuring a completely uncompromised signal path and an outstanding signal purity. Their main defect is that these units were designed as twochannels units: an additional DSP board can be inserted for obtaining a four-channels system, but for obtaining 8 channels two units are needed.

The second limitation of the Sony convolvers is that they can employ only the impulse response sets contained in the special Sony CDs. One cannot load on these machines user-defined impulse responses. For at least three of the top ranked concert halls, Sony supplies multichannel impulse responses (apparently measured with substantially different microphone orientation and positions, not deriving them from a B-format IR as explained here in section 4), which can be successfully employed for Ambiophonics. The editing possibility contained in the convolvers is limited, and so the adaptation of the IRs to the reproduction space can be done only partially.


Fig. 29 Hardware Ambiophonics system

In conclusion, an audiophile-grade complete Ambiophonics system can be built with hardware parts commonly available on the market, namely a BSS Soundweb for driving the Stereo Dipole and two 4-channels Sony DRE-S777 convolvers driving 8 surround channels, as shown in fig. 29. It must be remembered that a suitable listening room (with very little reverb and complete absence of other defects) is required, along with a good pair of loudspeakers for the Ambiopole; the other surround loudspeakers, on the other hand, are less demanding, and thus can be cheaper.

5.2 Software Implementation

At the time of writing, three possible solutions are known for obtaining Ambiophonics listening by means of a computer equipped with a high quality, multichannel sound board. These are described here only very briefly.

The first solution, which has been available now for some years, is based on the use of the CoolEditPro software. The soundtracks must be computed in advance, and stored on the hard disk inside a multichannel session of CoolEdit. Then the waveforms can be played at will. The main advantage of this approach is that, since the computation is done off-line, even a very slow computer can be employed (there is not any real-time constraint). Of course, it takes a while to convolve the original stereo waveforms with all those IRs, and to save the results as separate WAV files on the hard disk. The convolution can be done by means of any of the currently available plugins, including the one already shipped with CoolEditPro.

The second solution is based on the Ambiovolver software, developed by J.J. Lopez [28]. It is a realtime convolver, which employs the very efficient Intel FFT routines, though using the traditional select-save algorithm [29]. This solution comes in a WIN32 executable, with graphical user interface, and is capable of sustained convolution of two input channels with 10 stereo IRs, driving ten output channels. Depending on the computational power available, up to 128 kpoints long IRs can be employed. The main disadvantage of this software solution is that it causes a substantial delay between input and output (approximately twice the length of the IRs), which is not a problem for listening to prerecorded material, but which is a serious problem in other applications (realtime acoustic display, synchronized audio-video virtual reality, etc.).

The third solution is the BruteFIR software by A. Torger [30]: it is GNU public domain software developed under Linux, which substantially does the same things as Ambiovolver, but employs a very clever convolution algorithm: the partitioned convolution scheme pioneered by Stockham [31] and refined by Soo and Pang [32].

This algorithm is substantially based on the selectsave algorithm, but the IR is partitioned in many blocks of the same length, each of them being convolved separately. This reduces the overall latency to twice the length of these sub-blocks, which is significantly shorter than the whole IR lengths (typically 8 or 16 partitions of the IR are used). The typical latency is 100 - 400 ms, thus low enough for interactive use, but not low enough for synching with a zero delay video stream.

In theory, this should result in increased computational load (although significantly less than with zero-latency convolution achieved through hybrid filtering, as employed by Lake DSP and described by W.G. Gardner [33]), but in practice, since the implementation is very well suited to the memory management and parallel processing capabilities of modern processor architectures, it ends up being up to twice as efficient as the traditional unpartitioned select-save algorithm.

The extra power made free by this clever algorithm can be employed to run at a higher sampling rate (96 kHz), or to drive more channels (16 surround loudspeakers instead of 8), or even for processing source signals with more than two channels. In fact BruteFIR can be configured to process any number of inputs, and it could be advisable to start with a 6 channel recording (SACD or DVD-audio).

In this case, the best strategy for Ambiophonics reproduction may be to put the L and R channels through the cross-talk canceling filters, and mix the center channel (with proper delay corresponding to half the length of the cross-talk filters) on both the frontal loudspeakers. The surround loudspeaker array is then better driven by the surround channels of the original recording, although these usually already contain a large amount of room reflections and reverb, and thus the convolved IRs must be very short in this case; but they are still required because they ensure proper inter-loudspeaker relationships which build up the three-dimensional sound field.

Extending the theory already developed in section 4, the room IRs required in this case should have been measured with two sound source positions, in the original theatre, placed behind the microphone, at the same azimutal angular positions corresponding to the theoretical position of the "surround" loudspeakers in a 5.1 system (+/- 110ƒ). These sound source positions should also be quite close to the microphone, so that the reverb-to-direct ratio in the measured IRs is small, and thus their convolution with the already reverberated surround channels of a 6-channel recording would not cause too much reverb in the reproduction space.

Depending on the recording, however, it could also be that the main channels L and R need to be included in the surround reproduction, being convolved with full-length room IRs as detailed in section 4. BruteFIR can easily accommodate these complex scenarios, because it can handle several input streams and do the mixing process very efficiently in the frequency domain, so that the number of forward and inverse FFTs is not increased.

It must be said that at this time the number of available 5 or 6-channel recordings is too small for allowing the evaluation of the optimal strategies for processing them Ambiophonically: nevertheless the system is potentially open to multichannel recordings, and has the potential for overcoming all the known limitations of 5.1 systems (lack of lateral sonic images, horizontal-only surround, presence of cross-talk related artifacts).

In conclusion, although at present Ambiophonics is usually obtained by means of hardware convolvers, in the near future complete systems will be built at very little cost thanks to standard PCs and multichannel sound boards. These software solutions circumvent all the limitations and the sound quality degradation typical of todayís hardware solutions, and should allow also for the Ambiophonic playback of 6-channels recordings.

<< Previous Page | Next Page >>

Article Pages 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10