|The Science of Domestic Concert Hall Design|
by Ralph Glasgal
AES 24th International Conference on Multichannel Audio 1
Concert Hall Acoustics For Posterity
3. AURALIZATION OF THE MEASURED DATA
This chapter analyzes the possibility of employing the results of these measurements for creating audible presentations of the acoustical behavior of the original rooms, to listeners exposed to an artificial soundfield, by means of headphones or loudspeakers.
The basic method for auralization is convolution: the impulse responses are employed as very long FIR filters, applied to dry (anechoic) recordings of music or speech. Convolution is a very efficient filtering technique, particularly if implemented with proper (old) algorithms on fast (new) processors: as clearly demonstrated in , a PC equipped with a last-generation processor can perform the real-time, low-latency convolution of dozens of channels with multiple impulse responses of hundreds of thousands of coefficients each. And the performances obtained with the simpler algorithms initially developed in the sixties  are better than those obtained with more recent developments , which appear to be preferable from the point of view of the total number of multiplications required, but are much less optimized for the memory-management architecture of modern processors.
The goal of this research is to create sets of impulse responses suitable for being employed by these software convolvers, creating the results in any of the currently available formats suitable for multichannel reproduction, and attempting to recreate as faithfully as possible the spatial attributes of the original soundfield.
3.1 ORTF-stereo impulse responses
This is the most basic processing, aimed at the creation of a žstandardÓ stereo presentation of the results of the auralization. The process is based on the availability of a number of dry mono recordings, one for each section of the orchestra or for each singer.
Each mono recording has to be convolved with a specific stereo impulse response, obtained by the pair of cardioid microphones in ORTF configuration. In principle, each of these impulse responses should be measured with the proper position of the sound source. In reality, the measurements are typically performed with just three positions of the source on the stage (Left, Right, Center), and this limits the number of independent žvirtual sourcesÓ which can be placed on the sonic scene.
In practice, however, it is possible to take advantage of the fact that, for each source position, the ORTF measurement was performed with 36 different orientations of the microphones (in 10 steps). This means that some minor adjustment of the virtual source position (by 10 or 20 degrees) can be obtained by selecting the ORTF impulse response coming from an orientation different than 0É. This of course is not perfectly rigorous, but is effective and subjectively undistinguishable from convolution with ORTF impulse responses measured with microphone orientation at 0É and true displacement of the source.
Of course, the results of the convolution of all the dry recordings are summed in a single stereo output file, which is suitable for reproduction in a normal stereo system (2-loudspeakers).
3.2 Binaural impulse responses (binaural room scanning)
The basic binaural approach is substantially the same as for the previous ORTF-based method, but employing the binaural IRs. This way, the result of the convolution is a 2-channels file, suitable for headphone reproduction.
However, two methods can be employed for substantially improving the surround effect obtained: for loudspeaker reproduction a proper cross-talk cancellation must be added, and for headphone reproduction an head-tracking sensor can drive a realtime convolver, switching the impulse responses being convolved as the listener rotates his head. Regarding the creation of optimal cross-talk cancelling filters, and optimal layouts for the loudspeakers employed for the reproduction, several papers were published in recent years [17,18].
Regarding instead the head-tracking real-time processing, some solutions were proposed by LakeDsp  and Studer , but requiring dedicated and expensive DSP-based workstations. The authors are working at a new, low-cost system for real-time auralization, making use of a game-quality head tracking system and a new, high efficiency, low latency convolution software.
3.3 B-format impulse responses (Ambisonics)
In this case, each dry mono source is convolved with the proper B-format impulse response. So, after the mixing of all these convolutions, a 4-channels B-format output is obtained.
The reproduction of a B-format signal over a suitable array of loudspeakers requires an Ambisonics decoder, for computing the proper feed for each speaker.
The creation of a software-based decoder has been pioneered by one of the authors , and has been further perfected by colleagues at the University of York, who recently released for free a suite of VST plugins , allowing for manipulation and decoding of B-format signals over various loudspeaker rigs.
In conclusion, the Ambisonics auralization simply requires the availability of a multichannel convolver (with 1 input and 4 outputs), a B-format mixer, and a Bformat Ambisonics decoder. The first tool is being developed by Waves, the second and third tools are already available from .
3.4 ITU 5.1 surround (from selected B-format impulse responses)
The basic approach for ITU 5.1 rendering is to first select a configuration of microphones to be employed, for driving the 5 main loudspeakers . Many of these microphone arrangements have been proposed, and in a recent round-robin project, called the Verdi project, most of them were comparatively evaluated .
The following pictures show the microphone configurations for these three setups:
For each of the above setups, it is possible to select a subset of 5 of the 36 positions where the Soundfield microphone was displaced, corresponding as close as possible to the intended positions of the chosen setup. Then, from the B.format impulse response measured in each of these 5 selected positions, a single (mono) impulse response is extracted, thanks to the program Visual Virtual Microphone, developed by David McGriffy and freely available on the Internet . Fig. 19 shows the userŪs interface of this program, when employed for extracting the hypercardioid response for the R channel of an OCT setup from the B-format impulse response coming from the 60É position, and with the sound source on the left of the stage.
It must be noted that the measurements performed with the rotating Soundfield microphone inherently assume a clockwise angle (due to the fact that the rotating table only turns in this way), whilst usually in surround-sound applications a counter-clockwise angle is employed.
As the microphone in this position was already tilted 50É on the right, and OCT mandates for an orientation of the Right supercardioid of 90É, a further rotation of 40É has to be implemented in the Visual Virtual Microphone program.
In the case where the chosen microphonic setup requires a microphone position which is not actually lying over the 1m-radius circumference, it is possible to use the WFS method (par. 3.6) for extrapolating the impulse response in the required position.
Finally, each mono dry source is convolved with the 5- channels impulse response derived from the corresponding sound source position over the stage, and the results of all these convolutions are mixed in a single final 5-channels track, which is suitable for reproduction over a standard ITU loudspeaker rig.