Header image  
by Ralph Glasgal
 
line decor
Home Tutorials Tech
Papers
Kudos and
Pictures
Demos Bio Free Ambio
book
Glossary The Home
Concert Hall
PC/Mobile
Applications
Rec Engineers
Corner
FAQ/Forum Links Contact us
line decor
e WIFR Structure

Ambiophonic Principles for the Recording and Reproduction of Surround Sound for Music - Part 5

Angelo Farina, Ralph Glasgal, Enrico Armelloni, Anders Torger

3.4 Subjective Comparison

The audible performances of the two digital filtering techniques were compared in a blind subjective test. 14 normal-hearing subjects were employed, aged between 20 and 36, 6 were females. The subjects were not trained in listening tests, nor did they know anything about the research and the goals of the experiment. Each subject was comfortably seated at the "sweet spot" in front of the Stereo-Dipole loudspeaker pair. He was given control of the DSP unit through two selection buttons, which were labeled A (FIR) and B (WFIR). A CD player generated the test signals (binaural recording of natural sounds on the beach, and of music inside a car compartment). The listener was free to switch in any moment between A and B filters. He had to fill in a questionnaire containing 7 attributes, rating each of them on a 5-levels scale (insufficient, mediocre, sufficient, fair, good), for both A and B systems.

The results were analyzed using classical ANOVA [16] (performed thanks to the Excel analysis toolpack). The following table presents the statistical results (the 5% critical F-value was 4.2252, which means that values greater than it indicate that the difference between A and B is significant).

Question

Avg. A

Avg. B

Anova's F Factor

Prob.

Overall appreciation

3.57

4.79

34.47

0.00%

Image localization

3.79

4.36

4.38

4.63%

Stage width

3.50

4.71

21.72

0.01%

Naturality

3.71

4.57

10.88

0.28%

Low frequency resp.

3.29

4.36

11.56

0.22%

Mid frequency resp.

3.79

4.07

1.60

21.7%

Hi frequency resp.

4.14

4.43

0.98

33.1%

Also the probability that A and B responses are the same was computed; the ANOVAís results can be seen in graphical form in fig. 17. From the table above and from fig. 17, it is clear that system B (WFIR) was significantly better than system A in questions 1, 3, 4 and 5. The significance is at limit for question 2 (prob. 4.63%), and there is no substantial difference in question 6 and 7. This means that the WFIR is globally better, and, particularly, because it widens the stereo image, it is more natural, and has deeper low-frequency response. Some subjects reported also that system A is drier, whilst system B is softer (and this is certainly due to the time smearing already mentioned in the previous section).


Fig. 17 ñ Anova of the subjective responses

3.5 Generalization to other stereo recordings

The above described procedure requires, in principle, that the same dummy head be employed, both during the recording of the stereo soundtrack, and for measuring the h impulse responses from which the cross-talk canceling filters are to be computed.

The obtained inverse filters are independent of the listener (each listener will add his own HRTF signature to the received sound), but they depend strongly on the listening setup (loudspeakers, room) and on the binaural microphone employed.

The last fact can be a problem for the reproduction of pre-recorded material available on most commercial CDs: in fact, only a small number of them were recorded with a binaural dummy head (in many cases the Neumann KU-100), and the vast majority consist of stereo recordings which can be categorized in one of two very different genders:

- coincident
- spaced

In the first case, only level differences appear between the two channels, and the recording is conceptually derived from two directional microphones coincident in space and with a certain angular divergence between their maximum sensitivity directions (Blumlein approach). In the second case, the signals come from two microphones placed at a relative distance similar to the human ears (approximately 170-180 mm), and thus they exhibit significant time misalignment between the two tracks, often with some further level difference caused by a physical obstacle between them (for example a rigid sphere) or by the directivity patterns of the microphones (which should in principle mimic the low-order HRTF spatial response). The first category also includes "virtual" stereo mixes, obtained by pure level-panning of monophonic recordings.

The spaced recordings are perfectly suited for reproduction with the cross-talk cancellation method, provided that the corresponding inverse filters are computed with the same microphone employed for the recordings. For this work, just two extreme cases were considered, namely the ORTF microphone (Schoeps MSTC64, 170 mm spaced cardioids with 110ƒ aperture) and the sphere microphone (Schoeps KFM-360 or KFM-6). Usually recordings done with spaced microphones are easily identified as such, and often details on the microphone type and placement are specified on the CD cover.

The vast majority of released CDs fall in the category of coincident recordings, albeit most of them are really studio-made amplitude mixes of spot miked multichannel recordings. One could think that the lack of interchannel delay impedes proper cross-talk cancelled reproduction of these recordings: instead it turns out that delivering these signal through a "moderate" cross talk cancellation field yields a realistic spatial imaging (although not comparable with spaced recordings). "Moderate" cross talk cancellation refers here to the typical effect obtainable by mechanical barriers, instead of by means of digital filters.

As clearly demonstrated by R. Glasgal [17], a mechanical barrier placed between the loudspeakers in the Stereo Dipole configuration, and extending near the face of the listener, provides quite effective cross-talk cancellation at high frequency (above 1 kHz), and progressively much less cancellation towards low frequency, with no separation at all under 200 Hz. This arrangement seems to provide suitable localization cues at high frequency (where the spatial imaging is governed mainly by level difference between the two ears), and preserves the traditional cross-talk based imaging at low frequency, where there is not any phase difference encoded in the source material, and this difference has to be recreated by the diffraction around the head of the listener.

In conclusion, 4 different sets of inverse filters can be created, each of them specifically suited for a different recording technique: binaural, sphere, ORTF and coincident (M/S Blumlein). Selecting the optimal set of filters, almost any kind of recording can be reproduced successfully over a Stereo Dipole with cross-talk cancellation.

<< Previous Page | Next Page >>

Article Pages 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10