• No results found

3 | Literature Review

3.6 Sound Source Signals (Stimuli)

A major factor in this research is the sound source signal used in the listening exper-iments. This section will review the dierent sound source signals that are used in auralisation studies and the possible dierences in the test results. The length of the used signals will be discussed as well. An overview of the used stimuli by dierent studies can be found in Table 3.10.

Table 3.7: Pros and cons, MUSHRA

Advantage Disadvantage

More adequate in discriminating dierences in quality than other test methods. [34]

Multiple samples will be tested at once. This is not ideal when testing measurement/auralisations pairs.

Tests can take less time to perform than when using the Recommenda-tion ITU-R BS.1116 method. [34]

Results of the listening tests ob-tained using MUSHRA may be bi-ased due to stimulus spacing and range equalizing. [54]

Able to display all the stimuli at once, leading to more consistent re-sults and smaller condence inter-vals. [34]

Table 3.8: Pros and cons, AB-Comparison

Advantage Disadvantage

More general assessments usually involve larger dierences and there-fore do not need such close control of the test parameters. [38]

The method has no hidden refer-ence, therefore it is more dicult to control the subject reliability.

The time taken to perform the test using the MUSHRA method can be signicantly less than when using the Recommendation ITU-R BS.1116 method. [34]

-Table 3.9: Pros and cons, 2/3AFC test

Advantage Disadvantage

Lower numbers of judges required because of the high sensitivity to dierences. [55]

The specication of the nature of the dierence between the samples is required. [56]

Higher statistical power when com-pared to similar test as the duo-trio, triangle and same/dierent tests.

[56]

-3.6.1 Comparing dierent source signals

Brinkmann et al. [17], in 2014, used two types of signals for their auralisation authen-ticity study: a pulsed pink noise (0.75s noise, 1s silence) and and an anechoic male speech recording (5s). When the test subjects were asked to detect the dierence between the real and simulated signal, they found that the test subjects scored a signicantly higher detection rate with the pulsed pink noise signal (87.5% to 100%) than with the speech signal (54% to 100%). It was stated that this result was in accordance with earlier studies, although no references were made. They mention

that a possible reason for this dierence is the broadband and more steady nature of the pulsed pink noise signal, which supports the detection of colouration.

The earlier mentioned research of Lindau et al. [40] used both pink noise and an excerpt of a acoustic guitar piece as stimuli for their listening tests. Results showed that the guitar stimuli was sensitive to simulation artefacts.

In another study, Lindau et al. [43] used a male speech, female speech, acoustic guitar, trumpet and drum sample as stimuli. Test subjects were presented with 80 recording/auralisation pairs and were asked to detect the auralisation. The research concludes that the speech and guitar samples were most suited to uncover potential artefacts of the simulation, compared to the drum and trumpet sample which had a signicantly lower detection rate.

In another study, Kosanke and Lindau [42] used only a drum sample for their earlier mentioned research on mixing time (page 15). A drum sample is also used in the study of Lokki and Järveläinen [39], who also used guitar, clarinet and female voice stimuli. Again, the drum stimuli provided auralisations that were less similar to the recordings than the other three. The paper states that "this was expected because drum sounds, being very wide-band transient signals, give no excuses with modelling errors.". The frequency range of a group of samples from the EBU Sound Quality Assessment database was assessed with the spectrogram plots as seen in Fig. 3.3.

Figure 3.3: Spectrogram of four audio samples

For their research on real and virtual loudspeakers (see page 13), Hiekkanen et al.

[32] used male speech and pink noise stimuli, samples of commercial rock and jazz music. The jazz sample especially, was chosen because of its wide spectrum and simultaneous sound source locations in dierent positions. Since the research focussed on the comparison between real and articial loudspeakers, it makes sense to use these music samples because the loudspeakers will most likely be used to play music. Both the jazz and noise sample have more power in the higher frequencies than the male human speech. This leads to a larger audible dierence.

Table3.10:Overviewofusedsoundsourcesignals Research(Pink)noiseHumanspeechGuitarDrumsTrumpetClarinetPopmusicJazzmusic Brinkmannetal.[17]xx(male)--- Lindauetal.[30]xx----x- Hiekkanenetal.[32]xx(male)----xx Lindauetal.[40]x-x--- Lindauetal.[43]-x(male/female)xxx--- KosankeandLindau[42]---x---- LokkiandJärveläinen[39]-x(female)xx----

LokkiandJärveläinen[39]-x(female)xx----3.6.2 Signal Length and Listener Memory

The length of the used sound source signal is discussed by Hiekkanen et al. [32] in their study on virtual loudspeakers. It is stated that human auditory memory is limited. Humans cannot accurately remember sound source signals longer than a few seconds [32]. No specic preferred maximum length is provided in the paper.

ITU Recommendation 1284-1 [38] also addresses the length of the signal. It is rec-ommended to use audio excerpts of no longer than 15 to 20s. For some tests, they may even be a few seconds. Table 3.11 shows the length of the used stimuli. It can be seen that most signals used do not exceed the 20s limit as stated by ITU Recommendation 1284-1.

Table 3.11: Duration of used stimuli

Research Length (s)

Lindau and Weinzierl [16] 6

Lindau et al. [40] 5

Kosanke and Lindau [42] 2.5 (+ reverb)

Lindau et al. [43] 6

Malecki et al. [37] 20

Lokki and Järveläinen [39] 10 - 20

Kronland-Martinet [48] 7

Breebaart and Schuijers [53] 9 - 30

Pieren et al. [57] 8

3.6.3 Discussion

When looking at the signals used for studies comparing auralisations with synthesized impulse responses versus auralisations made with measured impulse responses, there is a dierence in results between signals with dierent frequency ranges. Broadband signals turned out to provide stimuli that had more audible dierences between the synthesized and measured auralisations. Other samples, such as male/female human speech and various musical instruments, make it harder to distinguish the auralisations from the measurements. The trumpet stimuli used by the research of Lindau et al. [43] had an even lower detection rate than the drum stimuli. This could mean that a trumpet sample is even more suitable to make auralisation artefacts audible than the drum stimuli. Due to the limited auditory short term memory, test signals should not be longer than 15 to 20s [38].