• No results found

Eindhoven University of Technology MASTER Investigation of listening test methods van Loenen, R.C.

N/A
N/A
Protected

Academic year: 2022

Share "Eindhoven University of Technology MASTER Investigation of listening test methods van Loenen, R.C."

Copied!
113
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Eindhoven University of Technology

MASTER

Investigation of listening test methods

van Loenen, R.C.

Award date:

2018

Link to publication

Disclaimer

This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

(2)
(3)

Validation Of Synthesized Room Acoustic Auralisations

Investigation of Listening Test Methods

R.C. van Loenen

Supervisors:

F. Georgiou dr.ir. M.C.J. Hornikx ir. P.E. Braat - Eggen

Unit: Building Physics & Services Department: Built Environment

February, 2018

(4)
(5)

Abstract

The term auralisation can be used in analogy with visualization. It is used to describe the process of making digital sound elds audible. These auralisations can be cre- ated by calculating the articial reverberation of a digital building model and adding this to anechoic audio les with a process called convolution. It is then possible to perceive the acoustics of a room or building during the design phase. Auralisations can provide a great communication tool between architects, acousticians and other stakeholders. The authenticity and validation of auralisations is a recent subject of research. Currently, no standardized method is available about which listening test method to use for the assessment of the quality of auralisations. This study investigated the dierent methods that have been previously used to validate the au- thenticity of auralisations. An authentic auralisation can be validated by comparing measurements to the auralisations.

The rst part of this study consists of a literature review conducted on the listen- ing tests methodologies used in studies concerning auralisations. Two methods, the double-blind triple-stimulus (DB)-test and the Signal Detection Theory (SDT)-test were selected in order to further evaluate their applicability when evaluating the authenticity of auralisations. The DB-test presents the test subjects with three sam- ples: the rst is the reference, which in every case is the measurement. The two other samples are the measurement and the auralisation. The subject has to compare both samples to the reference and rate their dierence. In the SDT-test, the test subject listens to every sample once and then has to decide if it's 'real' or 'simulated'.

The next part of the research concerns performing both tests. Measured and simu- lated Binaural Room Impulse Responses (BRIRs) were obtained to perform the tests.

The measurements were performed in an indoor sports hall using a dummy head and a directional source at multiple locations. The simulated BRIRs were created with the ray-based software ODEON. The reverberation time of the measured and simu- lated BRIRs dier signicantly from each other. Therefore, another ODEON model was made with a reverberation time that matches that of the measured BRIRs.

With the measurements (samples convolved with the measured BRIRs) and simula- tions (samples convolved with the simulated BRIRs) completed, it was possible to prepare and carry out the listening tests. The simulated and measured BRIRs were convolved using dierent anechoic samples: human voice, trumpet and drum. 21 test subjects were asked to perform the two tests. First the double-blind triple-stimulus test was performed, followed by the SDT-test. All subjects were presented with ten dierent auralisations. The samples were repeated multiple times to achieve more signicant results.

While using the same sample set, both tests provided dierent results. The SDT-test

(6)

scores showed that the subjects could not clearly distinguish the measurements from the auralisations. The found scores and used number of subjects/repetitions did not exceed the set minimum threshold score of the test. This indicates that no signi- cant dierences between the measurements and auralisations were found. They were however able to distinguish the auralisations from the measurements with the double- blind triple-stimulus test. Using the rating system of the double-blind triple-stimulus test, the auralisations were rated between 'Rather similar' or 'Slightly dierent' when compared to the measurements. The dierences in results between samples that diered in source/receiver-distances, source radiation angle, anechoic samples and reverberation time, were analysed using Paired samples t-test and Wilcoxon signed rank tests. None of the dierences were found to be signicant. Using more test subjects and less samples with more repetitions could make the found dierences sig- nicant. While producing dierent results, both tests indicated that the quality of the auralisations was found to be relatively high. The outcomes indicated that both tests could be used for diered applications, depending on the goal of the concerning study.

(7)

Contents

List of gures . . . viii

List of tables . . . x

List of abbreviations . . . xi

Acknowledgements . . . xi

1 Introduction 1 1.1 History of Articial Reverberation . . . 1

1.2 Spatial Hearing and Binaural Reproduction . . . 2

1.3 Research Scope . . . 3

1.4 Research Method . . . 3

1.5 Thesis Overview . . . 4

2 Theory 7 2.1 Room Impulse Response . . . 7

2.2 Room Acoustics Modelling . . . 7

2.3 HRTF . . . 8

2.4 Convolution and FFT . . . 9

3 Literature Review 11 3.1 Direct Scaling . . . 12

3.1.1 Double-Blind Triple-Stimulus . . . 12

3.1.2 MUltiple Stimulus with Hidden Reference and Anchors . . . . 13

3.1.3 AB Comparison . . . 14

3.2 Indirect Scaling . . . 15

3.2.1 Three Alternative Forced Choice . . . 15

3.2.2 Two Alternative Forced Choice . . . 15

3.2.3 Paired Comparison . . . 16

3.3 Other methods . . . 16

3.4 Test duration . . . 17

3.5 Discussion and conclusion . . . 17

3.6 Sound Source Signals (Stimuli) . . . 20

3.6.1 Comparing dierent source signals . . . 21

3.6.2 Signal Length and Listener Memory . . . 24

3.6.3 Discussion . . . 24

3.7 Speech Intelligibility and Annoyance . . . 24

3.7.1 STI . . . 24

3.7.2 Annoyance . . . 25

3.8 Conclusion . . . 26

(8)

4 Measurements 29

4.1 Measurement plan . . . 29

4.2 Equipment . . . 29

4.3 Results . . . 31

5 Simulations 35 5.1 ODEON Model . . . 35

5.2 Time Domain Analysis . . . 37

5.3 Spectrum Analysis . . . 37

5.4 Results . . . 40

6 Listening Tests 43 6.1 Stimulus Preparation . . . 43

6.2 Test Environment . . . 45

6.3 Statistics . . . 45

6.3.1 Double-Blind Triple-Stimulus . . . 45

6.3.2 SDT-test . . . 46

6.3.3 Repetitions . . . 49

6.4 Listening Test Procedure . . . 50

6.4.1 Double-Blind Triple-Stimulus Test . . . 50

6.4.2 Signal Detection Theory Test . . . 50

6.5 Results . . . 50

7 Discussion 55 8 Conclusions 59 8.1 Future Research . . . 60

9 References 63

A Directivity Plot - ECHO Speech Source 71

B Convolution code 75

C Subject Reliability 77

D Paired T-Test - Results 79

E Double-blind triple-stimulus - Results 81

F SDT-test - Results 83

G Grouped Data tests 85

H Handout Authenticity Test 87

I Calculation Method Overdispersion 95

(9)

List of Figures

1.1 One of the Echo Chambers of the Abbey Road studios . . . 1

1.2 a. Listening situation b. Binaural hearing principle . . . 2

1.3 Front and side view of a Head And Torso Simulator . . . 3

1.4 Impression of the sports hall . . . 4

2.1 Excerpt of a measured BRIR, showing the direct sound and rst re- ection . . . 8

2.2 Principle of replacing a reection with a virtual source . . . 8

2.3 Impulse and spectrum of an HRTF for the left and right ear . . . 9

3.1 Example of a direct scale . . . 11

3.2 An example of the user interface used in the blind grading phase . . . 14

3.3 Spectrogram of four audio samples . . . 22

4.1 Used measurement equipment . . . 30

4.2 Measurement plan in the sports hall . . . 31

4.3 INR-values for Line 2 . . . 32

4.4 INR-values for Line 1 . . . 32

4.5 STI-values for Line 1 . . . 33

4.6 STI-values for Line 2 . . . 33

4.7 Measured BRIR Sports hall, L2S1, left channel . . . 33

5.1 Dimensions of the sports hall . . . 36

5.2 Early part of BRIR Left channel, L2S1 . . . 37

5.3 Early part of BRIR Left channel, L2S1, without reection . . . 38

5.4 Frequency response of L2S1, before and after the adjustment . . . 38

5.5 Frequency Response of the direct sound of the simulated and measured BRIR for L2S1, Subject 012 and 021 . . . 39

5.6 Frequency response of the Echo source, omnidirectional microphone . 40 5.7 Comparison of spectra . . . 41

5.8 T30, Measured, simulated and calibrated . . . 42

5.9 Spectra of the simulated BRIR of L2S1, before and after calibration . 42 6.1 Distributions for analysis of categorical data . . . 46

6.2 Example of an SDT Analysis . . . 47

6.3 Listening test GUIs . . . 51

6.4 Average scores for the double-blind triple-stimulus test . . . 53

6.5 Average scores for the SDT-test . . . 54

(10)

A.1 ECHO Source directivity plot: 63Hz, 250Hz, 1000Hz and 4000Hz . . . 71

A.2 ECHO Source directivity plot: 125Hz, 500Hz, 2000Hz and 8000Hz . . 71

A.3 ECHO Source directivity plot, 63Hz . . . 71

A.4 ECHO Source directivity plot, 125Hz . . . 71

A.5 ECHO Source directivity plot, 250Hz . . . 72

A.6 ECHO Source directivity plot, 500Hz . . . 72

A.7 ECHO Source directivity plot, 1000Hz . . . 72

A.8 ECHO Source directivity plot, 2000Hz . . . 72

A.9 ECHO Source directivity plot, 4000Hz . . . 73

A.10 ECHO Source directivity plot, 8000Hz . . . 73

(11)

List of Tables

3.1 Answering scale of the double-blind triple-stimulus test. . . 12

3.2 Indirect scaling of dierence. . . 15

3.3 Length of listening tests . . . 17

3.4 Overview of the used listening test methods and their purpose . . . 19

3.5 Method statistics . . . 20

3.6 Pro's and con's, double-blind triple-stimulus . . . 20

3.7 Pros and cons, MUSHRA . . . 21

3.8 Pros and cons, AB-Comparison . . . 21

3.9 Pros and cons, 2/3AFC test . . . 21

3.10 Used stimuli . . . 23

3.11 Length of used stimuli . . . 24

4.1 Used measurement equipment . . . 30

5.1 Settings of the ODEON Simulations . . . 35

5.2 Absorption coecients α . . . 37

5.3 Absorption coecients of the ceiling - Base settings [19] and calibrated 41 6.1 Used source positions, L1/L3 . . . 43

6.2 Used source positions, L2 . . . 43

6.3 Stimuli-set used with both test . . . 44

6.4 Relative loudness, L2 . . . 45

6.5 Relative loudness, L1 . . . 45

6.6 Calculated required sample size - Chi Squared . . . 46

6.7 Detection rate pcand corresponding d0minvalues . . . 48

6.8 Calculated sample sizes for the chosen pcand α values . . . 48

6.9 Possible combinations, overview for N = 25, double-blind . . . 49

6.10 Possible combinations, overview for N = 25, SDT . . . 49

6.11 Average DB-score per sample . . . 52

6.12 Average d0scores . . . 52

C.1 One Sample T-Test against 0, SDT . . . 77

D.1 Test for normal distribution, double-blind triple-stimulus test . . . 79

D.2 Paired t-test results, double-blind triple-stimulus test . . . 80

D.3 Test for normal distribution, SDT-test . . . 80

D.4 Paired t-test results, SDT-test . . . 80

(12)

E.1 Average scores per subject, for every sample . . . 81

F.1 Average D-Prime scores per subject, for every sample . . . 83

F.2 Hits (H) and False Alarm (F) ratios per subject, Sample 1-5 . . . 84

F.3 Hits (H) and False Alarm (F) ratios per subject, Sample 6-10 . . . 84

G.1 Data Groups . . . 85

G.2 Independent T-Tests* results, double-blind triple-stimulus test (*Paired samples t-tests) . . . 85 G.3 Independent T-Tests* results, SDT-test test (*Paired samples t-tests) 85

(13)

List of Abbreviations

2/3AFC Two/Three Alternative Forced Choice

ASW Apparent Source Width

BRIR Binaural Room Impulse Response DFT Discrete Fourier Transformation E-Sweep Exponential Sine-Sweep

FABIAN Fast And Binaural Impulse Repsonse Acquisition FFT Fast Fourier Transformation

GUI Graphical User Interface HATS Head And Torso Simulator HRTF Head Related Transfer Function HpTF Headphone Transfer Function IRS Inverse Repeated Sequence JND Just Noticeable Dierence

LEV Listener Envelopment

MLS Maximum Length Sequence

MRT Modied Rhyme Test

MUSHRA MUltiple Stimulus with Hidden Reference and Anchors

RMS Root Mean Square

S/R-distance Source/Receiver-distance

SDT Signal Detection Theory

SPL Sound Pressure Level

STI Speech Transmission Index

(14)
(15)

Acknowledgements

After wanderings along Frysian lakes and Greek theatres, it is nally here: the nal result of my graduation project. Of course I couldn't have done all of this on my own, so I would like to take the opportunity to share my gratitude.

Firstly, I would like to thank Fotis Georgiou. As my rst supervisor, he has provided me with both the knowledge and the motivation to keep going. Thanks for all the time you spent helping me along. Thanks also goes out to Maarten Hornikx for his knowledge and guidance throughout the project. Ella Braat-Eggen also deserves my thanks, especially for her ability to let me see things in a dierent light. The people of Level Acoustics and the personnel of the SSC are also thanked for their help during the measurements.

I would like to thank my fellow-students who accompanied me at the lab and Floor 5 (and various other non-study related places) for making me feel at home and for making all of these years studying in Eindhoven so much fun. All the people who participated in my listening tests are also very much appreciated; here's to never hearing that sentence again! Next, I would like to thank my fellow acoustician students and especially Marc Kalee and Wouter Reijnders for their help and the many interesting discussions about all kinds of acoustical topics. Special thanks goes out to Sandra Greven as well, for proofreading my thesis and providing me with lots of feedback.

Finally, I would like to thank my family, friends and loved ones for their support from the beginning until the very end.

(16)
(17)

1 | Introduction

1.1 History of Articial Reverberation

Articial reverberation and auralisation have been the topic of many acoustical stud- ies since the 1960s [1]. Before digital techniques were widely available, 'analogue' techniques were used to add an articial reverberation to a sound signal. Analogue articial reverberation techniques were introduced in the 1920s. The person credited for this use was M. T. "Bill" Putnam, who built a number of so called echo chambers [1]. It was mostly used for broadcasting applications. The dry signals were transmit- ted into the chamber, after which the response of the room was recorded. Absorption material could be added to control the overall reverberation time of the room. The amount of early reections could also be controlled by placing the microphone in dierent positions. These rooms were still used in later years. Fig. 1.1 shows an echo chamber of the famous Abbey Road Studios.

Figure 1.1: One of the Echo Chambers of the Abbey Road studios [2]

These particular chambers were used to add reverberation to iconic music records such as Dark Side Of The Moon by Pink Floyd and various records of The Beatles [3]. Since the added re- verberation was an attribute of these rooms, it was not articial in the strict sense of the word. Another more com- pact analogue technique is the spring re- verberator [1]. This device uses a set of helical springs. When excited by an electronic signal, the springs produce a set of echoes. These somewhat repre- sent the acoustics of a room. The vi- brations are recorded and mixed in with the original signal.

From the early 1960s, electronic rever- berators were starting to replace these analogue techniques. These were both cheaper and more practical to use than the techniques mentioned in the previ- ous paragraph. In 1961, Schroeder and

Logan [4] succeeded in simulating several reverberators on a computer. Due to the signicant increase of computer power in the early 90s, it was then possible to per-

(18)

form acoustic simulations and auralisation on a personal computer. Because of this, research on auralisation took o [5].

With the use of auralisation, the articial reverberation of a digital building model can be calculated and added to anechoic audio-samples without reverberation. Next to measuring, it is now also possible to compute the impulse response of a room. The Room Impulse Responses (RIR) can be considered as the acoustical ngerprint of a room [6]. A room can then be auralised by convolving the RIR with a dry signal. Dry signals are recordings made inside an anechoic chamber. With these auralisations, it is possible to perceive the acoustics of a room or building during the design phase.

Another denition is given by Kleiner et al. [7], who describe auralisation as "a term introduced to be used in analogy with visualization to describe rendering audible (imaginary) sound elds". During the design process in the built environment, it is possible that the acoustics could overlooked if members of the design team have little experience with acoustics and its facets. Auralisations can then be used as a tool to bridge the communication gap between architects, acousticians and other stakeholders [8]. Nowadays, auralisations are more accepted as a part of the archi- tectural design. This is, of course, especially true for buildings such as concert halls and theatres [9].

1.2 Spatial Hearing and Binaural Reproduction

A listening situation can be described as a system with an input (the source), an output (the receiver(s)) and the transmission between the two [10] (see Fig. 1.2.A).

In the case of this research, the input can be seen as the sound source or the used anechoic signal. The output/receivers are represented as the two eardrums of the listener. Finally, the transmission of sound between the source and the receivers in a room can be described using the earlier mentioned RIRs. This transmission can be measured using measurement equipment or simulated using room acoustics modelling software.

Figure 1.2: A. Listening situation B. Binaural hearing principle

Localisation and spatial hearing is caused by hearing with both ears and compar- ing the signals the two ears receive [11]. Dierences in amplitude and transit time between the sound reaching both ears make it possible to locate the source. This is illustrated in Fig. 1.2.B. Next to the diraction by the ears, the sound will be diracted by the head and torso. The transmission between the source and receivers can be described with the Impulse response of the environment. For auralisations, Head Related Transfer Functions (HRTF) are used. These HRTFs are lters which alter the spectrum of a signal for its direction and distance. These HRTFs are used to create Binaural RIRs (BRIRs). These are sets of two RIRs (for the left and right

(19)

ear). BRIRs can be considered as the key for room acoustics auralisations [12]. They can be calculated with computer software or measured with so-called dummy heads.

These dummy heads are articial replica's of an average human head and contain two microphones: one in the left and one in the right ear. Fig. 1.3 shows a front and side view of the type of dummy head that was used for this research: a Head And Torso Simulator (HATS) [13]. Head-Related Transfer Functions (HRTFs) are used to create articial BRIRs. The BRIRs are used to convolve dry mono signals to create a binaural result. The resulting auralisations can then be played back via headphones producing a 3D sound experience.

Figure 1.3: Front and side view of a Head And Torso Simulator [14]

1.3 Research Scope

The authenticity and validation of auralisations based on simulations is a recent sub- ject of research. An authentic auralisation is described by Lokki and Savioja [15] as an auralisation that the listener cannot distinguish from a recorded sound. Another used criterion is plausibility, which is dened as "a simulation in agreement with the listener's expectation towards an equivalent real acoustic event" [16]. Brinkmann et al. [17] describe the dierence between plausibility and authenticity as follows:

"Whereas the plausibility of a simulation refers to the degree of agreement with the listener's expectation towards a corresponding real event (agreement with an inner reference), authenticity refers to the perceptual identity with an explicitly presented real event (agreement with an external reference)." [17].

There is no standardized listening test method for assessing the quality of auralisa- tions [15]. Recent auralisation validation and assessment studies have used various methods. These dierent tests can produce dierent kinds of results. This means that depending on the goal of a study, the required listening test may also dier.

1.4 Research Method

The goal of this research was to investigate the dierent listening test methods con- cerning the authenticity of auralisations. The rst part covered a literature review

(20)

on the multiple test methods used with studies regarding auralisations. Focus was also given to the research area of audio perception since in that area many validated subjective evaluation methodologies have been developed. Based on this review, the Signal Detection Theory (SDT)-test and double-blind triple-stimulus test were chosen to investigated further.

Both methods were tested with a set of auralisations made in ODEON Room Acous- tics Software 12.12 [18], which were compared to a set of corresponding measure- ments. The research focussed on indoor acoustics, so the measurements was per- formed in an indoor sports hall (see Fig. 1.4). To test the eect of the used sample on the authenticity rating, dierent types of sound-source signals were used in the tests.

The study continued with the performance of the two chosen listening test methods.

Here, the goal was to see how these tests were to be performed and what results they produced. These tests were both executed using the same set of audio samples. The results were analysed to evaluate the applicability and usability of both listening test methods. Aspects such as the used samples, number of repetitions and the dierence in results between the two tests were analysed as part of these evaluations.

Figure 1.4: Impression of the sports hall [19]

1.5 Thesis Overview

The rst step of the research was a literature review on listening test methods used in studies concerning auralisations. This review is an adapted version of the previously written M3-report with the same title. Within this review, a variety of dierent test methods is discussed. Other aspects, such as used sound source signals are also included.

The second step is the performance of two listening test methods recommended by the conclusions of the literature review. This review was followed by the BRIR mea- surements. These were performed in the Sports Hall at the TU/e Eindhoven campus.

The measurement plans and used equipment settings are found in Section 4. Acous- tical simulations were performed with ODEON Acoustics 12.12. The measured and simulated BRIRs will be used with the anechoic samples to create the audio sam- ples. The performed ODEON-simulations, processing of the simulated and measured BRIRs simulations and the convolution are found in Section 5.

The used Graphical User Interface (GUI) was designed and built in the software Max/MSP 5.0.7. The statistical basis for the chosen number of test subjects and

(21)

number of test samples will be explained in Section 6.3. With both the audio samples and test environments prepared, the research will be concluded with the performance of the two listening tests methods. The results of these tests and their applicability are discussed in the nal sections.

(22)
(23)

2 | Theory

This section describes the theoretical concepts and methods related to the subject of auralisations. The simulated transmission, introduced in section 1.2 ( see Fig. 1.2.A), is the essential part of the auralisation framework. These simulated transmissions make it possible to create auralisations. Four essential elements are required to produce these transmissions: Room Impulse Responses, room acoustics modelling, HRTFs and Convolution and FFT. These elements will be discussed in this section.

2.1 Room Impulse Response

An Impulse Response (IR) can be described as the output of a linear system which is excited by an impulse [6]. Using this analogy in acoustics, a room can be considered as the 'system'. The IR of a system is acquired when that system is excited by a signal (the impulse). A commonly used method plays a known input signal in a space and measures the space's response [20]. The IR can then be extracted from this recording using a process called deconvolution. Deconvolution can be seen as the reverse eect of convolution, which is explained in more detail in section 2.4. If the input signal and it's expected eect on the output are known, the IR can be extracted from the measurement. This results in a so-called Room Impulse Response (RIR).

For the used source/receiver combination and location, the RIR will describe how sound from that source will arrive at the receiver [6]. The direct sound will arrive

rst (rst peak of the IR in Fig. 2.1). Other peaks are caused by the reections from dierent surfaces in that room. As mentioned earlier, an RIR can thus be considered as the ngerprint of the room. When the response is recorded with two microphones mounted in the left and right ear of a dummy head such as HATS, the results is called a Binaural Room Impulse Response (BRIR). It uses the term binaural since both ears are involved [21]. An example of a (section of a) BRIR can be seen in

gure 2.1, showing the direct sound and the rst reection for both channels.

2.2 Room Acoustics Modelling

Room acoustics can be simulated in multiple ways. Known concept are a.o. wave- based and ray-based models [22]. The software available for this research was ODEON acoustics. This is a hybrid ray-based acoustics simulation tool that makes it possible to simulate the acoustics of a digital geometrical model. It uses ray-tracing and the Image Source Method (ISM) and works as follows: a set number of rays are emitted from a sound source in a modelled space. The rays will travel through the space until they run out of energy. Energy is lost by absorption due to the absorption by

(24)

0 0.005 0.01 0.015 Time (s)

-1 -0.5 0 0.5 1

Amplitude (-) Left channel

Right channel

Figure 2.1: Excerpt of a measured BRIR, showing the direct sound and rst reection the air and by absorption of the surfaces with every reection. All surfaces in the space are assigned an absorption coecient α which describes the fraction of the in- coming sound particle energy that is not reected [6]. Both methods simulate sound as particles moving through these rays [23].

ISM is based on the principle that reections of a source in an enclosed room can be replaced by virtual sources. The source is mirrored in the plane of the reecting surface, creating this virtual source [24]. The concept is visualized in Fig. 2.2 for a single rst order reection. The concept ODEON uses with these virtual sources is to replace the room with sources placed at dierent distances. As seen in Fig. 2.2, the source/receiver distance is dierent for both sources. Because of this, the virtual sources will be will be excited with a delay [13].

R1 R1

S1 S2 S1

Figure 2.2: Principle of replacing a reection with a virtual source

The ISM and ray- tracing method have their pros and cons ([24] [25] [6] [26]). ISM produces very accurate results with the pos- sibility to also take phase relationships into consideration. How- ever, the method works best with rectangu- lar box-shaped rooms with plane and smooth surfaces. It will also

quickly result in high calculation times with even unrealistically high reection or- ders. Ray tracing have low computational costs and uses both specular and diuse reections. The method also works with curved and scattering surfaces. The angular resolution however, limits the accuracy. Both do not take wavelength and frequency into account.

2.3 HRTF

As described in Section 1.2, source localisation is made possible by dierence in amplitude and transit time. Next, the sound reaching the ears will get diracted by the head, pinnea and torso. The HRTFs are lters that take into account the

(25)

frequency alteration caused by the head and pinnae, the diraction caused by the head and torso, and the delay caused by the distance between the two ears [6]. An example of a set of HRTFs can be seen in Fig. 2.3. Applying the HRTF on a single RIR will result in a set of two BRIRs, one for both the left and right ear. The HRTF can be measured individually for a person or a dummy head. A RIR simulated by ODEON can be processed with a previously chosen HRTFs. This will result in a set of two BRIRs. Fig 2.3 shows the impulse and the spectrum of an HRTF. The impulse shows a delay for the left ear and the spectrum of both channels are clearly dierent.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time (ms) 10-3

-0.5 0 0.5

Amplitude (-)

Left Right

102 103 104

Frequency (Hz) -60

-40 -20 0 20

Magnitude (dB) Left

Right

Figure 2.3: Impulse and spectrum of an HRTF for the left and right ear

2.4 Convolution and FFT

An essential part of creating auralisations is convolution. Equation 2.1 shows the convolution formula [27] with s(t) as the input signal, h(t) as the impulse response of the system and g(t) the output signal:

g(t) =

Z

−∞

s(τ )h(t − τ )dτ = s(t) ∗ h(t) (2.1) In the work presented in this thesis, the input signal s(t) is the dry signal (e.g. drum, speech), h(t) the left or right channel of the BRIR and the output g(t) the aural- ization. Convolution can be implemented both in the time and frequency domain.

Multiplication in the frequency domain is equivalent to convolution in the time do- main. The process to transform a signal from the time domain to the frequency domain and back is called the Fourier Transformation [11]. The Discrete Fourier Transformation (DFT) is used as a Fourier transformation for sampled discrete sig- nals [11]. This study uses the Fast Fourier Transform (FFT), an ecient form of

(26)

DFT. The procedure of FFT is based on the convolution theorem. The concept of this theorem is that the convolution of two signals can be achieved by multiplying the complex spectra of both signals. The resulting convolution is achieved by applying DFT to the product of the signals [1]. Compared to convolution in the time-domain, this method requires a lot less operations [1].

(27)

3 | Literature Review

The literature review will mostly focus on the dierent listening methods that are currently used to assess the authenticity and quality of auralisations. The conclusions and recommendations of the literature review will be used to set up an experiment where a set of auralisations made with measured BRIRs will be compared to a similar set of auralisations made with simulated BRIRs.

Research Question

The main question of this literature review is:

ˆ How is the authenticity of synthesized room acoustic auralisations rated?

The sub-question is:

ˆ What are the eects of dierent sound-source signals on the authenticity rating of auralisations?

An important part of this study are the listening test methods that will be used in the experiments. As stated before by Lokki and Savioja [15], there is no recommended or standardized listening test method to test the quality of auralisations. Due to this, many dierent listening test methods are used in auralisation evaluation studies. This section will present a number of dierent methods and discuss the pros and cons.

Not all of the listening tests that are discussed in this section were used for research concerning the authenticity rating of auralisations, but these can still provide useful information. A way to categorize the dierent listening test methodologies is based on their scaling method [28]. There are two scaling methods, which are direct and indirect scaling. The direct scaling technique asks the test subject to directly convert a sensation and report it on a scale. An example of a direct scale can be seen in Fig.

3.1.

Figure 3.1: Example of a direct scale: Degradation Category Rating (DCR) scale [28]

On the other hand, the indirect technique measures the sensation by measuring the ability of the test subjects to distinguish dierent stimuli from one another. Both techniques are commonly used in auralisation studies.

(28)

Methods that are discussed in this section include:

ˆ Double-blind triple-stimulus

ˆ MUltiple Stimulus with Hidden Reference and Anchors

ˆ AB-comparison

ˆ Three Alternative Forced Choice

ˆ Two Alternative Forced Choice

3.1 Direct Scaling

3.1.1 Double-Blind Triple-Stimulus

The double-blind triple-stimulus test is described in ITU-R Recommendation BS.1116- 1 [29]. It is used to assess systems that introduce sounds with very small impairments.

It is recommended to have participants who have a certain level of expertise. The participants should be selected by their experience in taking listening tests. A train- ing is also recommended to let the test subjects familiarize themselves with the tests.

This training procedure can also be used to select the most qualied test subjects.

The double-blind triple-stimulus test presents the test subject with three stimuli "A",

"B" and "C". The test subject is asked to rate the impairment of B compared to A, and C compared to A on a scale of 1 to 5, as seen in Table 3.1).

The double-blind triple-stimulus was used by Lokki and Savioja [15] in a study where auralisations were validated with both objective and subjective means. Multiple listening test methods were used in this research. However, they do not mention the reason for choosing the double-blind triple-stimulus test or any of the pros and cons of the methods. The test subjects of this research were asked to compare the spatial and timbral dierences between recorded and auralised sound fragments. Based on the results of this listening test, Lokki and Savioja [15] concluded that the auralisations of this study were almost imperceptible from the actual recordings. However, transient signals such as a snare drum did provide auralisations with audible dierences.

Table 3.1: Answering scale of the double-blind triple-stimulus test [29].

Impairment Grade

Imperceptible 5.0

Perceptible, but not annoying 4.0

Slightly annoying 3.0

Annoying 2.0

Very annoying 1.0

The same listening test is used by Lindau et al. [30] with their research on Head- phone Transfer Functions (HpTF). A HpTF described the response and coupling of a headphone with the ears [31]. They are generated by playing known signals through the headphone and recording the signal with in-ear microphones. Non-individual, individual and generic HpTFs were compared within this research. Due to the fact that one of the two samples ("B" or "C") is identical to the reference ("A"), one slider will always be set to 5 (imperceptible). Test subjects could only pick one slider to rate the dierence. The other slider will then have to be set to 5. This is an

(29)

advantage of the double-blind triple-stimulus test: it forces the test subject to grade the hidden reference as imperceptible. It makes it possible for the experimenter to test the ability of the test subject to detect the artifacts of the test stimuli when compared to the reference [29].

Hiekkanen et al. [32] also used an adaptation of ITU-R Recommendation BS.1116-1 [29] for their research on the dierence between real and virtualized loudspeakers.

Auralisations make it possible to present test subjects with stimuli from dierent speakers originating from the same position. Dierences between the real and virtual loudspeaker are evaluated using ve attributes. The rst three are related to the source localization: Apparent sound width, direction of events and distance to events.

The rst describes how the width of the source is perceived by the subjects. The other two describe the perceived location and distance of the source compared to the (virtual) position of the subject. The fourth attribute is spaciousness, which is described as the amount of space present in the stimuli. The spectral content is described by the attribute Tone Colour. The same answer scale as Table 3.1 is used.

A similar listening test is used by Paul [33], who used three instead of two dierent stimuli to compare with the reference. Test subjects had to mark their rating on a linear scale printed on paper, which ranges from very similar to very dierent. The answering scale has no reference points for the subjects to base their scaling of the perceived dierence on. This can potentially lead to signicant dierences in ratings between dierent subjects. Although the used scale is continuous from 0 to 10, the marks were transformed into numerical values with a precision of 0.5.

3.1.2 MUltiple Stimulus with Hidden Reference and Anchors Another test often used is the Recommendation ITU-R BS.1534-1 [34], also known as MUltiple Stimulus with Hidden Reference and Anchors (MUSHRA). This codec- listening test is used for subjective assessment of audio coding formats such as MP3, AAC and FLAC. The dierence between the double-blind triple-stimulus test and MUSHRA is that MUSHRA is more suitable in evaluating audio of lower quality. It is more adequate in discriminating dierences in quality that with the use of other test methods might agglomerate at the lower half of the scale [28]. With MUSHRA, the test subject is presented with multiple audio fragments including the reference signal, one or multiple impaired audio fragments, a hidden reference and a hidden anchor. This anchor is a low-pass ltered version of the unprocessed signal (the reference). When test subjects are presented with the dierent stimuli and asked to rate the dierences, the low-pass signal should be given the lowest score. The hidden reference respectively should be given the highest. These two 'extreme' stimuli are used to encourage the test subjects to use a broad rating range. This range makes MUSHRA suitable in evaluating audio of lower quality [28]. Dierence ratings for multiple stimuli of lower quality could otherwise accumulate at the bottom end of the earlier shown answering scale (see Table 3.1) [28]. An example of the interface used for MUSHRA-tests can be seen in Fig. 3.2.

Watanabe et al. [35] used the MUSHRA-method in their study on the acoustics of a virtual renaissance theatre. Actual recordings in the theatre were compared with convolved stimuli using the measured transfer functions for the dierent loudspeakers used in the theatre. They used the MUSHRA-method to present the test subject

(30)

with ve dierent signals. Three perceptual attributes (Apparent Source Width (ASW)[36], Listener Envelopment (LEV)[36] and Clarity) were chosen to be assessed by the test subjects.

Figure 3.2: An example of the user interface used in the blind grading phase [34]

3.1.3 AB Comparison

Malecki et al. [37] used the AB-method for their research on the assessment of multi- channel auralisations. The RIRs were recorded in a test room with both the use of a multi-microphone technique and a SoundField type microphone. These RIRs were used to convolve the anechoic signals. The anechoic signals were also used to record reference samples in the same test room. Recommendations of ITU-R BS.1284-1 [38]

were followed. The test was performed in three series, with all three using a dierent sound system: a 5.0 surround system, a stereophonic system and a headphone. The test subject group consisted of eight people, who all had at least a few years of experience with listening tests. During the test, the subjects were presented with pairs of samples, which could randomly consist of dierent or identical samples. Test subjects rst had to answer if the samples were dierent or equal. When found to be dierent, the subjects were asked to express the dierence on a scale ranging from 1 to 5 (see Table 3.2), although it was not mandatory to answer this question.

However, this scale is not based on any ITU recommendations. The detection rate of the auralisations was found to be signicant for all three systems. The stereophonic has a notably lower detection rate of 63% compared to the headphone system (77%) and the 5.0 system (83%).

An AB-comparison method was also used by Lokki and Järveläinen [39] in a study concerning the subjective evaluation of auralisations where real-head recordings were compared with auralisations. 24 pairs of auralisations and recordings were used for the comparison. To measure the reliability and bias of the test subjects, eight pairs of identical samples were also mixed in with the other pairs. They presented the test subject with the auralisation and the accompanying recording and asked the subject to rate the dierence between the two samples.

(31)

Table 3.2: Indirect scaling of dierences used by Malecki et al. [37]

Impairment Grade

The dierences are hard to notice 1.0

Dierences based on the noise and crackles 2.0 Small dierences based on the quality and sound 3.0

Big dierences in sound 4.0

Very big dierences 5.0

3.2 Indirect Scaling

3.2.1 Three Alternative Forced Choice

The Three Alternative Forced Choice (3AFC) is a listening test used by Lindau et al.

[40] for their research concerning the BRIR grid resolution. In the 3AFC test, the subjects are asked to select the sample that diers from the other two. This is an advantage compared to the double-blind triple-stimulus test, because subjects have a 33% chance of selecting the right answers whereas this is 50% for the double- blind triple-stimulus test. The BRIR resolution is important for correct simulation of dynamic binaural synthesis. They used an adaptive version of the 3AFC-test, though no arguments are provided for their choice of listening test. The test subjects are presented three stimuli: two stimuli with the highest resolution and one stimulus with a reduced resolution. However, very little further information on how the test was conducted was available in this paper.

The 3AFC method is also used by Pelzer and Vorländer [41] for their research on the eect of dierent levels of detail of CAD-models used for auralisations. They modelled the same room with eight dierent levels of detail. The test subjects were presented two auralisations created with the highest level of detail and one with a lower level and were asked to pick the sample that diers from the other two.

Another example of the usage of the 3AFC method is the study of Kosanke and Lindau [42] on mixing time in BRIRs. The mixing time is the crosspoint between early reections and late reverberation. The latter part of the BRIR, after the mixing time, becomes more diuse, which makes individual reections less distinguishable. A major aspect in Virtual Acoustic Environments are the computational costs. To lower these costs, the paper examines the possibility of using a constant reverberation tail (the late reverberation). Normally, the reverberation tale is individually calculated for every position and direction. Replacing this with a constant one could potentially lower the calculation costs. BRIR measurements were performed in nine rooms with dierent volumes and dierent average absorption coecients. The test subjects had to distinguish the auralisation with the constant reverberation tail from the individually calculated one.

3.2.2 Two Alternative Forced Choice

Though not explicitly stated, the Two Alternative Forced Choice method (2AFC) was used by Lindau et al. [43] for their research on binaurally synthesized acoustic environments. 3926 BRIRs were recorded with the use of the Fast And Binaural

(32)

Impulse Response Acquisition (FABIAN). The test subjects were presented with 80 pairs always consisting of a recording and an auralisation and had to pick out the simulated stimuli. The results of this test was a detection rate of a signicant 52.9%.

A survey afterwards revealed that test subjects mostly based their judgment on the spectral dierences and on the dierence in source localization.

The 2AFC method was also used by Otondo and Rindel [44] to test the possible audio improvement of a new method for the representation of sound sources that vary their directivity in time in auralisations. Clarinet recordings were used to create a set of auralisations using a model of the Queen's Hall of the Royal Library of Copenhagen with the ODEON software. These auralisations were compared with recordings cre- ated with the traditional method. The test subjects were presented with 48 stimuli pairs and were asked to judge the perceived spaciousness of the sound in the room and the perceived naturalness of timbre of the clarinet. It was allowed to replay the stimuli as many times as wanted, although it was not allowed to switch between stimuli while playing. The spaciousness is described as the perceived change of the directivity of the source spatially; the naturalness as the representation of how the radiation of the source has aected the quality of the reproduced instrument sound.

The paper concludes that the perceived naturalness of the auralisations created with the new method has a signicantly higher preference than the traditional method.

There was no signicant preference concerning the perceived spaciousness.

3.2.3 Paired Comparison

For their research on real-time auralisation of non-stationary trac noise, Maillard and Jagla [45] used a paired comparison method to determine if there are perceptual dierences between noise recordings and auralisations. Three sets of 16 stimuli pairs were created for the test. To keep the test length within limits, test subjects were only presented 20 stimuli. They were asked to listen to a stimuli and to decide if it is recorded or synthesized.

3.3 Other methods

Some studies use listening test methods that cannot be categorized in one of the four previous addressed methods. Some are altered versions of these methods or completely new ones that have not been used before. These methods will be addressed in this section.

In a study on trac noise, Southern and June [46] use an adapted version of the forced choice method to distinguish real recordings from auralisations. Previous research ([47]) used an AB comparison test, but the paper concludes that this method is inappropriate for their situation. The research lets test subjects identify recorded and synthesized pass-bys of vehicles. The test subjects were presented with a reference recording and seven auralised stimuli and were asked to pick the stimulus that was the most similar to the reference. This is an indirect forced choice setup similar to the 2/3AFC method, but without a hidden reference and more stimuli.

Lindau and Weinzierl [16] assessed the plausibility of virtual acoustics environments.

A method similar to the Signal Detection Theory (SDT) was adapted to create a

(33)

new listening method. SDT is normally used to create a yes/no-paradigm. The test subject is presented with a 'signal' or 'no signal' condition. For this study, the reality takes the role of the 'signal' condition and the simulation the 'no signal' situation. The test subject has to decide, after hearing a sample, if it was real or simulated. According to this research, a strong personal bias can be expected "due to personal theories about the credibility of virtual realities and the performance of media systems in general" when test subjects have to rate the plausibility of a virtual environment on a scale from 0 to 1 without a given reference. The idea is that by giving the test subject only a yes/no answer possibility, the bias of the subjects, as described above, can be excluded during the post processing of the results.

3.4 Test duration

An aspect that cannot be overseen with listening tests is the duration of the tests.

Although it is a very important and practical aspect of listening tests, literature concerning this topic is relatively rare [48]. As stated by Bech and Zacharov [28], long tests with relatively tedious tasks are prone to bore the test subjects. Therefore, it is necessary to divide longer sessions into smaller ones. A session of 20 minutes seems to be an acceptable length. When interrupted with breaks, multiple sessions of this length can be performed during the day. The length of listening tests of dierent studies can be seen in Table 3.3. Only three of the six studies that provided the length of the listening tests meet the recommendations made by Bech and Zacharov [28].

Table 3.3: Duration

Research Length (min)

Brinkmann et al. [17] 45

Lindau and Weinzierl [16] 15

Lindau et al. [30] 20, 45-60 including training Lindau and Stefan [49] 30 - 90

Vorländer [50] 20 - 30

Parizet and Nosulenko [51] 20 - 40

3.5 Discussion and conclusion

One of the most important factors in choosing a suitable listening test is the expected level of dierence between the auralisations and the measurements. The double-blind triple-stimulus and the SDT test should be suited to detect small dierences, but this could lead to unusable results if the auralisations are clearly distinguishable from the recordings. The level of expertise of the test subject can be a major factor, since less experienced test subjects are less likely to hear small deviations between the dierent samples. This level of expertise also inuences the questions that can be asked to express the dierence. Test subjects that are less experienced within the

eld of acoustics can provide incorrect results when asked to assess certain specic acoustic parameters. Considering the duration of the listening tests, it is suggested to keep single sessions under 20 minutes to avoid listener fatigue.

(34)

An overview of the used listening test methods by dierent studies and listening test statistics can be found in Tables 3.4 and 3.5. The pros and cons of the dierent methods are found in Tables 3.6, 3.7, 3.8 and 3.9.

(35)

Table3.4:Overviewoftheusedlisteningtestmethodsandtheirpurpose Impairmentdouble-blindMUSHRA2/3AFCAB-ComparisonOther Plausibilityassessment----x[16] Dierencerecordedvs.auralizedsoundsxx[15][39]--xx[40][43]xxx[37][45][52]xx[30][46] Simulatedheadphones/loudspeakersx[32]---- Comparingmeasuredsignalvs.convolutionwithmea- suredBRIR-x[35]--- Dierencebetweendierentlevelsofmodeldetail--x[41]-- Dierenceconstantvs.individuallymanipulatedrever- berationtail--x[42]-- Compareaudioimprovementofauralisation--x[44]--

(36)

Table 3.5: Method statistics

Research Type of test No. of

sub- jects

No. of ques- tions

No. of repeti- tions

Lokki and Savioja [15] double-blind - - -

Lindau et al. [30] double-blind 27 - -

Hiekkanen et al. [32] double-blind 8 - -

Breebaart and Schuijers [53] double-blind 9 - -

Watanabe et al. [35] MUSHRA 10 - -

Lindau et al. [40] 3AFC 21 - -

Pelzer and Vorländer [41] 3AFC - - -

Kosanke and Lindau [42] 3AFC 24 20 -

Lindau et al. [43] 2AFC 35 80 -

Otondo and Rindel [44] 2AFC 10 48 -

Malecki et al. [37] AB-comparison 8 60 -

Lokki and Järveläinen [39] 1 AB comparison 12 32 - Lokki and Järveläinen [39] 2 AB comparison 6 - -

Maillard and Jagla [45] AB comparison - 20 -

Peplow et al. [52] AB comparison - - -

Lindau and Weinzierl [16] Other 11 100 100

Southern and June [46] 1 Other 13 - -

Southern and June [46] 2 Other 16 10 -

Table 3.6: Pro's and con's, double-blind triple-stimulus

Advantage Disadvantage

Can be used to asses sounds with

very small impairments. [29] It is recommended to have partici- pants who have a level of expertise [29]

The test forces the subject to nd and grade the hidden reference as imperceptible, which makes it pos- sible to asses the expertise of the subject. [29]

By rating the odd sample, the sub- ject takes more time per question than when the odd sample only has to be selected.

3.6 Sound Source Signals (Stimuli)

A major factor in this research is the sound source signal used in the listening exper- iments. This section will review the dierent sound source signals that are used in auralisation studies and the possible dierences in the test results. The length of the used signals will be discussed as well. An overview of the used stimuli by dierent studies can be found in Table 3.10.

(37)

Table 3.7: Pros and cons, MUSHRA

Advantage Disadvantage

More adequate in discriminating dierences in quality than other test methods. [34]

Multiple samples will be tested at once. This is not ideal when testing measurement/auralisations pairs.

Tests can take less time to perform than when using the Recommenda- tion ITU-R BS.1116 method. [34]

Results of the listening tests ob- tained using MUSHRA may be bi- ased due to stimulus spacing and range equalizing. [54]

Able to display all the stimuli at once, leading to more consistent re- sults and smaller condence inter- vals. [34]

Table 3.8: Pros and cons, AB-Comparison

Advantage Disadvantage

More general assessments usually involve larger dierences and there- fore do not need such close control of the test parameters. [38]

The method has no hidden refer- ence, therefore it is more dicult to control the subject reliability.

The time taken to perform the test using the MUSHRA method can be signicantly less than when using the Recommendation ITU-R BS.1116 method. [34]

-

Table 3.9: Pros and cons, 2/3AFC test

Advantage Disadvantage

Lower numbers of judges required because of the high sensitivity to dierences. [55]

The specication of the nature of the dierence between the samples is required. [56]

Higher statistical power when com- pared to similar test as the duo-trio, triangle and same/dierent tests.

[56]

-

3.6.1 Comparing dierent source signals

Brinkmann et al. [17], in 2014, used two types of signals for their auralisation authen- ticity study: a pulsed pink noise (0.75s noise, 1s silence) and and an anechoic male speech recording (5s). When the test subjects were asked to detect the dierence between the real and simulated signal, they found that the test subjects scored a signicantly higher detection rate with the pulsed pink noise signal (87.5% to 100%) than with the speech signal (54% to 100%). It was stated that this result was in accordance with earlier studies, although no references were made. They mention

(38)

that a possible reason for this dierence is the broadband and more steady nature of the pulsed pink noise signal, which supports the detection of colouration.

The earlier mentioned research of Lindau et al. [40] used both pink noise and an excerpt of a acoustic guitar piece as stimuli for their listening tests. Results showed that the guitar stimuli was sensitive to simulation artefacts.

In another study, Lindau et al. [43] used a male speech, female speech, acoustic guitar, trumpet and drum sample as stimuli. Test subjects were presented with 80 recording/auralisation pairs and were asked to detect the auralisation. The research concludes that the speech and guitar samples were most suited to uncover potential artefacts of the simulation, compared to the drum and trumpet sample which had a signicantly lower detection rate.

In another study, Kosanke and Lindau [42] used only a drum sample for their earlier mentioned research on mixing time (page 15). A drum sample is also used in the study of Lokki and Järveläinen [39], who also used guitar, clarinet and female voice stimuli. Again, the drum stimuli provided auralisations that were less similar to the recordings than the other three. The paper states that "this was expected because drum sounds, being very wide-band transient signals, give no excuses with modelling errors.". The frequency range of a group of samples from the EBU Sound Quality Assessment database was assessed with the spectrogram plots as seen in Fig. 3.3.

Figure 3.3: Spectrogram of four audio samples

For their research on real and virtual loudspeakers (see page 13), Hiekkanen et al.

[32] used male speech and pink noise stimuli, samples of commercial rock and jazz music. The jazz sample especially, was chosen because of its wide spectrum and simultaneous sound source locations in dierent positions. Since the research focussed on the comparison between real and articial loudspeakers, it makes sense to use these music samples because the loudspeakers will most likely be used to play music. Both the jazz and noise sample have more power in the higher frequencies than the male human speech. This leads to a larger audible dierence.

(39)

Table3.10:Overviewofusedsoundsourcesignals Research(Pink)noiseHumanspeechGuitarDrumsTrumpetClarinetPopmusicJazzmusic Brinkmannetal.[17]xx(male)--- Lindauetal.[30]xx----x- Hiekkanenetal.[32]xx(male)----xx Lindauetal.[40]x-x--- Lindauetal.[43]-x(male/female)xxx--- KosankeandLindau[42]---x---- LokkiandJärveläinen[39]-x(female)xx----

(40)

3.6.2 Signal Length and Listener Memory

The length of the used sound source signal is discussed by Hiekkanen et al. [32] in their study on virtual loudspeakers. It is stated that human auditory memory is limited. Humans cannot accurately remember sound source signals longer than a few seconds [32]. No specic preferred maximum length is provided in the paper.

ITU Recommendation 1284-1 [38] also addresses the length of the signal. It is rec- ommended to use audio excerpts of no longer than 15 to 20s. For some tests, they may even be a few seconds. Table 3.11 shows the length of the used stimuli. It can be seen that most signals used do not exceed the 20s limit as stated by ITU Recommendation 1284-1.

Table 3.11: Duration of used stimuli

Research Length (s)

Lindau and Weinzierl [16] 6

Lindau et al. [40] 5

Kosanke and Lindau [42] 2.5 (+ reverb)

Lindau et al. [43] 6

Malecki et al. [37] 20

Lokki and Järveläinen [39] 10 - 20

Kronland-Martinet [48] 7

Breebaart and Schuijers [53] 9 - 30

Pieren et al. [57] 8

3.6.3 Discussion

When looking at the signals used for studies comparing auralisations with synthesized impulse responses versus auralisations made with measured impulse responses, there is a dierence in results between signals with dierent frequency ranges. Broadband signals turned out to provide stimuli that had more audible dierences between the synthesized and measured auralisations. Other samples, such as male/female human speech and various musical instruments, make it harder to distinguish the auralisations from the measurements. The trumpet stimuli used by the research of Lindau et al. [43] had an even lower detection rate than the drum stimuli. This could mean that a trumpet sample is even more suitable to make auralisation artefacts audible than the drum stimuli. Due to the limited auditory short term memory, test signals should not be longer than 15 to 20s [38].

3.7 Speech Intelligibility and Annoyance

Speech intelligibility and sound annoyance studies are two subjects for which au- ralisations are often used. The studies concerning these subjects and the use of auralisations will be discussed in this section.

3.7.1 STI

STI stands for Speech Transmission Index and can be considered as the represen- tation of the preservation of dissimilarities of speech sound [58]. This parameter is

(41)

considered a reliable measure for speech intelligibility [58]. The use of auralisation for the prediction tests of STI can have major advantages. Yang and Hodgson [59] state that auralisations make it possible to perform STI tests with humans. These tests give more realistic results than measurements or predictions of objective metrics.

Disadvantages of such tests are the required number of test subjects and the fact that these tests can only be performed in rooms that are already built. Auralisations make it possible to perform those tests easier with a relatively high number of test subjects during the design phase.

For their research, Yang and Hodgson [59] compared the results of the same STI-tests for real and virtual classrooms. 300 words of Modied Rhyme Test list (MRT [60]) were recorded in an anechoic chamber. Next, a model of the classroom where the measurements were performed was built in CATT-Acoustics v8.0. This software was thereafter used to simulate the room's BRIR and to convolved the recorded MRT words and babble noise. As mentioned earlier on page 5, the paper only focussed on the dierence between the results of the listening tests of the real and virtual classroom. The authenticity of the used auralisations was not assessed.

To compare the results of both objective and subjective speech intelligibility between measurements and CATT/ ODEON models, Hodgson et al. [61] used a MRT test.

For the 'real' situation, anechoic recordings of the used samples were played from a speech source for the test subjects in a classroom. Tests were performed with both a 'noise on' and 'noise o' condition. The room was modelled in both CATT [62] and ODEON Acoustics [13], where the predicted BRIRs were used to convolute the same MRT samples. The same MRT-test was subsequently performed with these convo- luted samples, which during the tests were presented to the test subjects through headphones. Although it is possible to measure personalised HRTF for every test subject, the same generalized HRTF (based on a KEMAR dummy head) was used for the convolution. It was expected that it would not aected the speech intelligibility signicantly. For the objective comparison, the RT , EDT , C80and Lp were assessed.

The research concludes that the prediction of acoustical parameters and modelling of the models remains dicult. The results of the speech intelligibility tests con-

rms their prediction, which was that scores from auralised tests underestimate the speech-intelligibility of low reverberant rooms.

3.7.2 Annoyance

Vorländer et al. [63] mentioned the potential of auralisation and Virtual Reality concept for new investigations of annoyance and comfort in the architectural design process. However, many studies on annoyances that make use of auralisations are focussed on trac and vehicle noise ([45] [47] [46] [52] [64]). Especially the trac noise annoyance studies do not fall within the scope of this literature review since this study focusses on indoor acoustics. The term annoyance in the context of indoor auralisation is mostly found on the answering scale of the double-blind triple-stimulus and MUSHRA tests ([29] [65]). For the research of Lokki and Pulkki [65], test subjects were asked to judge both the dierence between the reference sample and sample A and the reference sample and sample B. The answering scale reached from "very annoying" (1.0) to "imperceptible" (5.0).

Thaden [66] performed a study on the auralisation of impact sound insulation. For

(42)

their research, they performed listening tests to evaluate the annoyance of impact noise. This is a subjective measure, instead of the single number quantity found with standardized sources such as tapping machines. Since recording impact noise in real buildings is very time consuming, it would be preferable to use auralisations for these tests. The goal of this research was thus to obtain an IR that describes the path between the impact noise source in one room and the receiver in an adjacent room. The paper concludes that an auralisation system has been developed and tested with the sound of a tapping machine. For further work, more dierent noise sources (such as jumping children and walking people) have to be recorded. Listening tests were not yet performed, but were planned for the future. A study that uses interior auralisation and that is closely related to trac annoyance is that of Asakura et al. [67]. In their research, they propose an auralisation method to investigate the transmission of noise through sound insulation constructions.

3.8 Conclusion

There are four common listening test methods that are used for all types of auralisa- tion studies: double-blind triple-stimulus, Multiple Stimulus with Hidden Reference and Anchors and Three/Two Alternative Forced Choice. All but the 2AFC-method have the possibility to check the reliability of the test subjects with the use of refer- ences, which can increase the trustworthiness of the results [29].

The usability of the auralisation authenticity tests will depend on the level of dier- ence between the auralisations and the measurements. For big dierences, the direct scaling MUSHRA-test has the most potential [29]. However, the MUSHRA-test com- pares a multitude of samples to one reference sample. This situation is not common when performing listening tests on auralisations since most auralisation studies only compare measurements and auralisation samples with the same S/R-combination.

This limits the number of used stimuli per question to 2 (or 3 when using a hidden reference). For smaller dierences, the indirect listening tests 3/2AFC and AB com- parison are more suitable [17]. The chosen method will thus depend on the level of similarity of the auralisations and recordings. The similarity can be dened by the listening test results of the test subjects. Another important aspect is the level of acoustical expertise of the test subjects. Without training or experience, it can be dicult for the test subjects to judge several 'abstract' acoustical attributes such as the timbre and the spaciousness of the samples.

The preferred tests are an SDT-test and a double-blind triple-stimulus test. One of the drawbacks of using these tests may be the amount of time it takes to complete all the questionnaires. Fatigue of the test subjects may have a negative eect on the trustworthiness of the results. Bech and Zacharov [28] state that single test sessions of 20 minutes are a good length to prevent fatigue and boredom. The complete test can be longer, but in that case, it is preferable to split the test into multiple sessions with a suitable break in between. Since test duration is limited to 20 minutes, signals should be kept relatively short. This coincides with ITU Recommendation 1284-1 [38], which states that audio excerpts used for tests should be no longer than 15-20s.

When comparing the results of auralisation / measurement comparison studies that used dierent types of stimuli, it is clear that broadband signals such as drum and

(43)

especially noise stimuli make for the most audible dierences. It is questionable though, if the noise signal is a signal that has to be used for auralisation studies since a human speech or music signal is more likely to be encountered in a real situation. A drum signal may be a good compromise, as it is a naturally occurring sound with a broadband frequency range (see Fig. 3.3).

(44)
(45)

4 | Measurements

The measurements to recorded the BRIRs required for this study will be described in this section. These were performed inside a Student Sport Center Eindhoven sports hall (see Fig. 1.4). The measurement plan and the used equipment will be described in the rst two sections. Afterwards, the results of the measurements will be discussed.

4.1 Measurement plan

As mentioned earlier, ODEON Acoustics was used to create a model of the sports hall (see Section 5 for more information). This model was used to create the measurement plan. Receiver positions were set out on every meter for three lines. Line L2 is drawn parallel to the wall and formed the 0°-line. The other two lines are created by rotating L2 30° to the left and to right (see Fig. 4.2). The source and receiver positions are placed at a minimum distance of 2 m from the walls. After interpolating the resulting STI-values, it is chosen to set a measurement position for every Just Noticeable Dierence (JND), which for STI is 0.03 [68]. Although it is unlikely that all the recorded BRIRs will be used for the tests, these measurements will produce a wide selection of samples. The measurement grid can be seen in Fig. 4.2.

4.2 Equipment

This section describes the microphone and sound source that was used during the measurements.

HATS

The B&K Head And Torso Simulator (HATS) was used to measure the BRIRs.

HATS is a model of a human head and torso (see gure 1.3). This model has two microphones (left and right) placed at the locations of the ear canals. The ears, head and torso of the model ensure that the recorded sound have a comparable diraction to that caused by an average adult body [6]. The dierence in arrival time between the left and right ear is also taken into account by the placement of the two microphones at both ear canals.

Referenties

GERELATEERDE DOCUMENTEN

[r]

• You must not create a unit name that coincides with a prefix of existing (built-in or created) units or any keywords that could be used in calc expressions (such as plus, fil,

The junkshop was chosen as the first research object for multiple reasons: the junkshops would provide information about the informal waste sector in Bacolod, such as the

On my orders the United States military has begun strikes against al Qaeda terrorist training camps and military installations of the Taliban regime in Afghanistan.. §2 These

‘down’ are used, respectively, in the case of the market declines... Similar symbols with

In addition, in this document the terms used have the meaning given to them in Article 2 of the common proposal developed by all Transmission System Operators regarding

Financial analyses 1 : Quantitative analyses, in part based on output from strategic analyses, in order to assess the attractiveness of a market from a financial

Belgian customers consider Agfa to provide product-related services and besides these product-related services a range of additional service-products where the customer can choose