• No results found

Processing interaural differences in lateralization and binaural signal detection

N/A
N/A
Protected

Academic year: 2021

Share "Processing interaural differences in lateralization and binaural signal detection"

Copied!
139
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Processing interaural differences in lateralization and binaural

signal detection

Citation for published version (APA):

Goff, Le, N. (2010). Processing interaural differences in lateralization and binaural signal detection. Technische Universiteit Eindhoven. https://doi.org/10.6100/IR657014

DOI:

10.6100/IR657014

Document status and date: Published: 01/01/2010 Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)
(3)

Laboratories Eindhoven and was carried out under the auspieces of the J.F. Schouten School for User-System Interaction Research.

An electronic copy of this thesis in PDF format is available from the website of the library of the Technische Universiteit Eindhoven (http://www.tue.nl/bib).

c

2009, Nicolas Le Goff, The Netherlands

All right reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the permission of the aurthor.

Cover design: Nicolas Le Goff

Printing: Universiteitsdrukkerij Technische Universiteit Eindhoven

A catalogue record is available from the Eindhoven University of Technology Library ISBN 978-90-386-2139-5

(4)

lateralization and binaural signal detection

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de rector magnificus, prof.dr.ir. C.J. van Duijn, voor een

commissie aangewezen door het College voor Promoties in het openbaar te verdedigen op dinsdag 2 februari 2010 om 16.00 uur

door

Nicolas Le Goff

(5)

prof.Dr. A.G. Kohlrausch Copromotor:

(6)

1 Introduction . . . 1

1.1 Experimental environment . . . 1

1.2 Modeling background . . . 2

1.3 Context . . . 3

1.4 Thesis outline . . . 5

2 An introduction to ITD detection with the model proposed by Breebaart et al. (2001a) . . . 7

2.1 A short cookbook of the model . . . 7

2.2 Internal representations of stimuli carrying ITDs . . . 14

2.3 ITD detection: dependence on spectral parameters . . . 21

2.A Appendix: Theoretical ITD detection . . . 27

3 Temporal integration in ITD detection. . . 33

3.1 Introduction . . . 34

3.2 Optimal integration and dependence of thresholds on duration . . . 37

3.3 Experiments I and II: ITD detection thresholds . . . 38

3.4 Experiment III: Psychometric functions . . . 44

3.5 Temporal integration in other experimental conditions . . . 47

3.6 ITD detection predictions with the Breebaart et al. (2001a) model . . . 51

3.7 Conclusion . . . 54

3.A Appendix: Optimal processing and temporal integration . . . 58

4 Effect of diotic fringes on ITD detection . . . 65

4.1 Introduction . . . 66

4.2 Experiment I: ITD detection with 100- and 200-ms fringes . . . 68

4.3 Experiment II: ITD detection with 20-ms fringes . . . 73

4.4 Effect of fringes on binaural detection . . . 77

4.5 Modeling . . . 80

(7)

masker correlation . . . 89

5.1 Introduction . . . 90

5.2 Experiment I: Influence of interaural attenuation on binaural detection . . 93

5.3 Experiment II: Effect of interaural attenuation and reduced masker correlation on spectral integration . . . 98

5.4 Modeling . . . 101

5.5 Conclusion . . . 108

6 Conclusion . . . 111

6.1 Summary of findings . . . 111

6.2 Discussion and further work . . . 112

Bibliography . . . 117

Summuary . . . 125

Acknowledgments . . . 129

(8)

This thesis addresses several aspects of binaural hearing in humans. The ability to listen binaurally, or in other words with two ears, is a fundamental aspect of the human hearing system that provides specific information about the environment. In this research work, we will focus our experimental investigations on the perception of the lateralization of a sound source and the unmasking of a sound source.

Lateralization is a perceptual phenomenon that occurs for sound sources reproduced by headphones. It refers to the fact that a sound source is perceived “on the side”, as opposed to “in the center” of the head. Sound sources can be lateralized in two different ways. The first possibility is to introduce a level difference between the left and right ear signals, which is called Interaural Level Difference (ILD). The second possibility is to introduce a time difference between the left and right ear signals, which is called Interaural Time Difference (ITD). In a three-dimensional environment, both interaural disparities are fundamental properties of sound waves reaching the head that are used by the binaural human auditory system to extract spatial attributes, such as the spatial position (localization) or the compactness of a sound source.

Unmasking is a perceptual phenomenon that refers to the situation in which a sound source is perceived among other sources. In a real-life environment, this occurs continuously as soon as multiple sources are competing. A typical situation where unmasking is observed is a cocktail party environment, where the voice of a speaker can be perceived despite the presence of a high background noise created by many other voices.

1.1

Experimental environment

Both aspects of binaural hearing studied in this research work are investigated by conducting psychoacoustical experiments. Such investigations aim at measuring the behavior of human listeners in response to sound stimuli in very controlled conditions. Listeners have to perform tasks using sound stimuli that are very different from sounds heard in a natural environment. The purpose of such a procedure is twofold. The use of artificially created stimuli gives the possibility to the experimenter to have full control over the stimuli. In particular, one has the possibility to generate stimuli that only vary in one specific attribute and therefore allow one to study, independently from

(9)

other parameters, its influence on perception. Such stimuli are reproduced by means of headphones in order to control the signals reaching each ear. Secondly, the use of well controlled tasks gives the possibility to the experimenter to simplify the listener’s task to an extreme degree. By doing so, tasks can be designed such that the perceptual behavior of listeners is studied independently of any a priori knowledge, cognitive ability or emotional parameters.

In practice, both lateralization and unmasking situations are studied by conducting experiments where the task of the listeners is to listen to three sound excerpts and to identify which one (the target) is different from the two others (the references). The experimenter creates the target by changing or introducing the studied binaural property in the stimuli, i.e., the ITD for lateralization tasks or a binaural signal for unmasking tasks. For both tasks, the references consist of diotic signals, meaning that they do not carry binaural information. We will refer to the two tasks as ITD detection and signal detection. The goal of these experiments is to determine the smallest value of the tested parameter for which reference and target stimuli can be distinguished. These values are called perceptual thresholds.

1.2

Modeling background

Models are often part of scientific investigations. On the one hand one would like to verify if a newly observed experimental behavior can be accounted by current models. On the other hand models and the properties of their internal mechanisms can be tested by experimental investigations. Ultimately, models should incorporate all fundamental properties of the studied system in order to be able to predict, ideally, its behvior in all possible situations. Models can also be used as engineering solutions outside the laboratory. Sound perception models, and closer to our interest, binaural perceptual models, aim at predicting hearing thresholds based on the analysis of the waveforms of the binaural stimuli that are used in the experiments.

While it is not the intention of the work reported in this thesis to develop a psychoacoustic model, it seems essential to compare the experimental results with existing models. We will mainly use the binaural model proposed by Breebaart

et al. (2001a) which is a quantitative model based on numerical signal processing

computations. Its purpose is to reproduce the behavior of human listeners as far as psychoacoustic behavior is concerned. It was designed to describe the binaural listening

(10)

capabilities of an average, normal hearing, listener. The model was conceived without a priori restrictions on the type of conditions it can predict or on the type of stimuli that can be used as inputs. The model was, however, designed in a context where binaural detection conditions were the main focus and for which it has shown good abilities. The model has also shown to have a realistic sensitivity to interaural differences (Breebaart

et al., 2001b) and some of its features have been used to predict sound localization in

other investigations (Park et al., 2008).

In the context of this thesis, the model will be used for a purpose that has not yet been deeply investigated, i.e., ITD detection. As will be reported in Chapters 3 and 4, several attempts have been proposed in the literature to predict ITD detection in different conditions. These models were, however, specifically formulated for lateralization conditions and used the ITD values as inputs instead of the actual stimuli.

1.3

Context

We will focus our study on lateralization induced by means of ITDs and more particularly investigate the change in performance as a function of the duration of the stimuli carrying ITDs. Previous studies (e.g., Tobias and Zerlin, 1959) have shown that listeners are capable of accumulating information over time, such that their perceptual thresholds decrease with increasing stimulus duration. Some studies (e.g., Houtgast and Plomp, 1968) have proposed to compare this change in thresholds with theoretical expectations. The expectations were derived assuming that the information carried by the stimuli is optimally processed and that each time interval of the stimulus had equal importance. It was observed that experimental thresholds decreased less with duration than predicted by such theoretical considerations. Two main hypotheses were suggested to explain this discrepancy. As first possibility, it was considered that the time intervals near the onset had more importance than the ongoing time intervals. Such perceptual emphasis of the onsets has also been suggested in the context of the precedence effect in which early-arriving stimulus intervals contribute stronger to localization than later arriving ones (Litovsky et al., 1999). A second possibility was that the integration of the information was sub-optimal (Hafter et al., 1979). We remark that previous theoretical expectations were provided under implicit assumptions regarding the nature of the effective cue used by the listeners.

(11)

It seems therefore necessary to experimentally verify the nature of the cue used by the listeners to perform ITD detection tasks. Based on this finding one can then perform the theoretical investigation under the assumption of optimal integration and equal contribution of all stimulus time intervals. Such an investigation would give the possibility to understand the perceptual mechanisms underlying the temporal integration of stimuli carrying ITDs and tackle the issue of onset emphasis.

A second aspect of this thesis is the influence of diotic noise fringes (noise bursts that do not carry any binaural cue) on ITD detection. A few studies (e.g., Akeroyd and Bernstein, 2001) on this topic can be found in the literature, but only with experimental investigations where stimuli are shorter than 100 ms. Stimuli and fringes in these studies were relatively short because they were mainly used to test the dominance of onset information as also observed in the precedence effect (Houtgast and Aoki, 1984). These studies reported that fringes generally impaired the listener’s performance and that the presence of a forward fringe lead to a stronger impairment than that of a backward fringe. There exist, however, no experimental data which indicate whether the findings of these studies can be extended to experimental conditions with longer stimulus durations. So far it has been implicitly assumed that the onset dominance observed for short stimuli is a general property of the human hearing system without a direct experimental support. This investigation gives us the possibility to test this assumption.

Binaural unmasking has been extensively studied in the literature (e.g., Jeffress

et al., 1956; McFadden, 1968; van de Par and Kohlrausch, 1999). A particular class

of experiments which have been influential for a theoretical understanding of binaural unmasking, has analyzed the influence of the interaural correlation of the noise masker on the detection thresholds (Robinson and Jeffress, 1963; Breebaart and Kohlrausch, 2001). An interesting hypothesis related to the influence of masker correlation has been made in the literature to explain the effect of an overall interaural level difference (interaural attenuation, IA) of 10-50 dB on binaural unmasking (Breebaart et al., 2001b). It was observed that the size of the binaural masking level difference (BMLD) was gradually reduced as the IA was increased. The proposed explanation for the degrading effect of such IA gives a strong role to peripheral nonlinearities in the hearing system, which in effect will lead to a decorrelation of the internal representations of the masker for large IA values. This explanation which helped to model experimental data published on this topic so far (McFadden, 1968; Weston and Miller, 1965) has

(12)

unfortunately never been challenged by testing any of its non-intuitive predictions. We therefore address the question how the effect of IA compares to a direct reduction in the interaural correlation of the masker. This study is particularly relevant to test whether the insights suggested by a modeling approach can be experimentally verified.

1.4

Thesis outline

The next chapter of the thesis will present the background information that is necessary for understanding and using the model by Breebaart et al. (2001a) to simulate ITD detection conditions. After a short recapitulation of the design of the model, we will describe how stimuli carrying ITDs are internally represented and how this information can be used to predict thresholds. An experiment in which ITD detection thresholds are measured as a function of the stimulus frequency will be reported. It will be shown that listeners are most sensitive to ITDs for sinusoidal stimuli in a range between 500 and 900 Hz. For broadband stimuli, the detection is dominated by the sensitivity of the most sensitive critical band. This experiment will also be used as a basis to discuss the ability of the model to predict these ITD detection conditions. In particular, the influence of the considered position on the binaural display of the model on the predicted thresholds will be demonstrated.

While the perception of ITDs has been previously studied, we want to contribute to the topic by investigating how listeners are integrating ITDs carried by stimuli that have variable durations, a topic that is sparsely addressed in the literature. We will bring a theoretical basis for the influence of duration assuming an optimal integration of information over time. The experimental investigations will rely on the measurement of (a) psychometric functions, used to determine the effective cue used by listeners and (b) ITD detection thresholds as a function of the duration of the stimuli. The study will show that the effective cue is the linear power of the ITD at all durations. Furthermore our results suggests that while for stimulus durations smaller than 20 ms the temporal integration is optimal, it is certainly sub-optimal for longer durations. This study is reported in Chapter 3.

In a follow-up experiment, we investigate to which degree and how the perception of ITDs is impaired by the presence of non-informative noise fringes placed before, after or around the stimuli. Such investigations have been previously reported in the literature, but exclusively for stimuli with durations shorter than 100 ms. Our experimental results

(13)

will show that for long stimulus durations, a backward fringe has a stronger effect than a forward fringe, a result that is opposite to data from literature measured with shorter stimuli. Several model approaches from the literature will be tested and none will be able to predict these experimental results. The study is reported in Chapter 4.

In the last experimental chapter, the influence of interaural differences on binaural unmasking is addressed. The experiments study and compare the effects of interaural attenuation and reduction in masker correlation on binaural detection thresholds. The study will show a partial match between the effects of the two parameters. Results will also show unusual spectral integration patterns for these conditions. It will be seen that in these conditions, detection thresholds decrease for supracritical bandwidths. While model simulations will bring insights to explain some of the differences between the effects of the two parameters, the decrease of thresholds for supracritical bandwidths will remain unexplained. The study is reported in Chapter 5.

In the final Chapter 6, a short summary of the findings of this thesis is given. In addition some of the unexplained phenomena are discussed and proposals for follow-up experiments are given that will hopefully help to further our understanding of binaural hearing.

(14)

the model proposed by Breebaart et al.

(2001a)

The investigations reported in this thesis are strongly connected to the binaural model proposed by Breebaart et al. (2001a). This model was developed and tested with a main focus on binaural unmasking. While the current research work is also partially dealing with masking conditions, the main area of interest is the detectability of signals carrying ITDs. The ability of the model to handle these situations has been shortly mentioned in Breebaart et al. (2001b). The authors showed predictions of ITD thresholds for tones as a function of frequency, which were in good agreement with literature data. In the following chapters the ability of the model to predict ITD thresholds as a function of various external parameters will be investigated in more detail. This chapter presents

a number of relevant aspects of the model related to the simulation of ITD detection∗.

2.1

A short cookbook of the model

In this section, the reader can find a general presentation of the model as well as a description of its fundamental aspects. The presentation is not exhaustive and the reader is referred to Breebaart et al. (2001a) for a full description of the model.

The model is part of a larger software framework that is used to conduct experiments with human listeners. Conducting a simulation with the model or an experiment with human listeners is therefore very similar from the perspective of the experimenter. In both cases, the first step is to set up an experiment by defining the stimuli and the measurement procedure. As default, experiments are conducted using a two-down one-up adaptive procedure to vary an external parameter, for which the experimenter wants to determine the perceptual thresholds. In binaural unmasking experiments, this parameter would be the signal level, for the various ITD detection experiments reported in this thesis, it will be the ITD of the signal. Other procedures, like non-adaptive ∗The wording ITD detection is used throughout this thesis to describe an experimental conditions

where listeners have to distinguish a stimulus carrying an ITD from a diotic stimulus. Such a situation could alternatively be called ITD discrimination if one considers that a diotic stimulus carries an ITD that has the remarkable value of zero.

(15)

measurements, are possible, but will not be explored in the context of this chapter. In combination with the adaptive procedure, stimuli are presented according to a three-interval forced-choice paradigm. An experiment or a simulation therefore starts by presenting a set of stimuli in trials, where each trial consists of one target interval and two reference intervals. A response by the listener or the computer is given after each trial. Initially, the external parameter has a high value such that the target interval is easily distinguished from the reference intervals. Based on the response, the external parameter is adaptively adjusted. Such successive presentation of stimuli and adjustment of the external variable is repeated until the value of the external parameters converges towards the threshold value corresponding to a certain percentage of correct answers, which depends on the adaptation rule. As standard a two-down one-up rule was used: after two correct answers, the value of the external parameter was reduced, after each incorrect response, it was increased. With such a procedure, the experimental or simulated thresholds correspond to the point of 70.7% correct responses on the psychometric function (Levitt, 1971).

The structure of the model can be decomposed in three subparts as shown in Fig. 2.1. A set of three stimuli enters the model via the peripheral processor and the model provides at the output a decision variable corresponding to the identifier (number) of the interval that was chosen as the target interval.

2.1.1 Peripheral processor

The peripheral processor simulates the peripheral part of the human auditory system. This processor is similar to the implementation proposed earlier by Dau et al. (1996a). It consists of several stages that are identical for the left and right pathways. The results presented in this thesis were obtained with a middle ear filter that has a frequency response based on data from Puria et al. (1997). The basilar membrane is modeled by a third-order gammatone filterbank with bandwidth values corresponding to the equivalent rectangular bandwidth (ERB) (Glasberg and Moore, 1990). The spectral spacing is one filter per ERB. The effective processing of the inner hair cells is modeled by a half-wave rectifier followed a fifth-order low-pass filter with a -3 dB cutoff frequency of 770 Hz. The low-pass filtering gradually removes fine structure for frequencies above about 700 Hz and above 2 kHz nearly all phase information in the fine structure is gone.

(16)

Figure 2.1: Successive stages of the model designed by Breebaart et al. (2001a). The model can be decomposed in three parts, a peripheral processor, a binaural processor and a central processor.

The last stage of the peripheral processor is an adaptation stage. It comprises a chain of five adaptation loops with various time constants (Dau et al., 1996a,b). As an illustration to help understand the nonlinear behavior of this stage, its response to a rectangular pulse of 1-s duration with a level of 70 dB is shown in Fig. 2.2.

The response of the stage can be decomposed in three different parts. The first part of the response is characterized by a very large overshoot reaching about 47000 model units (MU) in this specific case. Model units are arbitrary units used to quantify the magnitude of internal representations. This transient phase results from the fact that rapid variations, as compared to the time constant of the individual loops (5, 50, 129, 253 and 500 ms), are linearly transformed. The transient phase is followed by a steady state behavior. The adaptation loops in steady-state are compressing the signals according to a nearly logarithmic input/output characteristic. As seen in Fig. 2.2, during the steady-state phase, between 600 ms and 1000 ms, the output of the stage matches the representation of the input signal as this one is plotted in dB SPL. The last part of the response is taking place when the input signal is switched off. A moderate undershoot, compared to the onset overshoot, is observed as a manifestation

(17)

0 500 1000 1500 2000 −300 −100 100 300 500 700 900 Time [ms] Output [mu]

Figure 2.2: Pulse response the adaptation stage for a rectangular pulse that had a level of 70 dB and a duration of 1 s. The solid line represents the input rectangular pulse plotted in dB SPL. The dashed line represents the output of the adaptation stage in model unit (mu). The maximum level of the response (not shown) is about 47000 mu.

of the discharge of the adaptation loops. The stage has been designed to simulate monaural and binaural masking conditions where the aim is to detect a signal within a broadband masker. For these conditions the logarithmic compression in steady-state gives detection thresholds with an approximatively constant signal-to-masker ratio, if the masker level is well above the absolute threshold (Hall and Harvey, 1984). The discharge of the adaptation loops has been designed to obtain an approximation of the forward-masking curve (Zwicker, 1984) giving the possibility to simulate the nonlinear dependence of forward masked thresholds on masker level and masker duration.

The presence of the overshoot at the onset provides an emphasis of information located near the onset of the signals as compared to the ongoing part. Such an overshoot can also occur at any other instant than the onset of the signal. Effectively, a linear transmission of a signal trough the adaptation stage will occur whenever a variation in the signal is sufficiently fast as compared to the time constants of the adaptation loops. The actual peak amplitude of such overshoot will depend on the speed of the variation and the current state of charge of the adaptation loops. Because of this property, the adaptation stage tends to emphasize fast changes in signals and can be considered as a change detector.

The actual implementation of the model incorporates an extra step (not shown) where Gaussian noise is added to each signal originating from the filterbank in order

(18)

to limit the accuracy of the model and to model an absolute threshold. The amount of noise is chosen such that the absolute threshold is about 5 dB SPL between 1 and 4 kHz.

2.1.2 Binaural processor

The binaural processor receives both right and left ear signals after they have been processed in the peripheral monaural pathways. Its purpose is to combine the left and right inputs in order to extract binaural information. It consists of an array of excitation-inhibition (EI) cells that are, each, uniquely defined by a specific internal delay-attenuation pair (α, τ ). The output of the array of EI cells is also referred to as the binaural display. The number of EI cells in the binaural display is a priori undefined. The structure of a single cell is represented in Fig. 2.3.

Figure 2.3: A scheme of an excitation-inhibition (EI) cell with a characteristic interaural delay δτ and interaural attenuation δα.

As shown, an EI cell takes both right and left input signals on which the specific internal delay (τ ) and attenuation (α) of the cell are applied. Then the difference of the two signals is computed and the result is squared to obtain the energy of the difference between the left and right signals. The result of this computation is then temporally smoothed with a double sided exponential window with a time constant of 30 ms for each side (eq. to 60 ms window length). This filtering defines the finite binaural temporal resolution of the model and is motivated by experiments revealing sluggishness in the processing of binaural signals (e.g., Kollmeier and Gilkey, 1990).

The signal is then processed (compression) following the equation:

output = a ∗ log(b ∗ input + 1), (2.1)

where a and b are two scalars used to calibrate the model that are respectively equal to 0.1 and 0.00002 in the current implementation of the model. This transformation combines two different behaviors: For low output values of the EI cell, i.e., for a high

(19)

correlation between left and right ear signals, Eq. 2.1 can be approximated, using the Taylor series of the logarithmic function at zero, by the relation:

output = a ∗ b ∗ input. (2.2)

Similarly, for high output level of the EI cell, corresponding to a low interaural correlation, Eq. 2.1 can be approximated by:

output = a ∗ log(b ∗ input), (2.3)

as the factor b ∗ input becomes much larger than unity. This relation refers to a constant Weber fraction in the discrimination between different values of the interaural correlation.

At the output of the EI cell, the signal is weighted to account for results from Batra et al. (1997) who reported that cells with larger characteristic delay are less frequent than cells with smaller characteristic delays. This result is in line with Jeffress’s observation on coincidence counter neurons, who reported that “cells are less dense away from the median plane” (Jeffress, 1948). In the model, this is simulated by increasingly attenuating the output of cells that have larger internal delays.

Internal noise is added at the output of the binaural processor to limit its accuracy. It is assumed that the rms level of this Gaussian-noise source is constant and equal to 1 MU. This internal noise is independent of time, auditory channel, and is identical for each EI cell. By adjusting the scalars a and b, the output of the EI-type elements is scaled relative to the internal noise and hence the sensitivity for binaural differences can be adjusted. In practice, in order to obtain a faster convergence, all internal representations and templates in the central processor are computed noise free. The internal noise is effectively added at a latter stage in the form of a decision noise, see next section.

2.1.3 Central processor

The task of the central processor is to take decisions. In the context of a forced-choice paradigm, the central processor has to decide, on each presentation of a set of intervals (trial), which interval is the target one. The decision is based on an analysis of the internal representations of each interval of the forced-choice procedure. The decision mechanism comprises a template matching procedure and was designed as an optimal detector. The use of a template matching procedure has been used prior to the design

(20)

of this model by, for instance, the model proposed by Dau et al. (1996a) though in a slightly different implementation. In particular, in the model proposed by Breebaart

et al.(2001a), the term “template” is used for both average internal representations of

the reference and the target intervals of the force-choice procedure. These two templates are updated throughout the entire adaptive tracking of a threshold.

The template matching mechanism works as follows. The central processor constructs (stores in memory) two templates based on the internal representation of the reference and the target intervals. A reference interval can always be distinguished from the target interval based on the feedback (correct or incorrect) on its decision from the software used to conduct the experiment. Such a feedback is also provided to human listeners in the experiments reported in this thesis. As the adaptive procedure progresses the central processor updates, by averaging, the templates after each presentation of a set of reference and target intervals. On each decision, the identification of the target interval is achieved by measuring a distance (difference) between each of the three presented intervals and the template of the reference interval. The computation of the distance sums up information in the internal representation in the temporal and the spectral (auditory channels) domain. The distance is weighted by a function (the weight) that aims at emphasizing the parts of the internal representation that are reliable indicators of physical differences between reference and target intervals. The weight is defined as the ratio between the difference of the templates of target and reference intervals and the variance of the inputs of the detector. This weighting enhances the detection process under certain assumptions, markedly the independence of internal noise in time and frequency (see Appendix 3.A). Ultimately, the chosen interval is the one for which the computed distance with the reference template is the largest. The limitation of the detection is realized by introducing a certain amount of decision noise during the computation of the distances. This decision noise is a technical shortcut that is computed assuming that the internal noise was added in the internal representations in the binaural processor (see p. 98 in Breebaart, 2001). When the computed distances become similarly small as the added decision noise, the detector starts to make mistakes, hence the limitation in performance. The detector is qualified as “optimal” due to the fact that the weight function is inversely proportional to the variance of the inputs of the detector and assuming that the internal noise is independent across observations. See Appendix 3.A for a demonstration of this property in the context of multiple observations in the time domain.

(21)

2.2

Internal representations of stimuli carrying ITDs

As previously mentioned, the model proposed by Breebaart et al. (2001a) has not been specifically intended to simulate conditions in which stimuli carry ITDs. Given the role of ITD detection in the present thesis, it seems thus necessary to understand how these stimuli are represented in the model and how these internal representations can be used for simulations.

Because our interest is to study conditions in which lateralization of stimuli is achieved by the presence of ITDs, the left and right ear signals of the stimuli have the same intensity. Consequently we restrict our view on the binaural display, which has by nature two dimensions (α,τ ) to one dimension, the internal delay axis or τ -axis. The analysis is first introduced for deterministic signals and generalized for stochastic signals in section 2.2.2. Furthermore, in the following figures internal representations are shown for an instant in time where all adaptation stages have reached steady-state (190 ms after the onset of a 400-ms stimulus). Issues related to temporal variations will be discussed in section 2.2.3.

2.2.1 Sinusoidal signals

For the purpose of the study and the clarity of the illustration, we look at the properties of the internal representations of sinusoidal stimuli that have a frequency of 838 Hz and a level of 70 dB SPL. Such a frequency was chosen because, as it will be shown later,

it lies in the spectral range where the model is most sensitive to ITDs. †

Figure 2.4 shows two internal representations along the τ -axis. For the clarity of the illustration, no internal noise was used to obtain the figures presented in this chapter, unless specified otherwise. The continuous line shows the internal representation of a diotic stimulus (zero ITD) which corresponds to the internal representation of the stimulus in the reference intervals of an ITD detection experiment. The internal representation is perfectly symmetric around the center of the τ -axis, where the activity reaches 0, illustrating the perfect cancellation of the left and right ear signals. Moving further away from the center of the τ -axis, the pattern has a pseudo-periodical shape with a period corresponding to that of the sinusoidal stimulus, 1.19 ms in this case. Due to the weighting along the τ -axis resulting from the last stage of the EI-cell, the †This particular frequency was chosen as it is the center frequency of the auditory filter number

(22)

repetitions of the pattern are attenuated for positions that are further away from the center of the τ -axis. The relevant parts of the internal representations are therefore located in the first two periods, on each side of the center of the τ -axis. In addition to the weighting in the binaural processor, the nonlinearities of the peripheral processing and the processing in the binaural processor act such that the actual shape of the internal representation, even in the first period of the pattern shown in Fig. 2.4, is not perfectly sinusoidal. −1.0 −0.66 −0.330 0 0.33 0.66 1.0 0.02 0.04 0.06 0.08 0.1 Internal delay [ms]

Model output [mu]

Figure 2.4: Internal representations of stimuli consisting of a 838-Hz sinusoid at a level of 70 dB SPL without incorporation of internal noise. The continuous line corresponds to a diotic stimulus (zero ITD, reference interval) and the dashed line represents the internal representation of the same stimulus with an ITD of 166 µs (target interval). Internal representations are shown along the τ -axis.

The dashed line in Fig. 2.4 shows the internal representation of a sinusoidal stimulus with an ITD of 166 µs. This particular value was chosen for the clarity of the illustration. The overall properties (amplitude, symmetry, pseudo-periodicity...) of this internal representations are identical to that of the diotic version of the stimulus, but some differences are also observed. In particular the minimum of the pattern has shifted to an internal delay corresponding to the stimulus ITD, 166 µs in this case. One can also see that the height of the lobes in the first period of the pattern is now asymmetric.

The ability of the central central processor to distinguish between the two patterns shown in Fig. 2.4 can be illustrated by computing the difference between the two patterns. Figure 2.5 shows the absolute value of this difference along the τ -axis.

The difference of model activity appears to vary greatly as a function of the internal delay. In particular, we observe that the difference reaches a maximum for two positions

(23)

−1.0 −0.66 −0.330 0 0.33 0.66 1.0 0.02

0.04 0.06

Internal delay [ms]

Difference of model output [mu]

Figure 2.5: Difference between the internal representations of the target and reference intervals shown in Fig. 2.4.

away from the center of the τ -axis (around -50 µs and 200 µs). At these positions, the model would have the maximum sensitivity, which is clearly higher than that at the position τ =0. The questions of the considered positions and the combination of information from multiple positions along the internal delay axis are discussed in section 2.2.3.

2.2.2 Stochastic signals

In this section, the analysis on the internal representation of sinusoidal stimuli carrying ITDs of the previous section is extended to stochastic stimuli. For this analysis thousand independent noise stimuli were generated and analyzed, as previously, in the auditory channel centered on 838 Hz. Stimuli consisted of 2.9-kHz-wide noise bands that were spectrally centered on 1550 Hz and had a level of 70 dB SPL. The mean and the standard deviation of the internal representations of stimuli with zero ITD (continuous line) and stimuli carrying a 166-µs ITD (dashed line) are shown in Fig. 2.6 as a function of the internal delay. The dash-dotted lines around the internal representations represent the standard deviation computed across the thousand independent stimuli.

These mean internal representations are remarkably similar to those computed for sinusoidal stimuli shown in Fig. 2.4. Computing equivalent internal representations for noise stimuli centered at 838 Hz, but with smaller bandwidths, will lead to similar average internal representations, but with greater standard deviations. In general the average characteristics, in each auditory band, of the internal representations of stimuli

(24)

−1.0 −0.66 −0.330 0 0.33 0.66 1.0 0.02 0.04 0.06 0.08 0.1 Internal delay [ms]

Model output [mu]

Figure 2.6: Average internal representations of broadband noise stimuli at the output of the auditory channel centered at 838 Hz without the incorporation of internal noise. The continuous line represents the internal representation of a diotic stimulus (zero ITD, reference interval) and the dashed line represents the internal representation the same stimulus with a 166-µs ITD (target interval). Internal representations are shown along the τ -axis. The dot-dashed lines around the internal representation represent the standard deviation.

centered in or filling an auditory band depend mainly on the characteristics of the band itself.

The standard deviations for both internal representations at a given internal delay away from the symmetrical point of the patterns are similar. The variability in the internal representations increases with amplitude. In particular, there is no variability in the internal representations of each stimulus at the internal delay corresponding to the external ITD (0 and 166 µs in the present case). Because of the variation in the internal representations of stochastic stimuli, the strict difference between internal representations of target and reference interval is not sufficient to evaluate the ability of the model to distinguish between target and reference stimuli. We therefore evaluate the ability of the model to detect stimuli carrying ITD from diotic ones by computing

the sensitivity index, or d0, that accounts for the influence of the variability of the

internal representations.

The sensitivity index d0 was therefore computed as a function of internal delays for

the stimuli previously used in this section, with the exception that the target stimuli carried a 30-µs ITD, a value that is closer to the ITD threshold for the considered

400-ms broadband stimuli. The sensitivity index d0 is shown in Fig. 2.7 assuming an

(25)

or double (dash-dotted line) than that of the maximum of the variance of the internal representations of the stimuli along the internal delay axis. For such broadband stimuli, the internal noise is expected to be the limiting factor for the detection process, and the

dash-dotted line is the most realistic representation of d0 for the considered situation.

−1.0 −0.66 −0.330 0 0.33 0.66 1.0 1 2 3 4 5 6 Internal delay [ms] d’

Figure 2.7: d0 calculated for 1000 internal representations of target and reference intervals for

broadband noise stimuli assuming an internal noise that has a variance that is half (dashed line), double (dot-dashed line) and equivalent (continuous line) to the maximum of the variance of the internal representations of the stimuli along the internal delay axis. The computation is done for internal representations at the output of the auditory channel centered at 838 Hz. Target stimuli carried 30-µs ITD.

For the three amounts of internal noise, the dependence of the d0 index shows similar

properties as the difference in internal representations computed for sinusoidal signals and shown in Fig. 2.5. Likewise two positions (about -70 µs and 90 µs ) along the τ -axis are found to provide the largest sensitivity to the model. For such a near-threshold

value of the ITD (30 µs), the two positions provide a very similar d0 measure and thus

a similar sensitivity to the model. As the internal noise is assumed to be independent and equal in all EI cells, a variation in internal noise only leads to an equal scaling of

the d0 values for all internal delays and will not affect the positions for which the d0

index is maximum.

2.2.3 Discussion

While in theory the binaural display does comprise the outputs of the binaural processor at an unlimited amount of EI cells, one can only deal with a finite number of cells.

(26)

Historically, for the simulation of binaural masking conditions such as NoSπ, the model has been used by analyzing only one EI cell. Such a solution is highly practical from a computational point of view, as it requires as little computational power as possible. Breebaart et al. (2001a) have shown that it is reasonable to look at the position for which one observes a minimum of activity in the internal representation of the stimuli in the reference intervals of a multiple forced-choice procedure. They have also explained that considering only one EI cell in the binaural display to perform a simulation is equivalent to assuming that listeners focus all their attention to one particular position in the auditory space.

The patterns shown in Figs. 2.4 and 2.6 also indicate that such a position is easy to find for ITD detection conditions. In this case, the minimum activity in the pattern of the reference interval is at the position α = τ = 0. While this central position could be the most suitable for constructing a stable representation of the reference intervals, one could alternatively consider that, in order to detect an event that is lateralized, human listeners focus their attention at the position where they expect the phenomenon to occur, i.e., on the side. Such a perceptual assumption translates into the fact that one should rather consider a position away from the central position of the binaural display in order to perform an ITD detection task. Such a hypothesis is actually in line with the

representations of differences in model activity and d0 as a function of internal delays

shown in Figs. 2.5 and 2.7. In both figures, the curves show that the ability of the model to distinguish the target from the reference internal representations heavily depends on the internal delay and that the sensitivity of the model is highest for positions away from the center.

The amount of EI cells used to conduct simulations is chosen by the experimenter. One possibility is to combine the information from a large numbers of cells. While elegant, such a solution would bring issues such as, how to combine and to treat the information from several cells. The question of feasibility of such a computation due to the increase in computational complexity is also a concern. For these reasons and also to follow the previous use of the model, we decided that model simulations will be performed using a single EI cell. This strategy leads to another question which is: which position to choose?

The previous analyses show that while two particular positions along the τ -axis provide the highest sensitivity, the EI cell centered on the binaural display can also be used to perform simulations, though with a reduced sensitivity. In practice, using

(27)

one of the two optimal positions can be a complicated matter. As previously shown, the characteristics of the internal representations depend on the center frequency of the

considered auditory filter and the external ITD. Consequently the d0index and therefore

the positions of optimal EI cells also vary with these two parameters. Therefore, in order to simulate an ITD detection experiment with one optimal EI cell, i.e., to complete an adaptive tracking of the ITD, one would have to adapt the position of this cell for each auditory channel on each adjustment of the ITD carried by the stimulus. Assuming that listeners also have to select the coincidence cell that provides them the largest sensitivity for performing a task, such a process would come in a natural manner. One can imagine that as the outputs of all coincidence cells are simultaneously available, listeners have to “listen” to the cell for which the task is the easiest and ignore the other cells. In the model, the equivalent processing would be to compute the output of all EI cells for each auditory channel, and then compute which EI cell would provide

the highest d0. Furthermore the previous analyses on the position of the optimal EI cells

were conducted for internal representations taken at a particular moment in time, where the model has reached a stationary state. However, due to the nonlinear peripheral processing, and in particular the adaptation stage, the position of an optimal EI cell varies as a function of time. In summary, the position of an optimal EI cell depends, to various degrees, on both external (stimulus ITD, level...) and internal parameters (auditory filter, peripheral processing) of the model. Such dependences require that the optimal positions must be determined on each presentation of the stimuli, by actually computing the output of many EI cells, which is in contradiction with the original goal of using only one EI cell.

Consequently, in the context of this thesis, model simulations of ITD detection experiments will be conducted using EI cells with, mostly, two different positions. The first position is in the center on the binaural display. For a stimulus with a non-zero ITD, the model activity will increase at this central position. This increase of model activity can be seen as a measurement of the decrease in interaural correlation resulting from the external ITD. Such an approach was previously used by Bernstein and Trahiotis (1996) to successfully predict ITD thresholds carried by high-frequency, amplitude modulated sinusoids.

The second position that will be considered, is a position away from the center of the binaural display. It optimizes the detection process and therefore gives a higher sensitivity to the model than the central position. We will refer to this position as

(28)

the optimal position even though such position does not correspond, strictly speaking,

to a maximum of the d0 function on the internal delay axis. The use of such optimal

positions has also been previously suggested by Colburn et al. (2004). These authors

suggested that the optimal position were located at approximatively ±45◦ of all internal

delays. This angle corresponds to about ±150 µs for the channel providing the highest sensitivity to ITD, which is centered at 838 Hz. While the argument by Colburn et al. (2004) was based on a linear autocorrelation analysis, the position of such a cell on our highly nonlinear model is, at threshold, for this specific channel, in steady-state and for a stimulus with a level of 70 dB around -113 and 160 µs. In the rest of the thesis, the position of the optimal EI cells will therefore be empirically chosen as the position found to provide the best sensitivity to the model, in the most sensitive auditory channel, for ITDs around thresholds value and given the spectral and temporal characteristics of the considered stimulus.

2.3

ITD detection: dependence on spectral parameters

In this section we report experimental and simulated ITD thresholds as a function of the center frequency and bandwidth of the stimuli. Such data are known from literature (Klumpp and Eady, 1956; Zwislocki and Feldman, 1956) and are mainly used in this thesis to discuss the model’s ability to predict ITD thresholds and in particular, to illustrate the influence of the position of the EI cell along the τ -axis on the predicted thresholds.

2.3.1 Experiment

Method

ITD thresholds were determined as a function of stimulus center frequency. The experiment was controlled by a computer program running in the software environment Matlab. The computer used to run the experiment was equipped with an RME DIGI96 sound card that was connected to a TDT HB7 headphone driver. The stimuli were reproduced with Beyerdynamic DT 990 headphones that had been calibrated on a Bruel & Kjær artificial ear.

The stimuli were presented in a non-adaptive three-interval three-alternative forced-choice adaptive task. The ITD was varied adaptively using a two-down one-up

(29)

procedure in order to estimate the 70.7% correct value (Levitt, 1971). ITD values were increased or decreased by using specific factors. The initial step size of the adaptive track corresponded to a factor of 1.584 and was reduced to a factor of 1.122 after two reversals. A run was terminated after 10 reversals and thresholds were defined as the average ITD across the last 8 reversals.

The duration of the pause between intervals was 200 ms. The stimuli carrying the ITD to be detected were presented in one of the three intervals with equal a priori probability and the lateralization was always on the same side. The stimuli in the other two intervals had zero ITD. Feedback was provided immediately after the subject responded.

Four young male adults (including the author of this thesis) who were experienced in psychoacoustic experiments and who had no evidence or history of hearing loss, served as listeners. After a short practice session, the 12 conditions were presented in random order. The measurements were repeated three times for each subject.

Stimuli

All stimuli were digitally generated with a sampling rate of 44.1 kHz. The stimuli consisted of either bursts of narrowband noise or sinusoids. The duration of the stimuli was 400 ms. The frequency of the sinusoids was either 213, 538, 727, 962 or 1256 Hz. Narrowband noise stimuli were 1-ERB wide and spectrally centered on the same frequencies as the sinusoidal stimuli. In addition thresholds were measured for 1-ERB-wide noise bursts centered at 1256 Hz, for which the bandwidth had been extended towards the low frequencies down to 633 Hz, resulting in a 622-Hz-wide stimulus spectrally centered at 944 Hz. The sinusoidal stimuli were presented at a level of 65 dB SPL. Noise stimuli were presented at a constant spectral level of 45 dB/Hz.

Prior to each adaptive run, a 1-s buffer of either sinusoid or bandlimited noise was generated. The noise buffers were created as a white Gaussian noise in the time domain that were filtered to the desired bandwidth in the frequency domain. For each trial a portion of the buffer was randomly chosen for each interval. The beginning and the ending of each interval were shaped with 10-ms Hanning onset and offset ramps. The ITDs carried by the target intervals were ongoing ITDs that were obtained by a phase shift in the frequency domain.

(30)

Results

Results for both types of stimuli are shown in Fig. 2.8. Squares represent thresholds measured for noise stimuli and circles represent thresholds measured for sinusoidal stimuli. These data represent the mean of the ITD thresholds obtained across the four listeners and three repetitions. The error bars represent the standard error of the mean across the twelve values. ITD thresholds obtained for tones are always higher

213 538 727 962 1256 20 30 50 70 100 150 230 Center Frequency [Hz] Threshold ITD [ µ s]

Figure 2.8: Average ITD thresholds as a function of frequency. Squares represent thresholds measured for 1-ERB-wide noise stimuli with the exception of the isolated square on the right side which was obtained for the originally 1-ERB-wide noise centered at 1256 Hz for which the bandwidth was extended down to 633 Hz. Circles symbols represent thresholds measured for sinusoidal stimuli. Error bars represent standard errors of the mean.

than those obtained for narrowband noises centered at the same frequency. For both stimulus types, the variation of the thresholds as a function of the center frequency is, however, the same. Thresholds are lowest in a frequency range between about 500 Hz and 900 Hz. For frequencies outside this range thresholds increase. The increase is stronger for frequencies above 900 Hz than for frequencies below 500 Hz. Such data are in line with previous findings reported in the literature (Klumpp and Eady, 1956; Zwislocki and Feldman, 1956).

The isolated square on the right side was obtained for the originally 1-ERB-wide noise centered at 1256 Hz for which the bandwidth was extended down to 633 Hz. As a consequence of this bandwidth extension, the threshold decreases to the level of conditions obtained for 1-ERB-wide noise stimuli centered at lower frequencies. Based on this observation, we derive the hypothesis that listeners are performing ITD

(31)

detection with broadband signals by using the frequency range of the signals for which the task is easiest regardless of the actual bandwidth of the signal.

In order to test this hypothesis, three additional stimuli were tested in the same experimental conditions as before. These stimuli were noise bursts centered at 727 Hz, which is in the range that provides the highest sensitivity. The first stimulus had a bandwidth of 1 ERB (a repetition of the first part of the experiment), the second stimulus had a bandwidth of 5 ERB and the third stimulus was 2.9-kHz-wide spectrally centered at 1550 Hz. The results are shown in Fig. 2.9. Only a minor difference

1 ERB 5 ERB Broadband

10 20 30 50 Threshold ITD [ µ s]

Figure 2.9: Average ITD thresholds for 1-ERB and 5-ERB-wide noise stimuli centered at 727 Hz as well as 2.9-kHz-wide noise spectrally centered at 1550 Hz. Error bars represent standard errors of the mean.

is found between the three thresholds. Three pairwise, two-tailed t-tests at the 5% significant level show that the three average thresholds are not significantly different from each other. Therefore extending the spectral range of the stimulus beyond the most sensitive spectral range had a minor effect on the thresholds for this specific task. This observation supports the idea that listeners only use the spectral range of the stimuli for which the task is easiest.

2.3.2 Simulation

In this section the ability of the model to replicate the experimental ITD thresholds as a function of spectral parameters is evaluated.

The sinusoidal and noise stimuli used for the simulations had a fixed overall level of 65 dB SPL. Simulations were conducted for more center frequencies than those used

(32)

in the experiment in order to obtain a better spectral resolution of the behavior of the model. The stimuli were otherwise identical to those used in the experiments. Simulations were conducted using 9 1-ERB-spaced auditory channels centered on frequencies between 125 and 2000 Hz. Similarly to the experiment, simulations were conducted using a two-down one-up adaptive tracking of the ITD and stimuli were presented using a 3-IFC paradigm. The results are shown in Fig. 2.10.

125 250 500 1000 2000 10 20 30 50 100 200 Center Frequency [Hz] Threshold ITD [ µ s] A 125 250 500 1000 2000 10 20 30 50 100 200 Center Frequency [Hz] Threshold ITD [ µ s] B

Figure 2.10: ITD thresholds as a function of the center frequency of the stimuli. Panel A shows simulated thresholds for noise (filled squares) and sinusoidal (filled circles) stimuli obtained by using the central EI cell (top) and the optimal EI cell (bottom). The dashed line denotes a constant interaural phase difference of 0.05 rad. Panel B shows simulated thresholds for the sinusoidal stimuli (filled circles) for both central (top) and optimal (bottom) EI cell and the experimental thresholds measured with sinusoidal stimuli replotted from Fig. 2.8 (open circles). ITD thresholds measured using 65-dB SPL sinusoids from Klumpp and Eady (1956) and Zwislocki and Feldman (1956) are shown by diamonds and left-pointing triangles, respectively.

Panel A shows simulated thresholds for both noise (squares) and sinusoidal (circles) stimuli measured either by using the central EI cell (top curves) or optimal EI cells (bottom curves). The optimal EI cells are chosen to optimize the detection in each auditory channel. Among the two possibilities, we arbitrarily chose the EI cells with negative position on the internal delay axis. Optimal positions are located at positions -390, -230, -140 and -120 µs for channels centered at 125, 250, 500, 750 Hz and above, respectively. Thresholds measured for the central EI cell are virtually identical for both types of stimuli, which is in line with the analysis on the internal representations, where it was discussed that the characteristics of the internal representations were mainly depending on the center frequency of the auditory band. For both types of EI cells, the variation of ITD thresholds as a function of the center frequency of the

(33)

stimulus is qualitatively the same. The lowest thresholds are found in the range 700 to 1100 Hz, slightly above the range 500 to 900 Hz which gave the lowest thresholds in the experiment. For center frequencies below about 500 Hz, ITD thresholds can be approximately characterized by a constant phase sensitivity, as shown by the dashed line, which corresponds to a constant phase difference of 0.05 rad. For center frequencies above 900 Hz ITD thresholds increase as a result of the loss of fine structure in the inner haircell stage.

Panel B shows that the simulated thresholds are in the same order of magnitude as our experimental data and thresholds from Klumpp and Eady (1956) (diamonds) and Zwislocki and Feldman (1956) (left-pointing triangles). The increase of the simulated thresholds for center frequencies larger than 900 Hz is clearly lower than that of the experimental data.

These results show that the model proposed by Breebaart et al. (2001a) is, to a large extent, capable of predicting ITD thresholds as a function of both stimulus bandwidth and center frequency. The loss of sensitivity towards the high frequency is, however, underestimated. The model has a maximum sensitivity for frequencies between 500 and 900 Hz, a frequency range that will therefore be used to investigate the dependence of ITD thresholds on other parameters, such as the duration of the stimuli, as presented in the next chapter. The results also show that the dependence of the thresholds on the position of the EI cell is clearly noticeable and should therefore be taken into account when designing a simulation.

(34)

2.A

Appendix: Theoretical ITD detection

The purpose of this appendix is to derive a theoretical formulation of the sensitivity

index d0 in the context of ITD detection experiments. This index represents the ability

of the model to distinguish two noise stimuli, among which one carries an ITD. This

derivation of d0 is done under a number of simplifications and assumptions. The

peripheral processing is restricted to the filtering of the auditory filterbank and a half-way rectification occurring in the haircell stage. All processes are assumed to be stationary.

2.A.1 Noise autocorrelation

First, we define the autocorrelation function and variance of the binaural signals composing a stimulus carrying an ITD. A noise signal x(t) is used to create a stimulus carrying an ITD consisting of the binaural signal l(t), r(t) according to:

l(t) = x(t), (2.4)

r(t) = x(t − τe), (2.5)

with τe an external interaural delay (ITD).

For a given duration T , the autocorrelation functions Rll and Rrr of the signals l(t)

and r(t) are estimated according to: ˆ Rll(τe) = Z T 0 l(t)l(t − τ)dt, (2.6) ˆ Rrr(τe) = Z T 0 r(t)r(t − τ)dt. (2.7)

The spectral power density of the noise signal x(t) is denoted by ω(f ). Consequently,

assuming a stationary noise process, the expected values for τe = 0 of ˆRll and ˆRrr are

equal to the expected value of ˆRxx(0):

ˆ

Rxx(0) = T

Z ∞

0 ω(f )df. (2.8)

The variance of Rxx(0), given by σR2xx(0) is defined as:

σ2

Rxx(0) = T2

Z ∞

0 ω

(35)

2.A.2 Binaural processor

In a second step, the expected output of the binaural processor and its variance are computed.

Expected value of the output

The binaural processor computes the energy of the difference between l(t) and r(t),

given by ˆD(τi, τe): ˆ D(τi, τe) = Z T 0 (l(t) − r(t − τi − τe)) 2 dt, (2.10)

with τi an internal delay, and τe an external delay (ITD) parameter. The expected

value of ˆD(τi, τe) is, assuming stationary noise processes, given by:

ˆ

D(τi, τe) = Rxx(0) + Rxx(0) − 2Rxx(τi+ τe), (2.11)

which can also be written as ˆ

D(τi, τe) = Rxx(0) (2 − 2ρxx(τi+ τe)) , (2.12)

with ρxx(τ ) the normalized auto-correlation function of x(t). For narrow-band noise

and relatively small delays τ , the value of ρxx(τ ) can be approximated by:

ˆ

ρxx(τ ) = cos(2πfcτ ), (2.13)

with fc the center frequency of the narrow-band noise. Hence, given small delays and

narrow-band noises, the value for ˆD(τi, τe) can be approximated by:

ˆ

D(τi, τe) = Rxx(0) (2 − 2 cos(2πfc(τi+ τe))) . (2.14)

From Eq. 2.14, it can be observed that the output ˆD(τi, τe) equals zero if the external

delay τe is opposite equal to the internal delay τi. Such an observation is qualitatively

in line with the analysis of the internal representation based on model insights, where it was shown that the internal representation reached zero activity for an internal delay equal to the ITD carried by the stimuli (point of symmetry along the τ -axis).

Variance of the output

In Eq. 2.14, the expected output under the assumption of narrow-band signals and short delays of the binaural processor is given. To estimate the variance in this output,

(36)

σ2

D, it is assumed that the term Rxx(0) is responsible for the majority of variance in

ˆ

D(τi, τe). As a result, the estimate of the variance ˆσ2D is given by:

ˆ

σD(τ2 i,τe)= σ

2

Rxx(0)(2 − 2 cos(2πfc(τi+ τe))) . (2.15)

2.A.3 Detectability index for a noise with an external ITD

In an ITD detection task, a noise with an ITD (τe 6= 0) has to be distinguished from

a reference noise without ITD (i.e., τe = 0). The detection is assumed to occur at

the level of the binaural processor, i.e., based on ˆD(τi, τe). The reference noise has a

binaural processor output D1(τi, 0), while the target noise (with an external ITD) has a

binaural processor output D2(τi, τe). As described in the previous section, the binaural

processor output D(τi, τe) is a stochastic variable with a mean and standard deviation,

which depend on the characteristics of the input stimuli.

In the following, the estimate for Rxx(0) is given under the assumption that only

half of the stimulus is available due to half-wave rectification in the auditory periphery,

and assuming a gammatone auditory filter with frequency response γ(f, fc) centered

at fc: Rxx(0) = T 2 Z fc+b/2 fc−b/2 ω(f )|γ(f, fc)| 2df, (2.16) with γ(f, fc) = 1 1 + j(f − fc)/bERB !n . (2.17)

Here, b denotes the bandwidth of the stimulus, n is the order of the gammatone filter

(n=3), and bERB is the 3-dB bandwidth of the gammatone filter which was set to the

Equivalent Rectangular Bandwidth (ERB) resulting in:

bERB =

24.7(0.00437fc+ 1)

2√21/n− 1 . (2.18)

Similarly, the variance of Rxx(0), given by σ2Rxx(0), is obtained by:

σR2xx(0) = T 2 2Z fc+b/2 fc−b/2 ω 2 (f )|γ(f, fc)|4df. (2.19)

The expected value for the output ˆD(τi, τe) and its variance σD(τ2ˆ i,τe) can now be

estimated using Eqs. 2.16, 2.19 in combination with Eqs. 2.14, 2.15: ˆ D(τi, τe) = (2 − 2 cos(2πfc(τi+ τe))) T 2 Z fc+b/2 fc−b/2 ω(f )|γ(f, fc)| 2df, (2.20)

(37)

σD(τi,τ2ˆ e) = (2 − 2 cos(2πfc(τi+ τe))) T 2 2Z fc+b/2 fc−b/2 ω 2 (f )|γ(f, fc)|4df. (2.21)

The expected value and variance at the output of the binaural processor (Eqs. 2.20

and 2.21) can be used independently for a reference stimulus (τe = 0) and a target

stimulus with an ITD (τe6= 0). In particular, they can be used to derive a formulation

of the sensitivity index d0 as a function of τ

i and τe, which is then given by:

d0 = D(τˆ i, τe) − ˆD(τi, 0) q ˆ σD(τi,τe)σˆD(τi,0) . (2.22)

If the binaural processor output is also subject to internal noise with a variance σ2

i, the detectability index is given by:

d0 = D(τˆ i, τe) − ˆD(τi, 0) q ˆ σD(τi,τe)σˆD(τi,0)+ σ 2 i . (2.23)

A graphical representation of d0 as defined in Eq. 2.23 is shown by the dashed-line

curve plotted on the scale shown on the right side of Fig. 2.11. The values of d0 have

been computed considering the auditory channel centered at 838 Hz and assuming a 30-µs ITD. −1.0 −0.66 −0.330 0 0.33 0.66 1.0 1 2 3 4 5 Model d’ −1.0 −0.66 −0.33 0 0.33 0.66 1.00 1 2 3 4 5 Internal delay [ms] Theoretical d’

Figure 2.11: d0 index shown for a theoretical computation representing Eq. 2.23 (dashed line) and

computed using internal representation of the model (continuous line).

On the scale shown on the left side of the figure, one can also see a re-plot of the d0

index (continuous line) resulting from the analysis of internal representations done in section 2.2.2. The amount of internal noise (half of the maximum of the variance of the

(38)

internal representation of the stimuli along the internal delay axis.) was chosen such that

the maximum of the two curves reach a similar level. This derivation of d0 as a function

of internal delays was also done for a 30-µs ITD on the auditory channel centered at

838 Hz. It is interesting to note that the theoretically derived d0 index presents some of

the characteristics found in the model-based analysis. In particular the periodicity of the index, and two optimal positions are among the common characteristics. It should, however, be noticed that, while the illustration was chosen such that it shows a good agreement in the optimal positions found for both analyses, deriving optimal positions in the full model for the general case based on such theoretical analysis is not possible due to the simplifications involved in the computation.

(39)
(40)

In this chapter we characterize the dependence of ITD detection thresholds on the stimulus duration. Previous studies reported that thresholds vary less with duration than one would expect under the assumptions of optimal integration and equal contribution of all stimulus intervals. We show that optimal integration does not refer to a particular rate of decrease of the thresholds with the stimulus durations, but to a specific relationship between the rate of decrease of the thresholds with the duration and the

effective cue to which the sensitivity index d0 is proportional. Consequently

we conduct several experiments to measure both the dependence of the

ITD thresholds on the stimulus duration, and the dependence of d0 on

the ITD. Results show that for stimulus durations of 10 and 20 ms, temporal integration of the ITD can be explained by assuming both optimal integration of information and equal weights of all stimulus increments. Results also show that for longer durations, temporal integration is most likely not optimal and that the presence of a potential onset emphasis is not a critical aspect to explain the temporal integration process. The binaural model proposed by Breebaart et al. (2001a) is able to predict ITD thresholds as a function of the stimulus duration if the onset emphasis provided by the adaptation stage is reduced as compared to its original design.

Referenties

GERELATEERDE DOCUMENTEN

o Multi-channel Wiener filter (but also e.g. Transfer Function GSC) speech cues are preserved noise cues may be distorted. • Preservation of binaural

– Binaural cues, in addition to spectral and temporal cues, play an important role in binaural noise reduction and sound localization. (important to preserve

BINAURAL MULTI-CHANNEL WIENER FILTERING The multi-channel Wiener filter (MWF) produces a minimum mean- square error (MMSE) estimate of the speech component in one of the

Noise power and speech distortion performance In order to analyse the impact of the weighting factor μ on the NR criterion and on the ANC criterion, the SD at the ear canal

Therefore for given resource constraints (total number of non-zero equalizer taps and total transmit power), an efficient algorithm to allocate the resources over all the tones

Performance on signal recovery of the ℓ1 minimization black dotted-dashed line [1], the iteratively reweighted ℓ1 minimization blue dotted line [16], the iteratively reweighted

The main purpose of this paper is to investigate whether we can correctly recover jointly sparse vectors by combining multiple sets of measurements, when the compressive

Furthermore, the noise reduction perfor- mance of the binaural multi-channel Wiener filtering algorithm is similar to that of the binaural adaptive algorithm with a cut-off fre-