Combining noise reduction and binaural cue preservation for hearing aids:MWF-ITF

(1)

Laboratory for Experimental ORL K.U.Leuven, Belgium Dept. of Electrotechn. Eng.

ESAT/SISTA

K.U.Leuven, Belgium

Combining noise reduction and binaural cue preservation for hearing aids:

MWF-ITF

Multichannel Wiener Filter with Interaural Transfer Function

Van den Bogaert T., Doclo S., Moonen M. and Wouters J.

ICASSP 2007

download at https://gilbert.med.kuleuven.be/~u0041407/2007ICASSP.pdf Contact: Tim.vandenbogaert@med.kuleuven.ac.be

(2)

Overview

• Problem statement: binaural hearing aids, noise reduction and preservance of binaural cues

• Multichannel Wiener Filter approaches:

– MWF: a standard N-microphone Multi-channel Wiener Filter approach

– MWF-ITF: Extension of MWF to an MWF with integrated Interaural Transfer Function.

– Experimental results: SRT and Localization

• Objective measures

• Perceptual measures (N=5)

(3)

Problem statement

• Hearing impairment  reduction of speech intelligibility in background noise (even with amplification)

– Signal processing to selectively enhance useful speech signal – Multiple microphones available: spectral + spatial processing – Most hearing impaired are fitted with hearing aids at both ears

• Binaural hearing: everything relating to hearing simultaneously (vs bilateral) with two ears

– Binaural cues, in addition to spectral and temporal cues, play an important role in binaural noise reduction and sound localization.

(important to preserve these cues)

– It has been reported that current bilateral noise reduction systems have a negative impact on binaural hearing [Van den Bogaert et al. 2005, Van den Bogaert et al. 2006]

(4)

Problem statement

Bilateral system Binaural system

+ More microphones = more performance?

- Need of binaural link - If adaptive typically no control

on left/right proc.

Hard to preserve interaural cues (mic mismatch, imperfections, ...)

- + more control on left/right processing?

(5)

Problem statement

• Main binaural cues

– Interaural phase or interaural time differences

• ITD range: from 10µs to 700µs

• Of the signal f<1300Hz and/or on the low frequency envelope for complex sounds

– Interaural level differences

• ILD range: from 1dB up to more than 30dB

• Physically significant f>2000Hz

IPD/ITD

Signal source ILD

(6)

Design criteria binaural noise reduction for HA’s:

- Maximize noise reduction by using all available microphone signals (binaural link assumed), 2 outputs needed

- Preserve the binaural cues

- Limit the amount of speech distortion

(HA constraints: Robustness of the system, low complexity, …)

Problem statement

(7)

Overview of binaural noise reduction techniques

• Different approaches

– BSS methods (Robert Aichner ICASSP 3days ago) – Fixed beamforming [e.g. Desloge 1997]

• Low complexity

• Limited performance, only speech cues may be preserved (in ideal situations)

– CASA based techniques [e.g. Wittkop 2003]

• Perfect preservation of speech/noise cues

• Mostly for 2 microphones, “spectral substraction” like problems

– Adaptive beamforming, based on GSC structure passing the low freq part of the signal unproc. [e.g. Welker 1997]

• Preserves parts of the binaural cues

• Substantial drop in noise reduction

– Binaural multi-channel wiener filter [e.g. Doclo 2002 Spriet 2004]

• Speech cues are preserved

• No assumptions about positions of sources and microphones

• Noise cues may be distorted

Extension of MWF :

preservation of binaural speech and noise cues without substantially compromising noise reduction performance

(8)

Multichannel Wiener Filter

) (w N )

(w S Speaker

Noise

W1^(⁾

) (

Z0 Z¹^(⁾ )

1(

) Y

0(

Y

W0^(⁾

Filtered output:

) ( )

( )

(

) ( )

( )

(

1 1

1

0 0

0

w V

w X

w Y

w V

w X

w Y

m m

m

m m

m









Speech

component Noise component

2M microphones:

) 1 ...(

0 

 M

m

Goal: to estimate the speech component at the reference microphone of each hearing aid (r0, r1)

typically the front omnidirectional one:

Standard multichannel Wiener filter

 0

W J^MSE



A hearing aid listening scenario

(9)

Multichannel Wiener Filter

0

1

2 2

0, 0 0

1, 1 1

( )

H H

r

H H

r

J E X

X 

      

 

        

W X W V

W W X W V

Speech distortion Noise reduction Trade off parameter

Introduce trade off parameter noise reduction/speech distortion

= 1 SDW

W R r

0

W J^MSE



To control or reduce speech distortion rewrite cost function:

Speech distortion weighted multichannel Wiener filter

In standard hearing aid beamforming, avoiding speech distortion is typically done by

calibrating the speech reference path and removing the speech component in the noise ref path

(10)

Multichannel Wiener Filter

0 1

= ^x ^v ^M , = ^x , _x _y _v

M x v x



    

 

    

   

R R 0 r

R r R R R

0 R R r

• Depends on second-order statistics of speech and noise, no

assumption of speech and noise source (can be integrated in VAD)

• Perfectly preserves the interaural cues of the speech component since in the left and right hearing aid an estimate is made of the speech component in the front microphone of this hearing aids.

• Shifts the interaural cues of the noise component to the cues of the speech component !!!!

= 1 SDW

W R r

Estimate, f(VAD)

Add term related to binaural cues of noise component to the MWF cost function

Possible cues: ITD, ILD, Interaural Transfer Function (ITF) ITF-MWF

(11)

Multichannel Wiener Filter – ITF extension

   

0

1

2 2

0, 0 0

1

1, 1

2 2

0 1 0 1

( ) =

H H

r

tot H H

r

H x H H v H

in in

J E X

X

E ITF E ITF



 

      

    

      

   

 

   

W X W V

W W X W V

W X W X W V W V

Under assumption of a single noise source

You can do this for the speech and noise component

 



⁰ ¹



0

1 1 1

* 0, 1,

0, 0 1

1, 1, 1,* 1 1

( , ) ( , )

r r

v r v

in

r r r v

E V V

V r r

ITF  V  E V V  R r r R

Goal: the ITF of the noise component at the output = ITF at the input

Performance and influence of beta and alpha on Loc and SNR improvement performance?

(12)

Experimental results

• Identification of HRTFs:

– Binaural recordings on CORTEX MK2 artificial head

– 2 omni-directional microphones on each hearing aid (d=1cm) – Hrtfs measured = -90:15:90, 90:30:270, 1m from head – Conditions: T60=140 ms (T⁶⁰=590 ms added) fs=16 kHz, =1

• Objective evaluation:

– AI weighted SNR improvement – ITD and ILD error

• Perceptual evaluation:

– Headphone exp with record. hrtfs – SRT measurements (50% Sp. Intell) – Localization using prerecorded hrtfs,

S and N components are send seperately through the fixed filter, localize S and N in the room were the hrtfs were recorded

(13)

Experimental results

Left input

signals Right input signals

( )  ( )  ( )

Y X V

FFT FFT

0 x0 v0

Z  Z  Z Z₁  Z_x₁  Z_v₁ Left output Right output

IFFT IFFT

Frequency-domain filtering

Off-line computation of statistics

VAD

( ), ( )

v  x 

R R

Calculate filters

for this specific sc. ₀

1

( ) = ( )

( )

 



 

 

 

W W , , W

  

Stored filters are converged for a condition Sx Ny with Sx=speech weighted noise from angle x and Ny=babble noise from angle y

(14)

Experimental results: objective evaluation

• Error measures correlated with design criteria:

– Maximize speech intelligibility: Intelligibility weighted SNR improvement (left/right)

– Minimize interaural cue distortion

• ILD of speech and noise component

• ITD of speech and noise component

   

L i L i

i

SNR I  SNR 

 





importance of i-th frequency bin for speech intelligibility

   

x i x i

i

ITD I  ITD 

 





  { 0,0 1,^*₁} { 0 ^*1}

x i r r x x

ITD  E X X E Z Z

   

low-pass filter 1500 Hz

   

x x

x out i in i

i

ILD ILD  ILD 

 





(15)

0 0.05 0.1 0.15 0.2 0

0.2 0.4

0 5 10 15

 ILD error speech component



ILD [dB]

0 0.05 0.1 0.15 0.2

0 0.2

0.4 0 5 10 15

 ILD error noise component



ILD [dB]

0 0.05 0.1 0.15 0.2

0 0.2

0.4 0 0.5 1 1.5

 ITD error speech component



ITD [rad]

0 0.05 0.1 0.15 0.2

0 0.2

0.4 0 0.5 1 1.5

 ITD error noise component



ITD [rad]

objective evaluation: localization

S0N60

(16)

0

0.05

0.1

0.15

0.2

0 0.1 0.2 0.3 0.4 0.50

5 10 15 20 25

 SNR improvement left ear



SNRw [dB]

0

0.05

0.1

0.15

0.2

0 0.1 0.2 0.3 0.4 0.50

5 10 15 20 25

 SNR improvement right ear



SNR w [dB]

objective evaluation: SRT

   

L i L i

i

SNR I  SNR 

 





β

β α

α S0N60

Additions:

For T60=0.59s Left perf. S0N60 drops to 5dB SNR AI, right perf drops to 7dB SNR AI noise reduction

Going from 2 to 4 microphones gives a gain of about 3 dB SNR AI to 9dB SNR AI compared to 2 microphone performance

T60=0.14s

importance of i-th frequency for speech intelligibility

(17)

SRT: perceptual evaluation

VU noise 60 deg, alpha=0

9,00 11,00 13,00 15,00

0,0 0,1 0,3 1,0 10,0

Beta

SRT improvement

– Adaptive SRT procedure to find 50% Speech Recept Threshold – S0N60, dutch VU sentences, T60=140ms

– average SRT without processing = -9.2 dB – SRT improvements in the range 11-13 dB

Binaural speech intelligibility advantage because of spatial seperation speech and noise component does not seem to compensate for loss in SNR improvement

Addition: performance drops to around 6dB SRT gain if T60=590ms (S0N90 tested for N=2)

(18)

Localization: perceptual evaluation

• Condition SxN0: Speech arrives from angle x, with x from -90° till +90°

in steps of 30°, noise arrives from 0 degrees.

• Perceptual procedure: calculate MWF filters offline trained on spatial condition SxNy. Now run a telephone ring arriving from angle x and angle y seperately through the filters and store the result. Play these wav files under headphones to the subject and ask to localize the telephone signal.

(19)

Localization: perceptual evaluation

-90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90 -90

-75 -60 -45 -30 -15 0 15 30 45 60 75 90

-90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90 -90

-75 -60 -45 -30 -15 0 15 30 45 60 75 90 -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90

-90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90

-90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90 -90

-75 -60 -45 -30 -15 0 15 30 45 60 75 90

SxN0

alfa=0 beta=0

Localization of Sx Localization of N0

alfa=0 beta=10

(20)

Localization: perceptual evaluation SxN0

Loc error Sx in SxN0 Loc error N0 in SxN0

0 10 20 30 40 50 60 70 80

-90 -60 -30 0 30 60 90

x(°)

error(°

0 10 20 30 40 50 60 70 80

(°)

-10 0 10 20 30 40 50 60 70 80

0 0,1 0,3 1 10 100

beta

(°)

alpha=0 alpha=0,5

0 10 20 30 40 50 60 70 80 90 100

-90 -60 -30 0 30 60 90

x (°) error(°)

0 0,1 0,3 1 10 100

beta

be ta=0 be ta=0,1 be ta=0,3 be ta=1 be ta=10 be ta=100

Loc error Sx, 5 subjects Loc error N0, 5 subjects

(21)

• Sum of localisation errors S_x and N₀

• Parameters can be tuned to achieve better overal localization performance -> at the cost of some noise reduction

• There is a correlation between physical and perceptual evaluation, even for localization. However error measures far from perfect. (do not include diffuseness, …)

Localization: perceptual evaluation

Loc error Sx + Loc error N0 5 subjects SxN0

0 10 20 30 40 50 60 70 80

0 0,1 0,3 1 10 100

beta

(°)

a l p h a = 0 a l p h a = 0 , 5

(22)

Conclusions

• (Speech distortion weighted) MWF preserves the speech cues, not the noise cues

• MWF-ITF enables, by constraining the filters W to an area where noise ITF is preserved, a trade off between preservance of speech and noise cues and noise

reduction performance (a solution but not the perfect solution: multiple spectral overlapping noise

sources, ...)

• Preserving localization cues did not show a large

benefit (due to the spatial seperation of speech and

noise) / reduction (due to the extra constraints set on

W) in SRT score.

(23)

Acknowledgements

download at https://gilbert.med.kuleuven.be/~u0041407/2007ICASSP.pdf Contact: Tim.vandenbogaert@med.kuleuven.ac.be

(24)

objective evaluation: SRT (additions)

• The gain of going from 2 mics on one HA to 3 or 4 mics (low reverb):

– Single Noise Scenario:

• 2 mic performance: 5 to 19 dB SNR AI improvement

• 2 to 3 mics: +2/+5 dB SNR AI improvement

• 3 to 4 mics: +1/+4 dB SNR AI improvement

• (max if noise source is at position of 2 mics -> adding a good SNR signal as 3rd or 4th microphone)

– Multiple noise sources (3 noise sources)

• 2 mic performance: around 7 dB SNR AI improvement

• 2 to 3 mics: +2dB SNR AI improvement

• 3 to 4 mics: +2dB SNR AI improvement

(25)

Localization: objective evaluation

-80 -60 -40 -20 0 20 40 60 80

0 10 20 30 40 50 60

Angle speech source

ITD error [%]

ITD error speech (VU_man + auditec 0deg,  = 0, SNR=0dB)

-80 -60 -40 -20 0 20 40 60 80

0 10 20 30 40 50 60

Angle speech source

ITD error [%]

ITD error noise (VU_man + auditec 0deg,  = 0, SNR=0dB)

beta = 0 beta = 0.1 beta = 0.3 beta = 1 beta = 10 beta = 100

SxN0

large  changes direction of speech component to noise component  increase weight  (cf. physical and perceptual evaluation)