Physical and perceptual evaluation of the Physical and perceptual evaluation of the Interaural Wiener Filter algorithm Interaural Wiener Filter algorithm

(1)

Physical and perceptual evaluation of the Physical and perceptual evaluation of the

Interaural Wiener Filter algorithm Interaural Wiener Filter algorithm

Simon Doclo

¹

, Thomas J. Klasen

¹

, Tim van den Bogaert

²

, Marc Moonen

¹

, Jan Wouters

²

1Dept. of Electrical Engineering (ESAT-SCD), KU Leuven, Belgium

2Laboratory for Exp. ORL, KU Leuven, Belgium

IHCON, Aug 19 2006

Slides available at http://homes.esat.kuleuven.be/~doclo/presentations.html

(2)

2

Overview Overview

• Binaural hearing aids: noise reduction and preservation of binaural cues

• Overview of binaural noise reduction algorithms

• Binaural multi-channel Wiener filter:

o Estimate of speech component at both hearing aids

o Speech cues are preserved – noise cues may be distorted

• Preservation of binaural cues:

o Extension of cost function with ITD-ILD-ITF expressions

• Experimental results:

o Physical evaluation (SNR, ITD, ILD)

o Perceptual evaluation (SRT, localisation)

• Audio demonstration

(3)

33



P_L P_R

ITD

ILD

^L

R

P P





• Binaural auditory cues:

o Interaural Time Difference (ITD) – Interaural Level Difference (ILD) o Binaural cues, in addition to spectral and temporal cues, play an

important role in binaural noise reduction and sound localization

Problem statement Problem statement

• Hearing impairment  reduction of speech intelligibility in background noise

o Signal processing to selectively enhance useful speech signal o Many hearing impaired are fitted with hearing aid at both ears o Multiple microphones available: spectral + spatial processing

Problem statement

Binaural noise reduction

Multi-channel Wiener filter

Preservation of binaural cues

Experimental results

Audio demo

Conclusions

(4)

4

Problem statement Problem statement

• Bilateral system:

o Independent processing of left and right hearing aid

Audio demo

Conclusions

(5)

55

Problem statement Problem statement

• Bilateral system:

o Independent processing of left and right hearing aid o Localisation cues are distorted

• Binaural system:

o Cooperation between left and right hearing aid (e.g. wireless link) o Assumption: all microphone signals are available at the same time

Objectives/requirements for binaural algorithm:

1. SNR improvement: noise reduction, limit speech distortion

2. Preservation of binaural cues (speech/noise) to exploit binaural hearing advantage

3. No assumption about position of speech source and microphones

Audio demo

Conclusions

[Van den Bogaert, 2006]

(6)

6

Binaural noise reduction techniques Binaural noise reduction techniques

Audio demo

Conclusions

(7)

77

Binaural noise reduction techniques Binaural noise reduction techniques

Audio demo

Conclusions

(8)

8

Binaural noise reduction techniques Binaural noise reduction techniques

• Fixed beamforming: spatial selectivity + binaural speech cues

o Maximize directivity index while restricting speech ITD error

o Superdirective beamformer using HRTFS

[Desloge, 1997]

[Lotter, 2004]

low computational complexity

limited performance, known geometry, broadside array, only speech cues

[Desloge, 1997]

Audio demo

Conclusions

(9)

99

Binaural noise reduction techniques Binaural noise reduction techniques

• CASA-based techniques

o Computation and application of (real-valued) binaural mask based on binaural and temporal/spectral cues

[Kollmeier, Peissig, Wittkop, Dong, Haykin]

perfect preservation of binaural cues of speech/noise component

mostly for 2 microphones, “spectral-subtraction”-like problems

[Wittkop, 2003]

Audio demo

Conclusions

(10)

10

• Adaptive beamforming: based on GSC-structure

o Divide frequency spectrum: low-pass portion unaltered to preserve ITD cues, high-pass portion processed using GSC

Binaural noise reduction techniques Binaural noise reduction techniques

[Welker, 1997]

preserves binaural cues to some extent

substantial reduction in noise reduction performance, known geometry

[Welker, 1997]

Audio demo

Conclusions

(11)

1111

Binaural noise reduction techniques Binaural noise reduction techniques

• Binaural multi-channel Wiener filter

o MMSE estimate of speech component in microphone signal at both ears

[Doclo, Klasen, Wouters, Moonen]

speech cues are preserved, no assumptions about position of speech source and microphones

noise cues may be distorted

Extension of MWF :

preservation of binaural speech and noise cues without substantially compromising noise

reduction performance

Audio demo

Conclusions

(12)

12

Design of hearing aid SP algorithm requires some mathematics

but perceptual evaluation in a couple

of minutes…

(13)

1313

Configuration and signals Configuration and signals

• Configuration: microphone array with M microphones at left and right hearing aid, communication between hearing aids

• Vector notation: Y( )



 X( )



 V( )



noise component

0,_m( ) = 0,_m( ) V0,_m( ), = 0 0 1 Y0,_m( ) = X0,_m() 0,_m( ) , m = 0M0 1 Y  X  V  m M 

speech component

Audio demo

Conclusions ⁰ ⁰ ¹ ¹

( ) =

^H

( ) ( ), ( ) =

^H

( ) ( ) Z  W  Y  Z  W  Y 

• Use all microphone signals to compute output signal at both ears

(14)

14

Overview of cost functions Overview of cost functions

Multi-channel Wiener filter (MWF): MMSE estimate of

speech component in microphone signal at both ears

trade-off noise reduction

and speech distortion

Speech-distortion weighted multi-channel Wiener filter (SDW-MWF)

[Doclo 2002, Spriet 2004]

binaural cue preservation of speech + noise

Partial estimation of noise component

[Klasen 2005]

Extension with ITD-ILD or

Interaural Transfer

Function (ITF)

[Doclo 2005, Klasen 2006]

Audio demo

Conclusions

(15)

1515

• Binaural SDW-MWF: estimate of speech component in microphone signal at both ears (usually front microphone) + trade-off between noise reduction and speech distortion

Binaural multi-channel Wiener filter Binaural multi-channel Wiener filter

0 1

= ^x ^v ^M , = ^x , _x _y _v

M x v x



    

 

    

   

R R 0 r

R r R R R

0 R R r

0

1

2 2

0, 0 0

1

1, 1

( )

H H

r

H H

r

J E X

X



      

 

         W X W V

W W X W V ^W^SDW ⁼ ^{R r}^¹

trade-off parameter

Audio demo

Conclusions

estimate o Depends on second-order statistics of speech and noise

o Estimate Ry during speech-dominated time-frequency segments, estimate R_v during noise-dominated segments, requiring robust voice activity detection (VAD) mechanism

o No assumptions about positions of microphones and sources

(16)

16

Binaural multi-channel Wiener filter Binaural multi-channel Wiener filter

• Binaural cues (ITD-ILD) :

Perfectly preserves binaural cues of speech component Binaural cues of noise component  speech component !!

(cf. physical and perceptual evaluation)

Audio demo

Conclusions

• Extension of SDW-MWF with binaural cues

o Add term related to binaural cues of noise (and speech) component to SDW cost function

o Possible cues: ITD, ILD, Interaural Transfer Function (ITF) o Weight factors  and  can be frequency-dependent

( ) = ( )

^x

( )

^v

( )

tot SDW cue cue

J W J W   J W   J W

(17)

1717

Interaural Wiener Filter Interaural Wiener Filter

Audio demo

Conclusions

• Preserve binaural cues between input and output

o ITD: phase of cross-correlation

o ILD: power ratio

o ITF: Interaural transfer function (incorporates ITD and ILD)

0 0

1 1

H

v v

out H

v

ITF Z

 Z  W V W V

 



⁰ ¹



0

1 1 1

* 0, 1,

0, 0 1

1, 1, 1,* 1 1

( , ) ( , )

r r

v r v

in

r r r v

E V V

V r r

ITF  V  E V V  R r r

e.g.

R

   

0

1

2 2

0, 0 0

1

1, 1

2 2

0 1 0 1

( ) =

H H

r

tot H H

r

H x H H v H

in in

J E X

X

E ITF E ITF



 

      

    

      

   

 

   

W X W V

W W X W V

W X W X W V W V

ITF preservation speech ITF preservation noise o Closed form expression!

o large  changes direction of speech component to noise component

 increase weight  (cf. physical and perceptual evaluation)

(18)

18

Overview of batch algorithm Overview of batch algorithm

Left input

signals Right input signals

( )



 ( )



 ( )



Y X V

FFT FFT

0 x0 v0

Z  Z  Z Z₁  Z_x₁  Z_v₁ Left output Right output

IFFT IFFT

Frequency-domain filtering

Off-line computation of statistics

VAD

( ), ( )

v  x 

R R

Calculate binaural input cues and filter

0 1

( ) = ( )

( )

 



 

 

 

W W

, ,

W

  

Audio demo

Conclusions

(19)

1919

Experimental results Experimental results

• Identification of HRTFs:

o Binaural recordings on CORTEX MK2 artificial head

o 2 omni-directional microphones on each hearing aid (d=1cm) o LS = -90:15:90, 90:30:270, 1m from head

o Conditions: T60=140 ms, fs=16 kHz, L=1366 taps

- Physical - Perceptual

Audio demo

Conclusions

(20)

20

Experimental results Experimental results

• Speech and noise material:

o Dutch sentences (VU list)

o Stationary speech-weighted noise with same long-term spectrum as speech material  spatial aspects

o S0N₆₀ ,SNR=0 dB

o fs=16 kHz, FFT-size N=256, =1

• Physical evaluation:

o Speech intelligibility: SNR o Localisation: ITD / ILD

• Perceptual evaluation:

o Preliminary study

o Speech intelligibility: SRT o Localisation: localise S and N

Audio demo

Conclusions

(21)

2121

Physical evaluation Physical evaluation

• Performance measures:

o Intelligibility weighted SNR improvement (left/right)

o ILD error (speech/noise component)  power ratio

   

x x

x out i in i

i

ILD ILD  ILD 

 





o ITD error (speech/noise component)  phase of cross-correlation

x

 

i x

 

i i

ITD I  ITD 

 





^{ }

1

* *

0,0 1, 0 1

{ } { }

x i r r x x

ITD  E X X E Z Z

   

   

L i L i

i

SNR I  SNR 

 





importance of i-th frequency for speech intelligibility

low-pass filter 1500 Hz

Audio demo

Conclusions

(22)

Physical evaluation: SNR Physical evaluation: SNR

0

0.05

0.1

0.15

0.2

0 0.1 0.2 0.3 0.4 0.50

5 10 15 20 25

 SNR improvement left ear



SNR w [dB]

0

0.05

0.1

0.15

0.2

0 0.1 0.2

0.3 0.4

0.50 5 10 15 20 25

SNR improvement right ear



SNR w [dB]

(23)

Physical evaluation: ILD-ITD Physical evaluation: ILD-ITD

0 0.05 0.1 0.15 0.2

0 0.2

0.4 0 5 10 15

 ILD error speech component



ILD [dB]

0 0.05 0.1 0.15 0.2

0 0.2

0.4 0 5 10 15

 ILD error noise component



ILD [dB]

0 0.05 0.1 0.15 0.2

0 0.2

0.4 0 0.5 1 1.5

 ITD error speech component



ITD [rad]

0 0.05 0.1 0.15 0.2

0 0.2

0.4 0 0.5 1 1.5

 ITD error noise component



ITD [rad]

(24)

24

Physical evaluation Physical evaluation

• Conclusions:

  increases: ITD-ILD error of noise component decreases … BUT… ITD-ILD error of speech component increases

  increases: ITD-ILD error of speech component decreases … BUT… ITD-ILD error of noise component increases

o Compromise between speech and noise localisation error possible (cf. localisation experiments)

o SNR improvement only slightly degraded (cf. SRT experiments)

Audio demo

Conclusions

(25)

2525

• Speech intelligibility: SRT

o How does parameter  affect speech intelligibility ?

o Two effects: increasing  reduces SNR improvement, but preserves binaural noise cues better, enabling binaural speech intelligibility advantage

• Localisation performance

o How do parameters  and  affect localisation of processed speech and noise components ?

 : preservation of speech cues, : preservation of noise cues

Perceptual evaluation Perceptual evaluation

Audio demo

Conclusions

(26)

26

• Measurement procedure:

o SRT = SNR where 50% of speech is intelligible o adaptive procedure (2 dB/step)

o headphone experiments, using HRTFs

o S₀N₆₀ (Dutch VU sentences – stationary noise) o presentation level = 65 dB SPL

o 5 normal-hearing subjects

o fs=16 kHz, FFT-size N=256, =1, =0 o Reference condition = no processing

Perceptual evaluation: SRT Perceptual evaluation: SRT

HRTF_x

HRTF_v speech

noise

G

Binaural filter



Mic L

R

Headphones

Audio demo

Conclusions

(27)

2727

Perceptual evaluation: SRT Perceptual evaluation: SRT

VU noise 60 deg, alpha=0

9,00 11,00 13,00 15,00

0,0 0,1 0,3 1,0 10,0

Beta

SRT improvement

• Results:

o average SRT without processing = -9.2 dB o SRT improvements in the range 11-13 dB

o Binaural speech intelligibility advantage does not seem to compensate for loss in SNR improvement

Audio demo

Conclusions

(28)

28

• Sum of localisation errors S

_x

and N

₀

• Parameters can be tuned to achieve better overal localization performance at the cost of some noise reduction

• Good correlation between physical and perceptual evaluation

Perceptual evaluation: localisation Perceptual evaluation: localisation

Loc error Sx + Loc error N0 5 subjects SxN0

0 10 20 30 40 50 60 70 80

0 0,1 0,3 1 10 100

beta

(°)

a l p h a = 0 a l p h a = 0 , 5

Audio demo

Conclusions

(29)

2929

Audio demonstration Audio demonstration

• Speech and noise material:

o HINT sentences, speech source in front (0) o Multi-talker babble noise at 60

o SNR=0 dB, fs=16 kHz, FFT-size N=256, =1, =0

Noisy Speech Noise

Input

Output (=0) Output (=0.05) Output (=10)

Audio demo

Conclusions

(30)

30

• Speech enhancement for binaural hearing aids:

o Improve speech intelligibility

o Localisation: preserve binaural speech and noise cues

o No assumptions about position speech source and microphones

• Suitable algorithm: multi-channel Wiener filter

speech cues are preserved noise cues may be distorted

• Preservation of binaural noise cues:

Interaural Wiener filter: extension with Interaural Transfer Function of noise (and speech) component

• Perceptual evaluation:

o S0N₆₀: SRT improvements in the range 11-13 dB

o Binaural speech intelligibility advantage does not seem to compensate for (small) loss in SNR improvement

o Parameters can be tuned to achieve better overal localization performance at the cost of some noise reduction

Conclusions Conclusions

Audio demo

Conclusions

(31)

3131