Noise reduction and binaural Noise reduction and binaural cue preservation of multi- cue preservation of multi- microphone algorithms microphone algorithms

(1)

Noise reduction and binaural Noise reduction and binaural

cue preservation of multi- cue preservation of multi-

microphone algorithms microphone algorithms

Simon Doclo, Tim van den Bogaert, Marc Moonen, Jan Wouters

Dept. of Electrical Engineering (ESAT-SCD), KU Leuven, Belgium Dept. of Neurosciences (ExpORL), KU Leuven, Belgium

Oldenburg, June 28 2007

(2)

2

Overview Overview

• Problem statement

o Improve speech intelligibility + preserve spatial awareness o Bilateral vs. binaural processing

• Binaural signal processing using multi-channel Wiener filter

o MWF: noise reduction and preservation of speech cues, noise cues are distorted

o Extension of MWF to preserve binaural cues of all components:

– MWFv: partial estimation of noise component

– MWF-ITF: extension with Interaural Transfer Function

o Physical and perceptual evaluation

• Reduce bandwidth requirements of wireless link

o Distributed binaural MWF

(3)

33

Problem statement Problem statement

• Many hearing impaired are fitted with a hearing aid at both ears

o Signal processing to selectively enhance useful speech signal and improve speech intelligibility

o Signal processing to preserve directional hearing and spatial

awareness

o Multiple microphone available: spectral + spatial processing

• Binaural auditory cues:

o Interaural Time Difference (ITD) – Interaural Level Difference (ILD) o Binaural cues, in addition to spectral and temporal cues, play an

important role in binaural noise reduction and sound localisation o ITD: f < 1500Hz, ILD: f > 2000Hz

IPD/ITD

ILD

Problem statement -bilateral/binaural

Binaural processing

Bandwidth reduction

Conclusions

(4)

4

Bilateral vs. Binaural Bilateral vs. Binaural

Bilateral system

Independent left/right processing:

Preservation of binaural cues for localisation ?

Binaural system

More microphones:

 better performance ?

 preservation of binaural cues ? Need of binaural link

Conclusions

(5)

55

• Bilateral system:

o Independent processing of left and right hearing aid o Localisation cues are distorted

RMS error per loudspeaker when accumulating all responses of the different test conditions (NH = normal hearing, NO = hearing impaired without hearing aids, O = omnidirectional configuration, A = adaptive directional configuration)

[Van den Bogaert et al., 2006]

Bilateral vs. Binaural Bilateral vs. Binaural

Conclusions

 also effect on intelligibility through binaural hearing advantage

(6)

7

• Bilateral system:

o Independent processing of left and right hearing aid o Localisation cues are distorted

• Binaural system:

o Cooperation between left and right hearing aid (e.g. wireless link) o Assumption : all microphone signals are available at the same time

Objectives/requirements for binaural algorithm:

1. SNR improvement: noise reduction, limit speech distortion

2. Preservation of binaural cues (speech/noise) to exploit binaural hearing advantage

3. No assumption about position of speech source and microphones

[Van den Bogaert et al., 2006]

Bilateral vs. Binaural Bilateral vs. Binaural

Conclusions

(7)

1010

Configuration and signals Configuration and signals

• Configuration: microphone array with

M microphones at left and right hearing aid, communication between hearing aids

noise component

0,_m

( ) =

0,_m

( ) V

0,_m

( ) , = 0

0

1 Y

0,_m

( ) =  X

0,_m

(  ) 

0,_m

( )  , m = 0  M

0

 1 Y  X   V  m  M 

speech component

0

( ) =

0^H

( ) ( ),

1

( ) =

1^H

( ) ( ) Z  W  Y  Z  W  Y 

• Use all microphone signals to compute output signal at both ears

Problem statement

Binaural processing -MWF

-Cue preservation -Physical evaluation -Perceptual eval

Conclusions

(8)

11

Overview of cost functions Overview of cost functions

Multi-channel Wiener filter (MWF): MMSE estimate of speech component in microphone signal at both ears trade-off noise reduction

and speech distortion

Speech-distortion weighted multi-channel Wiener filter (SDW-MWF)

[Doclo 2002, Spriet 2004]

binaural cue preservation of speech + noise

Partial estimation of noise component (MWFv)

[Klasen 2005]

Extension with ITD-ILD or Interaural Transfer

Function (ITF)

[Doclo 2005, Klasen 2006, Van den Bogaert 2007]

Conclusions

(9)

1212

• Binaural SDW-MWF: estimate of speech component in microphone signal at both ears (usually front microphone) + trade-off between noise reduction and speech distortion

Binaural multi-channel Wiener filter Binaural multi-channel Wiener filter

0

1

=

^x ^v ^M

, =

^x

,

_x _y _v

M x v x



    

 

    

   

R R 0 r

R r R R R

0 R R r

0

1

2 2

0, 0 0

1

1, 1

( )

H H

r

H H

r

J E X

X 

      

 

                   W X W V

W W X W V ^W

^SDW

⁼ ^{R r}

^¹

speech component

in front microphonesspeech distortion noise reduction

Conclusions

estimate o Depends on second-order statistics of speech and noise

o Estimate Ry during speech-dominated time-frequency segments, estimate Rv during noise-dominated segments, requiring robust voice activity

detection (VAD) mechanism

o No assumptions about positions of microphones and sources

o Adaptive (LMS-based) algorithm available [Spriet 2004, Doclo 2007]

(10)

13

Binaural multi-channel Wiener filter Binaural multi-channel Wiener filter

• Interpretation for single speech source:

o Spectral and spatial filtering operation

with  (spatial) coherence matrix and P (spectral) power

o Equivalent to superdirective beamformer (diffuse noise field) or delay-and-sum beamformer (spatially white noise field)

+

single-channel WF-based postfilter (spectral subraction)

Spatial separation between

speech and noise sources SNR

0

1 1

*

,0 1 1

/

0,

H

v v

SDW H H r

v v v s

P P A



 

 

 Γ A A Γ A W AΓ A A Γ A

• Binaural cues (ITD-ILD) :

Perfectly preserves binaural cues of speech component Binaural cues of noise component  speech component !!

Conclusions

(11)

1414

• Partial estimation of noise component

o Estimate of sum of speech component and scaled noise component

o Relationship with SDW-MWF: mix with reference microphone signals

reduction of noise reduction performance works for multiple noise sources

Partial noise estimation (MWFv) Partial noise estimation (MWFv)

0

1

0

1

0,

2

0, 0

1, 1, 1

( )

^r _H

r

H r

r

X X

V

J E  V



    

 

    

  

   



  

W W Y

W Y

0

1 0

1

2 2

0, 0 0

1,

0,

1 1, 1

0 1

( )

^r

,

r

H H

r

H H

r

J X V

E X   V

 

      

 

       

    

     

 

 

 

W X W V

W W X W V

0

1

0 0, ,0

1 1, ,1

(1 ) (1 )

r SDW

Z Y Z

 

  

Conclusions

(12)

15

Interaural Wiener filter (MWF-ITF) Interaural Wiener filter (MWF-ITF)

• Extension of SDW-MWF with binaural cues

o Add term related to binaural cues of noise (and speech) component

o Possible cues: ITD, ILD, Interaural Transfer Function (ITF)

( ) = ( ) ^x ( ) ^v ( )

tot SDW cue cue

J W J W 



J W 



J W

0 0

1 1

H

v v

out H

v

ITF Z

 Z  W V W V

 



⁰ ¹



0

1 1 1

* 0, 1,

0, 0 1

1, 1, 1,* 1 1

( , ) ( , )

r r

v r v

in

r r r v

E V V

V r r

ITF  V  E V V  R r r

e.g.

R

   

0

1

2 2

0, 0 0

1

1, 1

2 2

0 1 0 1

( ) =

H H

r

tot H H

r

H x H H v H

in in

J E X

X

E ITF E ITF



 

      

    

        

   

 

   

W X W V

W W X W V

W X W X W V W V

ITF preservation speech ITF preservation noise

o Closed form expression!

o large  changes direction of speech  increase weight  o Implicit assumption of single noise source

Conclusions

(13)

1717

Simulation setup Simulation setup

• Identification of HRTFs:

o Binaural recordings on CORTEX MK2 artificial head

o 2 omni-directional microphones on each hearing aid (d=1cm) o LS = -90:15:90, 90:30:270, 1m from head

o Room reverberation: T

60

=140 ms (and T

60

=510 ms)

Conclusions

(14)

18

Experimental results Experimental results

• Simulations:

o S

x

N

_y

, SNR = 0 dB on left front microphone (broadband) o f

s

= 20.48 kHz

• MWF algorithmic parameters:

o batch procedure, perfect VAD o L=96, =5

o MWFv for different , MWF-ITF for different ,

• Physical evaluation:

o Speech = HINT, noise = babble noise o Speech intelligibility: SNR

o Localisation: ITD / ILD

• Perceptual evaluation:

o Preliminary study with NH subjects o Speech intelligibility: SRT

o Localisation: localise S and N

Conclusions

(15)

1919

Physical evaluation Physical evaluation

• Performance measures:

o Intelligibility weighted SNR improvement (left/right)

o ILD error (speech/noise component)  power ratio

   

x x

x out i in i

i

ILD ILD  ILD 

   

o ITD error (speech/noise component)  phase of cross-correlation

x

 

i x

 

i i

ITD I  ITD 

    ^{ }

1

* *

0,0 1, 0 1

{ } { }

x i r r x x

ITD  E X X E Z Z

   

   

L i L i

i

SNR I  SNR 

   

importance of i-th frequency for speech intelligibility

low-pass filter 1500 Hz

Conclusions

(16)

0 0.2 0.4 0.6 0.8 1 0

10 20



SNR left [dB]

0 0.2 0.4 0.6 0.8 1

0 10 20



SNR right [dB]

0 0.2 0.4 0.6 0.8 1

0 2 4 6



ILD speech [dB]

0 0.2 0.4 0.6 0.8 1

0 2 4 6



ILD noise [dB]

0 0.2 0.4 0.6 0.8 1

0 0.5 1

ITD speech [rad]

auditec 60deg (=5, L=96, N=4)

0 0.2 0.4 0.6 0.8 1

0 0.5 1

ITD noise [rad]

Physical evaluation: MWFv Physical evaluation: MWFv

S

₀

N

₆₀

(17)

2323

• Procedure:

o headphone experiments, using measured HRTFs

o Filters are calculated off-line on VU speech-weighted noise as S and multitalker babble noise as N

o All stimuli presented at comfort level, 5 NH subjects (ongoing)

Perceptual evaluation Perceptual evaluation

Headphones HRTF_x

HRTF_v speech

noise

G Binaural

filter



Mic L

R

Conclusions

• Speech intelligibility:

o Adaptive procedure to find 50% Speech Reception Threshold (SRT)

• Localisation:

o S and N components (telephone) are sent separately through filter o Localise S and N in room where HRTFs were measured

o Level roving 6 dB, 3 repetitions per condition for each subject

(18)

N270 N315 S0 S45 N60 S90 24

-90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90

v a mwf02 0dB

Perceptual evaluation: MWFv Perceptual evaluation: MWFv

• Algorithms: unprocessed, state-of-the art bilateral, MWF, MWFv (=0.2)

• Conditions: S0

N

60

, S

45

N

315

and S

90

N

270

(T

60=510 ms)

N270 N315 S0 S45 N60 S90 -90

-75 -60 -45 -30 -15 0 15 30 45 60 75 90

v a unproc 0dB

N270 N315 S0 S45 N60 S90 -90

-75 -60 -45 -30 -15 0 15 30 45 60 75 90

v a classic 0dB

N270 N315 S0 S45 N60 S90 -90

-75 -60 -45 -30 -15 0 15 30 45 60 75 90

v a mwf0 0dB

Conclusions

(19)

2525

Perceptual evaluation: MWFv Perceptual evaluation: MWFv

• With state-of-the-art systems: preservation of binaural cues only within central angle of frontal hemisphere.

• Binaural MWF:

o preserves localization cues for speech source

o preserves localization cues for noise source(s) with small mixing 

o Recent SRT experiments (N=2) show no substantial SRT difference between =0 and =0.2

• Ongoing research:

o Perceptual evaluation (SRT and localisation) for MWF-ITF

Conclusions

(20)

28

Bandwidth constraints Bandwidth constraints

• Binaural MWF:

o 2M microphone signals are transmitted over wireless link

• Reduce bandwidth requirement of wireless link:

o Transmit one signal from contralateral ear

Conclusions

– Front contralateral microphone signal

– Output of contralateral fixed (e.g. superdirective) beamformer – Output of MWF using only M contralateral microphone signals – Iterative distributed binaural MWF scheme

(21)

2929

Physical evaluation Physical evaluation

60 90 120 180 270 300 -60 60 -120 120 120 210 60 120 180 210 60 120 180 270 8

10 12 14 16 18 20

22 Performance comparison of MWF-based binaural algorithms

noise source(s) angle (°)

AI weighted SNR improvement (dB)

MWF-full MWF-front MWF-contra MWF-iter

Conclusions

Performance of dB-MWF close to full binaural MWF !

(22)

3030

Contralateral directivity pattern Contralateral directivity pattern

T₆₀=140 ms S₀N₁₂₀

Left HA ^-50

-45 -40 -35

30 210

60 240

90 270

120

300

150

330

180 0

Left HA - contralateral (N=4,120 deg,=5,SNR=14.6277)

Fullband

SNR=14.6dB B-MWF

-45 -40 -35 -30

30 210

60 240

90 270

120

300

150

330

180 0

Left HA - front contralateral (N=3)

Fullband

MWF-front

SNR=10.5dB

-55 -50 -45 -40 -35

30 210

60 240

270

120

300

150

330

180 0

Left HA - MWF contralateral (N=4,120 deg,=5,SNR=14.2051)

Fullband

MWF-contra

SNR=14.2dB

-50 -45 -40 -35

30 210

60 240

270

120

300

150

330

180 0

dB-MWF

SNR=14.2dB

(23)

3131

Conclusions Conclusions

• State-of-the art signal processing in (bilateral) HAs:

preservation of binaural cues only within central angle of frontal hemisphere

• Binaural MWF:

o Substantial noise reduction (MWF 4  3 > 2) o Preservation of binaural speech cues

o Distortion of binaural noise cues

o No assumptions about positions and microphones  VAD

• Compromise between noise reduction and binaural cue preservation can be achieved with extensions of MWF

o Mixing with microphone signals o Interaural Transfer Function

• Reduction of bandwidth using distributed MWF

Conclusions