Frequency-domain Criterion for Frequency-domain Criterion for Speech Distortion Weighted Speech Distortion Weighted Multi-Channel Wiener Filtering Multi-Channel Wiener Filtering

(1)

Frequency-domain Criterion for Frequency-domain Criterion for

Speech Distortion Weighted Speech Distortion Weighted Multi-Channel Wiener Filtering Multi-Channel Wiener Filtering

Simon Doclo¹, Ann Spriet^1,2, Marc Moonen¹, Jan Wouters²

1Dept. of Electrical Engineering (ESAT-SCD), KU Leuven, Belgium

2Laboratory for Exp. ORL, KU Leuven, Belgium

HSCMA-2005, 17.03.2005

(2)

Overview Overview

• Adaptive beamforming: GSC

o Not robust against signal model errors

• Spatially-preprocessed SDW-MWF:

o Increase robustness of adaptive stage by taking speech distortion into account

o Implementation: stochastic gradient algorithms o Frequency-domain criterion

o Experimental validation in hearing instruments

• Audio demonstration

• Conclusions

(3)

3

Hearing instruments Hearing instruments

• Hearing problems effect more than 10% of population

• Digital hearing instruments allow for advanced signal processing, resulting in improved speech understanding

• Major problem: (directional) hearing in background noise

o reduction of noise wrt useful speech signal o multiple microphones + DSP in BTE

o current systems: simple fixed and adaptive beamforming o robustness important due to small inter-microphone distance

hearing aids and cochlear implants

design of robust multi-microphone noise reduction scheme

Introduction

Adaptive beamforming

Experimental validation

Audio demo

Conclusions

(4)

GGSC = Adaptive MVDR-beamformerSC = Adaptive MVDR-beamformer

Avoids speech distortion

Relies on assumptions

known mic characteristics, known speaker position, no reverberation

Speech distortion !

distorted speech + noise

speech

+ noise

Filter w₁ Filter w₂

 +

- -

Spatial pre-processor

(Fixed beamforming) Adaptive stage

(Adaptive Noise Canceller) 0°

0°

speech + noise speech reference

noise

noise references

Violated in practice

G

speech leakage+

Minimises output noise power

 



⁰ ²



[ ] min[ ] [ ] [ ] [ ] output noise power

T

k ^ k  v k ^{  } k k

w w w v



Introduction

Adaptive beamforming -GSC

-SP-SDW-MWF -Implementation

Audio demo

Conclusions

(5)

5

Robustness against model errors Robustness against model errors

• Spatial pre-processor and adaptive stage rely on

assumptions that are generally not satisfied in practice:

o Distortion of speech component in speech reference o Leakage of speech into noise references, i.e.

]

0[k x 0 x[k]

Speech component in output signal gets distorted ]

[ ] [ ]

[ ]

[k x₀ k k k

z_x   w^T x

• Design of robust noise reduction algorithm:

1. Reduce speech leakage contributions in noise references:

• Robust fixed spatial filter [Nordebo 94, Doclo 03]

• Adaptive blocking matrix [Van Compernolle 90, Hoshuyama 99, Herbordt 01]

• Estimate relative acoustic transfer functions [Gannot 01]

2. Reduce effect of present speech leakage:

• Only update adaptive filter during low-SNR periods/frequencies

• Quadratic inequality constraint, leaky LMS [Cox 87, Claesson 92, Tian 01]

• Take speech distortion explicitly into account, SDW-MWF [Spriet 04]

Introduction

Audio demo

Conclusions

(6)

Design of robust adaptive stage Design of robust adaptive stage

• Distorted speech in output signal:

• Robustness: limit by controlling adaptive filter

] [ ] [ ]

[ ]

[k x₀ k k k

z_x   w^T x ]

[ ] [k k

T x

w w[k]

o Quadratic inequality constraint (QIC-GSC):

= conservative approach, constraint  f(amount of leakage)

 ] 

[k w

o Take speech distortion into account in optimisation criterion (SDW-MWF)

– 1/ trades off noise reduction and speech distortion (1/ = 0  GSC, 1/ = 1  MMSE estimate)

– Regularisation term ~ amount of speech leakage

 



⁰ ²

  ^ ^

²



]

[ 1 [ ] [ ]

] [ ] [ ]

[

min E v k ^T k k E ^T k k

k w v w x

w ^^ ^ ^ 

noise reduction speech distortion

Limit speech distortion, while not affecting noise reduction performance in case of no model errors  QIC

Introduction

Audio demo

Conclusions

(7)

8

Implementation Implementation

• Algorithms:

o Recursive matrix-based (GSVD, QRD) – too expensive

o Stochastic gradient algorithms (time vs. frequency domain)

• Stochastic gradient algorithm (time-domain):

o Cost function

results in LMS-based updating formula

 



0 ²



1

 ^

[ ] [ ]

^

²



] [ ] [ ]

[ )

( E v k k k E k k

J w w^T v w^T x

 









 ⁰ 

[ 1] [ ] [ ] 1

[ ] [ ]

[ ] ^T[ ] [ ] ^T [ ]

k k  k v k k k k k k



 

      

 

w w v v w x x w

regularisation term Classical GSC

o Practical computation of regularisation term using data buffers o Reduce complexity by frequency-domain implementation [Spriet 04]

 Still large memory requirement due to data buffers

o Memory reduction by approximating FD regularisation term [Doclo 04]

Introduction

Audio demo

Conclusions

(8)

Frequency-domain criterion (1) Frequency-domain criterion (1)

• Extension of block-based frequency-domain criterion for multi-channel AEC [Benesty 01, Buchner 03]

• Set derivative wrt time-domain filter coefficients w to zero

 normal equations in FD

0 0

[ ] (1 ) ^m ^{m i} ^H[ ] [ ] 1 (1 ) [ ] [ ]

f v v

m m i H

x x x x

i

v v

i

J m  i i   i i

 ^ 



  

   ê ê  ê ê

[ ] [ ], [ ] 0[ ] [ ] [ ]

[ ] [ ], [ ] [ ] [ ]

T

v L v v

T

x L x x

m m e k v k k k

m m e k k k

    

 

e F e w v

e F e w x

,2

1

, 10

2

1 [ ]

[ ] [ 1] (1 ) [ ]

[ ] [ ] [ ]

v

H v

H

x x L

v L

x m

m m

m m m

m





 

     

 

 

 

 





Q

D e

w w G Q

D e



• Recursive algorithm (details cf. book “Speech Enhancement”)

Introduction

Audio demo

Conclusions

(9)

10

Frequency-domain criterion (2) Frequency-domain criterion (2)

• Practical calculation of regularisation term  averaging [ ]m 1 _x[m] [m 1], _x[m] _y[m] _v[m]

    

r Q w Q Q Q

• Approximations for reducing the computational complexity:

o Approximate and by block-diagonal (or diagonal) correlation matrices :

 (block-)diagonal matrices can be easily inverted

 Ensure that is positive-definite:

eigenvalues of (block-)diagonal matrix can be easily computed

v[ ]m

Q Q_y[ ]m

[ ] [ 1] (1 ) ^H[ ] [ ]/ 2

y m   y m   y m y m

Q Q D D

[ ] [ ] [ ]

x m  y m  v m

Q Q Q

o Constrained vs. unconstrained update :

 corresponds to setting derivate wrt frequency-domain filter coefficients to zero

10

2_NL / 2

 G I w

 ^,2 

[ ]m  w[m 1] (1)Λ[ ]m D^H_v [ ]m e_{v L}[ ]m  r[ ]m w

Introduction

Audio demo

Conclusions

(10)

Experimental results Experimental results

Configuration

• 3-mic BTE on dummy head (d = 1cm, 1.5cm)

• Speech source in front of dummy head (0)

• 5 speech-like noise sources: 75,120,180,240,285

• Gain mismatch = 4dB at 2^nd microphone

Noise 1 Reverberation time

= 500 msec

H.A.

Noise 3

Noise 3 Noise 4

Noise 5

mic 1 mic 2

mic 3

2

Introduction

Experimental validation -Performance -Complexity

Audio demo

Conclusions

(11)

12

• Improvement in speech intelligibility

Performan

Performance measuresce measures

speech

noise

f

SNR_i  - [dB]

intellig

SNR _iSNR_i

i





I

Importance of i-th band for speech intelligibility [dB]

• Speech distortion

[dB]

f

SD_i  - [dB]

input speech

output speech

intellig

SD _iSD_i

i





I

Introduction

Audio demo

Conclusions

(12)

Experimental validation (1) Experimental validation (1)

• SDR-GSC (unconstrained update)

o Results after convergence (L=32, =0.5, =0.995, BD/D stepsize) o GSC (1/ = 0) : degraded performance if significant leakage

o 1/ > 0 increases robustness (speech distortion  noise reduction)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

4 5 6 7

1/

SNR [dB]

SDR-GSC (N=2), unconstrained update,  = 0.50,  = 0.9950

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 5 10 15

1/

SD [dB]

Algo 2 (U-BD), no mismatch Algo 4 (U-D1), no mismatch Algo 2 (U-BD), mismatch Algo 4 (U-D1), mismatch

GSC

Introduction

Audio demo

Conclusions

(13)

14

Experimental validation (2) Experimental validation (2)

• Convergence behaviour:

o Convergence speed: block-diagonal step size > diagonal step size o large   fast convergence

o large   slow convergence, better performance upon convergence

2 4 6 8

2 3 4 5 6 7

 = 0.99500

^{= 0.10}

Algo 2 (U-BD) Algo 4 (U-D1)

2 4 6 8

2 3 4 5 6

7 ^{= 0.50}

2 4 6 8

2 3 4 5 6 7

 = 0.99875

SNR (dB), SDR-GSC (N=2), L = 32, 1/ = 0.5, no mismatch

2 4 6 8

2 3 4 5 6 7

Introduction

Audio demo

Conclusions

(14)

Complexity + memory Complexity + memory

• Parameters: M = 3 (mics), N = 2 ^(a), N = 3 ^(b), L = 32, fs = 16kHz, Ly = 10000

• Computational complexity:

• Memory requirement:

Algorithm Complexity (MAC) MIPS

QIC-GSC (FD) (3M-1)FFT + 16M - 9 2.16

SDW-MWF (FD-buffer) (3N+5)FFT + 30N + 10 3.22^(a), 4.27^(b) SDW-MWF (FD-matrix-diag) (3N+2)FFT + 8N²+ 13N 2.46^(a), 3.89^(b) SDW-MWF (FD-matrix-BD) (3N+2)FFT + 14N²+ 10N + 12 2.94(N=2 !)

Algorithm Memory kWords

QIC-GSC (FD) 4(M-1)L + 6L 0.45

SDW-MWF (FD-buffer) 2NL_y + 6LN + 7L 40.61 ^(a), 60.80 ^(b) SDW-MWF (FD-matrix-all) 4LN² + 6LN + 7L 1.12 ^(a), 1.95 ^(b)

Complexity and memory comparable to QIC-GSC

Introduction

Audio demo

Conclusions

(15)

16

Algorithm No deviations Deviation (4dB) Noisy microphone signal

Speech reference Noise reference

Output GSC (1/ = 0)

Output SDR-GSC (1/ = 0.5)

Audio demonstration Audio demonstration

Introduction

Audio demo

Conclusions

(L=32, =10, =0.99875, block-diagonal stepsize, unconstrained update)

(16)

• Spatially pre-processed SDW-MWF:

o Take speech distortion explicitly into account  improve robustness of adaptive stage

o Encompasses GSC and MWF as special cases

• Implementation:

o Stochastic gradient algorithms in time- and frequency-domain o Frequency-domain criterion: block-based processing  natural

derivation of different adaptive algorithms

o Block-diagonal vs. diagonal, constrained vs. unconstrained o Comparable implementation cost as QIC-GSC

• Experimental results:

o SP-SDW-MWF achieves better noise reduction than QIC-GSC, for a given maximum speech distortion level

o Faster convergence speed for block-diagonal step size matrix

Conclusions Conclusions

Introduction

Audio demo

Conclusions

Frequency-domain Criterion for Frequency-domain Criterion for Speech Distortion Weighted Speech Distortion Weighted Multi-Channel Wiener Filtering Multi-Channel Wiener Filtering