Multi-microphone noise reduction Multi-microphone noise reduction and dereverberation techniques and dereverberation techniques for speech applications for speech applications

(1)

Multi-microphone noise reduction Multi-microphone noise reduction

and dereverberation techniques and dereverberation techniques

for speech applications for speech applications

Simon Doclo

Dept. of Electrical Engineering, KU Leuven, Belgium 8 July 2003

(2)

Overview Overview

• Introduction

• Basic principles

• Robust broadband beamforming

• Multi-microphone optimal filtering

• Acoustic transfer function estimation and dereverberation

• Conclusion and further research

(3)

Overview Overview

• Introduction

Motivation and applications

Problem statement

Contributions

• Basic principles

• Robust broadband beamforming

• Multi-microphone optimal filtering

• Acoustic transfer function estimation and dereverberation

• Conclusion and further research

(4)

• Speech acquisition in an adverse acoustic environment

Motivation Motivation

• Speech communication applications: hands-free mobile telephony, voice-controlled systems, hearing aids

Background noise:

- fan, radio

- other speakers - generally unknown

Reverberation

- reflections of signal against walls, objects

• Poor signal quality

• Speech intelligibility and speech recognition

Introduction -Motivation

-Problem statement -Contributions

Basic principles

Beamforming

Multi-microphone optimal filtering

Transfer function estimation and dereverberation

Conclusion

(5)

Signal enhancement

Objectives Objectives

• Signal enhancement techniques:

Noise reduction : reduce amount of background noise without distorting speech signal

Dereverberation : reduce effect of signal reflections

Combined noise reduction and dereverberation

• Acoustic source localisation: video camera or spotlight

Basic principles

Beamforming

Conclusion

(6)

• Video-conferencing:

Microphone array for source localisation : – point camera towards active speaker

– signal enhancement by steering of microphone array

Applications Applications

• Hands-free mobile telephony:

Most important application from economic point of view

Hands-free car kit mandatory in many countries

Most current systems: 1 directional microphone

Basic principles

Beamforming

Conclusion

(7)

• Hearing aids and cochlear implants:

most hearing impaired suffer from perceptual hearing loss

 amplification  reduction of noise wrt useful speech signal

Applications Applications

• Voice-controlled systems:

domotic systems, consumer electronics (HiFi, PC software)

added value only when speech recognition system performs reliably under all circumstances

signal enhancement as pre-processing step

multiple microphones + DSP in hearing aid

current systems: simple beamforming

robustness important due to small inter-microphone distance

Basic principles

Beamforming

Conclusion

(8)

Algorithmic requirements Algorithmic requirements

• ‘Blind’ techniques: unknown noise sources and acoustic environment

• Adaptive: time-variant signals and acoustic environment

• Robustness:

Microphone characteristics (gain, phase, position)

Other deviations from assumed signal model (look direction error, VAD)

• Integration of different enhancement techniques

• Computational complexity

Basic principles

Beamforming

Conclusion

(9)

Problem statement Problem statement

• Problem of existing techniques:

Single-microphone techniques: very limited performance

 multi-microphone techniques: exploit spatial information

 multiple microphones required for source localisation

A-priori assumptions about position of signal sources and microphone array: large sensitivity to deviations

 improve robustness (and performance)

Assumption of spatio-temporally white noise

 extension to coloured noise

Development of multi-microphone noise reduction and dereverberation techniques

with better performance and robustness for coloured noise scenarios

Basic principles

Beamforming

Conclusion

(10)

State-of-the-art and contributions State-of-the-art and contributions

Single-microphone techniques – spectral subtraction

[Boll 79, Ephraim 85, Xie 96]

•Signal-independent transformation

•Residual noise problem

– subspace-based

[Dendrinos 91, Ephraim 95, Jensen 95]

•Signal-dependent transformation

•Signal + noise subspace

spatial

information robustness

3.Blind transfer function estimation and dereverberation

1. Robust broadband beamforming Multi-microphone techniques

– fixed beamforming

[Dolph 46, Cox 86, Ward 95, Elko 00]

•Fixed directivity pattern

– adaptive beamforming

[Frost 72, Griffiths 82, Gannot 01]

•adapt to different acoustic environments  performance

•`Generalised Sidelobe Canceller’ (GSC)

– inverse, matched filtering

[Myoshi 88, Flanagan 93, Affes 97]

only spectral information a-priori assumptions

(11)

Overview Overview

• Introduction

• Basic principles

Signal model

Signal characteristics and acoustic environment

• Robust broadband beamforming

• Multi-microphone optimal filtering

• Acoustic transfer function estimation and dereverberation

• Conclusion and further research

(12)

Signal model Signal model

• Signal model for microphone signals in time-domain: filtered version of clean speech signal + additive coloured noise

]

0[k y

]

1[k y

]

1[k y_N_

] [ ]

[ ]

[

k x k v k

y_n



h_n

[k ] 

s

[k ] 

v_n

[k ] 

_n



_n

Acoustic impulse response

] [k s

Speech signal

Additive noise

Introduction

Basic principles -Signal model -Characteristics

Beamforming

Conclusion

(13)

Signal model Signal model

• Multi-microphone signal enhancement: microphone signals are filtered with filters wn[k] and summed

f[k] = total transfer function for speech component

zv[k] = residual noise component



 



 





 



 

 [ ]

] [ ]

[ ]

[

] [ ]

[ ]

[ ¹

0 1

0

k z

k v k

w k

s k

f

k h k

w k

y k

w k

z

v N

n

n n

N n

n n

N n

n

 



^























• Techniques differ in calculation of filters:

Noise reduction :minimise residual noise zv[k] and limit speech distortion

Dereverberation : f[k]=δ[k] by estimating acoustic impulse responses hn[k]

Introduction

Beamforming

Conclusion

(14)

Signal characteristics Signal characteristics

• Speech:

Broadband (300-8000 Hz)

Non-stationary

On/off-characteristic

 Speech detection algorithm (VAD)

Linear low-rank model: linear combination of basis functions

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 -0.4

-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

Amplitude

Time (sec)

] [ ]

[

1

k a

k ^R _i

i



i



 s

s ^(R=12…20)

• Noise:

unknown signals (no reference available)

slowly time-varying (fan)  non-stationary (radio, speech)

localised  diffuse noise

Introduction

Beamforming

Conclusion

(15)

Acoustic environment Acoustic environment

• Reverberation time T₆₀ : global characterisation

• Acoustic impulse responses:

Acoustic filtering between 2 points in a room

FIR filter (K=1000…2000 taps)

Non-minimum-phase system

 no stable inverse

• Microphone array:

Assumption: point sensors with ideal characteristics

Deviations: gain, phase, position

Distance speaker – microphone array: far-field  near-field

Car Room Church

70 ms 250 ms 1500 ms

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.4

-0.2 0 0.2 0.4 0.6 0.8 1

Time (sec)

Amplitude

Impulse response PSK row 9

Introduction

Beamforming

Conclusion

(16)

Overview Overview

• Introduction

• Basic principles

• Robust broadband beamforming

Novel design procedures for broadband beamformers

Robust beamforming for gain and phase errors

• Multi-microphone optimal filtering

• Acoustic transfer function estimation and dereverberation

• Conclusion and further research

(17)

Fixed beamforming Fixed beamforming

• Speech and noise sources with overlapping spectrum at different positions

Exploit spatial diversity by using multiple microphones

• Technique originally developed for radar applications:

Smallband : delay compensation  broadband

Far-field : planar waves  near-field : spherical waves

Known sensor characteristics  deviations - Low complexity

- Robustness at low signal-to-noise ratio (SNR)

- A-priori knowledge of microphone array characteristics - Signal-independent

FIR filter-and-sum structure: arbitrary spatial directivity pattern for arbitrary microphone array configuration

Suppress noise and reverberation from certain directions

Introduction

Basic principles

Beamforming -Design -Robustness

Conclusion

(18)

Filter-and-sum configuration Filter-and-sum configuration

• Objective: calculate filters wn[k] such that beamformer performs desired (fixed) spatial and spectral filtering

Far-field: - planar waves - equal attenuation

Spatial directivity pattern:

( , )

) (

) , ) (

,

(  





 



w^Tg

S

H



Z



Desired spatial directivity pattern: D

(  ,  )

Introduction

Basic principles

Conclusion

(19)

Design procedures Design procedures

• Design filter w such that spatial directivity pattern optimally fits  minimisation of cost function

Broadband problem: no design for separate frequencies i

 design over complete frequency-angle region

No approximations of integrals by finite Riemann-sum

Microphone configuration not included in optimisation

• Cost functions:

Least-squares  quadratic function

Non-linear cost function  iterative optimisation = complex!

[Kajala 99]

 

 





F

 

H

 

D

 

d



d



J_LS

(w ) ( , ) ( , ) ( , )

²

amplitude and phase

 

 

 





F

 

H

 

D

 

d



d



J_NL

(w ) ( , ) ( , )

²

( , )

² ²

Double integrals only need to be calculated once

) , (  

H

) , (  

D

Introduction

Basic principles

Conclusion

(20)

Design procedures Design procedures

• 2 non-iterative cost functions, based on eigenfilters:

Eigenfilters: 1D and 2D FIR filter design

Extension to design of broadband beamformers

• Novel cost functions:

Conventional eigenfilter technique  (G)EVD

Eigenfilter based on TLS-criterion  GEVD

• Conclusion: TLS-eigenfilter preferred non-iterative design

 

 



       





H D d d

F

J _tot

e TLS T

1 ) , ( )

, ) (

, ( )

(

2

w Q w w

[Vaidyanathan 87, Pei 01]

 

 



      







 



H H d d

D F D

J _c _c

c c eig

2

) , ( )

, ) (

, (

) , ) (

, ( )

(w

reference point required

Introduction

Basic principles

Conclusion

(21)

Non-linear procedure TLS-Eigenfilter

Simulations Simulations

dBdB

Parameters:

-N=5, d=4cm -L=20, f_s=8kHz -Pass: 40ô-80ô -Stop: 0ô-30ô + 90ô-180ô

Delay-and-sum

Angle (deg) Freq (Hz)

dB

(22)

Near-field configuration Near-field configuration

• Near-field: spherical waves + attenuation

• Ultimate goal: design for all distances

• One specific distance: very similar to far-field design (different calculation of double integrals)

• Several distances: trivial extension for most cost functions, for TLS-eigenfilter = sum of generalised Rayleigh-quotients

Take into account distance r between speaker - microphones

  

 





R

tot F r H r D r d d dr

J

(w ) (  ,  , ) (  ,  , ) (  ,  , )

²

 

Finite number (R) of distances





 ^R

r

r r

tot J

J

1

) ( )

(w  w

Deviation for other distances

Trade-off performance for different distances

Introduction

Basic principles

Conclusion

(23)

Far-field pattern Near-field pattern (r=0.2m)

Simulations Simulations

Angle

(deg) Frequency

(Hz)

dBFar-field design

Angle

(deg) Frequency

(Hz)

dB

Mixed near-field far-field

Angle Frequency

dB

Angle Frequency

dB

Parameters:

-N=5, d=4cm -L=20, f_s=8kHz -Pass: 70ô-110ô -Stop: 0ô-60ô + 120ô-180ô

(24)

• Small deviations from the assumed microphone characteristics (gain, phase, position)  large deviations from desired directivity pattern, especially for small-size microphone arrays

• In practice microphone characteristics are never exactly known

• Consider all feasible microphone characteristics and optimise

average performance using probability as weight

– requires knowledge about probability density functions

worst-case performance  minimax optimisation problem

Robust broadband beamforming Robust broadband beamforming

1 0

1

0

, , ) ( ) ( )

(

0 1





 





_N _N _N

A A

mean J A A f A f A dA dA

J

N



Incorporate specific (random) deviations in design







 



  





_position

/ cos phase

) , ( gain

) , ( )

,

(

_n ^j ^j ^f ^c

n a e ⁿ e ⁿ ^s

A

     

^ ^ ^ ^



^ ^ ^

Measurement or calibration procedure

Introduction

Basic principles

Conclusion

(25)

Simulations Simulations

• Non-linear design procedure

• N=3, positions: [-0.01 0 0.015] m, L=20, fs=8 kHz

• Passband = 0ô-60ô, 300-4000 Hz (endfire) Stopband = 80ô-180ô, 300-4000 Hz

• Robust design - average performance:

Uniform pdf = gain (0.85-1.15) and phase (-5^o-10^o)

• Deviation = [0.9 1.1 1.05] and [5ô -2ô 5ô]

Design J J_dev J_mean J_max

Non-robust 0.1585 87.131 275.40 3623.6

Average cost 0.2196 0.2219 0.3371 0.4990 Maximum

cost 0.1707 0.1990 0.4114 0.4167

Introduction

Basic principles

Conclusion

(26)

Non-robust design Robust design

No deviationsDeviations (gain/phase)

Simulations Simulations

dB

Angle

(deg) Frequency

(Hz) dBdB

Angle

(deg) Frequency

(Hz)

dB

Introduction

Basic principles

Conclusion

(27)

Non-robust design Robust design

Simulations

(28)

Overview Overview

• Introduction

• Basic principles

• Robust broadband beamforming

• Multi-microphone optimal filtering

GSVD-based optimal filtering technique

Reduction of computational complexity

Simulations

• Acoustic transfer function estimation and dereverberation

• Conclusion and further research

(29)

Multi-microphone optimal filtering Multi-microphone optimal filtering

Objective: optimal estimate of speech components in microphone signals

Minimise MSE ^E

 

^xn^[^k ^] ^z^[^k^]



²



No a-priori assumptions



²



_[ _]



²



]

[ [ ] [ ] min [ ] [ ] [ ]

minE k k E k ^T k k

k

k x z x W y

W

W   

] [ ]

[ ]

[k _yy¹ k _yx k

WF R R

W  ^

Multi-channel Wiener Filter



^[ ^] ^[ ^]



] [ ]

[k _yy¹ k _yy k _vv k

WF R R R

W  ^ 

-Speech and noise independent

-2nd order statistics noise stationary  estimate during noise periods (VAD)

Multi-microphone Signal-dependent Robustness

Introduction

Basic principles

Beamforming

Multi-microphone optimal filtering -Optimal filtering -Complexity -Simulations

Conclusion

(30)

Multi-microphone optimal filtering Multi-microphone optimal filtering

• Implementation procedure:

based on Generalised Eigenvalue Decomposition (GEVD) – take into account low-rank model of speech

– trade-off between noise reduction and speech distortion

QRD [Rombouts 2002] , subband [Spriet 2001] lower complexity

• Generalised Eigenvalue Decomposition (GEVD):







] [ ]

[ ] [ ]

[

] [ ]

[ ] [ ]

[

k k

T v

vv

T y

yy

Q Λ

Q R

Q Λ

Q R

coloured noise!

Low-rank model











M R

i k k

R i

k k

i i



 1 ,

] [ ]

[

1 ,

] [ ]

[

2 2









] ] [

[ ] 1 [

diag ]

[ ]

[ ₂

2

k k k - η

k

k ^T

i T i

WF Q Q

W 



 ^ 



Signal-dependent FIR-filterbank

Introduction

Basic principles

Beamforming

Conclusion

(31)

General class of estimators General class of estimators

• Multi-channel Wiener filter: always combination of noise reduction and (linear) speech distortion:

estimation error: e[k] 



^I_M ^^W_WF^T ^[^k^]



^x^[^k^] ^^W_WF^T ^[^k^]^v^[^k^]

• General class: noise reduction  speech distortion

– =1 : MMSE (equal importance)

– <1 : less speech distortion, less noise reduction – >1 : more speech distortion, more noise reduction

[Ephraim 95]

] ] [

[ ) 1 (

] [

] [ ]

diag [ ]

[ ]

[ ₂ ₂

2 2

k k η k

k η k k

k ^T

i i

i T i

WF Q Q

W 









 ^ 





speech distortion residual noise

Introduction

Basic principles

Beamforming

Conclusion

(32)

• Decomposition in spectral and spatial filtering term

• Desired beamforming behaviour for simple scenarios

Frequency-domain analysis Frequency-domain analysis

  ^ ^ ^

WWF

 

  ^ ^

_v

  ^

x x

P P

P



^Γ^_y¹

    ^

^Γ^_x¹

^

^e₁

spectral filtering

(PSD) spatial filtering (coherence)

Introduction

Basic principles

Beamforming

Conclusion

Speech Noise

(33)

Complexity reduction Complexity reduction

• Recursive version: each time step calculation GSVD + filter

• Complexity reduction using:

Recursive techniques for recomputing GSVD [Moonen 90]

Sub-sampling (stationary acoustic environments) High computational complexity

Batch Recursive QRD ^[Rombouts]

sub = 1 7504 Gflops 2.1 Gflops 358 Mflops sub = 20 375 Gflops 105 Mflops 18 Mflops

(N = 4, L = 20, M=80, f_s = 16 kHz, P = 4000, Q = 20000)

) (

3

16M³  M ² PQ 20 M.5 ² 3 M.5 ²

Real-time implementation possible

Introduction

Basic principles

Beamforming

Conclusion

(34)

Complexity reduction Complexity reduction

• Incorporation in ‘Generalised Sidelobe Canceller’ (GSC) structure: adaptive beamforming

Creation of speech reference and noise reference signals

Standard multi-channel adaptive filter (LMS, APA)

]

0[k y

]

1[k y

]

1[k y_N_



Speech reference

]

0[k w

]

1[k w

]

1[k w_N

Optimal filter Noise

reference(s)

 +

– ]

0[k w_a

Adaptive filter

 delay

Increase noise reduction performance

Complexity reduction by using shorter filters

Introduction

Basic principles

Beamforming

Conclusion

(35)

Simulations Simulations

• N=4, SNR=0 dB, 3 noise sources (white, speech, music), fs=16 kHz

• Performance: improvement of signal-to-noise ratio (SNR)

0 500 1000 1500

0 5 10 15

Reverberation time (msec)

Unbiased SNR (dB)

Delay-and-sum beamformer

GSC (L_ANC=400, noise ref=Griffiths-Jim) Recursive GSVD (L=20, L_ANC=400, all nref) Recursive GSVD (L=20, no ANC)

Introduction

Basic principles

Beamforming

Conclusion

(36)

Simulations Simulations

• N=4, SNR=0 dB, 3 noise sources, fs=16 kHz, T60=300 msec

• ‘Power Transfer Functions’ (PTF) for speech and noise component

-30 -25 -20 -15 -10 -5 0

Speech

Noise

Spectrum (dB)

Recursive GSVD (L=20, no ANC)

Recursive GSVD (L=20, L_ANC=400, all noise ref)

Introduction

Basic principles

Beamforming

Conclusion

(37)

Conclusions Conclusions

• GSVD-based optimal filtering technique:

Multi-microphone extension of single-microphone subspace- based enhancement techniques

Signal-dependent  low-rank model of speech

No a-priori assumptions about position of speaker and microphones

• SNR-improvement higher than GSC for all reverberation times and all considered acoustic scenarios

• More robust to deviations from signal model:

Microphone characteristics

Position of speaker

VAD: only a-priori information!

– No effect on SNR-improvement – Limited effect on speech distortion

Introduction

Basic principles

Beamforming

Conclusion

(38)

Advantages - Disadvantages Advantages - Disadvantages

Fixed

beamforming Adaptive

beamforming Optimal filtering

Signal-dependent no yes yes

Noise reduction + ++ +++

Dereverberation + + no

Complexity low average high

VAD no yes yes

Robustness

-

(+)

--

(+) ++

Introduction

Basic principles

Beamforming

Conclusion

(39)

Overview Overview

• Introduction

• Basic principles

• Robust broadband beamforming

• Multi-microphone optimal filtering

• Acoustic transfer function estimation and dereverberation

Time-domain technique

Frequency-domain technique

• Conclusion and further research

(40)

Objective Objective

]

0[k y

]

1[k y

]

1[k y_N_

]

1[k h



]

0[k w

]

1[k w

]

1[k w_N_ ]

[k z

Blind estimation of acoustic impulse responses Time-domain Frequency-domain

Noise reduction and dereverberation Dereverberation Source

localisation

Introduction

Basic principles

Beamforming

Transfer function estimation and dereverberation -Time-domain -Frequency-domain -Dereverberation

Conclusion

(41)

• Signal model for N=2 and no background noise

• Subspace-based technique: impulse responses can be computed from null-space of speech correlation matrix

Eigenvector corresponding to smallest eigenvalue

Coloured noise: GEVD

Problems occuring in time-domain technique:

– sensitivity to underestimation of impulse response length – low-rank model in combination with background noise

Time-domain techniques Time-domain techniques

S(z)

H₀(z)

H₁(z) Y₁(z) Y₀(z) Signals

]

yy[k R

-H₁(z)

H₀(z)



Null-space

0

±α

E(z)

Introduction

Basic principles

Beamforming

Conclusion

(42)

• Batch estimation techniques form basis for deriving adaptive stochastic gradient algorithm

• Usage :

Estimation of partial impulse responses  time-delay estimation for acoustic source localisation

For source localisation adaptive GEVD algorithm is more robust than adaptive EVD algorithm (and prewhitening) in

Stochastic gradient algorithm Stochastic gradient algorithm

1 ]

[ subject to

, ] [

min

u R u u R u



u ^T _yy k ^T _vv k

 

] 1 [

] [ ]

1 [

] 1 ] [

1 [

] [ ] [ ]

[ ] [ ] [ ]

[ ]

1 [

] [ ] [ ]

[

 









 



k k

k k k

k k

k e k

k

k k

k e

vv T

u R

u u u

u R

y u

u

y u



Introduction

Basic principles

Beamforming

Conclusion

(43)

• Problems of time-domain technique  frequency-domain

• Signal model: rank-1 model

• Estimation of acoustic transfer function vector H() from GEVD of correlation matrices and

Corresponding to largest generalised eigenvalue  no stochastic gradient algorithm available (yet)

Unknown scaling factor in each frequency bin:

 can be determined only if norm is known

 algorithm only useful when position of source is fixed (e.g. desktop, car)

Frequency-domain techniques Frequency-domain techniques

















) ( 1 1 0

) (

1 1 0

) (

) ( )

( )

( ) (

) (

) ( )

(



V H

Y

 





 









 





 







 





 









 N N

N V

V V S

H H H

Y Y Y

) (



Ryy R_vv(



)

) (



H

Introduction

Basic principles

Beamforming

Conclusion

(44)

Combined noise reduction and Combined noise reduction and

dereverberation dereverberation

• Filtering operation in frequency domain:

• Dereverberation:  normalised matched filter

• Combined noise reduction and dereverberation:

Z() is optimal (MMSE) estimate of S()

Optimal estimate of s[k]  integration of multi-channel Wiener-filter with normalised matched filter

Trade-off between both objectives

) ( ) ( )

( ) ( ) ( )

( ) ( )

(

) (



V W

H W

Y

W ^H

F H

H S

Z

  



 



 

 1 ) (  

F

)

2

( ) ) (

( 

 

H

W_d



H Residual noise

) ˆ ( ) ( )

ˆ ( 

H



S



X



Introduction

Basic principles

Beamforming

Conclusion

(45)

Simulations Simulations

• N=4, d=2 cm, fs=16 kHz, SNR=0 dB, T60= 400 msec

• FFT-size L=1024, overlap R=16

• Performance criteria:

Signal-to-noise ratio (SNR)

Dereverberation-index (DI) :

SNR (dB) DI (dB) Original microphone signal 2.88 4.74

Noise reduction 16.82 4.73

Dereverberation 2.30 0.86

Combined noise reduction and

dereverberation 10.12 1.35



 



 ^H d



 20log ( ) ( ) 2

1

10 W H

Introduction

Basic principles

Beamforming

Conclusion

(46)

Simulations Simulations

Introduction

Basic principles

Beamforming

Conclusion