Theoretical analysis of binaural cue
preservation using Multi-Channel Wiener Filtering and Interaural Transfer Functions
S. Doclo, T.J. Klasen, T. Van den Bogaert, J. Wouters, M. Moonen
Katholieke Universiteit Leuven, Belgium - Dept. Electrical Engineering (SCD), ExpORL
1 Binaural hearing aids
• Hearing impairment → reduction of speech intelligibility in background noise – signal processing to selectively enhance/extract useful speech signal
– multiple microphones available: spectral + spatial processing
– many hearing impaired are fitted with a hearing aid at both ears
• Binaural auditory cues:
– Interaural Time Difference (ITD) - Interaural Level Difference (ILD)
– binaural cues, in addition to spectral and temporal cues, play an important role in binaural noise reduction and sound localization
• Bilateral system: independent processing → binaural cues are not preserved
• Binaural system: cooperation between left and right hearing aid
Hearing aid user
Z1(ω) Z0(ω)
W0(ω) W1(ω)
Y0,0(ω) · · · Y0,M0−1(ω) Y1,0(ω) · · · Y1,M1−1(ω)
Hearing aid user
Z1(ω) Z0(ω)
W0(ω) W1(ω)
Y0,0(ω) · · · Y0,M0−1(ω) Y1,0(ω) · · · Y1,M1−1(ω)
1. SNR improvement: noise reduction, limit speech distortion
2. Preservation of binaural cues to exploit binaural hearing advantage 3. No assumptions about position of speech source and microphones
2 Binaural multi-channel Wiener filter
Multi-channel Wiener filter (MWF): MMSE estimate of speech component in microphone signal at both ears
binaural cue preservation of speech + noise
noise component Partial estimation of
[Klasen 2005]
[Doclo 2002, Spriet 2004]
[Doclo 2005, Klasen 2006]
Function (ITF) Interaural Transfer
Extension with ITD-ILD or trade-off noise reduction
and speech distortion
Speech-distortion-weighted multi-channel Wiener filter (SDW-MWF)
• Configuration: microphone array at left and right hearing aid Y
0,m(ω) = X
0,m(ω) + V
0,m(ω), m = 0 . . . M
0− 1
Y (ω) = h Y
0,0(ω) . . . Y
0,M0−1(ω) Y
1,0(ω) . . . Y
1,M1−1(ω) i
T= X(ω) + V(ω)
• Cooperation between hearing aids: use all available microphone signals to
generate output signal at both ears → computation of filters W
0(ω) and W
1(ω) Z
0(ω) = W
0H(ω)Y(ω), Z
1(ω) = W
1H(ω)Y(ω), W (ω) =
"
W
0(ω) W
1(ω)
#
• SDW-MWF: estimate speech component in microphone signal at both ears;
additional trade-off between noise reduction and speech distortion J
SDW(W) = E
"
X
0,r0− W
0HX X
1,r1− W
1HX
#
2
+ µ
"
W
0HV W
1HV
#
2
⇒ W
SDW= R
−1r
R =
"
R
x+ µR
v0
M0
MR
x+ µR
v#
, r =
"
r
x0r
x1#
, R
x= R
y− R
v– estimate R
yduring speech-dominated segments and R
vduring noise-dominated segments → robust VAD required
– no assumptions about positions of microphones and sources
3 Theoretical analysis
• Performance measures:
– SNR improvement (left/right): difference between input and output SNR – ITD error (speech/noise): phase of cross-correlation
– ILD error (speech/noise): power ratio
• Single speech source, no assumptions about noise field:
– X = AS with A acoustic transfer function vector (head, microphones, room) W
SDW,0= R
−1vA
A
HR
−1vA +
Pµs
A
∗0,r0, W
SDW,1= R
−1vA
A
HR
−1vA +
Pµs
A
∗1,r1- ITD/ILD of speech component is perfectly preserved
- ITD/ILD of output noise component = ITD/ILD of speech component !
4 Extension with Interaural Transfer Function
• Control binaural cues of noise (and speech) component
• Interaural Transfer Function (ITF): incorporates both ITD and ILD – assumption: single localized noise source (constant ITF)
IT F
desv= V
0,r0V
1,r1= E{V
0,r0V
1,r∗ 1}
E{V
1,r1V
1,r∗ 1} , IT F
outv(W) = W
H0V W
H1V J
IT Fv(W) = E
W
H0V
W
H1V − IT F
desv2
= E{|W
0HV − IT F
desvW
1HV |
2}
E{|W
1HV |
2} = W
HR
vtW W
HR
v1W
• Total cost function: noise reduction, speech distortion, cue preservation J
tot(W) = J
SDW(W) + αJ
IT Fx(W) + βJ
IT Fv(W)
– subtle difference with quadratic ITF cost function in [Klasen, ICASSP 2006]
– no-closed form expression → iterative optimization techniques
5 Simulation results
• Investigate effect of α and β on noise reduction and cue preservation
• Data model:
– one speech source + one noise source, non-reverberant environment
– head shadow effect → HRTFs (equal for microphones on same hearing aid) – sensor noise: R
v(ω) = P
v(ω) h g (ω, θ
v)g
H(ω, θ
v)+δI
Mi
• Simulation parameters:
– speech source at −5
◦and noise source at 40
◦– 2-microphone array (d
0= 2 cm, d
1= 1.5 cm)
– f = 2 kHz, f
s= 16 kHz, SNR = 0 dB, δ = 0.01 (sensor noise −20 dB), µ = 1
• Conclusions:
– Increasing β substantially decreases ITD/ILD error of noise component, but also decreases SNR improvement
– α can be used for reducing ITD/ILD error of speech component caused by increasing β
0 2 4 6 8 10
0 1
2 3
4 5
0 0.1 0.2 0.3 0.4 0.5
α ITD error speech [%]
β
0 2 4 6 8 10
0 1
2 3
4 5
0 10 20 30 40 50
α ITD error noise [%]
β
0 2 4 6 8 10
0 1
2 3
4 5
18 20 22 24 26 28 30 32
α Average ∆SNR [dB]
β
−5 5 15
30 210
60 240
90 270
120
300
150
330
180 0