Improved Signal Processing in Hearing Aids:
A system Approach
Kim Ngo
ESAT-SCD, Katholieke Universiteit Leuven, Belgium
EST-SIGNAL Meeting September 2009.
Outline
◮ Introduction (Hearing aids, hearing loss, acoustic feedback, background noise). ◮ Problem statement and motivation.
◮ Multi-Channel Wiener Filter (MWF) based Noise Reduction (NR). ◮ SDW-MWF based aprroach to integrate NR and DRC.
◮ Adaptive Feedback Cancellation (AFC). ◮ Conclusion.
◮ Publications. ◮ Timeline
Introduction
Research areas in Hearing Aids
Binaural Processing Wireless Communications
AD/DA Converters Digital Signal Processing
Analog Signal Processing Automatic Sound Classification
Single−channel Noise Reduction Directional Microphones Loudspeaker Speech/Audio Coding Source Localisation Beamforming Source Separation Active Noise Control Filterbank Design Dereverberation Automatic Speech Recognition
Feedback Cancellation
Dynamic Range Compression
Multi−channel Noise Reduction
Introduction
Sensorineural hearing loss
0000000000000000000000000000000 1111111111111111111111111111111 0 10 20 30 40 50 60 70 80 90 100 0 750 125 250 500 1000 2000 3000 4000 5000 6000 7000 8000 Hearing level [dB] Normal Hearing Frequency [Hz] Profound Hearing loss
Mild Hearing loss
Moderate Hearing loss
Severe Hearing loss
Dynamic Range
Uncomfortable Loud Soft Threshold Sensorineural Normal d d’ a b c a’ b’ c’◮ Hearing threhold increases with increasing frequency. ◮ Threshold of hearing raised as a result of the hearing loss. ◮ Threshold of loudness discomfort remains the same. ◮ Reduced dynamic range (threshold to discomfort).
Introduction
Dynamic Range Compression
◮ Audibility is an important first step in improving the intelligibility of a speech signal.
0 20 40 60 80 100 120 20 30 40 50 60 70 80 90 100 110 120 output SPL (dB) input SPL (dB) Attack time Output Spectrum Gain adjustment Power (dB)
Critical Band Gain Model
CR/CT Release time
Input spectrum
◮ Automatically adjust the gain based on the intensity level. ◮ High intensity attenuated - low intensity amplified.
◮ Compression Threshold (CT) (point at which the slope change) ◮ Compression Ratio (CR) (Steepness of the slope)
◮ Amplification gain G dB Objective:
◮ Mapping the wide dynamic range of speech into the reduced dynamic range. ◮ Weak sounds audible - loud sounds not uncomfortably loud.
Introduction
Acoustic Feedback
path forward Microphone signal Loudspeaker signal Feedback signal acoustic feedback path Near−end signal F G◮ Undesired acoustic coupling between loudspeaker and microphone. ◮ Limits the maximum amplification.
◮ Feedback are most severe at high frequencies. ◮ Instability results in a high frequency tone. ◮ Correlation between near-end signal and
loudspeaker signal.
◮ Standard adaptive filtering converge to a biased solution.
Objective:
◮ Increase maximum stable gain (MSG) ◮ Reduce bias and convergence (misadjustment) ◮ Minimize speech distortion (sound quality)
Introduction
Background Noise
◮ Reduced frequency resolution (separating sounds of different frequencies) ◮ Reduced temporal resolution (intense sounds mask weaker sounds)
Hearing aid users
◮ Understanding speech in noise is a major problem
◮ multiple speakers, fans, traffic etc. ◮ Reduces the intelligibility of speech. ◮ More sensitive to the noise level. ◮ Need higher SNR to communicate.
Objective:
◮ Maximally reduce the noise (SNR improvement) ◮ Minimize speech distortion (sound quality) ◮ Improve intelligibility of speech
Problem Statement and Motivation
◮ Compensation of sensorineural hearing loss requires NR, DRC and AFC. ◮ The general problem of NR, DRC and AFC is not new.
◮ Each of these areas are usually developed and evaluated independently. Existing mehtod
◮ Hearing aids typically use a serial concatenation of NR and DRC and AFC ◮ Counteract and limit functionality of other algorithms
Short-term objective:
◮ Development on Multi-channel NR (SDW-MWF). ◮ Integration of SDW-MWF and DRC.
◮ Development on Adaptive feedback Cancellation (PEM-based AFC) ◮ Analyse any undesired effects in the integration process.
Long-term objective:
◮ Integration of NR, DRC and AFC into one signal processing scheme. ◮ SNR improvement vs. Audibility vs. speech distortion vs. Increase MSG
Speech Distortion Weighted Multi-channel Wiener
Filter (SDW-MWF)
+ ... ... Noise Noise Desired signal Noise X2(k,l) W2(k,l) XM(k,l) WM(k,l) Z(k,l) X1(k,l) W1(k,l)Frequency-domain microphone signals,
X(k,l) =Xs(k,l) +Xn(k,l) (1) MWF MMSE criterion, W∗(k,l) =arg min W ε{|X s 1(k,l) −W HX (k,l)|2} (2) SDW-MWF MMSE criterion, W∗(k,l) =arg min W ε{|X s 1(k,l) −W HXs (k,l)|2} + µε{|WHXn(k,l)|2} (3) Optimal SDW-MWF W∗(k,l) =` Rs(k,l) + µRn(k,l)´−1 Rs(k,l)e1 (4)
Output of the SDW-MWF can be written as
Z(k,l) =W∗,H(k,l)X(k,l). (5)
Concept SDW-MWF
µ
◮ Second-order statistics of the noise are assumed to be stationary
Rs(k,l) =Rx(k,l) −Rn(k,l) (6)
◮ Estimation of Rx(k,l)and Rn(k,l)an averaging
time window of 2-3 seconds is typically used. 0 5 10 15 20 25 30
−0.2 −0.1 0 0.1 0.2 0.3 0.4 Time (sec) Update speech+noise correlation matrice Update noise−only correlation matrice Properties of SDW-MWF
◮ SDW-MWF depends on long-term average of spectral and spatial characteristics. ◮ Eliminates short-time effects, such as musical noise
◮ SDW-parameterµis a fixed value for all frequencies Properties not included in SDW-MWF
◮ Speech and noise can be non-stationary spectrally and temporally. ◮ speech contains many pauses while noise can be continously present. ◮ Different weight to speech dominant segments and to noise dominant segments
Speech Presence Probability
Two-state speech model
H0(k,l) :Xi(k,l) = Xin(k,l)
H1(k,l) :Xi(k,l) = Xis(k,l) +Xin(k,l) (7)
Conditional Probability Density Functions of the observed signals
p(Xi(k,l)|H0(k,l)) = 1 πλn i(k,l) exp ( −|Xi(k,l)| 2 λn i(k,l) ) p(Xi(k,l)|H1(k,l)) = 1 π(λs i(k,l) + λ n i(k,l)) exp ( − |Xi(k,l)| 2 λsi(k,l) + λn i(k,l) ) (8)
Speech Presence Probability
p(k,l) = 1+ q(k,l) 1−q(k,l)(1+ ξ(k,l))exp(−υ(k,l)) ff−1 (9)
◮ Conditional SPP is estimated for each frequency bin and each frame
Extension of SDW-MWF
µinto SDW-MWF
SPPIncorporating the conditional Speech Presence probability in SDW-MWF
W∗=arg min
W pε{|X
s
1−WHX|2|H1} + (1−p)ε{|WHXn|2} (10)
The SDW-MWF incorporating the conditional SPP can then be written as
W∗
SPP=
“
Rs+“1p”Rn”−1Rse1. (11)
◮ If p=0, the SDW-MWF
SPPattenuates the noise by applying W∗←0. ◮ If p=1, the SDW-MWF
SPPsolution corresponds to the MWF solution (µ=1). ◮ If 0<p<1 there is a trade-off between noise reduction and speech distortion. The combined solution can then be written as
W∗ SPP= „ Rs+ „ 1 α(1 µ)+(1−α)p « Rn «−1 Rse1 (12)
Extension of SDW-MWF
µinto SDW-MWF
SPP◮ Example of SPP for a frame
0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 Conditional SPP Weighting factor µ=1 µ=2 µ=3 µ=4 1/p ◮ SDW-based on SPP 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 Conditional SPP Weighting factor µ=1 and α=0.5 µ=2 and α=0.5 µ=3 and α=0.5 µ=4 and α=0.5 13 / 33
Extension of SDW-MWF
µinto SDW-MWF
SPP◮ Example of SPP for a frame
0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 Conditional SPP Weighting factor µ=1 µ=2 µ=3 µ=4 1/p ◮ SDW-based on SPP 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 Conditional SPP Weighting factor µ=1 and α=0.5 µ=2 and α=0.5 µ=3 and α=0.5 µ=4 and α=0.5
◮ Example of SPP for a frame (modified)
0 2000 4000 6000 8000 0 0.2 0.4 0.6 0.8 1 1.2 Frequency (Hz) Conditional SPP
ζmin=0.1 and ζmax=0.3162 ζmin=0.1 and ζmax=0.60
◮ SDW-based on SPP (modified) 0 2000 4000 6000 8000 0 0.2 0.4 0.6 0.8 1 1.2 Frequency (Hz) Conditional SPP α=0 α=1, µ=2 α=0.25, µ=2 α=0.50, µ=2 α=0.75, µ=2
Extension of SDW-MWF
µinto SDW-MWF
SPP A priori SNR FFT IFFT (Synthesis) Input signal Output signal Filtering (Analysis) & A priori SAP A posteriori SNR correlation matrices Conditional SPP SPP−SDW−MWF Frequency domain Additional signalprocessing path Existing signal processing path
ˆ q= P(H0) p= P(H1|X) Z= W ∗,H SPPX W∗ SPP
Challenges on additional signal processing path ◮ Incorporating a psychoacoustical Model ◮ Masking properties of the human auditory
system
◮ Auditory properties of speech perception ◮ Define perceptual relevant criteria ◮ Make residual noise perceptually inaudible Challenges on existing signal processing path
◮ Limited opportunities
◮ Continously updating correlation matrices in speech+noise periods.
◮ development on Voice Activity Detection
Experimental Set-up for SDW-MWF
µand
SDW-MWF
SPPSimulations have been performed with
◮ A 2-microphone behind-the-ear hearing aid mounted on a CORTEX MK2 manikin. ◮ The loudspeakers (FOSTEX 6301B) are positioned at 1 meter from the center of
the head.
◮ The reverberation time T
60=0.21s.
◮ The speech is located at 0◦and the two multi-talker babble noise sources are
located at 120◦and 180◦.
◮ The speech signals consist of male sentences from the HINT-database ◮ and the noise signal consist of a multi-talker babble from Auditec ◮ The speech signals are sampled at 16kHz
Experimental results for SDW-MWF
µand
SDW-MWF
SPP 0 5 10 15 20 25 0 2 4 6 8 10 12 14 16 input SNR (dB) ∆ SNR intellig (dB) α=0 µ=1 and α=1 µ=1 and α=0.75 µ=1 and α=0.50 µ=1 and α=0.25 0 5 10 15 20 25 0 1 2 3 4 5 6 7 8 input SNR (dB) SD (dB) α=0 µ=1 and α=1 µ=1 and α=0.75 µ=1 and α=0.50 µ=1 and α=0.25 17 / 33Integration of MWF based NR and DRC
Motivation
◮ When NR and DRC are serially concatenated, undesired interaction effects occur ◮ DRC can counteract NR by amplifying the residual noise after NR
◮ Degrades the SNR and defeats the purpose of using NR
....
..
Speech
DRC
Z
s(k, l)
Gs dB SDW-MWFµW
µ(k, l)
ˆ
Z
s(k, l)
◮ DRC does not distinction between speech and noise dominant segments ◮ Low intensity segments are amplified equally (including residual noise)
Z(k,l) = W∗,H(k,l)X(k,l) (13)
Z(k,l) = Xˆs
Extension of DRC into Dual-DRC
0 20 40 60 80 100 120 0 20 40 60 80 100 120 input SPL (dB) output SPL (dB) Speech DRC Noise DRC◮ Reusing the conditional SPP estimated in SDW-MWFSPP ◮ A dual-DRC concept can be
introduced
◮ Using switchable compression characteristic
Dual-DRC conept
◮ If(p(k,l) =1)the speech DRC applied.
◮ If(p(k,l) =0)it is undesirable to amplify and the noise DRC is applied. ◮ For the in-between cases a weighted sum of the two DRC curves is used
.... .. Conditional SPP estimation Dual−DRC and SDW-MWFSPP p(k, l) Gn dual,dB Gs dB Zs(k, l) Zˆs(k, l) WSPP(k, l) 19 / 33
Experimental Set-up for SDW-MWF based NR and
DRC
Simulations have been performed
◮ A 2-microphone behind-the-ear hearing aid mounted on a CORTEX MK2 manikin. ◮ The loudspeakers (FOSTEX 6301B) are positioned at 1 meter from the center of
the head.
◮ The reverberation time T
60=0.21s.
◮ The speech is located at 0◦and the two multi-talker babble noise sources are located at 120◦and 180◦.
◮ The speech signals consist of male sentences from the HINT-database ◮ The noise signals consist of a multi-talker babble from Auditec ◮ The signals are sampled at 16kHz.
◮ an FFT length of 128 with half overlapping frames. ◮ The DRC is implemented based on 20 critical bands. The following parameters are fixed during all simulations:
◮ The input level is set to 65dB SPL at the hearing aid microphones. ◮ The DRC attack and release time are set to at=10ms and rt=150ms. ◮ The compression threshold is set to CT=30dB.
◮ The hearing aid gain is set to G dB=30dB.
Experimental results for SDW-MWF
µ-based NR and
DRC
0 5 10 15 20 25 −15 −10 −5 0 5 10 input SNR ∆ SNR intellig (dB) 1/p+DRC µ=1+DRC µ=2+DRC µ=3+DRC 0 5 10 15 20 25 5.5 6 6.5 7 7.5 8 input SNR SD (dB) 1/p+DRC µ=1+DRC µ=2+DRC µ=3+DRC 21 / 33Experimental results for SDW-MWF
SPP-based NR
and dual-DRC
0 5 10 15 20 25 −10 −5 0 5 10 15 input SNR ∆ SNR intellig (dB) 1/p+DRC ∆GdB=5 ∆GdB=10 ∆GdB=15 0 5 10 15 20 25 4 5 6 7 8 9 10 11 12 13 14 input SNR SD (dB) 1/p+DRC ∆GdB=5 ∆GdB=10 ∆GdB=15Combined NR and DRC and AFC
Long-term objective:
◮ Integration of NR, DRC and AFC into one signal processing scheme.
.... .. Conditional SPP estimation Dual−DRC and
SDW-MWF
SPPG
n dual,dBG
s dBZ
s(k, l)
W
SPP(k, l)
ˆ
Z
s(k, l)
p(k, l) 23 / 33Adaptive Feedback Cancellation
− feedback cancellation path acoustic feedback path + forward path + F v(t) G u(t) y(t) d[t, ˆf(t)] x(t) ˆ y[t|ˆf(t)] ˆ FThe microphone signal
y(t) =v(t) +x(t) =v(t) +F(q,t)u(t) (15) The feedback-compensated
d(t) =v(t) + [F(q,t) − ˆF(q,t)]u(t). (16) ◮ Adaptively model the feedback path
◮ Estimate the feedback signal
◮ correlation between the near-end signal and the loudspeaker signal ◮ Caused by the closed signal loop
Main Challenge
◮ Reduce the correlation between the near-end signal and the loudspeaker signal ◮ Prediction error method-based AFC (PEM-based AFC)
PEM-based AFC (single near-end signal model)
+− acoustic feedback path +− prefilter feedback cancellation path path + forwardsource signal model decorrelating F G v(t) e(t) ˆ F ˜ y[t, ˆh(t)] ˆ F H u(t) d[t, ˆf(t)] ˜ u[t, ˆh(t)] ε[t, ˆh(t), ˆf(t − 1)] ˆ y[t|ˆf(t)] y(t) x(t) ˆf(t) ˆ H−1 ˆ H−1 Microphone signal ◮ y(t) =F(q,t)u(t) +H(q,t)e(t)
◮ Prefiltering of loudspeaker and microphone signals.
◮ Inverse near-end signal model.
The all-pole model can be written as
H(q,t) = 1 C(q,t)= 1 1+c1(t)q−1+ ... +cnc(t)q−nc (17) Prediction error ε[t, ξ(t)] =H−1(q,t)[y(t) −F(q,t)u(t)] (18)
Minimize prediction error min ξ(t)= 1 2N t X k=1 ε2[k, ξ(t)] (19)
◮ Single all-pole model (Short-term predictor) fails to remove the periodicity
PEM-based AFC (cascaded near-end signal model)
◮ A cascade of near-end signal models removes the coloring and periodicity Sinusoidal model d(t) = P X n=1 Ancos(ωnt+ φn) +r(t), t=1, ...,M (20)
Cascaded near-end signal model
y(t) =F(q,t)u(t) +H1(q,t)H2(q,t)e(t) (21)
The CPZLP model can be written as
d(t) = P Y n=1 1−2ρcosωnz−1+ ρ2z−2 1−2 cosωnz−1+z−2 ! e(t) (22)
the output from the prediction error filter
e(t, ω) = P Y n=1 1−2 cosωnz−1+z−2 1−2ρcosωnz−1+ ρ2z−2 ! d(t) (23)
Incorporating pitch estimation in PEM-based AFC
◮ Speech signals are usually considered as voiced or unvoiced ◮ Voiced sounds consist of fundamental frequencyω
0and its harmonic components.
◮ CPZLP estimates all frequencies independently ◮ Does not exploits the harmonicity of speech. Fundamental frequency estimation (pitch estimation)
◮ Sinusoids are having frequencies that areω
0, i.e.,ωn= ω0n
◮ This follows naturally from voiced speech being quasi-periodic. Applying pitch estimation in PEM-based AFC
e(t, ω) = P Y n=1 1−2 cosωnz−1+z−2 1−2ρcosωnz−1+ ρ2z−2 ! d(t) (24)
Pitch estimation considered are
◮ Subspace-orthogonality-based pitch estimation. ◮ Subspace-shiftinvariance-based pitch estimation ◮ Optimal-filtering based pitch estimation.
Experimental Set-up for PEM-based AFC
Simulations have been performed
◮ The near-end sinusoidal model order is set to P = 15 ◮ The near-end noise model order is set to 30.
◮ Both near-end signal models are estimated using 50%overlapping data windows of length M = 320 samples.
◮ The NLMS adaptive filter length is set to n F= 200. ◮ The near-end signal is a 30 s speech signal at fs= 16 kHz.
◮ The forward path gain K(t)is set 3 dB below the maximum stable gain (MSG) without feedback cancellation.
Experimental results for PEM-based AFC
0 5 10 15 20 25 30 10 12 14 16 18 20 22 24 26 28 t (s) M S G (d B ) 20 log10K(t) MSG F (q) AFC-LP AFC-CPZLP AFC-shiftinv AFC-orth AFC-optfilt 0 1 2 3 4 5 x 105 −15 −10 −5 0 t/Ts(samples) M A F (d B ) AFC-LP AFC-CPZLP AFC-shiftinv AFC-orth AFC-optfilt 29 / 33Conclusion
◮ SDW-MWF
µcan be further improved by using SDW-MWFSPP. ◮ SDW-MWF
SPPcan be further extended with more advance SDW criteria ◮ An MWF-based NR and Dynamic Range Compression have been proposed. ◮ Dual-DRC concept using switchable compression characteristic
Future and current work
Combined MWF and AFC
+ ... ... DRC + − + + + . . . X2(k, l) WM(k, l) X1(k, l) W1(k, l) W2(k, l) u(k, l) Z(k, l) Z(k, l)ˆ XM(k, l) WF(k, l) F1(q) F2(q) FM(q) + ... DRC + . + + . . − − − ... WM(k, l) W1(k, l) W2(k, l) u(k, l) Z(k, l) WF M(k, l) X1(k, l) X2(k, l) XM(k, l) WF 1(k, l) WF 2(k, l)
◮ Applying NR before AFC... ◮ Applying AFC before NR ...
Future and current work
Combined Single-channel NR and AFC
+− feedback cancellation path acoustic feedback path + NR forward path F G u(t) x(t) ˆ y[t|ˆf(t)] ˆ F n(t) v(t) ˆ d[t, ˆf(t)] y(t)¯ y(t) +− feedback cancellation path acoustic feedback path + forward path NR F G u(t) ˆ y[t|ˆf(t)] ˆ F d[t, ˆf(t)] y(t) n(t)v(t) x(t) ˆ d[t, ˆf(t)]
◮ Applying NR before AFC... ◮ Applying AFC before NR ...
Future and current work
further exploit PEM-based AFC using pitch estimation
◮ Using pitch estimation amplitudes can be estimated
◮ Possibility to design more accurat CPZLP model (better decorrelation) Remove the CPZLP model
◮ by re-using the optimal filter for both fundamental frequency estimation and filtering.
◮ Decorrelation by sinusoidal subtraction
◮ Further investigate the use of subspace methods.
Combined Speech Coding and AFC
− feedback cancellation path acoustic feedback path + + decorrelation device forward path F u(t) x(t) ˆ y[t|ˆf(t)] ˆ F G d[t, ˆf(t)] y(t) v(t) − feedback cancellation path acoustic feedback path + + forward path coding Speech F u(t) x(t) ˆ y[t|ˆf(t)] ˆ F G d[t, ˆf(t)] y(t) v(t) 33 / 33