Improved Signal Processing in Hearing Aids: A system Approach

(1)

Improved Signal Processing in Hearing Aids:

A system Approach

Kim Ngo

ESAT-SCD, Katholieke Universiteit Leuven, Belgium

EST-SIGNAL Meeting September 2009.

(2)

Outline

◮ _{Introduction (Hearing aids, hearing loss, acoustic feedback, background noise).} ◮ _{Problem statement and motivation.}

◮ Multi-Channel Wiener Filter (MWF) based Noise Reduction (NR). ◮ _{SDW-MWF based aprroach to integrate NR and DRC.}

◮ _{Adaptive Feedback Cancellation (AFC).} ◮ _Conclusion.

◮ Publications. ◮ _Timeline

(3)

Introduction

Research areas in Hearing Aids

Binaural Processing Wireless Communications

AD/DA Converters Digital Signal Processing

Analog Signal Processing Automatic Sound Classification

Single−channel Noise Reduction Directional Microphones Loudspeaker Speech/Audio Coding Source Localisation Beamforming Source Separation Active Noise Control Filterbank Design Dereverberation Automatic Speech Recognition

Feedback Cancellation

Dynamic Range Compression

Multi−channel Noise Reduction

(4)

Introduction

Sensorineural hearing loss

0000000000000000000000000000000 1111111111111111111111111111111 0 10 20 30 40 50 60 70 80 90 100 0 750 125 250 500 ₁₀₀₀ ₂₀₀₀ ₃₀₀₀ ₄₀₀₀ ₅₀₀₀ ₆₀₀₀ ₇₀₀₀ ₈₀₀₀ Hearing level [dB] Normal Hearing Frequency [Hz] Profound Hearing loss

Mild Hearing loss

Moderate Hearing loss

Severe Hearing loss

Dynamic Range

Uncomfortable Loud Soft Threshold Sensorineural Normal d d’ a b c a’ b’ c’

◮ _{Hearing threhold increases with increasing frequency.} ◮ _{Threshold of hearing raised as a result of the hearing loss.} ◮ Threshold of loudness discomfort remains the same. ◮ _{Reduced dynamic range (threshold to discomfort).}

(5)

Introduction

Dynamic Range Compression

◮ Audibility is an important first step in improving the intelligibility of a speech signal.

0 20 40 60 80 100 120 20 30 40 50 60 70 80 90 100 110 120 output SPL (dB) input SPL (dB) Attack time Output Spectrum Gain adjustment Power (dB)

Critical Band Gain Model

CR/CT Release time

Input spectrum

◮ _{Automatically adjust the gain based on the intensity level.} ◮ High intensity attenuated - low intensity amplified.

◮ _{Compression Threshold (CT) (point at which the slope change)} ◮ _{Compression Ratio (CR) (Steepness of the slope)}

◮ Amplification gain G dB Objective:

◮ Mapping the wide dynamic range of speech into the reduced dynamic range. ◮ _{Weak sounds audible - loud sounds not uncomfortably loud.}

(6)

Introduction

Acoustic Feedback

path forward Microphone signal Loudspeaker signal Feedback signal acoustic feedback path Near−end signal F G

◮ _{Undesired acoustic coupling between} loudspeaker and microphone. ◮ _{Limits the maximum amplification.}

◮ _{Feedback are most severe at high frequencies.} ◮ Instability results in a high frequency tone. ◮ _{Correlation between near-end signal and}

loudspeaker signal.

◮ _{Standard adaptive filtering converge to a} biased solution.

Objective:

◮ Increase maximum stable gain (MSG) ◮ _{Reduce bias and convergence (misadjustment)} ◮ _{Minimize speech distortion (sound quality)}

(7)

Introduction

Background Noise

◮ Reduced frequency resolution (separating sounds of different frequencies) ◮ _{Reduced temporal resolution (intense sounds mask weaker sounds)}

Hearing aid users

◮ _{Understanding speech in noise is a} major problem

◮ _{multiple speakers, fans, traffic etc.} ◮ _{Reduces the intelligibility of speech.} ◮ More sensitive to the noise level. ◮ _{Need higher SNR to communicate.}

Objective:

◮ _{Maximally reduce the noise (SNR improvement)} ◮ Minimize speech distortion (sound quality) ◮ _{Improve intelligibility of speech}

(8)

Problem Statement and Motivation

◮ Compensation of sensorineural hearing loss requires NR, DRC and AFC. ◮ _{The general problem of NR, DRC and AFC is not new.}

◮ _{Each of these areas are usually developed and evaluated independently.} Existing mehtod

◮ _{Hearing aids typically use a serial concatenation of NR and DRC and AFC} ◮ _{Counteract and limit functionality of other algorithms}

Short-term objective:

◮ _{Development on Multi-channel NR (SDW-MWF).} ◮ _{Integration of SDW-MWF and DRC.}

◮ Development on Adaptive feedback Cancellation (PEM-based AFC) ◮ _{Analyse any undesired effects in the integration process.}

Long-term objective:

◮ Integration of NR, DRC and AFC into one signal processing scheme. ◮ _{SNR improvement vs. Audibility vs. speech distortion vs. Increase MSG}

(9)

Speech Distortion Weighted Multi-channel Wiener

Filter (SDW-MWF)

+ ... ... Noise Noise Desired signal Noise X2(k,l) W2(k,l) XM(k,l) WM(k,l) Z(k,l) X1(k,l) W1(k,l)

Frequency-domain microphone signals,

X(k,l) =Xs₍_k_,_l_{) +}_Xn₍_k_,_l₎ ₍₁₎ MWF MMSE criterion, W∗₍_k_,_l_{) =}_{arg min} W ε{|X s 1(k,l) −W H_X (k,l)|2} (2) SDW-MWF MMSE criterion, W∗₍_k_,_l_{) =}_{arg min} W ε{|X s 1(k,l) −W H_Xs (k,l)|2} + µε{|WHXn(k,l)|2} (3) Optimal SDW-MWF W∗₍_k_,_l_{) =}` Rs(k,l) + µRn(k,l)´−1 Rs(k,l)e1 (4)

Output of the SDW-MWF can be written as

Z(k,l) =W∗,H₍_k_,_l₎_X₍_k_,_l_). ₍₅₎

(10)

Concept SDW-MWF

µ

◮ _{Second-order statistics of the noise are assumed to} be stationary

Rs₍_k_,_l_{) =}_Rx₍_k_,_l_{) −}_Rn₍_k_,_l₎ ₍₆₎

◮ Estimation of Rx(k_,l)and Rn(k_,l)an averaging

time window of 2-3 seconds is typically used. 0 5 10 15 20 25 30

−0.2 −0.1 0 0.1 0.2 0.3 0.4 Time (sec) Update speech+noise correlation matrice Update noise−only correlation matrice Properties of SDW-MWF

◮ _{SDW-MWF depends on long-term average of spectral and spatial characteristics.} ◮ _{Eliminates short-time effects, such as musical noise}

◮ SDW-parameter_µis a fixed value for all frequencies Properties not included in SDW-MWF

◮ _{Speech and noise can be non-stationary spectrally and temporally.} ◮ speech contains many pauses while noise can be continously present. ◮ _{Different weight to speech dominant segments and to noise dominant segments}

(11)

Speech Presence Probability

Two-state speech model

H0(k,l) :Xi(k,l) = Xin(k,l)

H1(k,l) :Xi(k,l) = Xis(k,l) +Xin(k,l) (7)

Conditional Probability Density Functions of the observed signals

Speech Presence Probability

p(k,l) =  1+ q(k,l) 1−q(k,l)(1+ ξ(k,l))exp(−υ(k,l)) ﬀ−1 (9)

◮ Conditional SPP is estimated for each frequency bin and each frame

(12)

Extension of SDW-MWF

µ

into SDW-MWF

SPP

Incorporating the conditional Speech Presence probability in SDW-MWF

W∗₌_{arg min}

W pε{|X

s

1−WHX|2|H1} + (1−p)ε{|WHXn|2} (10)

The SDW-MWF incorporating the conditional SPP can then be written as

W∗

SPP=

“

Rs+“1_p”Rn”−1Rse1. (11)

◮ _{If p}₌_{0, the SDW-MWF}

SPPattenuates the noise by applying W∗←0. ◮ _{If p}=_{1, the SDW-MWF}

SPPsolution corresponds to the MWF solution (µ=1). ◮ If 0<p<1 there is a trade-off between noise reduction and speech distortion. The combined solution can then be written as

W∗ SPP= „ Rs+ „ 1 α(1 µ)+(1−α)p « Rn «−1 Rse1 (12)

(13)

Extension of SDW-MWF

µ

into SDW-MWF

SPP

◮ Example of SPP for a frame

0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 Conditional SPP Weighting factor µ=1 µ=2 µ=3 µ=4 1/p ◮ _{SDW-based on SPP} 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 Conditional SPP Weighting factor µ=1 and α=0.5 µ=2 and α=0.5 µ=3 and α=0.5 µ=4 and α=0.5 13 / 33

(14)

Extension of SDW-MWF

µ

into SDW-MWF

SPP

◮ _{Example of SPP for a frame}

0 0.2 0.4 0.6 0.8 1 0 2 4 6 8 10 Conditional SPP Weighting factor µ=1 µ=2 µ=3 µ=4 1/p ◮ _{SDW-based on SPP} 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 Conditional SPP Weighting factor µ=1 and α=0.5 µ=2 and α=0.5 µ=3 and α=0.5 µ=4 and α=0.5

◮ _{Example of SPP for a frame (modified)}

0 2000 4000 6000 8000 0 0.2 0.4 0.6 0.8 1 1.2 Frequency (Hz) Conditional SPP

ζmin=0.1 and ζmax=0.3162 ζmin=0.1 and ζmax=0.60

◮ _{SDW-based on SPP (modified)} 0 2000 4000 6000 8000 0 0.2 0.4 0.6 0.8 1 1.2 Frequency (Hz) Conditional SPP α=0 α=1, µ=2 α=0.25, µ=2 α=0.50, µ=2 α=0.75, µ=2

(15)

Extension of SDW-MWF

µ

into SDW-MWF

SPP A priori SNR FFT IFFT (Synthesis) Input signal Output signal Filtering (Analysis) & A priori SAP A posteriori SNR correlation matrices Conditional SPP SPP−SDW−MWF Frequency domain Additional signal

processing path Existing signal processing path

ˆ q= P(H0) p= P(H1|X) Z= W ∗,H SPPX W∗ SPP

Challenges on additional signal processing path ◮ _{Incorporating a psychoacoustical Model} ◮ _{Masking properties of the human auditory}

system

◮ _{Auditory properties of speech perception} ◮ Define perceptual relevant criteria ◮ _{Make residual noise perceptually inaudible} Challenges on existing signal processing path

◮ Limited opportunities

◮ _{Continously updating correlation matrices} in speech+noise periods.

◮ _{development on Voice Activity Detection}

(16)

Experimental Set-up for SDW-MWF

µ

and

SDW-MWF

SPP

Simulations have been performed with

◮ _{A 2-microphone behind-the-ear hearing aid mounted on a CORTEX MK2 manikin.} ◮ The loudspeakers (FOSTEX 6301B) are positioned at 1 meter from the center of

the head.

◮ The reverberation time T

60=0.21s.

◮ _{The speech is located at 0}◦_{and the two multi-talker babble noise sources are}

located at 120◦_{and 180}◦_.

◮ _{The speech signals consist of male sentences from the HINT-database} ◮ _{and the noise signal consist of a multi-talker babble from Auditec} ◮ _{The speech signals are sampled at 16kHz}

(17)

Experimental results for SDW-MWF

µ

and

SDW-MWF

SPP 0 5 10 15 20 25 0 2 4 6 8 10 12 14 16 input SNR (dB) ∆ SNR intellig (dB) α=0 µ=1 and α=1 µ=1 and α=0.75 µ=1 and α=0.50 µ=1 and α=0.25 0 5 10 15 20 25 0 1 2 3 4 5 6 7 8 input SNR (dB) SD (dB) α=0 µ=1 and α=1 µ=1 and α=0.75 µ=1 and α=0.50 µ=1 and α=0.25 17 / 33

(18)

Integration of MWF based NR and DRC

Motivation

◮ When NR and DRC are serially concatenated, undesired interaction effects occur ◮ _{DRC can counteract NR by amplifying the residual noise after NR}

◮ _{Degrades the SNR and defeats the purpose of using NR}

....

..

Speech

DRC

Z

s

(k, l)

_Gs dB SDW-MWFµ

W

µ

(k, l)

ˆ

Z

s

_{(k, l)}

◮ DRC does not distinction between speech and noise dominant segments ◮ _{Low intensity segments are amplified equally (including residual noise)}

Z(k,l) = W∗,H(k,l)X(k,l) (13)

Z(k,l) = Xˆs

(19)

Extension of DRC into Dual-DRC

0 20 40 60 80 100 120 0 20 40 60 80 100 120 input SPL (dB) output SPL (dB) Speech DRC Noise DRC

◮ Reusing the conditional SPP estimated in SDW-MWFSPP ◮ A dual-DRC concept can be

introduced

◮ _{Using switchable compression} characteristic

Dual-DRC conept

◮ _If(_p(_k_,_l) =1)_{the speech DRC applied.}

◮ If(p(k,l) =0)it is undesirable to amplify and the noise DRC is applied. ◮ _{For the in-between cases a weighted sum of the two DRC curves is used}

.... .. Conditional SPP estimation Dual−DRC and SDW-MWFSPP p(k, l) Gn dual,dB Gs dB Zs_{(k, l)} Zˆs_{(k, l)} WSPP(k, l) 19 / 33

(20)

Experimental Set-up for SDW-MWF based NR and

DRC

Simulations have been performed

◮ _{A 2-microphone behind-the-ear hearing aid mounted on a CORTEX MK2 manikin.} ◮ _{The loudspeakers (FOSTEX 6301B) are positioned at 1 meter from the center of}

the head.

◮ _{The reverberation time T}

60=0.21s.

◮ The speech is located at 0◦and the two multi-talker babble noise sources are located at 120◦_{and 180}◦_.

◮ _{The speech signals consist of male sentences from the HINT-database} ◮ _{The noise signals consist of a multi-talker babble from Auditec} ◮ _{The signals are sampled at 16kHz.}

◮ an FFT length of 128 with half overlapping frames. ◮ _{The DRC is implemented based on 20 critical bands.} The following parameters are fixed during all simulations:

◮ The input level is set to 65dB SPL at the hearing aid microphones. ◮ _{The DRC attack and release time are set to at=10ms and rt=150ms.} ◮ _{The compression threshold is set to CT=30dB.}

◮ The hearing aid gain is set to G dB=30dB.

(21)

Experimental results for SDW-MWF

µ

-based NR and

DRC

0 5 10 15 20 25 −15 −10 −5 0 5 10 input SNR ∆ SNR intellig (dB) 1/p+DRC µ=1+DRC µ=2+DRC µ=3+DRC 0 5 10 15 20 25 5.5 6 6.5 7 7.5 8 input SNR SD (dB) 1/p+DRC µ=1+DRC µ=2+DRC µ=3+DRC 21 / 33

(22)

Experimental results for SDW-MWF

_SPP

-based NR

and dual-DRC

0 5 10 15 20 25 −10 −5 0 5 10 15 input SNR ∆ SNR intellig (dB) 1/p+DRC ∆GdB=5 ∆GdB=10 ∆G_dB=15 0 5 10 15 20 25 4 5 6 7 8 9 10 11 12 13 14 input SNR SD (dB) 1/p+DRC ∆GdB=5 ∆GdB=10 ∆G_dB=15

(23)

Combined NR and DRC and AFC

Long-term objective:

◮ _{Integration of NR, DRC and AFC into one signal processing scheme.}

.... .. Conditional SPP estimation Dual−DRC and

SDW-MWF

SPP

G

n dual,dB

G

s dB

Z

s

_{(k, l)}

W

SPP

(k, l)

ˆ

Z

s

_{(k, l)}

p(k, l) 23 / 33

(24)

Adaptive Feedback Cancellation

− feedback cancellation path acoustic feedback path + forward path + F v(t) G u(t) y(t) d[t, ˆf(t)] x(t) ˆ y[t|ˆf(t)] ˆ F

The microphone signal

y(t) =v(t) +x(t) =v(t) +F(q,t)u(t) (15) The feedback-compensated

d(t) =v(t) + [F(q,t) − ˆF(q,t)]u(t). (16) ◮ _{Adaptively model the feedback path}

◮ Estimate the feedback signal

◮ _{correlation between the near-end signal and the loudspeaker signal} ◮ _{Caused by the closed signal loop}

Main Challenge

◮ Reduce the correlation between the near-end signal and the loudspeaker signal ◮ _{Prediction error method-based AFC (PEM-based AFC)}

(25)

PEM-based AFC (single near-end signal model)

+− acoustic feedback path +− prefilter feedback cancellation path path + forward

source signal model decorrelating F G v(t) e(t) ˆ F ˜ y[t, ˆh(t)] ˆ F H u(t) d[t, ˆf(t)] ˜ u[t, ˆh(t)] ε[t, ˆh(t), ˆf(t − 1)] ˆ y[t|ˆf(t)] y(t) x(t) ˆf(t) ˆ H−1 ˆ H−1 Microphone signal ◮ y(t) =F(q_,t)u(t) +H(q_,t)e(t)

◮ _{Prefiltering of loudspeaker and} microphone signals.

◮ Inverse near-end signal model.

The all-pole model can be written as

H(q,t) = 1 C(q,t)= 1 1+c1(t)q−1+ ... +cnc(t)q−nc (17) Prediction error ε[t, ξ(t)] =H−1₍_q_,_t_)[_y₍_t_{) −}_F₍_q_,_t₎_u₍_t_)] ₍₁₈₎

Minimize prediction error min ξ(t)= 1 2N t X k=1 ε2[k, ξ(t)] (19)

◮ Single all-pole model (Short-term predictor) fails to remove the periodicity

(26)

PEM-based AFC (cascaded near-end signal model)

◮ _{A cascade of near-end signal models removes the coloring and periodicity} Sinusoidal model d(t) = P X n=1 Ancos(ωnt+ φn) +r(t), t=1, ...,M (20)

Cascaded near-end signal model

y(t) =F(q,t)u(t) +H1(q,t)H2(q,t)e(t) (21)

The CPZLP model can be written as

d(t) = P Y n=1 1−2ρcosωnz−1_{+ ρ}2_z−2 1−2 cosωnz−1₊_z−2 ! e(t) (22)

the output from the prediction error filter

e(t, ω) = P Y n=1 1−2 cosωnz−1₊_z−2 1−2ρcosωnz−1_{+ ρ}2_z−2 ! d(t) (23)

(27)

Incorporating pitch estimation in PEM-based AFC

◮ _{Speech signals are usually considered as voiced or unvoiced} ◮ _{Voiced sounds consist of fundamental frequency}_ω

0and its harmonic components.

◮ _{CPZLP estimates all frequencies independently} ◮ Does not exploits the harmonicity of speech. Fundamental frequency estimation (pitch estimation)

◮ _{Sinusoids are having frequencies that are}_ω

0, i.e.,ωn= ω0n

◮ This follows naturally from voiced speech being quasi-periodic. Applying pitch estimation in PEM-based AFC

e(t, ω) = P Y n=1 1−2 cosωnz−1₊_z−2 1−2ρcosωnz−1_{+ ρ}2_z−2 ! d(t) (24)

Pitch estimation considered are

◮ _{Subspace-orthogonality-based pitch estimation.} ◮ _{Subspace-shiftinvariance-based pitch estimation} ◮ Optimal-filtering based pitch estimation.

(28)

Experimental Set-up for PEM-based AFC

Simulations have been performed

◮ _{The near-end sinusoidal model order is set to P = 15} ◮ _{The near-end noise model order is set to 30.}

◮ Both near-end signal models are estimated using 50%overlapping data windows of length M = 320 samples.

◮ _{The NLMS adaptive filter length is set to n} F= 200. ◮ _{The near-end signal is a 30 s speech signal at f}_s_{= 16 kHz.}

◮ _{The forward path gain K}(_t)_{is set 3 dB below the maximum stable gain (MSG)} without feedback cancellation.

(29)

Experimental results for PEM-based AFC

0 5 10 15 20 25 30 10 12 14 16 18 20 22 24 26 28 t (s) M S G (d B ) 20 log10K(t) MSG F (q) AFC-LP AFC-CPZLP AFC-shiftinv AFC-orth AFC-optfilt 0 1 2 3 4 5 x 105 −15 −10 −5 0 t/Ts(samples) M A F (d B ) AFC-LP AFC-CPZLP AFC-shiftinv AFC-orth AFC-optfilt 29 / 33

(30)

Conclusion

◮ _SDW-MWF

µcan be further improved by using SDW-MWFSPP. ◮ _SDW-MWF

SPPcan be further extended with more advance SDW criteria ◮ An MWF-based NR and Dynamic Range Compression have been proposed. ◮ _{Dual-DRC concept using switchable compression characteristic}

(31)

Future and current work

Combined MWF and AFC

+ ... ... DRC + − + + + . . . X2(k, l) WM(k, l) X1(k, l) W1(k, l) W2(k, l) u(k, l) Z(k, l) Z(k, l)ˆ XM(k, l) WF(k, l) F1(q) F2(q) FM(q) + ... DRC + . + + . . − − − ... WM(k, l) W1(k, l) W2(k, l) u(k, l) Z(k, l) WF M(k, l) X1(k, l) X2(k, l) XM(k, l) WF 1(k, l) WF 2(k, l)

◮ _{Applying NR before AFC...} ◮ _{Applying AFC before NR ...}

(32)

Future and current work

Combined Single-channel NR and AFC

+− feedback cancellation path acoustic feedback path + NR forward path F G u(t) x(t) ˆ y[t|ˆf(t)] ˆ F n(t) v(t) ˆ d[t, ˆf(t)] _y(t)_¯ _y(t) +− feedback cancellation path acoustic feedback path + forward path NR F G u(t) ˆ y[t|ˆf(t)] ˆ F d[t, ˆf(t)] y(t) n(t)v(t) x(t) ˆ d[t, ˆf(t)]

◮ Applying NR before AFC... ◮ _{Applying AFC before NR ...}

(33)

Future and current work

further exploit PEM-based AFC using pitch estimation

◮ Using pitch estimation amplitudes can be estimated

◮ _{Possibility to design more accurat CPZLP model (better decorrelation)} Remove the CPZLP model

◮ by re-using the optimal filter for both fundamental frequency estimation and filtering.

◮ Decorrelation by sinusoidal subtraction

◮ _{Further investigate the use of subspace methods.}

Combined Speech Coding and AFC

− feedback cancellation path acoustic feedback path + + decorrelation device forward path F u(t) x(t) ˆ y[t|ˆf(t)] ˆ F G d[t, ˆf(t)] y(t) v(t) − feedback cancellation path acoustic feedback path + + forward path coding Speech F u(t) x(t) ˆ y[t|ˆf(t)] ˆ F G d[t, ˆf(t)] y(t) v(t) 33 / 33