Cancellation with Continuous Near-End Activity

(1)

Double-Talk Robust Acoustic Echo

Cancellation with Continuous Near-End Activity

Toon van Waterschoot and Marc Moonen

Katholieke Universiteit Leuven, ESAT-SCD, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium toon.vanwaterschoot@esat.kuleuven.be

1 Introduction

• The acoustic echo cancellation (AEC) problem is defined by – the far-end signal u(t),

– the near-end signal v(t),

– the room impulse response (RIR)

F (q, t) = f ₀ (t) + f ₁ (t)q ⁻¹ + . . . + f _n _F (t)q ⁻ⁿ ^F , – the echo signal x(t) = F (q, t)u(t),

– the microphone signal y(t) = x(t) + v(t), – the adaptive filter

F ˆ (q, t) = ˆ f ₀ (t) + ˆ f ₁ (t)q ⁻¹ + . . . + ˆ f _n _F (t)q ⁻ⁿ ^F ,

– the echo-compensated signal d(t) = y(t) − ˆ F (q, t)u(t).

far-end from

far-end to

x (t) y (t)

F ˆ u (t)

d (t) v (t) e (t)

acoustic echo path

H F

• A double-talk situation occurs when both the far-end signal u (t) and the near-end signal v(t) are active, and may lead to:

– slow convergence of the RLS algorithm, – divergence of the (N)LMS algorithm.

The standard solution is to switch off the adaptation during double-talk using a double-talk detector (DTD), which however has two shortcomings:

– during the time needed to detect double-talk, the convergence of the algorithm may have already been affected considerably,

– switching off the adaptation does not solve the continuous double-talk problem in which a near-end signal is perma- nently active.

Therefore it is desirable to add double-talk robustness to the adaptive algorithms, which is possible by taking into ac- count the near-end signal characteristics.

2 Best linear unbiased estimate

Let us consider the batch linear estimation problem that occurs in AEC, given a data record {u(k), y(k)} ^t _k=1 :



 



y(1) y(2) ...

y(t)



 



| {z }

y

=



 



u(1) . . . u(1 − n _F ) u(2) . . . u(2 − n _F )

... ... ...

u(t) . . . u(t − n _F )



 



| {z }

U

· 

 f ₀

...

f _n _F





| {z }

f

+



 



v(1) v(2) ...

v(t)



 



| {z }

v

. (1)

Any linear estimate of parameter vector f can be written as a linear function of the data vector y:

ˆ f = Z ^T y. (2)

For this estimate to be unbiased, the t×(n _F +1) matrix Z should be subjected to two constraints:

( Z ^T U = I _n _F ₊₁ (a)

EZ ^T v = 0 _(n _F _+1)×1 (b) (3)

• A typical AEC adaptive algorithm is based on the least- squares (LS) estimator,

ˆ f _LS = (U ^T U ) ⁻¹ U ^T y, (4) which is unbiased but not necessarily minimum-variance.

• Minimizing the variance E(ˆf − Eˆf)(ˆf − Eˆf) ^T of the estimate (2) under the unbiasedness constraint (3(a)) yields the best linear unbiased estimate (BLUE):

ˆ f _BLUE = (U ^T R ⁻¹ U ) ⁻¹ U ^T R ⁻¹ y, (5) with R the near-end signal correlation matrix, defined by

R , Evv ^T . (6)

References

[1] G. Rombouts, T. van Waterschoot, K. Struyve, and M. Moo- nen, “Acoustic feedback cancellation for long acoustic paths using a nonstationary source model,” in Proceedings of the 13th European Signal Processing Conference (EUSIPCO- 2005), Antalya, Turkey, September 4-8, 2005.

3 Prediction error identification

• Assume that the near-end signal v(t) is generated as

v(t) = H(q, t)e(t) with Ee(t)e(t − k) = δ(k)σ _t ² . (7) The BLUE in (5) can then be realized as

ˆ f _BLUE = (U ^T H ^−T Λ ⁻¹ H ⁻¹ U ) ⁻¹ U ^T H ^−T Λ ⁻¹ H ⁻¹ y , (8) with the prefiltering matrix H and weighting matrix Λ defined as

H = H ^T ,





H (q, 1) . . . 0 ... . .. ...

0 . . . H (q, t)



 and Λ ,





σ ₁ ² . . . 0 ... ... ...

0 . . . σ _t ²



 . If the near-end signal is described using an autoregressive model,

H (q, t) = 1

A(q, t) = 1

1 + a ₁ (t)q ⁻¹ + . . . + a _n _A (t)q ⁻ⁿ ^A ,

then the prefilters H ⁻¹ (q, k) = A(q, k), k = 1 . . . t are FIR fil- ters of order n _A .

• The BLUE is then also the minimizing estimator of the prediction error criterion

V _{P E} (t, f (t), a(t), σ _t ² ) = 1 2N

X t k=1

λ ^t−k

σ _k ² A(q, k)[y(k)−F (q, t)u(k)] ₂ , with a(t) , [a ₁ (t) . . . a _n _A (t)] ^T . This criterion can be mini- mized recursively using the two-stage PEM-AFROW algorithm [1]:

First stage: linear prediction of the echo-compensated signal d(t, ˆf(t − 1)), calculated using the previous estimate ˆ f (t − 1) on a rectangular hopping window of length M

that ’looks ahead’ P − 1 samples:

d (t) =





y(t + P − 1) ...

y (t + P − M )



 −





u(t + P − 1) . . . u(t + P − 1 − n _F ) ... . . . ...

u(t + P − M ) . . . u(t + P − M − n _F )



 ˆ f (t−1).

The autocorrelation functions φ _dd (τ ), τ = 0 . . . n _A , of d(t, ˆ f (t − 1)) are estimated using the autocorrelation method:



 

 

φ ˆ _dd (0) φ ˆ _dd (1)

...

φ ˆ _dd (n _A )



 

 

=



 



0 . . . d(t) . . . d(t − M + 1) 0 . . . d(t − 1) . . . 0

... ... ... . .. ...

d(t) . . . d(t − n _A ) . . . 0



 





 



0 ...

d(t) ...

d(t − M + 1)



 

 The near-end signal AR coefficients a(t) and the near- end excitation signal variance σ _t ² are then estimated from φ ˆ _dd (τ ), τ = 0 . . . n _A , using the Levinson-Durbin recursion.

Second stage: recursive update of the RIR estimate using the loudspeaker and microphone signals prefiltered with the estimated coefficients ˆ a (t) from the first stage:

y _A (t) =

y(t) . . . y(t − n _A ) 1 ˆ a (t)

, u _A (t) =





u(t) . . . u(t − n _A ) ... . .. ...

u(t − n _F ) . . . u(t − n _F − n _A )





1 ˆ a (t)

.

The RIR estimate ˆf(t − 1) can then be updated recursively, either with the Gauss-Newton method:

ˆ f (t) = ˆ f (t − 1) + 1 ˆ

σ _t ² R _f ⁻¹ (t)u _A (t)ε _p (t), R _f (t) = λR _f (t − 1) + 1

ˆ

σ _t ² u _A (t)u _A ^T (t),

(8) or with the stochastic gradient method:

ˆ f (t) = ˆ f (t − 1) + µ u _A (t)ε _p (t)

u _A ^T (t)u _A (t) + (n _F + 1)ˆ σ _t ² (9) where in both cases weighting is performed using the estimated variance ˆ σ _t ² from the first stage, and the a priori prediction error is calculated as

ε _p (t) = ε(t, ˆ f (t − 1), ˆ a (t)) = y _A (t) − u _A ^T (t)ˆ f (t − 1).

4 Simulation results

• Simulation parameters: f _s = 8kHz, n _F + 1 = 1000, n _A = 12 or 55, λ = 0.9997, µ = 0.5, M = 215.

• Echo-to-background ratio: EBR , ^P P ^N ^k=1 N ^|x(k)| ²

k=1 |v(k)| ² = 10dB.

• Performance measure: δ(t) = 20 log ₁₀ ^kˆ ^f ^{(t)−f k} _{kf k} .

• ’TRUE’: knowledge of the clean near-end signal is assumed

4.1 Gauss-Newton type algorithms

• Sliding window: P = 1

0 2000 4000 6000 8000 10000 12000

−30

−20

−10 0 10 20 30 40 50 60

t/T

s

(s)

δ (t) (dB)

RLS

SW−PEM−AFROW n

_A

= 55 SW−PEM−AFROW n

A

= 55 TRUE SW−PEM−AFROW n

_A

= 12

SW−PEM−AFROW n

_A

= 12 TRUE

• Hopping window: P = M − n _A

0 2000 4000 6000 8000 10000 12000

−30

−20

−10 0 10 20 30 40 50 60

t/T

s

(s)

δ (t) (dB)

RLS

HW−PEM−AFROW n

A

= 55 HW−PEM−AFROW n

A

= 55 TRUE HW−PEM−AFROW n

A

= 12 HW−PEM−AFROW n

A

= 12 TRUE

4.2 Stochastic gradient algorithms

• Sliding window: P = 1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 10

⁵

−25

−20

−15

−10

−5 0 5 10

t/T

s

(s)

δ (t) (dB)

NLMS

SW−PEM−AFROW n

A

= 55 SW−PEM−AFROW n

A

= 55 TRUE SW−PEM−AFROW n

A

= 12 SW−PEM−AFROW n

A

= 12 TRUE

• Hopping window: P = M − n _A

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 10

⁵

−25

−20

−15

−10

−5 0 5 10

t/T

s

(s)

δ (t) (dB)

NLMS

HW−PEM−AFROW n

A

= 55 HW−PEM−AFROW n

A

= 55 TRUE HW−PEM−AFROW n

A

= 12 HW−PEM−AFROW n

A

= 12 TRUE

Cancellation with Continuous Near-End Activity

Double-Talk Robust Acoustic Echo