Double-Talk Robust Acoustic Echo
Cancellation with Continuous Near-End Activity
Toon van Waterschoot and Marc Moonen
Katholieke Universiteit Leuven, ESAT-SCD, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium toon.vanwaterschoot@esat.kuleuven.be
1 Introduction
• The acoustic echo cancellation (AEC) problem is defined by – the far-end signal u(t),
– the near-end signal v(t),
– the room impulse response (RIR)
F (q, t) = f 0 (t) + f 1 (t)q −1 + . . . + f n F (t)q −n F , – the echo signal x(t) = F (q, t)u(t),
– the microphone signal y(t) = x(t) + v(t), – the adaptive filter
F ˆ (q, t) = ˆ f 0 (t) + ˆ f 1 (t)q −1 + . . . + ˆ f n F (t)q −n F ,
– the echo-compensated signal d(t) = y(t) − ˆ F (q, t)u(t).
far-end from
far-end to
x (t) y (t)
F ˆ u (t)
d (t) v (t) e (t)
acoustic echo path
H F
• A double-talk situation occurs when both the far-end signal u (t) and the near-end signal v(t) are active, and may lead to:
– slow convergence of the RLS algorithm, – divergence of the (N)LMS algorithm.
The standard solution is to switch off the adaptation dur- ing double-talk using a double-talk detector (DTD), which however has two shortcomings:
– during the time needed to detect double-talk, the conver- gence of the algorithm may have already been affected considerably,
– switching off the adaptation does not solve the continuous double-talk problem in which a near-end signal is perma- nently active.
Therefore it is desirable to add double-talk robustness to the adaptive algorithms, which is possible by taking into ac- count the near-end signal characteristics.
2 Best linear unbiased estimate
Let us consider the batch linear estimation problem that occurs in AEC, given a data record {u(k), y(k)} t k=1 :
y(1) y(2) ...
y(t)
| {z }
y
=
u(1) . . . u(1 − n F ) u(2) . . . u(2 − n F )
... ... ...
u(t) . . . u(t − n F )
| {z }
U
·
f 0
...
f n F
| {z }
f
+
v(1) v(2) ...
v(t)
| {z }
v
. (1)
Any linear estimate of parameter vector f can be written as a linear function of the data vector y:
ˆ f = Z T y. (2)
For this estimate to be unbiased, the t×(n F +1) matrix Z should be subjected to two constraints:
( Z T U = I n F +1 (a)
EZ T v = 0 (n F +1)×1 (b) (3)
• A typical AEC adaptive algorithm is based on the least- squares (LS) estimator,
ˆ f LS = (U T U ) −1 U T y, (4) which is unbiased but not necessarily minimum-variance.
• Minimizing the variance E(ˆf − Eˆf)(ˆf − Eˆf) T of the estimate (2) under the unbiasedness constraint (3(a)) yields the best linear unbiased estimate (BLUE):
ˆ f BLUE = (U T R −1 U ) −1 U T R −1 y, (5) with R the near-end signal correlation matrix, defined by
R , Evv T . (6)
References
[1] G. Rombouts, T. van Waterschoot, K. Struyve, and M. Moo- nen, “Acoustic feedback cancellation for long acoustic paths using a nonstationary source model,” in Proceedings of the 13th European Signal Processing Conference (EUSIPCO- 2005), Antalya, Turkey, September 4-8, 2005.
3 Prediction error identification
• Assume that the near-end signal v(t) is generated as
v(t) = H(q, t)e(t) with Ee(t)e(t − k) = δ(k)σ t 2 . (7) The BLUE in (5) can then be realized as
ˆ f BLUE = (U T H −T Λ −1 H −1 U ) −1 U T H −T Λ −1 H −1 y , (8) with the prefiltering matrix H and weighting matrix Λ de- fined as
H = H T ,
H (q, 1) . . . 0 ... . .. ...
0 . . . H (q, t)
and Λ ,
σ 1 2 . . . 0 ... ... ...
0 . . . σ t 2
. If the near-end signal is described using an autoregressive model,
H (q, t) = 1
A(q, t) = 1
1 + a 1 (t)q −1 + . . . + a n A (t)q −n A ,
then the prefilters H −1 (q, k) = A(q, k), k = 1 . . . t are FIR fil- ters of order n A .
• The BLUE is then also the minimizing estimator of the pre- diction error criterion
V P E (t, f (t), a(t), σ t 2 ) = 1 2N
X t k=1
λ t−k
σ k 2 A(q, k)[y(k)−F (q, t)u(k)] 2 , with a(t) , [a 1 (t) . . . a n A (t)] T . This criterion can be mini- mized recursively using the two-stage PEM-AFROW algo- rithm [1]:
First stage: linear prediction of the echo-compensated sig- nal d(t, ˆf(t − 1)), calculated using the previous estimate ˆ f (t − 1) on a rectangular hopping window of length M
that ’looks ahead’ P − 1 samples:
d (t) =
y(t + P − 1) ...
y (t + P − M )
−
u(t + P − 1) . . . u(t + P − 1 − n F ) ... . . . ...
u(t + P − M ) . . . u(t + P − M − n F )
ˆ f (t−1).
The autocorrelation functions φ dd (τ ), τ = 0 . . . n A , of d(t, ˆ f (t − 1)) are estimated using the autocorrelation method:
φ ˆ dd (0) φ ˆ dd (1)
...
φ ˆ dd (n A )
=
0 . . . d(t) . . . d(t − M + 1) 0 . . . d(t − 1) . . . 0
... ... ... . .. ...
d(t) . . . d(t − n A ) . . . 0
0 ...
d(t) ...
d(t − M + 1)
The near-end signal AR coefficients a(t) and the near- end excitation signal variance σ t 2 are then estimated from φ ˆ dd (τ ), τ = 0 . . . n A , using the Levinson-Durbin recursion.
Second stage: recursive update of the RIR estimate using the loudspeaker and microphone signals prefiltered with the estimated coefficients ˆ a (t) from the first stage:
y A (t) =
y(t) . . . y(t − n A ) 1 ˆ a (t)
, u A (t) =
u(t) . . . u(t − n A ) ... . .. ...
u(t − n F ) . . . u(t − n F − n A )
1 ˆ a (t)
.
The RIR estimate ˆf(t − 1) can then be updated recursively, either with the Gauss-Newton method:
ˆ f (t) = ˆ f (t − 1) + 1 ˆ
σ t 2 R f −1 (t)u A (t)ε p (t), R f (t) = λR f (t − 1) + 1
ˆ
σ t 2 u A (t)u A T (t),
(8) or with the stochastic gradient method:
ˆ f (t) = ˆ f (t − 1) + µ u A (t)ε p (t)
u A T (t)u A (t) + (n F + 1)ˆ σ t 2 (9) where in both cases weighting is performed using the es- timated variance ˆ σ t 2 from the first stage, and the a priori prediction error is calculated as
ε p (t) = ε(t, ˆ f (t − 1), ˆ a (t)) = y A (t) − u A T (t)ˆ f (t − 1).
4 Simulation results
• Simulation parameters: f s = 8kHz, n F + 1 = 1000, n A = 12 or 55, λ = 0.9997, µ = 0.5, M = 215.
• Echo-to-background ratio: EBR , P P N k=1 N |x(k)| 2
k=1 |v(k)| 2 = 10dB.
• Performance measure: δ(t) = 20 log 10 kˆ f (t)−f k kf k .
• ’TRUE’: knowledge of the clean near-end signal is assumed
4.1 Gauss-Newton type algorithms
• Sliding window: P = 1
0 2000 4000 6000 8000 10000 12000
−30
−20
−10 0 10 20 30 40 50 60
t/T
s(s)
δ (t) (dB)
RLS
SW−PEM−AFROW n
A= 55 SW−PEM−AFROW n
A
= 55 TRUE SW−PEM−AFROW n
A= 12
SW−PEM−AFROW n
A= 12 TRUE
• Hopping window: P = M − n A
0 2000 4000 6000 8000 10000 12000
−30
−20
−10 0 10 20 30 40 50 60
t/T
s(s)
δ (t) (dB)
RLS
HW−PEM−AFROW n
A
= 55 HW−PEM−AFROW n
A
= 55 TRUE HW−PEM−AFROW n
A
= 12 HW−PEM−AFROW n
A
= 12 TRUE
4.2 Stochastic gradient algorithms
• Sliding window: P = 1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 10
5−25
−20
−15
−10
−5 0 5 10
t/T
s(s)
δ (t) (dB)
NLMS
SW−PEM−AFROW n
A
= 55 SW−PEM−AFROW n
A
= 55 TRUE SW−PEM−AFROW n
A
= 12 SW−PEM−AFROW n
A
= 12 TRUE
• Hopping window: P = M − n A
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 10
5−25
−20
−15
−10
−5 0 5 10
t/T
s(s)
δ (t) (dB)
NLMS
HW−PEM−AFROW n
A
= 55 HW−PEM−AFROW n
A
= 55 TRUE HW−PEM−AFROW n
A
= 12 HW−PEM−AFROW n
A