Acoustic Echo Cancellation in the Presence of Continuous Double-Talk

(1)

Acoustic Echo Cancellation in the Presence of Continuous Double-Talk

Toon van Waterschoot ^∗ , Geert Rombouts ^∗ , Kris Struyve ^† and Marc Moonen ^∗

∗ Katholieke Universiteit Leuven, ESAT-SCD,

Kasteelpark Arenberg 10, B-3001 Leuven, Belgium,

http://www.esat.kuleuven.ac.be/scd

toon.vanwaterschoot@esat.kuleuven.ac.be

† Televic N.V., L. Bekaertlaan 1,

B-8870 Izegem, Belgium, http://www.televic.com/

k.struyve@televic.com

Abstract

The use of a double-talk detector in acoustic echo cancellation cannot improve the adaptive algorithm’s performance if near-end noise is continuously present. We propose a new way of dealing with such a continuous double-talk situation, which may occur e.g. in an automatic gain adjustment application. If the microphone and loudspeaker signals are prefiltered with the inverse near-end signal model, a minimum variance room impulse response estimate can be obtained. However, the near-end signal model is unknown and time-varying and has to be estimated concurrently with the room impulse response. We apply three different prediction error identification algorithms to this problem, that were originally developed for adaptive feedback cancellation. Simulation results indicate that only the prediction error method based adaptive filtering algorithm applying row operations (PEM-AFROW) outperforms standard RLS or NLMS adaptive algorithms.

1 Introduction

Consider the acoustic echo cancellation (AEC) problem

y (t) = F (q, t)u(t) + v(t), (1) in which an echo-compensated signal based on an estimate of the room impulse response ˆ F (q, t) is sent to the far-end side:

d (t) = y(t) − ˆ F (q, t)u(t). (2)

far-end from

far-end to

x (t) y (t)

F ˆ u (t)

d (t) v (t) e (t)

acoustic echo path

1 A

F

We focus on the continuous double-talk situation (simulta- neous far-end and near-end activity 100 % of the time) which may occur in

• a noisy teleconferencing scenario,

• an automatic gain adjustment application,

• an acoustic feedback scenario.

The use of a double-talk detector is irrelevant in this scenario since it would freeze the adaptation continuously.

Our aim is to develop a double-talk robust adaptive filter- ing algoritm for estimating the room impulse response (RIR) F (q, t). This will be accomplished by including a model of the near-end signal:

y (t) = F (q, t)u(t) + 1

A(q, t) e (t). (3) The RIR model F (q, t) and the inverse near-end signal model A (q, t) have to be identified concurrently. We propose three al- goritms for concurrent identifcation based on prediction error (PE) identification theory.

Concatenating the parameters vectors of F (q, t) and A(q, t), f (t) , f 0 (t) f ₁ (t) . . . f _n

_F

(t) _T

, (4)

a (t) , a ₁ (t) a ₂ (t) . . . a _n

_A

(t) _T

. (5)

leads to a vector containing all to-be-identified parameters:

θ(t) , f(t) a (t)

(6) The prediction error criterion for estimating θ(t) may be de- fined as

V _{P E} (t, θ(t)) = 1 2

t

X

k =1

λ ^t _θ ^−k ˆ

σ _k ² A (q, t)[y(k) − F (q, t)u(k)] ₂

(7)

with:

• exponential weighting with forgetting factor λ _θ to allow tracking of a time-varying RIR

• weighting with the inverse variance of the near-end excita- tion signal e(t), estimated as

ˆ

σ _k ² = λ _σ σ ˆ _k ² ₋₁ + (1 − λ _σ )ε ² (k, ˆ θ (k − 1)), (8) using the prediction error

ε (t, θ(t)) = A(q, t)[y(t) − F (q, t)u(t)]. (9)

2 Recursive PE identification

2.1 Two-channel adaptive filtering (2ch-AF)

By modelling the convolution of the inverse near-end signal model and the RIR as one FIR filter,

B(q, t) , −A(q, t)F (q, t), (10) the prediction error criterion can be linearized as follows:

V _2ch (t, θ _2ch (t))= 1 2

t

X

k =1

λ ^t _θ ^−k ˆ

σ _k ² A (q, t)y(k) + B(q, t)u(k) ₂

(11)

= 1 2

t

X

k =1

λ ^t _θ ^−k ˆ

σ _k ² y(k) + φ _2ch ^T (k)θ _2ch (t) ₂

, (12) with

θ _2ch (t) , [b ₀ (t) . . . b _n

_B

(t) a ₁ (t) . . . a _n

_A

(t)] ^T , (13) φ _2ch (k) , [u(k) . . . u(k − n B ) y (k − 1) . . . y(k − n _A )] ^T (14) An ordinary RLS algorithm can be applied to estimate θ _2ch (t):

θ ˆ _2ch (t) = ˆ θ _2ch (t − 1) − 1 ˆ

σ _t ² R ⁻¹ (t)φ _2ch (t)ε(t, ˆ θ _2ch (t − 1)), R (t) = λ _θ R (t − 1) + 1

ˆ

σ _t ² φ _2ch (t)φ _2ch ^T (t), (15) ε (t, ˆ θ _2ch (t − 1)) = y(t) + φ _2ch ^T (t) ˆ θ _2ch (t − 1). (16)

Pro: → convex cost function (no local minima)

Contra: → estimation of F (q, t) and A(q, t) is performed on the same data window,

→ deconvolution needed to obtain F (q, t) from B (q, t) (complexity)

2.2 Prediction error method based adaptive fil- tering (PEM-AF)

Recursive prediction error identification of F (q, t) and A(q, t) from the non-linear PE criterion leads to a recursion for pa- rameter vector ˆ θ (t):

θ ˆ (t) = ˆ θ (t − 1) + 1 ˆ

σ _t ² R ⁻¹ (t)ψ(t, ˆ θ (t − 1))ε(t, ˆ θ (t − 1)). (17) The gradient vector ψ(t, θ(t)) is defined as

ψ(t, θ(t)) = ψ _f (t, a(t)) ψ _a (t, f (t))

, −





∂

∂f (t) ε (t, θ(t))

∂

∂a (t) ε (t, θ(t))



 . (18) Decoupling of the data windows on which F (q, t) and A(q, t) are estimated can be achieved by block-diagonalizing R(t) as

R (t) =

"

R _f (t) 0 _(n

F

+1)×n

_A

0 _n

A

×(n

F

+1) R _a (t)

#

. (19)

ˆ f (t) = ˆ f (t − 1) + 1 ˆ

σ _t ² R _f ⁻¹ (t)ψ _f (t)ε(t), (20) ˆ

a (t) = ˆ a (t − 1) + 1 ˆ

σ _t ² R _a ⁻¹ (t)ψ _a (t)ε(t), (21) R _f (t) = λ _f R _f (t − 1) + 1

ˆ

σ _t ² ψ _f (t)ψ _f ^T (t), (22) R _a (t) = λ _a R _a (t − 1) + 1

ˆ

σ _t ² ψ _a (t)ψ _a ^T (t). (23)

ε (t) = ˆ A (q, t − 1)ξ(t), (24)

ψ _f (t) = [ ˆ A (q, t−1)u(t) . . . ˆ A (q, t−n _F −1)u(t−n _F )] ^T , (25) ψ _a (t) = ξ(t − 1) . . . ξ(t − n _A ) _T

, (26)

ξ (t) , y(t) − ˆ F (q, t − 1)u(t). (27) Pro: → decoupling of the data windows for estimation of

F (q, t) and A(q, t)

Contra: → non-convex cost function (local minima),

→ approximate prediction error and gradient vec- tors, assuming that A(q, t) remains stationary during n _F samples

2.3 Prediction error method based adaptive fil- tering algorithm applying row operations (PEM-AFROW)

A two-stage prediction error identification algorithm:

Stage 1: linear prediction of the near-end signal model {a _i , σ _i ² } on a frame of length M of the echo-compensated signal vec- tor

d _i = y _i − U _i ˆ f ((i − 1)M ), (28) using a previous estimate for the RIR F (q, t).

Stage 2: exponentially windowed RLS algorithm to estimate F (q, t) using the prefiltered data

y _ˆ _a

_j

(k) , y(k) . . . y(k − n A ) 1 ˆ a _j

, (29)

u _ˆ _a

_j

(k) , u(k) . . . u(k − n A ) 1 ˆ a _j

, (30)

and the exact gradient vector

ψ _f (t, ˆ a _i ) , u ˆ a

_i

(t) . . . u _ˆ _a

_i

(t − n _F ) _T

. (31)

ˆ f (t) = ˆ f (t − 1) + 1 ˆ

σ _i ² R ⁻¹ (t)ψ _f (t, ˆ a _i )ε(t, ˆ f (t − 1), ˆ a _i ) (32) R (t) = λ _f R (t − 1) + 1

ˆ

σ _i ² ψ _f (t, ˆ a _i )ψ _f ^T (t, ˆ a _i ) (33) ε (t, ˆ f (t − 1), ˆ a _i ) = y _ˆ _a

_i

(t) − ψ _f ^T (t, ˆ a _i )ˆ f (t − 1) (34) Pro: → decoupling of the data windows for estimation of

F (q, t) and A(q, t),

→ exact calculation of prediction error and gradient vectors (only row operations)

Contra: → non-convex cost function (local minima)

3 Simulation results

0 2000 4000 6000 8000 10000 12000

−20

−10 0 10 20 30 40 50 60 70 80

t/T

s

(s)

δ (t) (dB)

WRLS 2ch−AF PEM−AF

PEM−AFROW

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10⁵

−25

−20

−15

−10

−5 0 5 10

t/T

s

(s)

δ (t) (dB) NLMS

stochastic gradient PEM−AF

stochastic gradient PEM−AFROW

Acoustic Echo Cancellation in the Presence of Continuous Double-Talk

Acoustic Echo Cancellation in the Presence of Continuous Double-Talk

Toon van Waterschoot ∗ , Geert Rombouts ∗ , Kris Struyve † and Marc Moonen ∗

∗ Katholieke Universiteit Leuven, ESAT-SCD,

Kasteelpark Arenberg 10, B-3001 Leuven, Belgium,

http://www.esat.kuleuven.ac.be/scd

toon.vanwaterschoot@esat.kuleuven.ac.be

† Televic N.V., L. Bekaertlaan 1,

B-8870 Izegem, Belgium, http://www.televic.com/

k.struyve@televic.com

Abstract

1 Introduction

Consider the acoustic echo cancellation (AEC) problem

y (t) = F (q, t)u(t) + v(t), (1) in which an echo-compensated signal based on an estimate of the room impulse response ˆ F (q, t) is sent to the far-end side:

d (t) = y(t) − ˆ F (q, t)u(t). (2)

far-end from

far-end to

x (t) y (t)

F ˆ u (t)

d (t) v (t) e (t)

acoustic echo path

1 A

F

We focus on the continuous double-talk situation (simulta- neous far-end and near-end activity 100 % of the time) which may occur in

• a noisy teleconferencing scenario,

• an automatic gain adjustment application,

• an acoustic feedback scenario.

The use of a double-talk detector is irrelevant in this scenario since it would freeze the adaptation continuously.

Our aim is to develop a double-talk robust adaptive filter- ing algoritm for estimating the room impulse response (RIR) F (q, t). This will be accomplished by including a model of the near-end signal:

y (t) = F (q, t)u(t) + 1

A(q, t) e (t). (3) The RIR model F (q, t) and the inverse near-end signal model A (q, t) have to be identified concurrently. We propose three al- goritms for concurrent identifcation based on prediction error (PE) identification theory.

Concatenating the parameters vectors of F (q, t) and A(q, t), f (t) , f 0 (t) f 1 (t) . . . f n

(t) T

, (4)

a (t) , a 1 (t) a 2 (t) . . . a n

(t) T

. (5)

leads to a vector containing all to-be-identified parameters:

θ(t) ,  f(t) a (t)



(6) The prediction error criterion for estimating θ(t) may be de- fined as

V P E (t, θ(t)) = 1 2

t

X

k =1

λ t θ −k ˆ

σ k 2 A (q, t)[y(k) − F (q, t)u(k)] 2

(7)

with:

• exponential weighting with forgetting factor λ θ to allow tracking of a time-varying RIR

• weighting with the inverse variance of the near-end excita- tion signal e(t), estimated as

ˆ

σ k 2 = λ σ σ ˆ k 2 −1 + (1 − λ σ )ε 2 (k, ˆ θ (k − 1)), (8) using the prediction error

ε (t, θ(t)) = A(q, t)[y(t) − F (q, t)u(t)]. (9)

2 Recursive PE identification

2.1 Two-channel adaptive filtering (2ch-AF)

By modelling the convolution of the inverse near-end signal model and the RIR as one FIR filter,

B(q, t) , −A(q, t)F (q, t), (10) the prediction error criterion can be linearized as follows:

V 2ch (t, θ 2ch (t))= 1 2

t

X

k =1

λ t θ −k ˆ

σ k 2 A (q, t)y(k) + B(q, t)u(k) 2

(11)

= 1 2

t

X

k =1

λ t θ −k ˆ

σ k 2 y(k) + φ 2ch T (k)θ 2ch (t) 2

, (12) with

θ 2ch (t) , [b 0 (t) . . . b n

(t) a 1 (t) . . . a n

(t)] T , (13) φ 2ch (k) , [u(k) . . . u(k − n B ) y (k − 1) . . . y(k − n A )] T (14) An ordinary RLS algorithm can be applied to estimate θ 2ch (t):

θ ˆ 2ch (t) = ˆ θ 2ch (t − 1) − 1 ˆ

σ t 2 R −1 (t)φ 2ch (t)ε(t, ˆ θ 2ch (t − 1)), R (t) = λ θ R (t − 1) + 1

ˆ

σ t 2 φ 2ch (t)φ 2ch T (t), (15) ε (t, ˆ θ 2ch (t − 1)) = y(t) + φ 2ch T (t) ˆ θ 2ch (t − 1). (16)

Pro: → convex cost function (no local minima)

Toon van Waterschoot ^∗ , Geert Rombouts ^∗ , Kris Struyve ^† and Marc Moonen ^∗

Concatenating the parameters vectors of F (q, t) and A(q, t), f (t) , f 0 (t) f ₁ (t) . . . f _n

(t) _T

a (t) , a ₁ (t) a ₂ (t) . . . a _n

(t) _T

θ(t) , f(t) a (t)

V _{P E} (t, θ(t)) = 1 2

λ ^t _θ ^−k ˆ

σ _k ² A (q, t)[y(k) − F (q, t)u(k)] ₂

• exponential weighting with forgetting factor λ _θ to allow tracking of a time-varying RIR

σ _k ² = λ _σ σ ˆ _k ² ₋₁ + (1 − λ _σ )ε ² (k, ˆ θ (k − 1)), (8) using the prediction error

V _2ch (t, θ _2ch (t))= 1 2

λ ^t _θ ^−k ˆ

σ _k ² A (q, t)y(k) + B(q, t)u(k) ₂

λ ^t _θ ^−k ˆ

σ _k ² y(k) + φ _2ch ^T (k)θ _2ch (t) ₂

θ _2ch (t) , [b ₀ (t) . . . b _n

(t) a ₁ (t) . . . a _n

(t)] ^T , (13) φ _2ch (k) , [u(k) . . . u(k − n B ) y (k − 1) . . . y(k − n _A )] ^T (14) An ordinary RLS algorithm can be applied to estimate θ _2ch (t):

θ ˆ _2ch (t) = ˆ θ _2ch (t − 1) − 1 ˆ

σ _t ² R ⁻¹ (t)φ _2ch (t)ε(t, ˆ θ _2ch (t − 1)), R (t) = λ _θ R (t − 1) + 1

σ _t ² φ _2ch (t)φ _2ch ^T (t), (15) ε (t, ˆ θ _2ch (t − 1)) = y(t) + φ _2ch ^T (t) ˆ θ _2ch (t − 1). (16)

σ _t ² R ⁻¹ (t)ψ(t, ˆ θ (t − 1))ε(t, ˆ θ (t − 1)). (17) The gradient vector ψ(t, θ(t)) is defined as

ψ(t, θ(t)) = ψ _f (t, a(t)) ψ _a (t, f (t))

R _f (t) 0 _(n

0 _n

+1) R _a (t)

σ _t ² R _f ⁻¹ (t)ψ _f (t)ε(t), (20) ˆ

σ _t ² R _a ⁻¹ (t)ψ _a (t)ε(t), (21) R _f (t) = λ _f R _f (t − 1) + 1

σ _t ² ψ _f (t)ψ _f ^T (t), (22) R _a (t) = λ _a R _a (t − 1) + 1

σ _t ² ψ _a (t)ψ _a ^T (t). (23)

ψ _f (t) = [ ˆ A (q, t−1)u(t) . . . ˆ A (q, t−n _F −1)u(t−n _F )] ^T , (25) ψ _a (t) = ξ(t − 1) . . . ξ(t − n _A ) _T

→ approximate prediction error and gradient vec- tors, assuming that A(q, t) remains stationary during n _F samples

Stage 1: linear prediction of the near-end signal model {a _i , σ _i ² } on a frame of length M of the echo-compensated signal vec- tor

d _i = y _i − U _i ˆ f ((i − 1)M ), (28) using a previous estimate for the RIR F (q, t).

y _ˆ _a

(k) , y(k) . . . y(k − n A ) 1 ˆ a _j

u _ˆ _a

(k) , u(k) . . . u(k − n A ) 1 ˆ a _j

ψ _f (t, ˆ a _i ) , u ˆ a

(t) . . . u _ˆ _a

(t − n _F ) _T

σ _i ² R ⁻¹ (t)ψ _f (t, ˆ a _i )ε(t, ˆ f (t − 1), ˆ a _i ) (32) R (t) = λ _f R (t − 1) + 1

σ _i ² ψ _f (t, ˆ a _i )ψ _f ^T (t, ˆ a _i ) (33) ε (t, ˆ f (t − 1), ˆ a _i ) = y _ˆ _a

(t) − ψ _f ^T (t, ˆ a _i )ˆ f (t − 1) (34) Pro: → decoupling of the data windows for estimation of