by incorporating prior knowledge in an acoustic echo canceller

(1)

Towards optimal regularization

by incorporating prior knowledge in an acoustic echo canceller

Toon van Waterschoot, Geert Rombouts and Marc Moonen

Katholieke Universiteit Leuven, ESAT-SCD, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium toon.vanwaterschoot@esat.kuleuven.be

1 Regularization in least squares parameter estimation

Consider the linear estimation problem







y(1) y(2) ...

y (N )







=







u(1) u(0) . . . u(1 − n _F ) u(2) u(1) . . . u(2 − n _F )

... ... . .. ...

u (N ) u(N − 1) . . . u(N − n _F )







· 



 f ₀ f ₁ ...

f _n

_F





 +







v(1) v(2) ...

v (N )





 ,

or in matrix notation

y = Uf + v. (1)

An estimator for f may be obtained by minimizing the least squares (LS) criterion

min ˆ f

V _LS (ˆ f ) = min

ˆ f

(y − Uˆ f ) ^T (y − Uˆ f ) (2) which results in the well-known LS estimator

ˆ f _LS = (U ^T U ) ⁻¹ U ^T y. (3) When the input signal u(t) is not (or hardly) persistently excit- ing, as is often the case when using audio signals, the matrix U ^T U may be ill-conditioned or even singular. A common so- lution is to apply Tikhonov regularization by adding a scaled identity matrix to U ^T U :

ˆ f _RgLS = (U ^T U + αI) ⁻¹ U ^T y. (4) This is in fact the solution to a ridge regression problem as for- mulated by the criterion

min ˆ f

V _RR (ˆ f ) = min

ˆ f

(y − Uˆ f ) ^T (y − Uˆ f ) + αkˆ f k ² ₂ . (5) We address the question whether it is convenient to replace the scaled identity matrix αI by a non-identity, possibly non- diagonal regularization matrix P:

ˆ f _RgLS = (U ^T U + P) ⁻¹ U ^T y, (6) and if it is, what is the optimal choice for P?

2 Linear mimimum mean square error estimation

Let us derive an expression for the minimum mean square error (MMSE) estimator of f :

min ˆ f

V _{M M SE} (ˆ f ) = min

ˆ f

E (ˆ f − f ) ^T (ˆ f − f ) (7) under the following assumptions:

• The estimator is a linear function of the data in y:

ˆ f _MMSE = Z ^T y. (8)

• The measurement noise in v is drawn from a stationary white noise process with zero mean and variance σ _v ² :

µ _v , Ev = 0, (9)

R _v , cov(v) = Evv ^T = σ _v ² I. (10)

• The true parameter vector f is considered as a random vari- able on which some prior knowledge may be available.

More specifically, let the prior probability density function (PDF) p(f ) be characterized by its first and second order mo- ments:

µ _f , Ef, (11)

R _f , cov(f) = E(f − Ef)(f − Ef) ^T . (12) Then the linear MMSE estimator can be obtained as the mean of the posterior PDF p(f |y) after the data have been recorded:

ˆ f _MMSE = E(f |y) = µ+(U ^T R _v ⁻¹ U +R _f ⁻¹ ) ⁻¹ U ^T R _v ⁻¹ (y−Uµ).

We will construct the prior knowledge on f in such a way that µ _f = 0. Then, also using the white noise assumption in (10), the expression for ˆf _MMSE can be rewritten as

ˆ f _MMSE = (U ^T U + σ _v ² R _f ⁻¹ ) ⁻¹ U ^T y. (13) From this point of view, applying Tikhonov regularization as in (4) with α = σ _v ² is equivalent to assuming that the true pa- rameter vector f is drawn from a stationary white noise pro- cess. In room acoustics applications however, more informa- tion on the true parameter may be available and an appropri- ate non-identity covariance matrix R _f can be constructed.

3 Gathering prior knowledge on room acoustics

Consider an acoustic echo cancellation (AEC) application:

far-end from

far-end to

x (t) y (t)

F ˆ u (t)

d (t)

acoustic echo path

F

v (t)

If an LS estimator with Tikhonov regularization is applied in the AEC problem, the regularization matrix P = αI (with the regularization parameter typically chosen as α = σ _v ² ) looks like this:

0

20

40

60

80

1000

20

40

60

80

100 0

0.5 1 1.5

A room impulse response (RIR) has a very typical form, which may be characterized by three parameters:

• the initial delay, which corresponds to the time needed by the loudspeaker sound wave to reach the microphone through a direct path (i.e. without reflections),

• the direct path attenuation, which determines the peak re- sponse in the RIR, and

• the exponential decay rate, which models the reverberant tail of the RIR.

0 0.02 0.04 0.06 0.08 0.1 0.12

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

t (s)

direct path attenuation

initial delay

exponential decay rate

These three parameters may be calculated from the acoustic setup (distance between loudspeaker and microphone, acous- tic absorption of the walls, room volume, etc.). Hence they can be considered as prior knowledge. If these three param- eters are taken into account, a diagonal regularization matrix may be constructed that looks like this:

0 20

40 60

80

100 0 10 20 30 40 50 60 70 80 90 100

0 20 40 60 80 100 120 140

An idealized case, which is interesting as a reference method, occurs when the true RIR is known. In this case a diagonal reg- ularization matrix may be constructed with the diagonal ele- ments equal to the inverse square values of the true parameter vector coefficients:

0

20

40

60

80

100 0

20

40

60

80

100 0

2000 4000 6000 8000 10000

4 Off-line simulation results

Four different least squares estimators were compared for the AEC application:

• the LS estimator ˆf _LS without regularization,

• the regularized LS estimator ˆf _RgLS with Tikhonov regular- ization: P = σ _v ² I ,

• the regularized LS estimator ˆf _RgLS with regularization based on the three RIR parameters described above: P = σ _v ² R ⁻¹ _f _,synth , and

• the regularized LS estimator ˆf _RgLS with regularization based on the true RIR: P = σ _v ² R ⁻¹ _f _,true .

For each estimator, 100 simulation runs were performed with a different near-end noise realization.

Two types of loudspeaker signals were used:

• a sum of n _F − 1 sinusoids with random frequencies, uni- formly distributed in the interval between DC and the Nyquist frequency, and

• a male speech signal.

Two types of acoustic impulse responses were used:

• a hearing aid impulse response with n _F + 1 = 100 coeffi- cients, and

• a room impulse response with n _F + 1 = 1000 coefficients.

0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24

ˆf_RgLS

kˆf−fk₂ kfk₂

ˆf_RgLS ˆf_RgLS

P= σ_v²I

ˆf_LS

P= σ_v²R⁻¹_{f ,synth} P= σ_v²R⁻¹_{f ,true}

F (q) = hearing aid IR u (t) = Σ sin

0.05 0.1 0.15 0.2 0.25 0.3 0.35

kˆf−fk₂ kfk₂

ˆf_LS ˆf_RgLS

P= σ_v²I

ˆf_RgLS ˆf_RgLS

P= σ_v²R⁻¹_{f ,true} P= σ_v²R⁻¹_{f ,synth}

u (t) = speech F (q) = hearing aid IR

0.1 0.12 0.14 0.16 0.18 0.2 0.22

kˆf−fk₂ kfk₂

ˆf_LS ˆf_RgLS

P= σ_v²I

ˆf_RgLS

P= σ_v²R⁻¹_{f ,synth}

ˆf_RgLS

P= σ_v²R⁻¹_{f ,true}

u (t) = Σ sin F (q) = room IR

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

kˆf−fk₂ kfk₂

ˆf_LS ˆf_RgLS

P= σ_v²I

ˆf_RgLS

P= σ_v²R⁻¹_{f ,synth}

ˆf_RgLS

P= σ_v²R⁻¹_{f ,true}

by incorporating prior knowledge in an acoustic echo canceller

Towards optimal regularization