by incorporating prior knowledge

(1)

Towards optimal regularization

by incorporating prior knowledge

Toon van Waterschoot and Marc Moonen

Katholieke Universiteit Leuven, ESAT-SCD, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium toon.vanwaterschoot@esat.kuleuven.ac.be

1 Regularization in least squares parameter estimation

Consider the linear estimation problem







y (1) y (2)

...

y (N )







=







u (1) u (0) . . . u (1 − n _F ) u (2) u (1) . . . u (2 − n _F )

... ... . .. ...

u(N ) u(N − 1) . . . u(N − n _F )







· 



 f ₀ f ₁ ...

f _n

_F





 +







v (1) v (2)

...

v(N )





 ,

or in matrix notation

y = Uf + v. (1)

An estimator for f may be obtained by minimizing the least squares (LS) criterion

min ˆ f

V _LS (ˆ f ) = min

ˆ f

(y − Uˆ f ) ^T (y − Uˆ f ) (2) which results in the well-known LS estimator

ˆ f _LS = (U ^T U ) ⁻¹ U ^T y. (3) When the input signal u(t) is not (or hardly) persistently ex- citing, the matrix U ^T U may be ill-conditioned or even singu- lar. A common solution is to apply Tikhonov regularization by adding a scaled identity matrix to U ^T U :

ˆ f _RgLS = (U ^T U + αI) ⁻¹ U ^T y. (4) This is in fact the solution to a ridge regression problem as for- mulated by the criterion

min ˆ f

V _RR (ˆ f ) = min

ˆ f

(y − Uˆ f ) ^T (y − Uˆ f ) + αkˆ f k ² ₂ . (5) We address the question whether it is convenient to replace the scaled unity matrix αI by a non-identity, possibly non- diagonal regularization matrix P:

ˆ f _RgLS = (U ^T U + P) ⁻¹ U ^T y, (6) and if it is, what is the optimal choice for P?

2 Linear mimimum mean square error estimation

Let us derive an expression for the minimum mean square error (MMSE) estimator of f :

min ˆ f

V _{M M SE} (ˆ f ) = min

ˆ f

E (ˆ f − f ) ^T (ˆ f − f ) (7) under the following assumptions:

• The estimator is a linear function of the data in y:

ˆ f _MMSE = Z ^T y. (8)

• The measurement noise in v is drawn from a stationary white noise process with zero mean and variance σ _v ² :

µ _v , Ev = 0, (9)

R _v , cov(v) = Evv ^T = σ _v ² I. (10)

• The true parameter vector f is considered as a random vari- able on which some prior knowledge may be available.

More specifically, let the prior probability density function (PDF) p(f ) be characterized by its first and second order mo- ments:

µ _f , Ef, (11)

R _f , cov(f) = E(f − Ef)(f − Ef) ^T . (12) Then the linear MMSE estimator can be obtained as the mean of the posterior PDF p(f |y) after the data have been recorded:

ˆ f _MMSE = E(f |y) = µ+(U ^T R _v ⁻¹ U +R _f ⁻¹ ) ⁻¹ U ^T R _v ⁻¹ (y−Uµ).

We will construct the prior knowledge on f in such a way that µ _f = 0. Then, also using the white noise assumption in (10), the expression for ˆf _MMSE can be rewritten as

ˆ f _MMSE = (U ^T U + σ _v ² R _f ⁻¹ ) ⁻¹ U ^T y. (13) From this point of view, applying Tikhonov regularization as in (4) is equivalent to assuming that the true parameter vector f is drawn from a stationary white noise process. In many ap- plications however, more information on the true parameter will be available and an appropriate non-identity covariance matrix R _f may be constructed.

3 An example application:

Acoustic echo cancellation

Consider an acoustic echo cancellation (AEC) application where:

• the loudspeaker signal u(t) is an audio signal (typically speech),

• the near-end noise signal v(t) is assumed to be drawn from a stationary white noise process,

• the acoustic impulse response (AIR) F (q) = f ₀ + f ₁ q ⁻¹ + . . . + f _n

_F

q ⁻ⁿ

^F

, with q the time shift operator, is assumed to be stationary, and

• the microphone signal is made up of the echo signal x(t) = F (q)u(t) and the near-end noise signal: y(t) = x(t) + v(t).

Typically the loudspeaker signal exhibits a tonal spectrum and the AIR is of high order. Therefore the lack of persistent excitation may indeed be a problem in the AEC application.

If an LS estimator with Tikhonov regularization is applied in the AEC problem, the regularization matrix P = αI (with the regularization parameter typically chosen as α = σ _v ² ) looks like this:

0

20

40

60

80

1000

20

40

60

80

100 0

0.5 1 1.5

An acoustic impulse response has a very typical form, which may be characterized by three parameters:

• the initial delay, which corresponds to the time needed by the loudspeaker sound wave to reach the microphone through a direct path (i.e. without reflections),

• the direct path attenuation, which determines the peak re- sponse in the AIR, and

• the exponential decay rate, which models the reverberant tail of the AIR.

0 0.02 0.04 0.06 0.08 0.1 0.12

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

t (s)

direct path attenuation

initial delay

exponential decay rate

These three parameters may be calculated from the acoustic setup (distance between loudspeaker and microphone, acous- tic absorption of the walls, room volume, etc.). Hence they can be considered as prior knowledge. If these three param- eters are taken into account, a diagonal regularization matrix may be constructed that looks like this:

0 20

40 60

80

100 0 10 20 30 40 50 60 70 80 90 100

0 20 40 60 80 100 120 140

An idealized case, which is interesting as a reference method, occurs when the true AIR is known. In this case a diagonal reg- ularization matrix may be constructed with the diagonal ele- ments equal to the inverse square values of the true parameter vector coefficients:

0

20

40

60 80

100 0

20

40

60

80

100 0

2000 4000 6000 8000 10000

4 Simulation results

Four different least squares estimators were compared for the AEC application: the LS estimator ˆf _LS and the regularized LS estimator ˆf _RgLS with Tikhonov regularization (P = σ _v ² I ), regu- larization based on the three AIR parameters described above (P = σ _v ² R ⁻¹ _f _,synth ), and on the true AIR (P = σ _v ² R ⁻¹ _f _,true ).

For each estimator, 100 simulation runs were performed with a different near-end noise realization.

Two types of loudspeaker signals were used: a sum of n _F − 1 sinusoids with random frequencies, uniformly distributed in the interval between DC and the Nyquist frequency, and a male speech signal.

Two types of acoustic impulse responses were used: a hear- ing aid impulse response with n _F + 1 = 100 coefficients, and a room impulse response with n _F + 1 = 1000 coefficients.

0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24

ˆf_RgLS

kˆf−fk₂ kfk₂

ˆf_RgLS ˆf_RgLS

P = σv²I

ˆf_LS

P = σv²R⁻¹_{f ,synth} P = σv²R⁻¹_{f ,true}

F (q) = hearing aid IR u(t) = Σ sin

0.05 0.1 0.15 0.2 0.25 0.3 0.35

kˆf−fk₂ kfk₂

ˆf_LS ˆf_RgLS

P = σv²I

ˆf_RgLS ˆf_RgLS

P = σv²R⁻¹_{f ,true} P = σv²R⁻¹_{f ,synth}

u (t) = speech F (q) = hearing aid IR

0.1 0.12 0.14 0.16 0.18 0.2 0.22

kˆf−fk₂ kfk₂

ˆf_LS ˆf_RgLS

P = σv²I

ˆf_RgLS

P = σv²R⁻¹_{f ,synth}

ˆf_RgLS

P = σv²R⁻¹_{f ,true}

u(t) = Σ sin F (q) = room IR

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

kˆf−fk₂ kfk₂

ˆf_LS ˆf_RgLS

P = σv²I

ˆf_RgLS

P = σv²R⁻¹_{f ,synth}

ˆf_RgLS

P = σv²R⁻¹_{f ,true}

by incorporating prior knowledge

Towards optimal regularization