Towards optimal regularization
by incorporating prior knowledge
Toon van Waterschoot and Marc Moonen
Katholieke Universiteit Leuven, ESAT-SCD, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium toon.vanwaterschoot@esat.kuleuven.ac.be
1 Regularization in least squares parameter estimation
Consider the linear estimation problem
y (1) y (2)
...
y (N )
=
u (1) u (0) . . . u (1 − n F ) u (2) u (1) . . . u (2 − n F )
... ... . .. ...
u(N ) u(N − 1) . . . u(N − n F )
·
f 0 f 1 ...
f n
F
+
v (1) v (2)
...
v(N )
,
or in matrix notation
y = Uf + v. (1)
An estimator for f may be obtained by minimizing the least squares (LS) criterion
min ˆ f
V LS (ˆ f ) = min
ˆ f
(y − Uˆ f ) T (y − Uˆ f ) (2) which results in the well-known LS estimator
ˆ f LS = (U T U ) −1 U T y. (3) When the input signal u(t) is not (or hardly) persistently ex- citing, the matrix U T U may be ill-conditioned or even singu- lar. A common solution is to apply Tikhonov regularization by adding a scaled identity matrix to U T U :
ˆ f RgLS = (U T U + αI) −1 U T y. (4) This is in fact the solution to a ridge regression problem as for- mulated by the criterion
min ˆ f
V RR (ˆ f ) = min
ˆ f
(y − Uˆ f ) T (y − Uˆ f ) + αkˆ f k 2 2 . (5) We address the question whether it is convenient to replace the scaled unity matrix αI by a non-identity, possibly non- diagonal regularization matrix P:
ˆ f RgLS = (U T U + P) −1 U T y, (6) and if it is, what is the optimal choice for P?
2 Linear mimimum mean square error estimation
Let us derive an expression for the minimum mean square error (MMSE) estimator of f :
min ˆ f
V M M SE (ˆ f ) = min
ˆ f
E (ˆ f − f ) T (ˆ f − f ) (7) under the following assumptions:
• The estimator is a linear function of the data in y:
ˆ f MMSE = Z T y. (8)
• The measurement noise in v is drawn from a stationary white noise process with zero mean and variance σ v 2 :
µ v , Ev = 0, (9)
R v , cov(v) = Evv T = σ v 2 I. (10)
• The true parameter vector f is considered as a random vari- able on which some prior knowledge may be available.
More specifically, let the prior probability density function (PDF) p(f ) be characterized by its first and second order mo- ments:
µ f , Ef, (11)
R f , cov(f) = E(f − Ef)(f − Ef) T . (12) Then the linear MMSE estimator can be obtained as the mean of the posterior PDF p(f |y) after the data have been recorded:
ˆ f MMSE = E(f |y) = µ+(U T R v −1 U +R f −1 ) −1 U T R v −1 (y−Uµ).
We will construct the prior knowledge on f in such a way that µ f = 0. Then, also using the white noise assumption in (10), the expression for ˆf MMSE can be rewritten as
ˆ f MMSE = (U T U + σ v 2 R f −1 ) −1 U T y. (13) From this point of view, applying Tikhonov regularization as in (4) is equivalent to assuming that the true parameter vector f is drawn from a stationary white noise process. In many ap- plications however, more information on the true parameter will be available and an appropriate non-identity covariance matrix R f may be constructed.
3 An example application:
Acoustic echo cancellation
Consider an acoustic echo cancellation (AEC) application where:
• the loudspeaker signal u(t) is an audio signal (typically speech),
• the near-end noise signal v(t) is assumed to be drawn from a stationary white noise process,
• the acoustic impulse response (AIR) F (q) = f 0 + f 1 q −1 + . . . + f n
Fq −n
F, with q the time shift operator, is assumed to be stationary, and
• the microphone signal is made up of the echo signal x(t) = F (q)u(t) and the near-end noise signal: y(t) = x(t) + v(t).
Typically the loudspeaker signal exhibits a tonal spectrum and the AIR is of high order. Therefore the lack of persistent excitation may indeed be a problem in the AEC application.
If an LS estimator with Tikhonov regularization is applied in the AEC problem, the regularization matrix P = αI (with the regularization parameter typically chosen as α = σ v 2 ) looks like this:
0
20
40
60
80
1000
20
40
60
80
100 0
0.5 1 1.5
An acoustic impulse response has a very typical form, which may be characterized by three parameters:
• the initial delay, which corresponds to the time needed by the loudspeaker sound wave to reach the microphone through a direct path (i.e. without reflections),
• the direct path attenuation, which determines the peak re- sponse in the AIR, and
• the exponential decay rate, which models the reverberant tail of the AIR.
0 0.02 0.04 0.06 0.08 0.1 0.12
−0.8
−0.6
−0.4
−0.2 0 0.2 0.4 0.6 0.8 1
t (s)
direct path attenuation
initial delay
exponential decay rate
These three parameters may be calculated from the acoustic setup (distance between loudspeaker and microphone, acous- tic absorption of the walls, room volume, etc.). Hence they can be considered as prior knowledge. If these three param- eters are taken into account, a diagonal regularization matrix may be constructed that looks like this:
0 20
40 60
80
100 0 10 20 30 40 50 60 70 80 90 100
0 20 40 60 80 100 120 140
An idealized case, which is interesting as a reference method, occurs when the true AIR is known. In this case a diagonal reg- ularization matrix may be constructed with the diagonal ele- ments equal to the inverse square values of the true parameter vector coefficients:
0
20
40
60 80
100 0
20
40
60
80
100 0
2000 4000 6000 8000 10000
4 Simulation results
Four different least squares estimators were compared for the AEC application: the LS estimator ˆf LS and the regularized LS estimator ˆf RgLS with Tikhonov regularization (P = σ v 2 I ), regu- larization based on the three AIR parameters described above (P = σ v 2 R −1 f ,synth ), and on the true AIR (P = σ v 2 R −1 f ,true ).
For each estimator, 100 simulation runs were performed with a different near-end noise realization.
Two types of loudspeaker signals were used: a sum of n F − 1 sinusoids with random frequencies, uniformly distributed in the interval between DC and the Nyquist frequency, and a male speech signal.
Two types of acoustic impulse responses were used: a hear- ing aid impulse response with n F + 1 = 100 coefficients, and a room impulse response with n F + 1 = 1000 coefficients.
0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24
ˆfRgLS
kˆf−fk2 kfk2
ˆfRgLS ˆfRgLS
P = σv2I
ˆfLS
P = σv2R−1f ,synth P = σv2R−1f ,true
F (q) = hearing aid IR u(t) = Σ sin
0.05 0.1 0.15 0.2 0.25 0.3 0.35
kˆf−fk2 kfk2
ˆfLS ˆfRgLS
P = σv2I
ˆfRgLS ˆfRgLS
P = σv2R−1f ,true P = σv2R−1f ,synth
u (t) = speech F (q) = hearing aid IR
0.1 0.12 0.14 0.16 0.18 0.2 0.22
kˆf−fk2 kfk2
ˆfLS ˆfRgLS
P = σv2I
ˆfRgLS
P = σv2R−1f ,synth
ˆfRgLS
P = σv2R−1f ,true
u(t) = Σ sin F (q) = room IR
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
kˆf−fk2 kfk2
ˆfLS ˆfRgLS
P = σv2I
ˆfRgLS
P = σv2R−1f ,synth
ˆfRgLS
P = σv2R−1f ,true