Towards optimal regularization
by incorporating prior knowledge in an acoustic echo canceller
Toon van Waterschoot, Geert Rombouts and Marc Moonen
Katholieke Universiteit Leuven, ESAT-SCD, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium toon.vanwaterschoot@esat.kuleuven.be
1 Regularization in least squares parameter estimation
Consider the linear estimation problem
y(1) y(2) ...
y (N )
=
u(1) u(0) . . . u(1 − n F ) u(2) u(1) . . . u(2 − n F )
... ... . .. ...
u (N ) u(N − 1) . . . u(N − n F )
·
f 0 f 1 ...
f n
F
+
v(1) v(2) ...
v (N )
,
or in matrix notation
y = Uf + v. (1)
An estimator for f may be obtained by minimizing the least squares (LS) criterion
min ˆ f
V LS (ˆ f ) = min
ˆ f
(y − Uˆ f ) T (y − Uˆ f ) (2) which results in the well-known LS estimator
ˆ f LS = (U T U ) −1 U T y. (3) When the input signal u(t) is not (or hardly) persistently excit- ing, as is often the case when using audio signals, the matrix U T U may be ill-conditioned or even singular. A common so- lution is to apply Tikhonov regularization by adding a scaled identity matrix to U T U :
ˆ f RgLS = (U T U + αI) −1 U T y. (4) This is in fact the solution to a ridge regression problem as for- mulated by the criterion
min ˆ f
V RR (ˆ f ) = min
ˆ f
(y − Uˆ f ) T (y − Uˆ f ) + αkˆ f k 2 2 . (5) We address the question whether it is convenient to replace the scaled identity matrix αI by a non-identity, possibly non- diagonal regularization matrix P:
ˆ f RgLS = (U T U + P) −1 U T y, (6) and if it is, what is the optimal choice for P?
2 Linear mimimum mean square error estimation
Let us derive an expression for the minimum mean square error (MMSE) estimator of f :
min ˆ f
V M M SE (ˆ f ) = min
ˆ f
E (ˆ f − f ) T (ˆ f − f ) (7) under the following assumptions:
• The estimator is a linear function of the data in y:
ˆ f MMSE = Z T y. (8)
• The measurement noise in v is drawn from a stationary white noise process with zero mean and variance σ v 2 :
µ v , Ev = 0, (9)
R v , cov(v) = Evv T = σ v 2 I. (10)
• The true parameter vector f is considered as a random vari- able on which some prior knowledge may be available.
More specifically, let the prior probability density function (PDF) p(f ) be characterized by its first and second order mo- ments:
µ f , Ef, (11)
R f , cov(f) = E(f − Ef)(f − Ef) T . (12) Then the linear MMSE estimator can be obtained as the mean of the posterior PDF p(f |y) after the data have been recorded:
ˆ f MMSE = E(f |y) = µ+(U T R v −1 U +R f −1 ) −1 U T R v −1 (y−Uµ).
We will construct the prior knowledge on f in such a way that µ f = 0. Then, also using the white noise assumption in (10), the expression for ˆf MMSE can be rewritten as
ˆ f MMSE = (U T U + σ v 2 R f −1 ) −1 U T y. (13) From this point of view, applying Tikhonov regularization as in (4) with α = σ v 2 is equivalent to assuming that the true pa- rameter vector f is drawn from a stationary white noise pro- cess. In room acoustics applications however, more informa- tion on the true parameter may be available and an appropri- ate non-identity covariance matrix R f can be constructed.
3 Gathering prior knowledge on room acoustics
Consider an acoustic echo cancellation (AEC) application:
far-end from
far-end to
x (t) y (t)
F ˆ u (t)
d (t)
acoustic echo path
F
v (t)
If an LS estimator with Tikhonov regularization is applied in the AEC problem, the regularization matrix P = αI (with the regularization parameter typically chosen as α = σ v 2 ) looks like this:
0
20
40
60
80
1000
20
40
60
80
100 0
0.5 1 1.5
A room impulse response (RIR) has a very typical form, which may be characterized by three parameters:
• the initial delay, which corresponds to the time needed by the loudspeaker sound wave to reach the microphone through a direct path (i.e. without reflections),
• the direct path attenuation, which determines the peak re- sponse in the RIR, and
• the exponential decay rate, which models the reverberant tail of the RIR.
0 0.02 0.04 0.06 0.08 0.1 0.12
−0.8
−0.6
−0.4
−0.2 0 0.2 0.4 0.6 0.8 1
t (s)
direct path attenuation
initial delay
exponential decay rate
These three parameters may be calculated from the acoustic setup (distance between loudspeaker and microphone, acous- tic absorption of the walls, room volume, etc.). Hence they can be considered as prior knowledge. If these three param- eters are taken into account, a diagonal regularization matrix may be constructed that looks like this:
0 20
40 60
80
100 0 10 20 30 40 50 60 70 80 90 100
0 20 40 60 80 100 120 140
An idealized case, which is interesting as a reference method, occurs when the true RIR is known. In this case a diagonal reg- ularization matrix may be constructed with the diagonal ele- ments equal to the inverse square values of the true parameter vector coefficients:
0
20
40
60
80
100 0
20
40
60
80
100 0
2000 4000 6000 8000 10000
4 Off-line simulation results
Four different least squares estimators were compared for the AEC application:
• the LS estimator ˆf LS without regularization,
• the regularized LS estimator ˆf RgLS with Tikhonov regular- ization: P = σ v 2 I ,
• the regularized LS estimator ˆf RgLS with regularization based on the three RIR parameters described above: P = σ v 2 R −1 f ,synth , and
• the regularized LS estimator ˆf RgLS with regularization based on the true RIR: P = σ v 2 R −1 f ,true .
For each estimator, 100 simulation runs were performed with a different near-end noise realization.
Two types of loudspeaker signals were used:
• a sum of n F − 1 sinusoids with random frequencies, uni- formly distributed in the interval between DC and the Nyquist frequency, and
• a male speech signal.
Two types of acoustic impulse responses were used:
• a hearing aid impulse response with n F + 1 = 100 coeffi- cients, and
• a room impulse response with n F + 1 = 1000 coefficients.
0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24
ˆfRgLS
kˆf−fk2 kfk2
ˆfRgLS ˆfRgLS
P= σv2I
ˆfLS
P= σv2R−1f ,synth P= σv2R−1f ,true
F (q) = hearing aid IR u (t) = Σ sin
0.05 0.1 0.15 0.2 0.25 0.3 0.35
kˆf−fk2 kfk2
ˆfLS ˆfRgLS
P= σv2I
ˆfRgLS ˆfRgLS
P= σv2R−1f ,true P= σv2R−1f ,synth
u (t) = speech F (q) = hearing aid IR
0.1 0.12 0.14 0.16 0.18 0.2 0.22
kˆf−fk2 kfk2
ˆfLS ˆfRgLS
P= σv2I
ˆfRgLS
P= σv2R−1f ,synth
ˆfRgLS
P= σv2R−1f ,true
u (t) = Σ sin F (q) = room IR
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
kˆf−fk2 kfk2
ˆfLS ˆfRgLS
P= σv2I
ˆfRgLS
P= σv2R−1f ,synth
ˆfRgLS
P= σv2R−1f ,true