• No results found

On the a-posteriori SNR of the speech-distortion weighted Wiener filter for single-channel noise reduction

N/A
N/A
Protected

Academic year: 2021

Share "On the a-posteriori SNR of the speech-distortion weighted Wiener filter for single-channel noise reduction"

Copied!
9
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

On the a-posteriori SNR of the

speech-distortion weighted Wiener filter for

single-channel noise reduction

Simon Doclo

ESAT - SCD, Katholieke Universiteit Leuven Kasteelpark Arenberg 10, 3001 Leuven, Belgium

E-mail: simon.doclo@esat.kuleuven.ac.be

Marc Moonen

ESAT - SSCD, Katholieke Universiteit Leuven Kasteelpark Arenberg, 3001 Leuven, Belgium

E-mail: marc.moonen@esat.kuleuven.ac.be October 26, 2004

(2)

1

Introduction

Wiener filtering is a commonly applied technique for noise reduction in single- and multi-channel signals [1], e.g. in speech enhancement applications [2, 3, 4, 5, 6, 7]. The standard Wiener filter (WF) minimises the mean square error (MSE) between the output signal and the speech component of the input signal. Hence, the error signal typically consists of a term related to noise reduction and a term related to speech distortion. While the standard Wiener filter assigns equal importance to both terms, a generalised version of the Wiener filter, the so-called speech-distortion weighted Wiener filter (SDW-WF) [4, 6], provides a trade-off between the noise reduction and the speech distortion term.

In this report we prove that the a-posteriori SNR after noise reduction with the single-channel (speech-distortion weighted) Wiener filter is always larger than or equal to the a-priori SNR. Although this may seem a trivial result, we have not found any proof in the literature. Strictly speaking, it is actually not so obvious that minimisation of the MSE leads to an SNR improvement, since no direct relationship exists between MSE and SNR.

2

Single-channel (SDW) Wiener filter

2.1 Wiener filter

Consider a noisy microphone signal y(n) at time n, consisting of a zero-mean clean speech signal x(n) and a zero-mean (white or coloured) noise signal v(n), which is assumed to be uncorrelated with x(n),

y(n) = x(n) + v(n) . (2.1)

The goal of the Wiener filter ho is to provide an estimate of the clean speech signal

x(n), by minimising the MSE between the clean speech signal and the output signal, ho = argmin

h

Jx(h) = argmin h

E© £x(n) − hTy(n)¤2ª, (2.2) whereT denotes transpose of a vector or a matrix, h is an FIR filter of length L and

y(n) is an L-dimensional data vector,

h = £ h0 h1 . . . hL−1 ¤T (2.3)

y(n) = £ y(n) y(n − 1) . . . y(n − L + 1) ¤T . (2.4) Solving (2.2) and using the assumption that speech and noise are uncorrelated, the Wiener filter can be written as

ho= R−y1Rxu1 = [Rx+ Rv] −1R xu1, (2.5) with Ry = E{y(n)yT(n)} (2.6) Rx = E{x(n)xT(n)} (2.7) Rv = E{v(n)vT(n)} (2.8)

(3)

the L × L-dimensional symmetric correlation matrices of the noisy signal, the clean speech signal and the noise signal, respectively, and

u1 =£ 1 0 . . . 0 ¤ T

. (2.9)

In addition, we define the normalised speech and noise correlation matrices ˜ Rx = Rx σ2 x (2.10) ˜ Rv = Rv σ2 v , (2.11)

with σx2 and σv2 the power of the speech and the noise signal and SNR = σ 2 x σ2 v (2.12) the a-priori signal-to-noise ratio (SNR). Note that the diagonal elements of the nor-malised correlation matrices ˜Rx and ˜Rv are equal to 1. Using (2.10), (2.11) and

(2.12), the Wiener filter can be written as ho = " ˜ Rx+ ˜ Rv SNR #−1 ˜ Rxu1 (2.13) 2.2 SDW Wiener filter

The MSE cost function Jx(h) in (2.2) can be written as

Jx(h) = E © £ x(n) − hTx(n)¤2ª | {z } ǫ2 x(n) + E© £hTv(n)¤2ª | {z } ǫ2 v(n) , (2.14) consisting of a term ǫ2

x(n) related to speech distortion and a term ǫ2v(n) related to noise

reduction. While the standard Wiener filter assigns equal importance to both terms, a generalised version of the Wiener filter, the so-called speech-distortion weighted Wiener filter (SDW-WF), provides a trade-off between the noise reduction and the speech distortion term [4, 6]. This generalised cost function can be written as

Jµ(h) = E© £x(n) − hTx(n)¤ 2ª

+ µ E© £hTv(n)¤2ª, (2.15) where µ ≥ 0 is the trade-off parameter between noise reduction and speech distortion. The SDW-WF hµ minimising this generalised cost function is equal to

hµ= [Rx+ µRv]−1Rxu1 = " ˜ Rx+ µ ˜Rv SNR #−1 ˜ Rxu1 (2.16)

From this formula one can see that the SDW-WF for a noisy signal with a certain priori signal-to-noise ratio SNR is equal to the WF for the same signal with a-priori signal-to-noise ratio SNR/µ. If µ > 1, the noise level is assumed to be higher than the actual level, such that the residual noise level is reduced at the expense of increased signal distortion. On the contrary, if µ < 1, signal distortion is reduced at the expense of decreased noise reduction [8].

(4)

2.3 A-posteriori SNR

The a-posteriori signal-to-noise ratio SNRo after noise reduction with the SDW-WF

hµ is equal to SNRo= hTµRxhµ hT µRvhµ = SNRh T µR˜xhµ hT µR˜vhµ (2.17) Hence, the ratio between the a-posteriori and the a-priori SNR is equal to

SNRo SNR = uT1R˜x h ˜ Rx+µ ˜ Rv SNR i−1 ˜ Rx h ˜ Rx+ µ ˜ Rv SNR i−1 ˜ Rxu1 uT1R˜x h ˜ Rx+µ ˜ Rv SNR i−1 ˜ Rv h ˜ Rx+µ ˜ Rv SNR i−1 ˜ Rxu1 . (2.18)

The goal of this report is to prove that for every possible ˜Rx, ˜Rv, SNR and µ this

ratio is larger than or equal to 1.

2.4 Generalised eigenvalue decomposition

The generalised eigenvalue decomposition of the normalised correlation matrices ˜Rx

and ˜Rv is defined as [9]

½ ˜Rx = Q Λ QT

˜

Rv = Q QT ,

(2.19) with Q an L×L-dimensional invertible, but not necessarily orthogonal, matrix and Λ an L × L-dimensional diagonal matrix, containing the generalised eigenvalues λi, i =

1 . . . L1. Using (2.19), the SDW-WF in (2.16) can be written as

hµ= Q−T [Λ + δI]−1Λq1 , (2.20)

with I the L × L-dimensional identity matrix, δ = µ/SNR ≥ 0 and q1 = QTu1. The

elements of the L-dimensional vector q1 will be denoted by

q1 =£ α1 α2 . . . αL ¤ T

. (2.21)

The ratio between the a-posteriori and the a-priori SNR can now be written as SNRo SNR = qT1 [Λ + δI]−2Λ3q 1 qT1 [Λ + δI]−2Λ2q 1 = PL i=1 λ3 i (λi+δ)2 α 2 i PL i=1 λ2 i (λi+δ)2 α 2 i . (2.22)

Since the diagonal elements of ˜Rv and ˜Rx are equal to 1, it can be easily shown that

qT1q1 = uT1 QQTu1= uT1R˜vu1= 1 (2.23)

qT1Λq1 = uT1 QΛQTu1= uT1R˜xu1= 1 , (2.24)

1

The generalised eigenvalues λi are always real and λi ≥ 0, since ˜Rx and ˜Rv are positive

(5)

such that L X i=1 α2i = 1 L X i=1 λiα2i = 1 . (2.25)

From these constraints it can be easily seen that

0 ≤ α2i ≤ 1, 0 ≤ λiα2i ≤ 1, i = 1 . . . L . (2.26)

3

Proof that SNR

o

≥ SNR

Using (2.22), proving that the a-posteriori SNR is larger than or equal to the a-priori SNR is equivalent to proving that

L X i=1 λ2 i(λi− 1) (λi+ δ)2 α2i ≥ 0 . (3.1)

Incorporating the constraints formulated in (2.25), this comes down to proving that the solution of the constrained minimisation problem

min L X i=1 λ2i(λi− 1) (λi+ δ)2 α2i, subject to L X i=1 α2i = 1, L X i=1 λiα2i = 1 (3.2) is equal to 0. CASE 1:δ = 0 (µ = 0 and/or SNR = ∞) In this case we can easily prove that

L X i=1 λ2i(λi− 1) (λi+ δ)2 α2i = L X i=1 (λi− 1) α2i = 1 − 1 = 0 . (3.3) CASE 2:δ > 0

The constrained optimisation problem (3.2) can be solved by introducing the Lagrange-multipliers β and γ, defining the cost function

J(β, γ) = L X i=1 λ2i(λi− 1) (λi+ δ)2 α2i + β " L X i=1 α2i − 1 # + γ " L X i=1 λiα2i − 1 # , (3.4) and setting the (2L + 2) partial derivatives equal to zero, i.e.

∂J(β, γ) ∂λi = λi(λ 2 i + 3δλi− 2δ) (λi+ δ)3 α2i + γα2i = 0, i = 1 . . . L (3.5) ∂J(β, γ) ∂αi = 2λ 2 i(λi− 1) (λi+ δ)2 αi+ 2βαi+ 2γλiαi= 0, i = 1 . . . L (3.6) ∂J(β, γ) ∂β = L X i=1 α2i − 1 = 0 (3.7) ∂J(β, γ) ∂γ = L X i=1 λiα2i − 1 = 0 . (3.8)

(6)

The minimum cost function is found by solving this set of equations for the (2L + 2) variables λi (i = 1 . . . L), αi (i = 1 . . . L), β and γ. For any solution (provided that

more than one solution exists), one can immediately verify that by multiplying (3.6) by αi/2 and summing over all i, one obtains

L X i=1 λ2 i(λi− 1) (λi+ δ)2 α2i + β L X i=1 α2i + γ L X i=1 λiα2i = 0 , (3.9)

such that the minimum cost function Jmin is equal to

Jmin = L X i=1 λ2i(λi− 1) (λi+ δ)2 α2i = −β − γ . (3.10)

We will now simultaneously solve the set of equations (3.5)-(3.8). Since δ > 0, equations (3.5) and (3.6) can be rewritten as

α2i £λi(λ2i + 3δλi− 2δ) + γ(λi+ δ)3 ¤ | {z } f(λi,γ,δ) = 0, i = 1 . . . L (3.11) 2αi £ λ2i(λi− 1) + β(λi+ δ)2+ γλi(λi+ δ)2 ¤ | {z } g(λi,β,γ,δ) = 0, i = 1 . . . L (3.12)

where f (λ, γ, δ) and g(λ, β, γ, δ) are 3th-order polynomials in the variable λ. From (3.11) and (3.12), one can verify that the variables {λi, αi}, i = 1 . . . L, either satisfy

(A) αi= 0, λi ≥ 0 (3.13)

(B) 0 < α2i ≤ 1, f (λi, γ, δ) = 0 and g(λi, β, γ, δ) = 0 . (3.14)

It is not possible for all solutions {λi, αi} to belong to (A), since in that case

PL

i=1α2i = 0. The variables λi, i ∈ B, can only take on a finite number of

pos-sible values, since for a certain value of β, γ, and δ, the polynomials f (λ, γ, δ) and g(λ, β, γ, δ) have a finite number of common roots. In Section 4 it is shown that the polynomials f (λ, γ, δ) and g(λ, β, γ, δ) maximally have 2 common roots, which we will represent by λB1 and λB2. Moreover, it is shown that λB1 and λB2 satisfy one

of the following conditions:       

(a) λB1, λB2 are complex

(b) 0 < λB1 < 23, 0 < λB2≤ 13

(c) λB1 > 0, λB2 ≤ 0

(d) λB1 < 0, λB2 < 0

(3.15)

Since λB1 and λB2 represent real and positive generalised eigenvalues, conditions (a)

and (d) are impossible. Condition (b) corresponds to 2 possible positive solutions λB1 < 23, λB2≤ 13, while condition (c) corresponds to 1 possible solution2 λB1 > 0.

2

The case λB2= 0 is not feasible, since in order for f (λ, γ, δ) and g(λ, β, γ, δ) to share a common

root λ = 0, one should have f (0, γ, δ) = γδ3

= 0 and g(0, β, γ, δ) = βδ2

= 0, hence β = 0 and γ = 0 (since δ > 0). If β = 0 and γ = 0, the variables {λi, αi}, i = 1 . . . L, either satisfy (A) αi= 0, λi≥ 0,

(B) 0 < α2

i ≤ 1, λi= 0, or (C) λ 2

i+ 3δλi− 2δ = 0, λi− 1 = 0 (which would imply δ = 0). Moreover,

it is not possible for all solutions {λi, αi} to belong to (A) and (B), since in that case

PL i=1λiα

2 i = 0.

(7)

In addition to satisfying (3.11) and (3.12), the variables {λi, αi}, i = 1 . . . L, should

also satisfy the constraints (3.7) and (3.8), i.e.

L X i=1 α2i = X i∈A α2i +X i∈B α2i =X i∈B αi2= 1 (3.16) L X i=1 λiα2i = X i∈A λiα2i + X i∈B λiα2i = X i∈B λiα2i = 1 . (3.17)

However, using (3.16), one can show that for condition (b)

L X i=1 λiα2i = λB1 X i∈B1 α2i + λB2 X i∈B2 α2i < X i∈B1 α2i + X i∈B2 α2i =X i∈B α2i = 1 , (3.18) which is in contradiction with (3.17). Hence, condition (b) is also impossible, leaving condition (c) as the only possibility. Using (3.16), one can show that for condition (c)

L X i=1 λiα2i = λB1 X i∈B α2i = λB1 , (3.19)

such that λB1should be equal to 1 in order to satisfy (3.17). Hence, solving f (λi, γ, δ) =

0 and g(λi, β, γ, δ) = 0 for β and γ with λi= λB1 = 1 leads to

γ = − δ

(1 + δ)3, β =

δ

(1 + δ)3 , (3.20)

such that, using (3.10),

Jmin= −β − γ = 0 (3.21)

4

Solutions for f (λ, γ, δ) = 0 and g(λ, β, γ, δ) = 0

The 3-th order polynomials f (λ, γ, δ) and g(λ, β, γ, δ), defined in (3.11) and (3.12), can be written as

f (λ, γ, δ) = (1 + γ)λ3+ 3δ(1 + γ)λ2+ δ(3γδ − 2)λ + γδ3 (4.1) g(λ, β, γ, δ) = (1 + γ)λ3+ (2γδ + β − 1)λ2+ δ(2β + γδ)λ + βδ2 (4.2) For any λ for which f (λ, γ, δ) = 0 and g(λ, β, γ, δ) = 0, it also holds that

f (λ, γ, δ)−g(λ, β, γ, δ) = (3δ +γδ −β +1)λ2+2δ(γδ −β −1)λ+δ2(γδ −β) = 0 , (4.3) such that the polynomials f (λ, γ, δ) and g(λ, β, γ, δ) can maximally have 2 common roots, which we will represent by λB1 and λB2,

λB1 = h ρ + 1 +p1 + 3(δ + 1)ρiδ 3δ + 1 − ρ (4.4) λB2 = h ρ + 1 −p1 + 3(δ + 1)ρiδ 3δ + 1 − ρ , (4.5) with ρ = β − γδ.

(8)

Depending on the value of ρ, different cases can be distinguished: 1. ρ < −1/3(δ + 1) : both λB1 and λB2 are complex

2. ρ = −1/3(δ + 1) : λB1 = λB2= 3δ+2δ (for all possible δ : 0 < λB1= λB2 ≤ 13)

3. −1/3(δ + 1) < ρ < 0 : δ

3δ+2 < λB1 < 3δ+12δ , 0 < λB2 < 3δ+2δ (for all possible δ :

0 < λB1 < 23, 0 < λB2 < 13)

4. ρ = 0 : λB1 = 3δ+12δ , λB2 = 0 (for all possible δ : 0 < λB1 < 23)

5. 0 < ρ < 3δ + 1 : λB1 > 3δ+12δ , −(3δ+1)δ2(3δ+2) < λB2< 0 (for all possible δ : λB1 > 0)

6. ρ = 3δ + 1 : (4.3) reduces to a linear equation with one solution λB2 = −(3δ+1)δ2(3δ+2)

7. ρ > 3δ + 1 : λB1 < −δ, −δ < λB2 < −(3δ+1)δ2(3δ+2)

These cases can be summarised as follows:       

(a) λB1, λB2 are complex

(b) 0 < λB1 < 23, 0 < λB2≤ 13

(c) λB1 > 0, λB2 ≤ 0

(d) λB1 < 0, λB2 < 0

(4.6)

5

Acknowledgements

Simon Doclo is a postdoctoral researcher supported by the Fund for Scientific Re-search - Flanders (FWO-Vlaanderen). This reRe-search work was carried out at the ESAT laboratory of the Katholieke Universiteit Leuven, in the frame of the F.W.O. Project G.0233.01, Signal processing and automatic patient fitting for advanced audi-tory prostheses, the I.W.T. Project 020540, Performance improvement of cochlear im-plants by innovative speech processing algorithms, the I.W.T. Project 020476, Sound Management System for Public Address systems (SMS4PA), the Concerted Research Action GOA-MEFISTO-666, and the Interuniversity Attraction Pole IUAP P5-22.

(9)

References

[1] L. L. Scharf, Statistical Signal Processing : Detection, Estimation and Time Series Analysis, Addison Wesley, 1st edition, July 1991.

[2] S. F. Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 27, no. 2, pp. 113–120, Apr. 1979.

[3] Y. Ephraim and D. Malah, “Speech Enhancement Using a Minimun Mean-Square Error Short-Time Spectral Amplitude Estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 32, no. 6, pp. 1109–1121, Dec. 1984.

[4] Y. Ephraim and H. L. Van Trees, “A Signal Subspace Approach for Speech Enhancement,” IEEE Trans. Speech and Audio Processing, vol. 3, no. 4, pp. 251–266, July 1995.

[5] E. J. Diethorn, Subband Noise Reduction Methods for Speech Enhancement, chap-ter 9 in “Acoustic Signal Processing for Telecommunication” (Gay, S. L. and Benesty, J., Eds.), pp. 155–178, Kluwer Academic Publishers, Boston, 2000. [6] S. Doclo and M. Moonen, “GSVD-based optimal filtering for single and

multi-microphone speech enhancement,” IEEE Trans. Signal Processing, vol. 50, no. 9, pp. 2230–2244, Sept. 2002.

[7] J. Benesty, J. Chen, and A. Huang, Study of the Wiener Filter for Noise Re-duction, chapter 2 in “Speech Enhancement: What’s New ?, Springer-Verlag, 2005.

[8] S. Doclo, Multi-microphone noise reduction and dereverberation techniques for speech applications, Ph.D. thesis, ESAT, Katholieke Universiteit Leuven, Belgium, May 2003.

[9] G. H. Golub and C. F. Van Loan, Matrix Computations, MD : John Hopkins University Press, Baltimore, 3rd edition, 1996.

Referenties

GERELATEERDE DOCUMENTEN

It was previously proven that a binaural noise reduction procedure based on the Speech Distortion Weighted Multi-channel Wiener Filter (SDW-MWF) indeed preserves the speech

o Multi-channel Wiener filter (but also e.g. Transfer Function GSC) speech cues are preserved noise cues may be distorted. • Preservation of binaural

In addition to reducing the noise level, it is also important to (partially) preserve these binaural noise cues in order to exploit the binaural hearing advantage of normal hearing

In [6], a binaural multi-channel Wiener filter, providing an en- hanced output signal at both ears, has been discussed. In ad- dition to significantly suppressing the background

In this paper, a multi-channel noise reduction algorithm is presented based on a Speech Distortion Weighted Multi-channel Wiener Filter (SDW-MWF) approach that incorporates a

In this paper we establish a generalized noise reduction scheme, called the Spatially Pre-processed Speech Distortion Weighted Multi-channel Wiener filter (SP-SDW-MWF), that

In this paper we establish a generalized noise reduction scheme, called the Spatially Pre-processed Speech Distortion Weighted Multi-channel Wiener Filter (SP-SDW-MWF), that

This paper presents a variable Speech Distortion Weighted Multichannel Wiener Filter (SDW-MWF) based on soft out- put Voice Activity Detection (VAD) which is used for noise reduction