2 A-priori speech and noise model: fixed beamforming

(1)

Unification of multi-microphone noise reduction systems

Ann Spriet, Simon Doclo, Marc Moonen, Jan Wouters 7th February 2006

1 General cost function

1.1 Signal model

Let X_i( f ), i = 1, . . . , M denote the frequency-domain microphone signals. Each microphone signal Xi( f ), i = 1, . . . , M can be decomposed into a speech component X_i^s( f ) and an additive noise component X_iⁿ( f ) as:

Xi( f ) = X_i^s( f ) + X_iⁿ( f ). (1)

Defining H_i^s( f ) as the acoustic transfer function from the speech source S( f ) to the i-th microphone, the speech component X_i^s( f ) equals:

X_i^s( f ) = H_i^s( f )S( f ) =H_i^s( f )

H₁^s( f )X₁^s( f ), (2)

were ˜H_i^s( f ) =^H_Hⁱ^ss^{( f )}

1( f )denotes the relative transfer function ratio of the i-th to the first microphone.

Let X( f ) ∈ C^M×1be defined as the stacked vector X( f ) =

X₁( f ) X₂( f ) · · · X_M( f ) T

. (3)

Then, (1) can be written as

X( f ) = X^s( f ) + Xⁿ( f ) = H^s( f )S( f ) + Xⁿ( f ) = ˜H^s( f )X₁^s( f ) + Xⁿ( f ), (4) with X^s( f ) and Xⁿ( f ) defined similarly as in (3) and

H˜^s( f ) = H^s( f )/H₁^s( f ) =h

1 ^H_H²^ss^{( f )}

1( f ) . . . ^H_H^M^ss^{( f )} 1( f )

iT

. (5)

To simplify notation, we define the power spectral density (PSD) of the speech and noise component in the i-th microphone signal as

P_X^s

i( f ) = ε{X_i^s( f )X_i^∗,s( f )} (6)

P_Xⁿ

i( f ) = ε{X_iⁿ( f )X_i^∗,n( f )} (7)

and the PSD of the speech source S( f ) as

P_S( f ) =ε{S( f )S^∗( f )}. (8)

In addition, we define the noise and speech correlation matrix as:

Rⁿ( f ) = ε{Xⁿ( f )X^n,H( f )}, (9)

R^s( f ) = ε{X^s( f )X^s,H( f )} = P_X^s₁( f ) ˜H^s( f ) ˜H^s,H( f ). (10)

1.2 Free-field propagation model

Single point source

Assuming free-field propagation, mathematical expressions can be derived for the acoustic tranfer function from a point source S( f , p) to the M microphones. Let S( f , p) be defined as a point source at location p with cartesian coordinates p = (x, y, z) and spherical coordinates(R,θ,φ) (where R is the distance,θthe azimuth,φthe elevation) as defined in Figure 1. Without loss of generality, we define the origin of the coordinates at the position of the first microphone of the microphone array. The contribution X_i( f , p) of the point source S( f , p) in the i-th microphone signal (with coordinates pi) equals

(2)

X_i( f , p) = A_i( f , p)ai(p)e^{− j2π}^f^τⁱ^(p)S( f , p), (11) where A_i( f , p) = Ai( f ,θ,φ) includes the microphone characteristics of the i-th microphone (and in the case of a hearing aid, the head related transfer function to the i-th microphone), a_i(p) is the attenuation of the point source S( f , p) at the position of the i-th microphone (near-field effect) and

τi(p) =kp − pik

c (12)

with c the speed of sound (340 m/s), is the propagation delay from the point source S( f , p) to the i-th microphone. Defining the first microphone signal X₁( f , p) as reference signal, X( f , p) can be written as:

X( f , p) = ˜d( f , p)X1( f , p) (13)

where ˜d( f , p) is the steering vector

˜d( f ,p) =h

1 ^A_A²^{( f ,p)}

1( f ,p) a₂(p)

a₁(p)e^{− j2π}^f(τ²^(p)−τ¹^(p)) . . . ^A_A^M^{( f ,p)}

1( f ,p) a_M(p)

a₁(p)e^{− j2πf}^(τ^M^(p)−τ¹^(p)) iT

. (14)

Using (28), it can be shown that the PSD P_X₁_{( f ,p)}( f ) of the first microphone signal X1( f , p) (i.e., the reference signal) equals:

P_X

1( f ,p)( f ) = |A1( f , p)a1(p)|²P_{S( f ,p)}( f ), (15)

where P_{S( f ,p)}( f ) is the PSD of the source S( f , p). If the first microphone is omnidirectional (i.e., A1( f , p) = 1), the PSD of the first microphone signal equals the PSD of the source signal S( f , p) up to a scalar |a1(p)|². An estimate of the first microphone signal is then a scaled and delayed version of the estimate of the source signal.

Multiple point sources

If several point sources S( f , p) at positions p ∈ P are propagating, the microphone signals X( f ) can be modeled as:

X( f ) = Z

p∈P˜d( f ,p)X1( f , p), (16)

with X1( f , p) defined by (28).

For uncorrelated point sources

ε{X1( f , pk)X1( f , pl)} = PX₁( f )δkl. (17) The model (16)-(17) can be used when the speech/noise sources cover a certain known region in space or when an approximate position of the speech source is known.

Remark: The free-field propagation model assumes that there is no reverberation and that the microphone characteristics and positions, the HRTFs (in case of hearing aids) are known. In practice, these assumptions will often be violated (e.g., microphone mismatch, reverberation) such that the true model (4) deviates from the free-field model (13)-(14), i.e.,

H( f ) = ˜d( f ) +˜ δ˜d( f ). (18)

Because of this deviation, techniques assuming a free-field propagation model may have a performance degradation in prac- tice. The amount of degradation depends on the deviationδ˜d( f ).

1.2.1 Far-field propagation

For far-field propagation, (14) equals

˜d( f ,p) =h

1 ^A_A²^{( f ,p)}

1( f ,p)e^{− j2πf}^(τ²^(p)−τ¹^(p)) . . . ^A_A^M^{( f ,p)}

1( f ,p)e^{− j2πf}^(τ^M^(p)−τ¹^(p)) iT

, (19)

where

τj(p) −τ1(p) =−xjsinφcosθ− yjsinφsinθ− zjcosφ

c (20)

For the special case of a linear array (i.e.,φ= 90^◦, y= 0),τj(p) −τ1(p) reduces to:

τj(p) −τ1(p) =−xjcosθ

c . (21)

(3)

00 11

0000000000

1111111111000000000000000000000000000000000000000000000 11111 11111 11111 11111 11111 11111 11111 11111 11111 000000

000000 000000 000000 000000 000000 000000 000000

111111 111111 111111 111111 111111 111111 111111 111111

00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111

000000 000000 000000 000000 000000 000000 000000 000000 000000

111111 111111 111111 111111 111111 111111 111111 111111 111111

θ φ

x

y z

Figure 1: Coordinate system with cartesian and spherical coordinates. The origin corresponds to the position of the first microphone.

1.2.2 Near-field propagation

For near-field propagation, the steering vector ˜d( f , p) equals

˜d( f ,p) =h

1 ^A_A²^{( f ,p)}

1( f ,p) a2(p)

a1(p)e^{− j2π}^f(τ²^(p)−τ¹^(p)) . . . ^A_A^M^{( f ,p)}

1( f ,p) aM(p)

a1(p)e^{− j2πf}^(τ^M^(p)−τ¹^(p)) iT

, (22)

with

a_i(p) = 1

kp − pik. (23)

1.3 Multi-microphone noise reduction

In a multi-microphone noise reduction system, the microphone signals X_i( f ) are filtered by (adaptive or fixed) filters Wi( f ) and combined in order to obtain an enhanced speech signal Z( f ). To simplify the formulation of the noise reduction algorithms, we define

W( f ) =

W₁( f ) W2( f ) · · · W_M( f ) H

, (24)

with W_l( f ) =∑^L−1_m=0w_l,me^{− j2π}

f fsm

. The output Z( f ) of the multi-channel noise reduction algorithm can then be expressed as Z( f ) = W^H( f )X( f ) = W^H( f ) X^s( f )

| {z }

Z^s( f )

+ W^H( f )Xⁿ( f )

| {z }

Zⁿ( f )

. (25)

The goal of the filter W( f ) is to minimize the output noise energy as much as possible without severely distorting the speech signal. The amount of speech distortion is measured with respect to a reference speech signal D^s( f ). This reference signal can be the speech component X₁^s( f ) in the first microphone, the speech source signal S( f ) or the speech component in the output of a fixed beamformer (e.g., the speech reference in the spatially pre-processed SDW-MWF[?]).

A general cost function J(W( f )) for the filter W( f ) is:

J(W) = (1 −λ)W^H( f )Rⁿ( f )W( f ) +λW^H( f )Rⁿ_m( f )W( f )))^H}

+µ1ε{(D^s( f ) − W^H( f )X^s( f ))(D^s( f ) − W^H( f )X^s( f + µ2ε{(D^s_m( f ) − W^H( f )X^s_m( f ))(D^s_m( f ) − W^H( f )X^s_m( f ))^H}. (26) The first two terms in J(W) correspond to output noise power. This output noise power can be:

• estimated online (i.e., the term W^HRⁿ( f )W( f )) or

(4)

Speech model Noise model Hard/Soft constraint on speech distortion Technique Fixed beamforming

A-priori A-priori µ₁= 0 µ₂=∞ LCMV-based

(Section 2) µ₁= 0 µ₂6=∞ Weighted LS criterion

None A-priori µ₁= 0 µ₂= 0 Differential microphone array

(Section 2) (Constraint on W( f ))

Adaptive beamforming

A-priori Online µ1= 0 µ2=∞ GSC

(Section 3) µ1= 0 µ26=∞ Soft-constrained MWF (Nordholm)

Combination µ1= 0 µ2=∞ Sensitivity constrained GSC

µ₁= 0 µ₂6=∞ Soft-constrained with partial noise model (E.g. Nordholm calibration data)

Online Online µ₁=∞ µ₂= 0 TF-LCMV/ABM

(Section 4) µ₁6=∞ µ₂= 0 SDW-MWF

Combination µ₁=∞ µ₂= 0 TF-LCMV with (partial) noise model

µ₁6=∞ µ₂= 0 SDW-MWF with (partial) noise model

Fixed µ1=∞ µ2= 0 TF-LCMV with noise model

µ16=∞ µ2= 0 SDW-MWF with noise model (not useful)

Combination Online µ₁6=∞ µ₂=∞ SDR-GSC

(Section 5) µ₁6=∞ µ₂6=∞ Combination SDW-MWF/soft-constrained

µ₁=∞ µ₂6=∞ Combination TF-LCMV/GSC (cf. Kates)

Combination µ₁6=∞ µ₂=∞ SDR-GSC with partial noise model

µ₁6=∞ µ₂6=∞ SDW-MWF/Soft-constrained + partial noise model

µ₁=∞ µ₂6=∞ TF-LCMV/GSC with partial noise model

Fixed µ₁=∞ µ₂6=∞ TF-LCMV/GSC with noise model

Table 1: Classification of multi-microphone noise reduction techniques.

• based on a pre-defined model Rⁿ_m( f ) of the noise correlation matrix, which is constructed through calibration measure- ments or mathematical models.

The last two terms in J(W) denote the distortion energy between the output speech component W^H( f )X^s( f ) (or W^H( f )X^s_m( f )) and a reference speech signal D^s( f ) (or D^s_m( f )). Again, the output speech distortion energy may be

• estimated online (i.e., asε{(D^s( f ) − W^H( f )X^s( f ))(D^s( f ) − W^H( f )X^s( f ))^H}) or

• based on a pre-defined model X^s_m( f ) for the microphone signals (i.e., asε{(D^s_m( f )−W^H( f )X^s_m( f ))(D^s_m( f )−W^H( f )X^s_m( f ))^H}).

Again, this model can be constructed based on calibration data or based on mathematical models.

Depending on the use of a-priori knowledge of the speech and/or noise correlation matrix and the use of a hard constraint on the speech distortion term (i.e., µ_1,2=∞or µ_1,26=∞), different existing multi-microphone noise reduction techniques can be obtained, as indicated in Table 1. When using a hard constraint (µ1=∞or µ2=∞), noise suppression is only achieved in the subspace orthogonal to the defined or actual subspace of speech. Signals in the (defined or actual) speech subspace are passed through undistorted by the noise reduction algorithm. The use of a soft-constraint (µ₁6=∞or µ₂6=∞) typically results in a spectral filtering of the desired speech component D^s( f ) since the speech and noise subspace are generally not orthogonal (often, the noise subspace spans the complete space).

Below, the different techniques are explained in more detail.

2 A-priori speech and noise model: fixed beamforming

2.1 Hard constraint on speech distortion (µ₂=∞)

Cost function

J(W) = W^H( f )Rⁿ_m( f )W( f ) + µ2ε{(D^s_m( f ) − W^H( f )X^s_m( f ))(D^s_m( f ) − W^H( f )X^s_m( f ))^H} (27) with µ₂=∞.

(5)

Assumed speech model The free-field propagation model in (13)-(14) is assumed:

X^s_m( f ) = ˜d^s( f , p^s)X_m,1^s ( f ), (28)

where p^srefers to the position of the speech source. The reference signal D^s_m( f ) = X_m,1^s ( f ).

Assumed noise model Different noise models have been used in the literature. The most well-known are:

• Delay-and-sum beamformer: a homogeneous, spatially uncorrelated noise field is assumed, i.e.,

Rⁿ_m( f ) = PN( f )IM, (29)

with P_Xⁿ

i( f ) = PN( f ), i = 1, . . . , M.

• Superdirective beamformer [?]: a homogeneous, diffuse (spherical isotropic) noise field is assumed, i.e.,

Rⁿ_m( f ) = PN( f )Γⁿ( f ) (30)

with P_Xⁿ

i( f ) = PN( f ), i = 1, . . . , M andΓⁿ( f ) the coherence matrix of diffuse noise, i.e.,

Γ( f ) =







1 Γ12( f ) · · · Γ1M( f ) Γ21( f ) 1 · · · Γ2M( f )

... ... ... ... ΓM1( f ) ΓM2( f ) · · · 1







(31)

Γkl( f ) =sin(2πfkpl− pkk /c)

2πfkpl− pkk /c = sinc(2πfkpl− pkk /c) (32) withkpl− pkk the interspacing between microphone l and k.

• Combination of diffuse and spatially uncorrelated noise field: sensitivity constrained superdirective beamformer [?, ?]

Rⁿ_m( f ) = PN( f ) (Γⁿ_m( f ) +η( f )IM) ,

withΓⁿ_m( f ) the coherence matrix of diffuse noise andη( f ) a (frequency-dependent) weighting factor.

Solution The filter that minimizes (27) equals W( f ) =

Rⁿ_m( f ) + µ2P_X^s

m,1( f ) ˜d^s( f , p^s) ˜d^s,H( f , p^s)−1

µ₂P_X^s

m,1( f ) ˜d^s( f , p^s). (33) Since ˜d^s( f , p^s) ˜d^s,H( f , p^s) equals a rank-one matrix and

m,1( f ) ˜d^s( f , p^s) ˜d^s,H( f , p^s)

is assumed to be full-rank, the matrix inversion lemma can be applied to

m,1( f ) ˜d^s( f , p^s) ˜d^s,H( f , p^s)−1

:

m,1( f ) ˜d^s( f , p^s) ˜d^s,H( f , p^s)−1

= Rⁿ_m⁻¹( f ) −Rⁿ_m⁻¹( f )µ2P_X^s

m,1( f ) ˜d^s( f , p^s) ˜d^s,H( f , p^s)Rⁿ_m⁻¹( f ) 1+ µ2P_X^s

m,1( f ) ˜d^s,H( f , p^s)Rⁿ_m⁻¹( f ) ˜d^s( f , p^s) (34) such that

W( f ) = Rⁿ_m⁻¹( f ) 1+ µ2P_X^s

m,1( f ) ˜d^s,H( f , p^s)Rⁿ_m⁻¹( f ) ˜d^s( f , p^s)µ₂P_X^s_m,1( f ) ˜d^s( f , p^s). (35) Using µ₂=∞(i.e., hard constraint on the speech distortion term),

W( f ) = Rⁿ_m⁻¹( f ) ˜d^s( f , p^s)

˜d^s,H( f , p^s)Rⁿ_m⁻¹( f ) ˜d^s( f , p^s)= Γⁿ_m⁻¹( f ) ˜d^s( f , p^s)

˜d^s,H( f , p^s)Γⁿ_m⁻¹( f ) ˜d^s( f , p^s). (36)

2.2 Soft constraint on speech distortion (µ₂6=∞)

Cost function

J(W) = W^H( f )Rⁿ_m( f )W( f ) + µ2ε{(D^s_m( f ) − W^H( f )X^s_m( f ))(D^s_m( f ) − W^H( f )X^s_m( f ))^H} (37) with µ₂6=∞.

(6)

Example The weighted Least-Squares (wLS) criterion [?, ?, ?, ?, ?, ?]

J_{W LS}(W( f )) = Z

p∈P

L²( f , p)

W^H( f ) ˜d( f , p) − D( f , p)2

dp, (38)

with D( f , p) a desired directivity pattern, can be transformed to the cost function (37), namely when the desired directivity pattern D( f , p) is defined as:

D( f , p) 6= 0 for p ∈ Ppass

= 0 for p ∈ Pstop. (39)

J(W( f )) = Z

p∈Pstop

W^H( f ) ˜d( f , p) ˜d^H( f , p)W( f )dp + µ2 Z

p∈Ppass

(1 − W^H( f ) ˜d( f , p))(1 − W^H( f ) ˜d( f , p))^Hdp. (40) This corresponds to the following speech and noise model:

• Speech model: The speech source is modeled as an infinite number of (uncorrelated) point sources with PSD L²( f , p) in the angular region P_pass:

X^s_m( f ) = Z

p∈Ppass

˜d^s( f , p)X₁^s( f , p)dp, (41)

with

ε{X₁^s( f , pk)X₁^s,∗( f , pl)} = L²( f , pk)δkl for p_k, pl∈ Ppass. (42) The reference signal D^s_m( f )equals:

D^s_m( f ) = Z

p∈Ppass

D( f , p)X₁^s( f , p)dp. (43)

• Noise model: The noise source is modeled as an infinite number of (uncorrelated) point sources with PSD L²( f , p) in the angular region Pstop

Xⁿ_m( f ) = Z

p∈Pstop

˜dⁿ( f , p)X₁ⁿ( f , p)dp, (44)

with

ε{X₁ⁿ( f , pk)X₁^n,∗( f , pl)} = L²( f , pk)δkl for p_k, pl∈ Pstop. (45) From (44)-(45), it follows that:

Rⁿ_m= Z

p∈Pstop

L²( f , p) ˜dⁿ( f , p) ˜d^n,H( f , p)dp. (46)

2.3 No constraint on speech distortion: differential microphone arrays [?, ?]

Cost function

J(W( f )) = W^H( f )Rⁿ_mW( f ) (47)

with W( f ) =

1 α T

to avoid the trivial solution W( f ) = 0. The noise is modelled as M − 1 uncorrelated noise sources, i.e,

Xⁿ_m( f ) =

M−1

∑

i=1

X₁ⁿ( f , pi) ˜d( f , pi) (48)

with

ε{X₁ⁿ( f , pi)X₁ⁿ( f , pj)} = P_Xⁿ₁( f , pi)δi j (49) Using (48)-(49), the noise correlation matrix Rⁿ_mequals

Rⁿ_m( f ) =

M−1

∑

i=1

P_Xⁿ

1( f , pi) ˜d( f , pi) ˜d^H( f , pi) (50) where p_i are the coordinates of the noise sources. Typically a linear array and far-field propagation is assumed such that p_i is characterized by the azimuthθⁿ_i of the noise source (cf. Section 1.2.1). Depending onθⁿ_i, different directivity patterns are obtained (e.g., cardioid (θⁿ_i = 180^◦), hypercardioid (θⁿ_i = 90^◦, ...).

(7)

3 A-priori speech model

3.1 Online estimated noise model

3.1.1 Hard constraint (µ₂=∞): GSC [?, ?]

Cost function

J(W) = W^H( f )Rⁿ( f )W( f ) + µ2ε{(D^s_m( f ) − W^H( f )X^s_m( f ))(D^s_m( f ) − W^H( f )X^s_m( f ))^H} (51) with µ₂=∞.

Assumed speech model The free-field propagation model in (13)-(14) is assumed:

X^s_m( f ) = ˜d^s( f , p^s)X_m,1^s ( f ), (52)

where p^srefers to the position of the speech source. The reference signal D^s_m( f ) equals X_m,1^s ( f ).

Noise model The noise model is estimated online.

Solution The filter W( f ) equals

W( f ) = Rⁿ( f ) + µ2P_X^s

1( f ) ˜d^s( f , p^s) ˜d^s,H( f , p^s)−1

µ₂P_X^s

1( f ) ˜d^s( f , p^s). (53) Since ˜d^s( f , p^s) ˜d^s,H( f , p^s) equals a rank-one matrix and

Rⁿ( f ) + µ2P_X^s

1( f ) ˜d^s( f , p^s) ˜d^s,H( f , p^s)

is assumed to be full-rank, the matrix inversion lemma can be applied, resulting in (cf. Section 2.1):

W( f ) = Rⁿ⁻¹( f ) 1+ µ2P_X^s

1( f ) ˜d^s,H( f , p^s)Rⁿ⁻¹( f ) ˜d^s( f , p^s)µ₂P_X^s

1( f ) ˜d^s( f , p^s). (54) Using µ₂=∞(i.e., hard constraint on the speech distortion term),

W( f ) = Rⁿ⁻¹( f ) ˜d^s( f , p^s)

˜d^s,H( f , p^s)Rⁿ⁻¹( f ) ˜d^s( f , p^s). (55) In a GSC-scheme, the hard constraint W^H( f ) ˜d^s( f , p^s) = 1 is imposed through the fixed beamformer and blocking matrix.

The filter W( f ) is then decomposed into a fixed filter Wq( f ) (i.e., the so-called quiescent vector) and an adaptive filter Wa( f ):

W( f ) = Wq+ B( f )Wa( f ) (56)

with Wq( f ) =_M¹˜d^s( f , p^s) and B^H( f ) ˜d^s( f , p^s) = 0.

3.1.2 Soft constraint (µ₂6=∞): soft-constrained MWF techniques by Nordholm et al.

Cost function

J(W) = W^H( f )Rⁿ( f )W( f ) + µ2ε{(D^s_m( f ) − W^H( f )X^s_m( f ))(D^s_m( f ) − W^H( f )X^s_m( f ))^H} (57) with µ₂6=∞.

Assumed speech model In [?] a fixed model is used for the spatial characteristics ˜H^s( f ) of the speech while the speech PSD P_X^s

1( f ) is estimated online. The speech source is modeled as an infinite number of (uncorrelated) point sources with true PSD P_X^s

1( f ) clustered closely in space within a pre-defined area P:

X^s_m( f ) = Z

p∈PX_m,1^s ( f , p) ˜d^s( f , p)dp (58)

D^s_m( f ) = Z

p∈PX_m,1^s ( f , p)dp (59)

with

ε{X_m,1^s ( f , pk)X_m,1^s,∗( f , pl)} = P_X^s₁( f )δkl ∀pk, pl∈ Ppass. (60) The speech PSD P_X^s

1( f ) is estimated online. To separate the estimation of the spectral and spatial characteristics, the technique is implemented in the frequency-domain.

(8)

Noise model The noise model is estimated online.

Solution The filter W( f ) equals

W( f ) = (µ2R^s_m( f ) + Rⁿ( f ))⁻¹µ2ε{X^s_m( f )D^s,∗_m ( f )}. (61) In the soft-constrained MWF techniques by Nordholm et al., µ₂is set to 1[?].

Assuming uncorrelated point sources, R^s_mandε{X^s_m( f )D^s_m( f )} in (61) can be computed as:

R^s_m = ε{^Z

p∈P

X_m,1^s ( f , p) ˜d^s( f , p)dp Z

p∈P

X_m,1^s,∗( f , p) ˜d^s,H( f , p)dp},

= Z

p∈P˜d^s( f , p) ˜d^s,H( f , p)ε{X_m,1^s ( f , p)X_m,1^s,∗( f , p)}dp,

= P_X^s₁( f ) Z

p∈P

˜d^s( f , p) ˜d^s,H( f , p)dp. (62)

ε{X^s_m( f )D^s_m( f )} = P_X^s₁( f ) Z

p∈P˜d^s( f , p)dp, (63)

where P_X^s

1( f ) is estimated online.

Instead of using a mathematical speech model, the speech correlation matrix R^s_m( f ) and the cross-correlationε{X^s_m( f )D^s,∗m ( f )}

can also be computed based on calibration data [?]. This can for example be useful in hearing aid applications where the head shadow effect should be taken into account.

3.2 Combination of online and fixed noise model

Instead of using only an online estimate of the noise model, the online estimated noise model can be combined with a pre- defined fixed noise model. This may be useful to increase robustness of the noise reduction algorithm against model errors (e.g., sensitivity-constrained GSC [?]) and/or VAD failures or when the location of some interfering sources is known a-prior (e.g., echo or feedback in case of a set-up with fixed loudspeaker and microphone positions [?]).

Cost function

J(W) = (1 −λ)W^H( f )Rⁿ( f )W( f ) +λW^H( f )Rⁿ_m( f )W( f )

+µ2{(D^s_m( f ) − W^H( f )X^s_m( f ))(D^s_m( f ) − W^H( f )X^s_m( f ))^H}, (64) withλ> 0.

3.2.1 Hard constraint (µ2=∞) Examples

• Sensitivity-constrained GSC [?, ?]

In [?, ?] the robustness of the GSC against model errors is increased by inserting spatially incorrelated noise. This corresponds to a fixed noise model for spatially uncorrelated noise

Rⁿ_m( f ) = PN( f )Im, (65)

with P_Xⁿ_m,i( f ) = PN( f ), i = 1, . . . , M.

Alternatives

• Alternatively, the online noise estimate of the GSC can be combined with a diffuse noise model or a model of noise in the back hemisphere to prevent amplification of sounds coming from the back. In addition, noise sources with a fixed, known location can be included (e.g., echo or feedback in the case of a fixed loudspeaker-microphone position).

3.2.2 Soft constraint (µ₂6=∞) Examples

• In [?], a model based on calibration signals is used for the speech signal and the echo signal, while the noise statistics are estimated online.