AT

(1)

A

KasteelparkArenberg10,3001Leuven(Heverlee)

SUBBAND AND FREQUENCY{DOMAIN

ADAPTIVE FILTERING TECHNIQUES

FOR SPEECH ENHANCEMENT IN

HANDS{FREE COMMUNICATION

Promotor:

Prof.dr.ir.M.Moonen

Proefschriftvoorgedragentot

hetbehalenvanhetdoctoraat

indetoegepastewetenschappen

door

(2)

Allerechtenvoorbehouden. Nietsuitdezeuitgavemagvermenigvuldigden/of

open-baar gemaakt wordendoor middel van druk, fotocopie, microlm, elektronisch of

op welke andere wijze ook zonder voorafgaande schriftelijke toestemming van de

uitgever.

All rightsreserved. Nopartofthepublication may bereproducedin anyformby

print, photoprint, microlmor any other meanswithout written permission from

thepublisher.

(3)

Thetelecommunicationssectorischaracterizedbyanincreasingdemand foruser{

friendlinessandinteractivity. Thisexplainsthegrowinginterestinhands{free

com-municationsystems. Signalqualityincurrenthands{freesystemsisunsatisfactory.

Toovercomethis, advanced signalprocessingtechniques such asthe subbandand

frequency{domainadaptivelterareemployedto enhance thesignal. These

tech-niquesareknownto havecomputationallyeÆcientsolutions. Furthermore,thanks

to the frequency{dependent processing and adaptivity, highly time{varying

sys-temsandsignalswithacontinuouslychangingspectralcontentsuchasspeechcan

behandled.

Thisthesisdealswithsubbandandfrequency{domainadaptivelteringtechniques

for speech enhancement in hands{free communication. The text consists of four

parts. Inthe rst partdesign methods for perfect and nearly perfect

reconstruc-tion DFT modulated lter banks are discussed. Part II dealswith subband and

frequency{domainadaptiveltering. ThesubbandadaptivelterandthePBFDAF{

algorithmarediscussed. Next,theinterrelationbetweenbothapproachesisstudied

and anovel subband adaptationscheme is proposed. In partIII of thethesis an

extension tothe PBFDAF algorithmis presented, calledthe PBFDRAPadaptive

lter. Thealgorithm isanalyzedand fastimplementation schemesarederived. In

thenal partwedescribeapplications ofouralgorithms to theacousticecho

can-cellation problem. It is seen that the algorithms discussed in parts I{III can be

(4)

(5)

Mathematical Notation

v vectorv

v (z) vectorv ,function ofthez{transformvariable

M matrixM

M(z) matrixM,functionofthez{transformvariable

v,M frequency{domainequivalentsofvandM

M T

transposeofmatrixM

M

complexconjugateofmatrixM

M H =(M ) T

HermitiantransposeofmatrixM

M 1 inverseofmatrixM M y pseudo{inverseofmatrixM

detM determinantofmatrixM

adjM=M 1

: detM adjugateofmatrixM

diagfv g squarediagonalmatrixwithvectorvasdiagonal

M

(z) complexconjugationofthecoeÆcientsofM(z)

withoutchangingz ~ M(z)=M T (z 1 ) paraconjugateofM(z) v(m) m{thelementofvectorv [v(z)] m

m{thelementofvectorfunction v(z)

M(m;n) elementonthem{throwandn{thcolumn of

matrixM

[M(z)]

m;n

elementonthem{throwandn{thcolumn of

matrixfunction M(z)

AB KroneckerproductofmatrixAand B

h[k] discrete{timelterortimesequenceh

H(z) z{transformofh[k]

H(f) DiscreteFourierTransformofh[k]

x?y convolutionofx[k]andy[k]

xy? circularconvolutionofx[k]andy[k]

xy circularcorrelationofx[k]andy[k]

H

l:L

(z) thel{thoutofL polyphasecomponentsofFIR

(6)

h[k] N# h[k]N{folddownsampled h[k] N" h[k]N{foldupsampled

IN setofnaturalnumbers

IN

0

=INnf0g setofnaturalnumberslargerthan0

ZZ setofintegernumbers

ZZ

0

=ZZnf0g setofintegernumbersexcept0

Q setofrationalnumbers

IR setofrealnumbers

IR

0

=IRnf0g setofrealnumbersexcept0

IR +

setofpositiverealnumbers

C setofcomplexnumbers

IR M

setofrealM{dimensionalvectors

C M

setofcomplexM{dimensionalvectors

C M

0 =C

M

nf0g setofcomplexM{dimensionalvectorsexcept0

<fxg realpartofx2C

=fxg imaginarypartofx2C

x

complexconjugateofx

conj() complexconjugation

^

x estimateofx

bxc largestintegersmallerorequaltox2IR

dxe smallestintegerlargerorequaltox2IR

rnd(x) roundx2IRtothenearestinteger

jj absolutevalue

jjjj

2

2{norm

Efg expectationoperator

2

x

varianceofx

gcd(M;N) greatestcommondivisorofM andN

lcm(M;N) leastcommonmultipleofM andN

xmody remainderafter divisionofx2INbyy2IN

p=a:b pisanintegerbetweena2ZZandb2ZZ,

i.e. a6p6b; p2ZZ

ab aismuch smallerthanb

ab aismuch largerthanb

ab aisapproximatelyequalto b

Fixed Symbols

M numberofsubbands,DFTsize

N subsamplingfactor

L blocksize

P lterpartitioning length

K least commonmultiple

(7)

f frequency{domainvariable

!=2f pulsation

z z{domainvariable

n blocktimeindex

f

s

sampling frequency

w[k] unknown FIRsystem,acousticpath

^ w[k],w^

(n)

[k] (equivalent)fullband adaptivelter,estimateof

w[k]

x far{end(loudspeaker) signal

s localsignalsource{of{interest

d=s+w?x near{end (microphone)signal

e errorsignal,outputoftheadaptivelter

i

i{thsubbanderrorsignal

n

rb

numberofrealsubbandstobeprocessed

n

cb

numberofcomplexsubbandstobeprocessed

adaptivelterstepsize

R xx =Efx x T

g autocorrelationmatrixofvectorx

L

FB

lengthofthe(equivalent)fullbandadaptivelter

L

SB

lengthofthesubbandadaptivelters

L

f

lengthofthelterbankprototype

L a

f

lengthoftheanalysislters

L s

f

lengthofthesynthesislters

L

p

lengthofthesynthesispolyphaselters

L

ef

eectivelengthoftheanalysisprototypelter

L

ac

numberofanti{causallteringtaps

L

c

numberofextracausallteringtaps

0 zerovectororzeromatrix

0 N N N zeromatrix 0 MN MN zeromatrix I N N N identitymatrix

J exchange matrixwithonesalongthemainanti{

diagonalandzeroselsewhere

F DFT matrix,F(m;n)=e

j 2 mn

M

; 06m;n<M

H(z) analysis polyphase matrix

G(z) synthesispolyphase matrix

B(z) prototypepolyphasematrixof aDFTmodulated

analysis lterbank

C(z) prototypepolyphasematrixof aDFTmodulated

synthesislterbank

h

0

[k] !H

0

(z) analysis prototypelter

g

0

[k] !G

0

(z) synthesisprototypelter

f

m

[k] !F

m

(z) m{thsubbandadaptivelter

j

p

1

(8)

Acronyms and Abbreviations

A/D Analog{to{Digitalconverter

AEC AcousticEchoCancellation

ALU ArithmeticLogicUnit

ANC AdaptiveNoiseCancellation

APA AÆneProjectionAlgorithm

ASIC Application{SpecicIntegratedCircuit

BLMS Block{LMSadaptivelter

CD CompactDisk

cf. confer: comparewith

CPU CentralProcessingUnit

D/A Digital{to{Analogconverter

DCT DiscreteCosineTransform

DFT DiscreteFourierTransform

DRAM DynamicRandomAccessMemory

DSP DigitalSignalProcessor

e.g. exempli gratia: forexample

Eq. equation

ERLE EchoReturnLossEnhancement

FDAF Frequency{DomainAdaptiveFilter

FFT FastFourierTransform

FIR FiniteImpulseResponselter

HiFi HighFidelity

IDFT InverseDiscreteFourierTransform

i.e. id est: that is

i ifandonlyif

IFFT InverseFastFourierTransform

IIR InniteImpulse Responselter

LMS LeastMeanSquareadaptivelter

MAC Multiply{Accumulate operation

MFlops Millionsof FloatingpointOperationsPerSecond

MIMO Multi{InputMulti{Outputsystem

MIPS Millionsof InstructionsPerSecond

NLMS NormalizedLeastMeanSquareadaptivelter

op. numberofequivalentrealOperations

ops. numberofequivalentrealOperationsperSecond

P/S Parallel{to{Serialconverter

PBFDAF PartitionedBlockFrequency{DomainAdaptive

Filter

PBFDRAP PartitionedBlockFrequency{DomainRAP

adaptivelter

(9)

QMF QuadratureMirrorFilters

RAP RowActionProjection

RLS RecursiveLeastSquaresadaptivelter

S/P Serial{to{Parallelconverter

SNR Signal{to{NoiseRatio

SPL SoundPressureLevel

SRAM StaticRandomAccessMemory

SVD SingularValueDecomposition

VME VERSAModuleEurocard(IEEE1014)computer

architecture

vs. versus

w.r.t. withrespectto

@ at

(10)

(11)

Voorwoord i Abstract iii Korte Inhoud v Glossary vii Contents xiii Samenvatting xxi 1 Introduction 1 1.1 Problemstatement . . . 1 1.2 Hands{freecommunication . . . 3 1.2.1 Denition . . . 3

1.2.2 Examplesofhands{freecommunicationsystems . . . 4

1.2.3 Signaldeterioration . . . 6

1.3 Characteristicsofspeech andtheacousticenvironment. . . 7

1.3.1 Speechsignals . . . 7

(12)

1.4 Enhancementtechniques . . . 10

1.4.1 Acousticechocancellation . . . 10

1.4.2 Noisesuppressionandinterferencecancellation . . . 13

1.4.3 Dereverberation . . . 14

1.5 Outlineofthethesisandcontributions . . . 15

1.5.1 Motivation . . . 15

1.5.2 Chapterbychapteroverviewandcontributions . . . 15

1.6 Conclusions . . . 20

2 BasicConcepts 21 2.1 Signalprocessingbasics . . . 22

2.1.1 Representationofvariables . . . 22

2.1.2 Multiratesignalprocessing . . . 22

2.1.3 Somedenitionsrelatedtomatrixalgebra. . . 23

2.2 Filterbankbasics. . . 24

2.2.1 Generalsubbandscheme. . . 24

2.2.2 Modulatedlterbanks. . . 25

2.2.3 Polyphase implementation . . . 28

2.2.4 Perfect reconstruction . . . 30

2.2.5 Overviewoflterbankdesigntechniques . . . 30

2.3 Adaptivelteringtechniquesforspeechenhancement. . . 33

2.3.1 Standardadaptivelteringtechniques . . . 35

2.3.2 Block{basedtechniques . . . 39

2.4 Computationalcost. . . 44

(13)

I DFT Modulated Filter Bank Design for Oversampled

Subband Systems

3 PerfectReconstructionOversampledDFTModulated FilterBank

Design 47

3.1 OversampledDFTmodulatedsubbandsystems . . . 48

3.1.1 DFTmodulatedanalysislterbank . . . 48

3.1.2 DFTmodulatedsynthesislterbank. . . 51

3.1.3 Implementation issues . . . 55

3.2 Perfectreconstruction . . . 55

3.2.1 Smith{McMillandecompositionbasedperfect reconstruction lterbankdesign . . . 57

3.2.2 Para{unitarylterbanks . . . 60

3.3 Para{unitarylterbankdesign . . . 61

3.3.1 Imposingpara{unitarity . . . 61

3.3.2 Para{unitarylattices . . . 63

3.3.3 Optimizationofthepara{unitarylattices . . . 64

3.3.4 Adjustingtheprototypelterlength . . . 65

3.3.5 Designexamples . . . 68

4 Nearly Perfect Reconstruction DFT Modulated Filter Bank De-sign 73 4.1 NearlyperfectreconstructionDFTmodulatedlterbanks . . . 74

4.2 Frequency{domainoptimization. . . 75

4.3 Mixedtime/frequency{domainoptimization . . . 77

(14)

II Subband and Frequency{Domain Adaptive Filtering

5 SubbandAdaptive Filtering 89

5.1 Subbandadaptivesystems. . . 90

5.1.1 Generalsubbandadaptivelteringsetup. . . 90

5.1.2 Subbandversusfullbandadaptiveltering. . . 91

5.1.3 Filterbankselection . . . 92

5.1.4 Polyphase implementation . . . 93

5.1.5 DFTmodulatedsubbandadaptivelters . . . 93

5.2 Designcriteriaforsubbandadaptivesystems . . . 94

5.2.1 Frequencyselectivity . . . 95

5.2.2 Perfect reconstruction . . . 95

5.2.3 Perfect pathmodelling. . . 97

5.3 Downsampling andaliasing: twoextremecases . . . 98

5.3.1 Criticallydownsampledsubbandschemes . . . 98

5.3.2 Two{foldoversampledsubbandsystems . . . 98

5.4 Subbandadaptivelterlength . . . 99

5.4.1 Innite{lengthsubbandlters. . . 99

5.4.2 Introducinganti{causalltertaps . . . 104

5.5 Implementationcostand complexitygainwithrespecttoLMS . . . 110

5.5.1 Roughcostestimate . . . 110

5.5.2 Detailedcostanalysis . . . 111

5.5.3 Costevaluation . . . 112

6 AnalysisofthePartitionedBlockFrequency{DomainAdaptive

(15)

6.1.1 DerivationofthePBFDAFalgorithm . . . 118

6.1.2 PBFDAFalgorithm: equationsandproperties . . . 122

6.1.3 Normalization. . . 123

6.1.4 Constrainedversusunconstrainedupdating . . . 124

6.1.5 AmbiguitycompensationforM>P+L 1 . . . 125

6.2 ThePBFDAFasaspecialcaseofsubbandadaptiveltering . . . . 127

6.3 PBFDAF:designcriteria . . . 133

6.4 Implementationcost . . . 135

6.4.1 Costcomputation . . . 135

6.4.2 Costevaluationandoptimalparametersetting . . . 136

7 Fullband ErrorAdaptation Scheme 145 7.1 Fullbanderroradaptation . . . 146

7.2 Computationalcomplexity. . . 150

7.3 PBFDAFweightupdatingrevisited. . . 153

III Iterated Partitioned Block Frequency{Domain Adap-tive Filtering 8 PartitionedBlockFrequency{Domain RAP 157 8.1 Partitionedblockfrequency{domainRAP . . . 158

8.1.1 Denition . . . 158

8.1.2 Mechanism . . . 159

8.2 OniteratingthePBFDRAP . . . 160

8.2.1 Computationof lim R!1 w (n;R) p . . . 161

8.2.2 UnconstrainedPBFDRAP: lim w (n;R) p . . . 165

(16)

8.2.3 ConstrainedPBFDRAP: lim R!1 w (n;R) p . . . 168 8.2.4 Summary . . . 173 8.3 Simulationexamples . . . 175 8.4 Conclusions . . . 176

9 FastPartitionedBlockFrequency{DomainRAP 179 9.1 FastPBFDRAP . . . 180

9.1.1 FastPBFDRAP,version1. . . 180

9.1.4 FastconstrainedPBFDRAP . . . 182

9.1.5 Summary . . . 183

9.2 Computationalcost. . . 188

9.2.1 UnconstrainedPBFDRAP. . . 188

9.2.2 ConstrainedPBFDRAP . . . 188

9.2.3 UnnormalizedconstrainedPBFDRAP versusPRA . . . 191

IV Acoustic Echo Cancellation, Implementation and Ex-periments 10Acoustic Echo Cancellation, Implementation& Experiments 195 10.1 Robustoperationandcontrol . . . 196

10.1.1 Short{timeenergy . . . 197

10.1.2 Far{endactivitydetection . . . 198

10.1.3 Double{talkdetection . . . 199

(17)

10.3 Areal{timeimplementationofanacousticechocancelleronDSP . . 204

10.3.1 DSPequipment. . . 205

10.3.2 Software . . . 206

10.3.3 Experiments . . . 207

11Conclusions and FurtherResearch 217 11.1 Conclusions . . . 217

11.2 Suggestionsforfurther research . . . 220

Bibliography 223 Appendices 241 A Somedenitionsrelatedtomatrixalgebra . . . 241

B AppendixtopartI . . . 245 B.1 Proofoftheorem3.1 . . . 245 B.2 PropertiesofB(z) . . . 246 B.3 Proofoftheorem3.2 . . . 248 B.4 Proofoftheorem3.3 . . . 250 B.5 Proofoftheorem3.4 . . . 251

B.6 Inversedecompositionofpara{unitarylattices. . . 253

B.7 Para{unitaryparameterizationforM=2N . . . 255

B.8 Para{unitaryDFTmodulatedlterbanksrevisited. . . 259

C AppendixtopartII . . . 261

C.1 Proofoftheorem5.2 . . . 261

(18)

C.6 \Time{reversed"PBFDAF . . . 273

C.9 ComplexityanalysisforthePBFDAF . . . 280

D AppendixtopartIII . . . 287

D.1 Proofoftheorem8.1 . . . 287 D.2 Proofoftheorem8.2 . . . 288 D.3 Proofoftheorem8.3 . . . 289 D.4 Proofoftheorem8.4 . . . 290 D.5 Proofoftheorem8.5 . . . 290 D.6 ConstrainedPBFDRAP:L FB <L. . . 291 D.7 Proofoftheorem8.7 . . . 293

(19)

Introduction

Intherst sectionofthis introductorychapter amotivation isgiven forthe

tech-niques that will be developed in the forthcoming chapters of the thesis and we

will presentsome future perspectives on hands{freecommunication, which is the

applicationwehaveinmind.

Insection1.2 afewexamplesofhands{freecommunicationsystemsaregivenand

thedierenttypesofsignaldegradationthatoccurareidentied.

Itappearsthat thecharacteristicsofspeechand thepropertiesof theacoustic

en-vironmentimposespecicconstraintsonthetypeofsignalenhancementalgorithm

that canbeused andonthewaythealgorithmsareapplied. Hence,in section1.3

somebasicsofspeechandacousticsarediscussed.

Foreachtypeofsignaldegradationthatcanbeidentiedinthehands{free

commu-nicationsetup,anumberofenhancementtechniquesareknownfromtheliterature.

Insection1.4severalsignalenhancementalgorithmsarebrie yaddressed.

An outline and an overview of the dierent chapters and parts of the thesis will

bepresentedin section1.5. Themaincontributionsaresummarizedandreferences

willbegiventothepublicationsthatwerebroughtaboutintheframeofthiswork.

Someconclusionstothischapterareformulatedinsection1.6.

1.1 Problem statement

The telecommunications market has rapidly expanded in recent years. This has

(20)

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

0

500 1000

1500

2000

Worldwide cellular subscribers

year

millions of worldwide cellular subscribers

Figure 1.1: Numberofworldwidecellularsubscribers[39][179]

annualrevenueoftheglobal telecommunicationsmarketin 1996wasestimated at

US$645billionandisexpectedtosurpassUS$1trillionin2002[85]. Thisgrowth

ispartlydueto theexpansionofthemobilephoneindustry. Asindicatedingure

1.1theestimatednumberofworldwidecellularsubscribersnowexceedsonebillion

and it is expected that this number will continue to increasesubstantially in the

nearfuture.

Thetelecommunicationsindustryischaracterizedbyanongoingtendencytowards

innovationandoptimization. Thisimplies,amongotherthings,afocusingtowards

user{friendliness and interactivity and hence explains the increasing demand for

hands{free communication systems today. As it is believed that more and more

telecom applications will become hands{freein the near future, a large potential

isexpectedforinnovativeandproduct{orientedresearchin theeldof hands{free

communicationin thecomingyears. Thisisconrmedbytheobservationthatthe

globalhands{freemarketcangrowfrom US$3billiontodayto overUS$ 9billion

inthenextveyears[151].

In present{day hands{freecommunication systemsthe signal quality is often

un-satisfactory. Several types of signaldeterioration canbedistinguished, as will be

(21)

.

Figure1.2: Hands{freecommunicationsetup

Inthisthesissubbandandfrequency{domainadaptivelteringtechniquesare

stud-ied. Thesesignalprocessingalgorithmscanbeusedinawidevarietyofapplications

wheresignalenhancementisrequired. InpartI,IIandIIIofthethesisseveralsignal

processingalgorithmswillbeconsidered. InpartIVitwillbeshownthatthese

sig-nalprocessingtechniquescanbeappliedtoenhancethesignalqualityinhands{free

communicationsystems. Wewillconcentrateononeformofdegradationin

partic-ular,whichiscausedbyso{calledacousticechoes,andillustratehowthealgorithms

discussedinpartI,II andIII ofthethesiscanbeemployed.

1.2 Hands{free communication

1.2.1 Denition

Considergure 1.2, which showsa typicalhands{free communicationsetup. The

conference room accommodates one or morecorrespondents, which interact with

otherpeople ataremotesiteviaawirelessorwired communicationchannel. The

roomshowningure1.2iscalledthenear{endconferenceroomasitaccommodates

thelocalornear{endspeaker(s). Attheremotesitethereisasimilarroom,called

far{endconferenceroom,withthefar{endspeaker(s).

(22)

systems they are granted the freedom to walkaround and to interact with each

otherin anaturalway.

Toestablishhands{freecommunication, in each conferenceroomanumberof

mi-crophones areinstalled to recordthelocal conversation. The recordedsignalsare

thensentto theremotesitewheretheyarefed intoasetofloudspeakers.

1.2.2 Examples of hands{free communication systems

Hands{free telephony

Dierentsortsofapplicationstinthehands{freecommunicationframework.Most

importantfrom aneconomic point of viewis certainly hands{freetelephony.

Re-cently in many countries all over the world mobile telephony has been forbidden

whiledriving. Mobilephonecallsincarsareallowedonlyifhands{freekitsareused.

Thisismotivatedbytheobservationthathand{heldmobilephonecallsdistractthe

driverandincreasethenumberofaccidents. Duringamobilephonecallthedriver

misses4outof 10road signsandfails to giveway toother vehicles in 25%ofthe

cases. It appearsthat theaccidentriskincreaseswith75%,whichreduces to 24%

ifahands{freekitisused [171].

ItwasfoundthatpeopleinNorthAmericaspendacombined500millionpassenger{

hoursin theirvehicleseachweek. Although65percentofallcell{phone

conversa-tions take place in a car or other form of transport, less than 15 percent of the

cell{phoneusersin theUShavehands{freeaccessories[25]. So,ahugemarketfor

hands{freekitsisexpectedin thenearfuture.

A little side{remark is however that cell{phone usage is responsible for only 1.5

percent of all accidentsin the US. On theother hand outside distraction was

re-sponsible foralmost 30percent of allcrashes. Adjusting theradio or changinga

tape orCD was thesecond{biggest cause of accidents, amounting to 11 percent.

Furthermore,itappearsthattheconversationsthemselvesleadtoadangerous

driv-ingbehavior,notthetypeof phonethat isused [25]. It shouldbeadded however

thatincontrasttotheUSmanualgearchangesarestillverypopularinEurope. It

isclearthat itisalmost impossibleto changethegear,to useamobile phoneand

tosteeranddrivesafelyatthesametime.

The mostcommon low{cost hands{freekits for mobile telephony in cars, such as

theKX{TCA87ofPanasonic(US$25),areheadsetswitha(directional)

micro-phoneand headphone. Thequalityis satisfactory,but accordingtoour denition

of hands{freesystemsin section 1.2.1these systemsare nottruehands{free

solu-tions. Asecond classofproducts, suchas thehands{freecaradapterNTN1583of

Motorola(US$100),useahands{freemicrophoneandabuilt{inspeaker,which

are connectedtothe dashboard. These are hands{freesystems,but thequalityis

(23)

and guaranteeabettersound quality. These systemshoweverneedto bebuilt in

and are integrated in the dashboard. The most advanced products rely on echo

cancellation and noise suppression techniques. The Sonata III echo cancellation

andvoiceenhancementsystemofNMSCommunications 1

wasdevelopedforservice

providersofE1longdistanceanddigitalwirelesstechnology. Itisexpectedthat in

thenearfuture smallerandmoreadvancedsolutionsforhands{freetelephonywill

bedeveloped,whichcan beintegratedin thehand{heldmobilephonesthemselves

andprovidehighqualitywidebandspeechenhancement.

Teleconferencing

Apartfrom hands{freetelephony alsoteleconferencing tsin thehands{free

com-munication framework presented in section 1.2.1. Teleconferencing systems are

commonlyusedinbusinessmeetingstoday. Teleclassing,whichenablesstudentsto

attend classes and lecturesfrom aremoteclassroom,is a special caseof this. As

theparticipantsin ateleconferencing meeting canstay in theirlocal oÆce

unnec-essarytravelingisavoided. Hence, alarge costreductionis obtainedandthe loss

ofprecious timeiskepttoaminimum. Amarketresearchreport fromWainhouse

Researchstatesthatthemarketforaudio,videoandwebconferencingserviceswill

reachUS$9.8 billionby2006,upfrom US$2.8billionin2000[135].

Powerful teleconferencing systems are already commercially available. Polycom,

Inc.,whichacquired PictureTelCorporationin 2001,brings arange offullduplex

audioconferencingequipmenttothemarket. Thesesolutionshavealimited

band-widthandaresuitedforsmallbusinessmeetings. Largersystemsarealsoavailable,

suchastheiPower TM

900seriesofPolycom,Inc. Theyprovideintegratedaudioand

videoconferencingandoerbetteraudioquality. Futuresystemswillhavetocope

withhigherbandwidthsandmulti{channel signalenhancement,forwhich eÆcient

signalprocessingalgorithmsareneeded.

Domoticand voice{controlledsystems

Nowadaysthere is anincreasing interest in so{calleddomotic systems. More and

morevoice controlled systemsare encountered in daily life at home and at work.

These hands{free systems canbe used for the automatic conditioning of a living

room orthe oÆce at work (switching the light or the central heating on and o,

openingthecurtains,...). Otherexamplesarevoicecontrolledelectronicdevicesor

HiFisystems,theon{boardcomputerinyourcar,voicecontrolledPCsoftware,... .

Telematicsseemstobethenextbigchallengeintheautomotiveindustry,providing

cellularvoiceandinternetservicesin vehicles. InNorthAmericaalone themarket

fortelematicsequipmentis expectedtogrowto US$7billionin2007[180].

1

(24)

near−end

speaker

far−end speaker

acoustic far−end echo

Figure1.3: Full{duplexhands{freecommunicationsetup

In 2001 Ford and Vodafone announced a strategic partnership to provide in{car

telematicservices. WithinveyearsnearlyallnewFordvehicleswillbettedwith

some telematics system. These systems will include voice recognition and text{

to{speech technology to recognizespoken phonenumbersas well asthenames of

previouslyentered contacts. Advancedsignalprocessingtechniqueswill beneeded

foradequatesignalconditioningandpreprocessing.

1.2.3 Signal deterioration

Consideragaingure1.2. Ideally,thedesirednear{endspeechsignal,whichstems

fromalocalcorrespondent,issenttotheremotesitewithoutanyqualitylosses. It

is clear that in ahands{free systemthe signalquality is degraded in many ways.

Duetothelargespeaker{to{microphonedistanceundesiredbackgroundsignalsare

recordedandaretransmittedtothecorrespondentaswell.

A rst type of disturbance are so{called acoustic echoes, which arise whenevera

far{endloudspeakersignalispickedupbythenear{end microphone(s)andissent

totheremotesite. Atthefar{endsitethesamecouplingmightexistbetween

loud-speakerandmicrophoneandhencethesignalcancirclearoundinthesystem. The

localspeakerhearsanechooradelayedversionofhis/herownspeech(gure1.3).

Such delayedsignalshinder smoothconversationandlowerthespeech

intelligibil-ity. Delayscouldbequitelong(several hundredsofmilliseconds),especiallywhen

satellitelinksareinvolved. Intheworstcasetheclosed{loopgainmightbecometoo

largeandtheechogetsunstable,resultinginaharmfulsinusoidaltone. Anumber

(25)

A second source of signal deterioration is \background noise". This type of

dis-turbance can e.g. be generated by a ventilator or a computer fan. It can also

come from people in theconference room not participating in thediscussion but

having adiscussion among themselvesin thebackground(cf. cocktailparty). In

carapplicationsnoiseisbeinggeneratedbytheengineorbythecarradio. It may

alsocomefrom thewindpassingaroundthecarcabinorfrom thecontactbetween

road and tires [94] [160]. Signal processing techniques that are applied to reduce

thebackgroundnoiselevelarereferredtoasnoisesuppressionorsourceseparation

algorithms. Ifareferenceofthedisturbingsignalcanbeobtained,e.g. in thecase

of radio orengine noise,morespecic enhancementtechniques canbeused. This

iscalledinterferencecancellationandisverysimilartoacousticechocancellation.

Finally,remarkthat allsignalspropagatethroughtherecordingroom. Asa

conse-quence reverberationis addedto thesignals,which leadsto anothertypeof signal

distortion. Althoughsignals(especially music)maysound morepleasantwhen

re-verberationis added,ingeneraltheintelligibilityislowered. Inordertocopewith

thiskindofdeformationdereverberationordeconvolutiontechniquesarecalledfor.

1.3 Characteristics of speech and the acoustic

en-vironment

Thecharacteristicsofspeech and thepropertiesof theacousticenvironmenthave

anin uence on thetypeof algorithm that isused and onthe way thealgorithms

are applied. In this section some characteristics and peculiarities of speech and

acousticsarediscussed. Onlythosepropertiesarementionedthatareimportantfor

thealgorithmsandtechniquesconsideredinthisthesis. Moredetailedinformation

on speech and signal processing for speech signals can befound in [29] [124]. A

good referenceonacousticsis[93].

1.3.1 Speech signals

Veryoftenin hands{freeapplicationsthe signalto beenhancedis speech. Speech

is a signal with highly time{varying characteristics. Sometimes speech is quasi{

periodic(e.g. vowels),atotherinstancesitactslikecolorednoise(fricatives)oritis

impulse{like(plosives).Forexample,intheword\peace"thereisacleardierence

betweentheplosive/p/,thevowel/i:/andthefricative/s/.

Speechisawideband signalwith frequencycomponentsbetween100and8000Hz,

hencecoveringmorethan6octaves. Forspeechunderstandingfrequenciesbetween

300and3400 Hz, i.e. 3.5octaves,areof mostinterest. Hence,asampling rateof

(26)

so{calledwideband speech systemsfor which highersampling rates, e.g. 16 kHz,

areused.

It is observedthat boththe time envelopeand thespectralcontentof speech are

continuouslychanging: theenergyofthespeechsignalisbothtime{andfrequency{

dependent. The meanfrequencyenvelopeof voiced speech isabout-6dB/octave.

Signal enhancement algorithms have to copewith the changing frequency

depen-denceandhenceoftenrelyonfrequency{domainandsubbandtechniques.

Thetime{domainevolutionofthespeechsignalischaracterizedbyitshighdynamic

range: speech pausesalternate with high energetic vowels orplosives, which

sig-nicantlyincreasethe short{timeenergy. This cane.g. beveried in gure10.12

(chapter 10)were aspeech signalisshownat thetop. It isfoundthat the

ampli-tudeofspeechvariesbetween30and90dBSPL[124]. Inorderto copewiththese

amplitudevariations12to16bitslinearquantizationiscommonlyusedforspeech.

Furthermore,due to thehigh dynamic rangeofthe speech signal,signal

enhance-mentalgorithmshavetobenormalizedbytheactualsignalenergy. Inthiswaythe

algorithmcan bepreventedfrom divergingandatthesametimeslowconvergence

canbeavoided.

1.3.2 The acoustic environment

It is observed from gure 1.2 that acoustic waves travel from source to listener

and thereby propagate through the recording room. This propagation results in

asignal attenuation and spectral distortion. It appears that the attenuationand

the distortion can be modelled quite well by alinear lter. Nonlinear eects are

typically of second order and mainly stem from the nonlinear characteristics of

theloudspeakers. Thelinearlterthat characterizestheacousticsandrelatesthe

emitted signal to the received signal, is called the acoustic impulse response and

playsanimportantroleinmanysignalenhancementtechniques.

Acoustic impulse responses can be measured quiteeasily, an exampleof which is

givenin gure1.4. Observethat theacousticimpulseresponse ischaracterizedby

adeadtime. Thedeadtimeisthetimeneededfortheacousticwavetopropagate

fromsourcetolistenerviatheshortest,directacousticpath. Afterthedirectpath

impulse a set of early re ections are encountered, whose amplitude and delay is

stronglydeterminedbytheshapeoftherecordingroomandthepositionofsource

and listener. Next come a set of late re ections, also called reverberation, which

decayexponentiallyintime. These impulsesstemfrom multi{pathpropagationas

acousticwavesre ectonwallsandobjectsintherecordingroom. Acousticimpulse

responsesaretypicallyhighly time{varying, asshownbythefollowingexperiment.

Experiment1.1 Considertheacousticimpulseresponse w

1

shown in gure1.4.

(27)

loud-0

0.05

0.1

0.15

0.2

0.25 −0.3

−0.2

−0.1

0

0.1

0.2

0.3 time (s)

amplitude

Acoustic impulse response of the ESAT speech laboratory

Figure1.4: AcousticimpulseresponseoftheESATspeechlaboratory

speaker. Theresponsey =w

1

?x wasrecordedwithamicrophone. Thedistance

between loudspeaker and microphone was approximately 180 cm. Based on the

loudspeakerandmicrophonesignal,w

1

couldbedetermined. Thentheexperiment

wasrepeated. Thecongurationwasslightlychanged,movingthemicrophone1cm

to theleft andleavingtheposition oftheloudspeakerandtherest ofthe

environ-mentunchanged. Againtheacousticimpulseresponse wascomputed, resultingin

w

2

. Despitethesmallchangeinmicrophonepositiontheimpulseresponsechanged

substantially: itwas foundthat

jjw 1 w 2 jj 2 jjw 1 jj 2 =72%:

Tosimulate theeect of movingcorrespondents in the recording room adummy

was placed between loudspeaker and microphone and the impulse response (w

3 )

wascomputed. Thenthedummywasmovedapproximately1cm. Allotherobjects

were left unchanged. Againtheacousticimpulseresponsew

4 wasdetermined. In thiscase jjw 3 w 4 jj 2 jjw 3 jj 2 =34%: 5

(28)

arecalledfor. Thankstothecontinuousupdatingthesealgorithmsaremoreorless

robustagainstpossiblesystemvariations.

Tocharacterizetheamountofreverberationinarecordingroomthereverberation

time (RT

60

) is dened as the time that the sound pressure level or the intensity

needs to decay to e.g. -60 dB of its original value. It is therefore a measure of

the decay and of theduration of theacoustic impulseresponse. It appearsto be

independentof the actualposition ofsource and listener. Thereverberationtime

was computed forthe impulse response shown in gure 1.4 following the method

describedin [60]. Itappearedthat RT

60

240ms.

Typicalreverberationtimesareintheorderofhundredsoreventhousandsof

mil-liseconds. ForatypicaloÆceroomRT

60

isbetween100and400ms,forachurch

RT

60

can be several seconds long. If therefore in a digital signal enhancement

application the acoustic impulse responses are characterized by FIR ltersmany

hundredsorseveralthousandsofltertapsareneeded,dependingonthesampling

rate. Hence,computationallyeÆcientalgorithmsarerequired.

Inorder to reduce thelter order,i.e. the numberof delay elements, IIRmodels

couldbecalled for. Itappearsthatalthough theordercanbereducedinthis way

it still remains quite large, i.e. in the order of several hundreds[75] [108]. IIR{

basedenhancement techniqueshaveto be reliedonin that case, typically leading

toeitheranincreasedcomputationalload,orstabilityproblemsandconvergenceto

localminima[108][141].

Inorderto optimallycontrol theexperimentscarriedoutintheframeof this

the-sis simulated room impulse responses were often used. These simulated acoustic

impulseresponsesweredesignedfollowingthemethod describedin[4][129][154].

1.4 Enhancement techniques

Eachofthethreeformsofsignaldegradationthatariseinhands{freecommunication

are now discussed in more detail, emphasizingexisting algorithmic solutions that

areknownfrom theliterature.

1.4.1 Acoustic echo cancellation

Experiments have shown that suppressing the acoustic echoes with 45 dB leads

to satisfactory perceptual results, as long asthe overall delay introduced by the

echo canceller doesn't exceed a certain upper bound. The input{output delay is

(29)

far−end echo

+

−

far−end signal

output

near−end

speaker

local near−end

acoustic path

signal

adaptive filter

+

x d e y s ^ w w

Figure1.5: Adaptiveacousticechocancellation

with respect to echo cancellation are containedin the ITU{T recommendations 2

(G.167) on acoustic echo controllers [86]. For instance, the end{to{end delay is

recommended not to exceed 16 ms for wideband teleconferencing. The far{end

signal suppression (when no near{end signal is present) should reach 40 dB for

teleconferencing systemsand45 dBin hands{freetelephony. Inpresence ofnear{

endsignals(doubletalk)thesuppressionshouldbeatleast25dB.Convergenceto

a3dBattenuationlevelshouldlast lessthan20msin thecaseofsingletalk.

Tosuppresstheechoseveralconventionalacoustic echocancellation techniquescan

be applied [77]. For instance, highly directional loudspeakers and microphones

and sound absorbingmaterialscan be usedto avoid re ections. Another popular

technique is voice controlled switching or loss control, which mutes channels in

whichnoorverylow{energeticactivityismeasured. Itisclearthatthesetechniques

relyon accurate voice activity detectionand hence quickly degrade. Further, the

stabilitymarginoftheclosed{loopsystemcanbeimprovedusingso{calledhowling

control. Theretoalmost inaudible nonlinearoperationsare inserted in the signal

pathtoavoidinstabilityoftheclosed{loopsystem,asthiswouldresultinaharmful

sinusoidaltonecirclingaroundinthenetwork.Frequency{shifting,combltersand

resonantpeak removalare often used. Finally, nonlinearpost{processing devices

canbeaddedtoremoveresidualerrorsignalsandtomakethesignalmorepleasant

tolistento.

Inpracticenowadaysacousticechocancellersare basedonadaptive ltering

tech-niques[76][77][106][176]. Adaptivelterswillbediscussedinsection2.3. Ageneral

adaptiveacousticechocancellationsetupisshowningure1.5. Iftheadaptivelter

^

wisagoodestimateoftheacousticimpulseresponsew itisobservedthat

e[k] = d[k] y[k] (1.1)

= (s[k]+w?x) w^?x (1.2)

s[k]; (1.3)

(30)

+

near−end

speaker

far−end

speaker

Figure1.6: Stereoacousticechocancellation setup

hence theecho can be removed. The adaptivelter w^ is a self{designingsystem

that usesagradientalgorithm thatminimizestheerrorsignalenergy. Inthis way

agood replica ofthe unknownsystem wcanbeobtained. Apartfrom theability

toobtainagoodechopathreplica,timevariationsoftheacousticimpulseresponse

canbetrackedaswell,thankstotheadaptivity. However,accuratetrackingofthe

acoustic impulse response w is still a challenge even if fast and hence expensive

adaptivelteringstructuresareapplied[62][162]asacousticimpulseresponsesare

knownto behighlytime{varying(cf. experiment1.1).

Inmoreadvanced systemstwoormoreloudspeakerchannels havetobecancelled

asshowningure1.6. Itcanbeproventhatstereoor|ingeneralmulti{channel|

acoustic echocancellation inherentlysuers from anon{uniqueness problem[113].

Inpracticehowever,auniquesolutiontothestereoechocancellationproblemdoes

exist, but theunderlying optimization that drives the adaptivelters appears to

beseverelyill{conditioned. Severaltechniques weredevelopedthat copewith this

issue. Theytryto decorrelatethestereochannels byinsertionof nonlinearities in

thesignalpathsorbyapplyingpsycho{acousticnoisemaskingtechniques[58] [68]

[87][121].

Although commercialadaptiveecho controllersare available onthe market

nowa-days, providing amerely satisfactorysolution to the single{channel acousticecho

cancellation problem, further improvement and research will be necessaryin the

comingyears. It isfor instance clear that in thenear future there will be aneed

for N{channel acousticecho controllers (e.g. for stereo,surround systems, Dolby

Digital 5.1). Remark that the number of adaptive lters in an N{channel echo

cancellationsystemequalsN 2

(31)

mostlyoperateatratherlowsamplingrates(8kHz)higherqualitywillberequired

inthenearfuture(16kHz,orevenhigher). Asthecomplexityofanecho

cancella-tionsystemusingalinearadaptivelteringalgorithm,changesquadraticallywith

thesamplingrate,againeÆcientadaptivestructureswillbeneeded. Finally,there

will be a request for a better overall performance and morerobustness in highly

non{stationaryand complexacousticenvironments. This requires reliablecontrol

software,whichis addedtotheadaptivelteringscheme.

1.4.2 Noise suppression and interference cancellation

Single{channelnoisereductionmethodshavebeenknownforalongtimenow. They

exploitthecharacteristicsofspeechandthenoiseandenhance theSNR by

appro-priate (matched orWiener)ltering operations[149]. More advanced techniques,

commonlyusedtoday,relyonspectralsubtraction[11][182].

NoisesuppressionisadiÆcultproblem. Itisobservedthatthesignalofinterestand

thebackgroundnoisetypicallyoverlapbothinthetimeandinthefrequencydomain.

Thisiscertainly truewhen bothsignalsarespeech. Thesignalofinterestandthe

\noise"arethereforediÆculttoseparateifclassicalspectro{temporalenhancement

techniquesareemployed.

Itisobservedhoweverthatthecorrespondentandthebackgroundnoisesourceare

typically atadierentposition in theconferenceroom. Hence, multi{microphone

techniques can be called for, which exploit the spatial information present in the

dierent microphone signals. This in general leads to spatio{temporal ltering

operationsandincreasestheperformance.

A rstclassofenhancementtechniquesthat relyonthisspatial diversityis

beam-forming. Thebeamformingidea comesfrom telecommunicationswhere it was

in-troducedtodesignantennaarrays. Lateritwassuccessfullyappliedtoacoustic

ap-plicationsaswell. Astheacousticenvironmentisinherentlytime{varyingadaptive

beamformingtechniquesare often called for. Broadbandbeamforming for speech

enhancementis stillatopic ofongoingresearch[17] [18][34] [66][74] [91][92] [97]

[122][123][125][136][150] [158][161][164] [165][172][174][175].

Morerecentlyoptimal ltering techniqueshave been proposed forthe suppression

of additivebroadbandnoise. These techniques relyonpowerfulmatrix

decompo-sitions such as theSVD and the Quotient SVD [33] [148]. They show asuperior

performancecompared toclassicalbeamformingapproachesbutare

computation-allymoredemanding.

Ifareferenceofthenoisesignalcanbeobtainedmorespecicsignalenhancement

techniquescanbeapplied. Forinstance,inthecaseofenginenoiseinacarthespark

signalcanbemeasuredandused tosuppress thenoiseinthe carcabin. Adaptive

(32)

Echo cancellation and noise suppression have been addressed independently for

many years now. Recently, it has beenrecognized that both problems are better

tackledin acombinedapproach, especially ifmulti{microphonesettingsare being

used. Initial results indicate that the combined approach yields a better

perfor-manceat alowercomputationalcost[1][31][63][102][103] [104][105].

Multi{microphone noise reduction schemes are being commercialized nowadays.

The systems that are available on the market howeverare typically rather basic

solutions with alimited numberof microphones and often relying onsimple, not

fullyadaptivesignalprocessing tools. There iscertainlyaneedfor morepowerful

and robustsystemswith ahigherperformanceat an acceptablecost in the

forth-comingyears.

1.4.3 Dereverberation

Ofthe threetypesofsignaldeteriorationthat occurin hands{freecommunication

dereverberationisleastprominent. However,inroomswithahighre ectivity

rever-berationeectshaveaclearlynegativeimpactontheintelligibility. Dereverberation

techniqueshavebeendevelopedoverthelastyearsbutthesolutionsavailabletoday

arenotyetsatisfactory.

Single{channel dereverberation techniques werereportedrst. Inverselteringcan

becalled for, by tryingto invert theacoustic impulse response. However, asthe

impulseresponsesare knownto benon{minimumphasesystemstheyhavean

un-stableinverse[112][120]. Cepstrum{based techniques aremorepromising [6][126]

[131]andrelyontheseparabilityofspeechandtheacousticsinthecepstraldomain.

Throughmulti{channelprocessingthespatialdiversityofthehands{freesetupcan

be exploited, in general leading to abetter performance. Acoustic beamforming

techniquesarebeingused,asapartfromnoisesuppressiontheyareknownto

par-tiallydereverberatethesignalsaswell. Asecondclassofmulti{channel

dereverbera-tiontechniquesisbasedoncepstralprocessing. Itwasshownthatthesingle{channel

cepstralbaseddereverberationalgorithmscanbeextendedtothemulti{channelcase

[96].

Matched ltering algorithms were reported in [2] [167]. They rely on subspace

trackingtechniques. Thesealgorithmsshowanimproveddereverberationcapability

with respect to classical approaches but as some environmental parameters are

assumedtobeknowninadvancetheseapproachesmaybelesssuitableinpractical

applications.

Duringthe lastyearsMIMO blind system identicationtechniqueshavebeen

de-veloped for equalization in digital communications [80] [118] [163] [166]. These

(33)

1.5 Outline of the thesis and contributions

In this section an outlineand anoverview of the thesiscan be found. The main

contributionsaresummarizedandreferenceswillbegiventothepublicationsthat

were broughtaboutin theframeofthis work.

1.5.1 Motivation

Inthisthesissubbandandfrequency{domainadaptivelteringtechniquesare

stud-ied, putting forward acousticecho cancellation as a possible and straightforward

application.

Acousticechocancellation,aswellasothersignalenhancementproblemsinhands{

freecommunication,dealswiththeretrievalofdegradedspeechembeddedin\noise".

Toenhance the speech signaltheacousticsofthe recordingroomneed to be

esti-mated. In section 1.3 we discussed some properties of speech and the acoustic

environmentthat imposespecicconstraintsonthesignalenhancementalgorithm

thatisused. Itwasforinstanceobservedthatacousticimpulseresponsesaretime{

varying high{order systems. It was further indicated in section 1.4.1 that there

willbeaneedfor(multi{channel)acousticechocontrollersin thenearfuture that

oerahighperformanceat increasingsamplingrates. Hence,computationally

eÆ-cientand adaptive algorithmic solutionsshould be called for. Finally, as thetime

envelope and thespectral content of speech are continuously changing time{ and

frequency{dependent signalprocessingisrequired.

Itwillappearintheforthcomingchaptersofthethesisthatsubbandandfrequency{

domainadaptivelteringtechniquesmeetalltherequirementsspeciedabove,

com-biningadaptivityandfrequency{dependentprocessing,and oeringahigh

perfor-manceatalowcost. Hence,subbandandfrequency{domainadaptivelterswillbe

putforwardasbeingappropriateapproachestosolvetheacousticechocancellation

problem.

It isnotonly ourobjectiveto presentexisting and novelsubbandand frequency{

domainadaptivesolutionsforacousticechocancellation,wewillalsodwellonthe

structures and principles that lie behind these techniques, in an attempt to get

moreinsight in theunderlying fundamentals. Whereas acousticecho cancellation

was presentedas the startingpoint and amotive forthis research, themain part

of thetext dealswithsignalprocessing assuch. Thepresentedtechniques can be

employedin manyapplications,goingfarbeyondacousticechocancellationalone.

1.5.2 Chapter by chapter overview and contributions

(34)

Theintroductory andconcludingchapterareomittedhoweverinthegure.

Inchapter2somebasicconceptsarediscussedandthenecessarysignalprocessing

toolswillbepresentedtounderstandthemainpartofthetext.

Part I : DFT modulated lter bank design for oversampled subband

systems

It wasmotivated in this introductory chapter that frequency{dependentadaptive

signalprocessingisrequiredforadequateacousticechosuppression. Frequency

de-pendencycanbeachievedthroughtheuseofdigitallterbanksandtheintegration

ofthese structuresin existing adaptivelteringschemes, leadingto so{called

sub-bandadaptivelters. Ingeneralhowever,digitallterbanksintroduceconsiderable

signalandaliasingdistortion.InpartIofthethesisdesignmethodsforperfectand

nearlyperfectreconstructionDFTmodulatedlterbanksarediscussed. These

l-ter banks introduce noor almostno signaldistortion and are easily integrated in

subbandadaptivelteringstructures.

Inchapter 3designmethods forperfect reconstructionoversampledDFT

modulated lter banks are presented. A para{unitary lter bank design

method is discussed,which waspresentedin [22]. With this method

how-evertheorderofthelterbankscannotbeadjustedaccurately. Wepresent

an extensionto this method, which basically allowsto chooseanydesired

lter length. Further, weshow that based onthe inverse parametrization

of thelter bankparametersappropriate startingvaluescanbeobtained,

which reducestheoptimizationtime.

Thestopbandattenuationofperfectreconstructionlterbanksistypically

unsatisfactoryifintermediateoperations,suchasadaptiveltering,are

per-formed on thesubband signals. In chapter 4, theperfect reconstruction

condition is relaxed to nearly perfect reconstruction. Both a frequency{

domainandamixedtime/frequency{domainbaseddesignmethod are

pre-sentedfornearlyperfect reconstructionDFTmodulatedlterbanks.

Sub-band adaptivelteringis takenasanexampleto illustrate that thanksto

theirlowerstopbandlevelnearlyperfectreconstructionlterbanks

outper-formperfectreconstructionsystems.

Publicationsrelatedtotherstpartofthethesisare[43] [45][52].

Part II : Subbandand frequency{domain adaptive ltering

In section1.5.1 subband and frequency{domainadaptive lterswere put forward

(35)

Part II

Part I

Chapter 2 Basic Concepts

Chapter 3 Filter Bank Design

Perfect Reconstruction

Chapter 5 Subband Adaptive Filtering

Chapter 7 Fullband Error Adaptation

Partitioned Block

Frequency−Domain

Adaptive Filtering

Chapter 6

Chapter 4 Filter Bank Design

Nearly Perfect Reconstruction

Frequency−Domain RAP

Chapter 8 Partitioned Block

Frequency−Domain RAP

Chapter 9 Fast Partitioned Block

Chapter 10 Experiments

Acoustic Echo Cancellation

Part III

Part IV

(36)

in more detail and discuss some of their properties. Although both approaches

weredevelopedindependentlyintheliteraturetheyarestronglyconnectedtoeach

other. We will focus on the interrelation between both techniques and combine

theirmechanismsto obtainimprovedalgorithmicstructures.

The subband adaptivelter is discussed in chapter 5. A comparison is

made between the subband approach and standard fullband adaptive

l-ters in terms of complexity and performance. It will be shown that

sub-bandadaptivelteringstructuressuerfromaconsiderableresidual

under-modelling errorunlessextra(anti{)causalsubbandltertapsareinserted.

Although the complexity gain w.r.t. the fullband approach is less than

expected, still a considerable cost reduction can be obtained. Next, we

formulate three design criteria for subband adaptive systems, which deal

with frequency selectivity, perfect reconstruction and perfect path

mod-elling. These conditions are necessaryrequirements to ensuresatisfactory

performanceofthesubbandadaptivelter.

In chapter 6 the partitioned block frequency{domain adaptive lter

(PBFDAF)isstudied. Itappearsthatthisalgorithm,whichisknownfrom

theliteratureforsomeyearsnow,outperformsstandardsubbandsystemsin

termsofconvergencebehaviorandmodellingcapabilities. Itwillbeproven

that thePBFDAFcanbe consideredasaspecialsubband adaptive

lter-ingstructure,whichfullls twooutofthethree designcriteriaforsubband

adaptivesystemsthatarespeciedinchapter5. Itisfurthershownthatthe

frequency{domainadaptivelterreliesonaspecialerrorcorrection

mecha-nism. Thankstotheerror correctiontheltercoeÆcientscanbe updated

withaliasing{freeerrorsignals,whichleadstoimprovedperformance.

Inanattempttogeneralizetheerrorcorrectionmechanismofthefrequency{

domainapproachtosubbandadaptivesystemsweproposeanovelfullband

error adaptation scheme for subband adaptive lters in chapter 7. The

alternativeadaptationschemeadjuststhesubbandltersbasedonthe

full-band error instead of using the subband errors, as is done in a classical

subband adaptivesystem. Inthis wayimprovedperformanceis obtained.

It is shown that for some common parameter settings the weight update

mechanism of the so{called unconstrained PBFDAF corresponds to that

ofthe fullband erroradaptationalgorithmpresentedinthis chapter. This

provesthatthefullbanderroradaptationalgorithmcanbeconsideredasan

extension of the frequency{domain error correction mechanism to a more

generalclassofsubbandadaptivelters.

(37)

PartIII: Iteratedpartitionedblockfrequency{domainadaptiveltering

InpartIII anextensiontothePBFDAFisproposed,called thePBFDRAP,which

is an adaptive ltering algorithm combining partitioned block frequency{domain

adaptivelteringwithso{called rowactionprojection. Thealgorithmispresented

andanalyzedandfastimplementationschemesarederived.

Inchapter8thePBFDRAPisdened anditisexplainedhowextraerror

suppression canbeachievedw.r.t. thePBFDAF. Further, theasymptotic

propertiesof thealgorithmareanalyzed: forsomeparametersettingsthe

PBFDRAPalgorithmapproacheswell{knownadaptivelteringalgorithms.

Finally, it is shown that the PBFDRAP outperforms the PBFDAF in a

realisticechocancellation setup.

FastimplementationsarederivedforthePBFDRAPalgorithminchapter

9. Thedierentfastimplementationschemesarecomparedwith the

stan-dard implementation of the PBFDRAP for dierent parameter settings.

It appears that a signicant complexity reduction canbe obtained. The

PBFDRAPadaptivelteristhencomparedwiththePRA algorithmfrom

a computational complexity point of view. It is seenthat for large block

lengthsthePBFDRAP isacheaperalternativetothePRA.

Publicationsrelatedtothispartare[50][54][55].

Part IV : Acousticecho cancellation, implementationand experiments

Inthenalpartofthethesistheacousticechocancellationproblemisrevisited. It

was pointedoutin section1.4.1that in thenearfuture there willbearequestfor

morerobustacousticechocancellationschemesoeringabetteroverallperformance

inhighlynon{stationaryandcomplexacousticenvironments. Thisrequiresreliable

control software, which is added to the adaptive ltering scheme to monitor the

adaptationspeed.

Chapter 10illustrates how thedierentadaptiveltersdevelopedin the

preceding chapters canbe applied to an acousticecho cancellation setup,

providingthemwithcontrolandso{calleddouble{talkdetectiontechniques,

known from the literature. Several experiments are discussed, dierent

adaptive ltering solutions are compared and some observations

concern-ing a real{time implementation of an acoustic echo canceller on DSP are

presented.

Publicationsrelatedtothispartare[44][46].

(38)

1.6 Conclusions

Intherstsectionofthischaptertheeconomicimpactoftelecommunication

tech-nologyandhands{freecommunicationinparticularwashighlightedanda

motiva-tionwasgivenforthework thatwas performedin theframeofthisthesis.

Insection 1.2 hands{freecommunicationwasdened,examples were givenand it

was pointedoutthat dierentsortsofsignaldegradationdooccur.

Insection1.3somebasicsofspeechandacousticswerestudied.

It wasshown in section1.4 that a largevarietyof signalenhancement techniques

are known from theliterature. They can beemployedin present{day hands{free

communicationsystemstoobtainabettersignalquality.

In section1.5 an outline and anoverview was given of the dierent chapters and

(39)

Basic Concepts

Inthischaptersomebasicconceptsarediscussedandthenecessarysignalprocessing

toolsarepresentedtofullyunderstand theforthcomingchaptersofthethesis.

Manyofthealgorithmsdescribedinthis thesisare so{calledblockbasedadaptive

lters. Ofteninthiskindofalgorithmssignalswithdierentsamplingratescoexist,

hencethename multiratesystems. Insection2.1 somebasicsofsignalprocessing

andofmultiratesignalprocessinginparticulararethereforediscussed.

PartIofthethesisfocusesondigitallterbankdesign. Theselterbankscanthen

beintegratedinthesubbandadaptivelteringstructuresthatarediscussedinpart

II. Section 2.2 thereforediscussessomelterbank fundamentals and presentsthe

necessarybackgroundinformationthat isneededtofullyunderstandpartI andII

ofthiswork.

ThealgorithmspresentedinpartIIandIIIareadaptivelters. Abriefoverviewof

existingadaptivelteringtechniqueswillbegivenin section2.3.

For many of thealgorithms that are discussed further on, acost analysis will be

performed. Theassumptions wewillmakeforthese costanalysesaresummarized

insection2.4.

(40)

2.1 Signal processing basics

2.1.1 Representation of variables

Mostofthesignals,ltersandsystemsthatarereferredtointhisthesisarediscrete{

timevariables. Theyarerepresentedinthetime,thefrequencyorinthez{domain.

Thetime{domainrepresentationofavariableh

h[k]=f ::: h[ 1] h[0] h[1] h[2] ::: g (2.1)

dependson thediscrete timek, which relatesto theactual time t=k=f

s

viathe

samplingfrequencyf

s .

H(z)isthez{transformofh[k]and isdened as

H(z)= 1 X k = 1 h[k]z k : (2.2)

An in{depthdiscussionof theuseandvalidityofthez{transformcanbefound in

manybooks onsignalprocessing[126][134]orcontrol theory[65][110].

By evaluating H(z) on the unit circle, i.e. replacing z by e j2f

in Eq. 2.2, the

frequency{domainrepresentationofh[k]isobtained:

H(f)= 1 X k = 1 h[k]e j2k f : (2.3)

H(f) is periodic in the frequency f 2 IR . For the evaluation of the frequency{

domain characteristics the fundamental interval (1 period) is usually considered,

i.e. 1 2 < f 6 1 2

, in which f = 0:5 corresponds to the Nyquist frequency. The

inversefrequency{domaintransformation

h[k]= Z 1 2 1 2 H(f)e j2k f df (2.4) computesh[k] fromH(f).

2.1.2 Multirate signal processing

In many of the adaptive ltering algorithms discussed in this thesis signals with

dierentsamplingratesareencountered. Asdierentsamplingratescoexistwithin

thesamealgorithmtheseadaptivelteringstructuresarecalledmultiratesystems.

(41)

Tofullydescribeamultiratesystemin thetimedomainseveraldiscrete{time

vari-ables should be dened and used in parallel. It is however more convenient to

representthese systemsin thez{domain.

Todescribetheconversionfrom onesamplingrateto another,twooperationswill

bediscussedhere: thereductionofthesamplingratewithanintegerfactor,called

downsampling,andtheincreaseof thesamplingratewithanintegerfactor,which

isreferredto asupsampling.

Downsampling

f[m]isanN{folddownsampledversionofh[k]if

f[m]=h[k]

N#

=h[mN]; 8m2ZZ;N 2IN

0

: (2.5)

InthiswaythesamplingrateisreducedbyafactorN. Insignal owgraphsN{fold

decimatorsordownsamplersarerepresentedas N # . Itcanbeshown[156]that

F(z)= 1 N N 1 X n=0 H(z 1=N e j 2 n N ) (2.6)

holds,in whichf[k] !F(z)andh[k] !H(z)arez{transformpairs.

Upsampling

f[m]isanN{foldupsampledversionofh[k]if

f[m]=h[k] N" = h[m=N] ifm=pN; 8p2ZZ;N2IN 0 0 otherwise, (2.7)

which increases the sampling rate by a factor N. In signal ow graphs N{fold

expandersor upsamplersaremarkedas N " . It canbeshown[156] that

F(z)=H(z N

): (2.8)

Bothoperationsintroduceartifacts. Inthecaseofdownsamplingaliasingisadded

to thesignal. Upsamplinginvokesso{calledmirrorfrequencies. Moreinformation

about this and how to get rid of theartifacts can be foundin any good book on

(multirate)signalprocessing,e.g. [156].

2.1.3 Some denitions related to matrix algebra

Inappendix A afew matrixalgebradenitions and propertiesare combined that

will be used and referred to in the forthcoming chapters of the thesis. A good

(42)

+

...

x

intermediate

processing

analysislterbank synthesislterbank y H 0 H 1 H M 1 G 0 G 1 G M 1 N N N N N N

Figure2.1: Ageneralsubbandscheme: allintermediateoperationscanbedoneat

thedownsampledrate,whichtypicallyleadstoareducedimplementationcostand

improvedperformance.

2.2 Filter bank basics

Filterbanksarewidelyusedindigitalsignalprocessing[156]. Typicalapplications

are subband coding [87] [169] and subbandadaptiveltering [53] [142]. Subband

techniquescanimprovetheperformanceofstandardfullbandalgorithmsforspeech,

audioorimageprocessing,astheyallowanoptimaltuningofthealgorithmineach

subband. Inthiswaysubbandalgorithmsoftenoutperformtheirgloballytuned

full-bandcounterparts. Furthermore,byusingmultiratetechniquestheimplementation

costcantypicallybereduced.

2.2.1 General subband scheme

Ageneralsubbandschemeisshowningure2.1. Theso{calledanalysislterbank

splitstheinputxinasetofsmallbandsignals: theinputislteredwitheachofthe

M analysisltersH

0 ;:::;H

M 1

andeachsubbandsignalis N{folddownsampled.

Hence, intermediate operations can be performed on the subband signals at the

downsampled rate and this typically leads to acheaperimplementation. Finally,

arecombination operationtakes place in thesynthesis lter bank G

0 ;:::;G

M 1 ,

whichoperatesontheN{foldupsampledsubbandchannelsandresultsintheoutput

y.

A lter bank is a set of parallel lters, which each lter out a part of the

fre-quencyspectrum. Ifallltershavethesamebandwidththelterbankissaidtobe

(43)

perceptionandthereforethebandwidthof thedierentltersischangedin a

log-arithmicway. Non{uniformlyspaced lterbanks areoftentree{orwavelet{based

[156][169]and mightbemoresuitableforapplications such asaudioor video. In

thisthesisonlyuniformlyspacedlterbankswillbeconsidered.

Uniformlyspaced lterbanksaretypicallyobtainedbymodulating,i.e. frequency

shiftingawell{designedlowpassprototypelter. Hence,theyarecalledmodulated

lterbanks. EachoftheanalysisltersH

0 ;:::;H

M 1

thenltersoutapartofthe

frequencyspectrumandasthereareMsubbandsintotalthebandwidthofeachof

theanalysisltersisequalto(orlargerthan) f

s

M

,inwhichf

s

representsthesampling

frequency corresponding to theinput signalsignal x. Modulated lter bankscan

easilybeimplementedbydecomposingtheprototypelterinpolyphasecomponents

andapplyingaDFTorDCToperation(seesections2.2.3,3.1and[156]). Thelatter

operationscanbeimplementedeÆcientlyusingfastsignaltransforms.

Critically downsampledsubbandschemes

Forauniformlyspacedmodulatedlterbankthebandwidthofeachoftheltersis

largerorequalto fs

M

. Hence,ifthedownsamplingfactorN islargerthanM a

con-siderableamountofaliasingwillbeinsertedin thesubbandsbythedownsampling

operation. Aliasing is often detrimental for the performance of the intermediate

subband operations (e.g. subband adaptive ltering, see also gure5.10). Hence

in practice, N is restrictedto be smaller or equal to M, with M = N being an

upper bound for N. A subband system for which M = N is called a critically

downsampled subband scheme. In thiscase theimplementation costcanbe

opti-mallyreduced: theintermediateoperationsandin manycasesalsothelterbank

operations(see section 2.2.3)can be doneat the lowest sampling rate, asN is as

largeaspossible.

Oversampledsubband schemes

Inpracticehowever, niteorder lterbanks haveto be usedin order to limitthe

processing delay and the computational complexity. Finite order lters have a

non{negligible transition bandwidth and therefore aliasing will beinserted in the

subbandseven ifthedownsampling factorN is smallerthanM. Critically

down-sampledniteordersubbandschemesarestronglysensitivetoaliasingandinmany

cases theloss in performance due to the critical downsampling is not acceptable.

Fromthispointofviewoversampledsubbandschemes(M >N)aremoreattractive

astheytradeobetweencomplexityreductionandaliasingdistortion.

2.2.2 Modulated lter banks

(44)

−0.5

0 −0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.2

0.4

0.6

0.8

1 digital frequency

frequency amplitude response

DFT modulated filter bank

m=0

m=1 m=2

m=2 m=3

Figure2.2: 4{bandDFTmodulatedlterbank: frequencyamplituderesponse

The M subbandlters of a DFTmodulated lterbank are derived by frequency

shiftingawell{designedlowpassprototypelterh

0 [k]oflengthL f inthefollowing way: h m [k] = h 0 [k]e j 2 k m M ; m=0:M 1 k=0:L f 1 (2.9) () H m (z) = H 0 (e j 2 m M z) (2.10) () H m (f) = H 0 f + m M : (2.11)

Equation 2.10followsfrom Eqs. 2.2 and 2.9 and Eq. 2.11can beobtained from

Eq. 2.10byreplacingz bye j2f

. Theltersarefrequencyshiftedversionsofeach

other andthe completeset of M lterscoversthewhole frequency spectrum. An

exampleof aDFTmodulatedlterbank,withM=4isshowningure2.2.

AvariantofthisistheInverseDFT modulatedlterbank,whichisdened as

h m [k] = h 0 [k]e j 2 k m M ; m=0:M 1 k=0:L 1 (2.12)

(45)

−0.5

0 −0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

1

2

3

4

5

6 digital frequency

frequency amplitude response

DCT modulated filter bank

m=0 m=1 m=1 m=2 m=2 m=3 m=3

Figure2.3: 4{bandDCTmodulatedlterbank: frequencyamplituderesponse

Thez{transformandfrequency{domainrepresentationare givenby

H m (z) = H 0 (e j 2 m M z): (2.13) () H m (f) = H 0 f m M : (2.14)

IDFTmodulationdiersfromDFTmodulationonlyintheorderinwhichthelters

arefrequencyshifted.

Apartfrom DFTmodulatedlterbankscosineorDCTmodulatedlterbanksare

often used. Based on a well{designed FIR prototype lter p[k] of length L

f the

dierentDCTanalysisltersh

m

[k]canbederivedasfollows[156] :

h m [k]=2p[k]cos M m+ 1 2 k L f 1 2 +( 1) m 4 ; m=0:M 1: (2.15)

An example ofa DCTmodulatedlter bank, with M =4is shownin gure 2.3.

WhereasDCTmodulatedlterbanksaretypicallyusedincriticallydownsampled

(46)

modu-frequentsubbandstendtooverlapwhenthesubbandsignalsarenot{critically

down-sampled. In this way alarge amount of aliasing is inserted in the subbands. As

it is our goal to design oversampledlter banks (M > N)that introduce onlya

small quantityof subbandaliasing, standardDCTmodulatedlterbanks arenot

applicable. Someschemes havebeenproposed that combine reallter bankswith

unequalsubsamplingindierentbandsto overcomethisproblem[78].

In thecase ofDFT modulated lterbanks onthe other hand the subbandlters

lter out asingle contingent frequency region. If there are M subbands and the

lowpassprototypelterhasagoodstopbandrejection,thebandwidthofthe

band-passedsignalsthat arelteredoutby eachofthesubband ltersis approximately

f

s

M

. Hence,the subband signalsarecorrectly projectedinto thenew fundamental

interval[ f s 2N ; f s 2N

] bythedownsamplingoperationifM >N,avoidingsevere

alias-ing distortion. As a consequence, oversampledsubband schemes are often based

on DFT modulated lter banks because of their aliasing robustness and ease of

implementation.

2.2.3 Polyphase implementation

The analysis and the synthesis lter bank are immediately followed, respectively

precededbydownsamplingorupsamplingunits(seegure2.1). Hence,itischeaper

to do notonly theintermediate processing, but also the lterbank operations at

thedownsampledrate,whichcanbeachievedthroughpolyphasedecomposition.

Analysis bank

IfthesignalspassingthroughtheM{bandanalysisbankare subsequentlyN{fold

downsampled, each subband lter h

m

[k] can be decomposed in its N{th order

polyphasecomponents: H m (z)= N 1 X n=0 z n H m n:N (z N ); (2.16) inwhichH mn:N

(z)isthen{thoutofN polyphasecomponentsofthem{thsubband

lterh

m

[k],in otherwordsthez{transformofh

m

[n+Nk],k=0;1;::: :

SwappingthepolyphasecomponentsandthedownsamplersleadstoamoreeÆcient

implementation. The ltering operationscan now be done at the lower sampling

rate,asshowningure2.4 forthem{thsubband.

The analysis bank can now schematically be represented as shown in gure 2.5.

H(z)iscalledtheanalysispolyphasematrix[156]. Element(m;n)ofH(z)is

[H(z)] m;n =H mn:N (z) m=0:M 1 n=0:N 1 (2.17)

(47)

+

...

x x x x m x m x m z 1 z 1 z 1 z 1 N N N N N H m (z) H m 0:N (z N ) H m1:N (z N ) H m N 1:N (z N ) H m 0:N (z) H m 1:N (z) H m N 1:N (z)

Figure2.4: Analysislterpolyphase decomposition

Synthesispart

Forthesynthesispartasimilarderivationcanbemade. Bypolyphase

decomposi-tion G m (z)= N 1 X n=0 z n G mn:N (z N ); (2.18)

andswappingthepolyphasecomponentsandtheupsamplersgure2.6isobtained.

All N{th order polyphase components are contained in the synthesis polyphase

matrixG(z)[156]: [G(z)] m;n =G mn:N (z): m=0:M 1 n=0:N 1 (2.19)

Bycombiningtheanalysisandsynthesispartgure2.1canbere{arranged,resulting

ingure2.7(omittingtheintermediateprocessingforawhile). Itisobservedthat

for the analysis part the analysis polyphase matrix H(z) is used whereas at the

synthesissideJG T

(z)is found. Jistheexchangematrixwith onesalongitsmain