A
KasteelparkArenberg10,3001Leuven(Heverlee)
SUBBAND AND FREQUENCY{DOMAIN
ADAPTIVE FILTERING TECHNIQUES
FOR SPEECH ENHANCEMENT IN
HANDS{FREE COMMUNICATION
Promotor:
Prof.dr.ir.M.Moonen
Proefschriftvoorgedragentot
hetbehalenvanhetdoctoraat
indetoegepastewetenschappen
door
Allerechtenvoorbehouden. Nietsuitdezeuitgavemagvermenigvuldigden/of
open-baar gemaakt wordendoor middel van druk, fotocopie, microlm, elektronisch of
op welke andere wijze ook zonder voorafgaande schriftelijke toestemming van de
uitgever.
All rightsreserved. Nopartofthepublication may bereproducedin anyformby
print, photoprint, microlmor any other meanswithout written permission from
thepublisher.
Thetelecommunicationssectorischaracterizedbyanincreasingdemand foruser{
friendlinessandinteractivity. Thisexplainsthegrowinginterestinhands{free
com-municationsystems. Signalqualityincurrenthands{freesystemsisunsatisfactory.
Toovercomethis, advanced signalprocessingtechniques such asthe subbandand
frequency{domainadaptivelterareemployedto enhance thesignal. These
tech-niquesareknownto havecomputationallyeÆcientsolutions. Furthermore,thanks
to the frequency{dependent processing and adaptivity, highly time{varying
sys-temsandsignalswithacontinuouslychangingspectralcontentsuchasspeechcan
behandled.
Thisthesisdealswithsubbandandfrequency{domainadaptivelteringtechniques
for speech enhancement in hands{free communication. The text consists of four
parts. Inthe rst partdesign methods for perfect and nearly perfect
reconstruc-tion DFT modulated lter banks are discussed. Part II dealswith subband and
frequency{domainadaptiveltering. ThesubbandadaptivelterandthePBFDAF{
algorithmarediscussed. Next,theinterrelationbetweenbothapproachesisstudied
and anovel subband adaptationscheme is proposed. In partIII of thethesis an
extension tothe PBFDAF algorithmis presented, calledthe PBFDRAPadaptive
lter. Thealgorithm isanalyzedand fastimplementation schemesarederived. In
thenal partwedescribeapplications ofouralgorithms to theacousticecho
can-cellation problem. It is seen that the algorithms discussed in parts I{III can be
Mathematical Notation
v vectorv
v (z) vectorv ,function ofthez{transformvariable
M matrixM
M(z) matrixM,functionofthez{transformvariable
v,M frequency{domainequivalentsofvandM
M T
transposeofmatrixM
M
complexconjugateofmatrixM
M H =(M ) T
HermitiantransposeofmatrixM
M 1 inverseofmatrixM M y pseudo{inverseofmatrixM
detM determinantofmatrixM
adjM=M 1
: detM adjugateofmatrixM
diagfv g squarediagonalmatrixwithvectorvasdiagonal
M
(z) complexconjugationofthecoeÆcientsofM(z)
withoutchangingz ~ M(z)=M T (z 1 ) paraconjugateofM(z) v(m) m{thelementofvectorv [v(z)] m
m{thelementofvectorfunction v(z)
M(m;n) elementonthem{throwandn{thcolumn of
matrixM
[M(z)]
m;n
elementonthem{throwandn{thcolumn of
matrixfunction M(z)
AB KroneckerproductofmatrixAand B
h[k] discrete{timelterortimesequenceh
H(z) z{transformofh[k]
H(f) DiscreteFourierTransformofh[k]
x?y convolutionofx[k]andy[k]
xy? circularconvolutionofx[k]andy[k]
xy circularcorrelationofx[k]andy[k]
H
l:L
(z) thel{thoutofL polyphasecomponentsofFIR
h[k] N# h[k]N{folddownsampled h[k] N" h[k]N{foldupsampled
IN setofnaturalnumbers
IN
0
=INnf0g setofnaturalnumberslargerthan0
ZZ setofintegernumbers
ZZ
0
=ZZnf0g setofintegernumbersexcept0
Q setofrationalnumbers
IR setofrealnumbers
IR
0
=IRnf0g setofrealnumbersexcept0
IR +
setofpositiverealnumbers
C setofcomplexnumbers
IR M
setofrealM{dimensionalvectors
C M
setofcomplexM{dimensionalvectors
C M
0 =C
M
nf0g setofcomplexM{dimensionalvectorsexcept0
<fxg realpartofx2C
=fxg imaginarypartofx2C
x
complexconjugateofx
conj() complexconjugation
^
x estimateofx
bxc largestintegersmallerorequaltox2IR
dxe smallestintegerlargerorequaltox2IR
rnd(x) roundx2IRtothenearestinteger
jj absolutevalue
jjjj
2
2{norm
Efg expectationoperator
2
x
varianceofx
gcd(M;N) greatestcommondivisorofM andN
lcm(M;N) leastcommonmultipleofM andN
xmody remainderafter divisionofx2INbyy2IN
p=a:b pisanintegerbetweena2ZZandb2ZZ,
i.e. a6p6b; p2ZZ
ab aismuch smallerthanb
ab aismuch largerthanb
ab aisapproximatelyequalto b
Fixed Symbols
M numberofsubbands,DFTsize
N subsamplingfactor
L blocksize
P lterpartitioning length
K least commonmultiple
f frequency{domainvariable
!=2f pulsation
z z{domainvariable
n blocktimeindex
f
s
sampling frequency
w[k] unknown FIRsystem,acousticpath
^ w[k],w^
(n)
[k] (equivalent)fullband adaptivelter,estimateof
w[k]
x far{end(loudspeaker) signal
s localsignalsource{of{interest
d=s+w?x near{end (microphone)signal
e errorsignal,outputoftheadaptivelter
i
i{thsubbanderrorsignal
n
rb
numberofrealsubbandstobeprocessed
n
cb
numberofcomplexsubbandstobeprocessed
adaptivelterstepsize
R xx =Efx x T
g autocorrelationmatrixofvectorx
L
FB
lengthofthe(equivalent)fullbandadaptivelter
L
SB
lengthofthesubbandadaptivelters
L
f
lengthofthelterbankprototype
L a
f
lengthoftheanalysislters
L s
f
lengthofthesynthesislters
L
p
lengthofthesynthesispolyphaselters
L
ef
eectivelengthoftheanalysisprototypelter
L
ac
numberofanti{causallteringtaps
L
c
numberofextracausallteringtaps
0 zerovectororzeromatrix
0 N N N zeromatrix 0 MN MN zeromatrix I N N N identitymatrix
J exchange matrixwithonesalongthemainanti{
diagonalandzeroselsewhere
F DFT matrix,F(m;n)=e
j 2 mn
M
; 06m;n<M
H(z) analysis polyphase matrix
G(z) synthesispolyphase matrix
B(z) prototypepolyphasematrixof aDFTmodulated
analysis lterbank
C(z) prototypepolyphasematrixof aDFTmodulated
synthesislterbank
h
0
[k] !H
0
(z) analysis prototypelter
g
0
[k] !G
0
(z) synthesisprototypelter
f
m
[k] !F
m
(z) m{thsubbandadaptivelter
j
p
1
Acronyms and Abbreviations
A/D Analog{to{Digitalconverter
AEC AcousticEchoCancellation
ALU ArithmeticLogicUnit
ANC AdaptiveNoiseCancellation
APA AÆneProjectionAlgorithm
ASIC Application{SpecicIntegratedCircuit
BLMS Block{LMSadaptivelter
CD CompactDisk
cf. confer: comparewith
CPU CentralProcessingUnit
D/A Digital{to{Analogconverter
DCT DiscreteCosineTransform
DFT DiscreteFourierTransform
DRAM DynamicRandomAccessMemory
DSP DigitalSignalProcessor
e.g. exempli gratia: forexample
Eq. equation
ERLE EchoReturnLossEnhancement
FDAF Frequency{DomainAdaptiveFilter
FFT FastFourierTransform
FIR FiniteImpulseResponselter
HiFi HighFidelity
IDFT InverseDiscreteFourierTransform
i.e. id est: that is
i ifandonlyif
IFFT InverseFastFourierTransform
IIR InniteImpulse Responselter
LMS LeastMeanSquareadaptivelter
MAC Multiply{Accumulate operation
MFlops Millionsof FloatingpointOperationsPerSecond
MIMO Multi{InputMulti{Outputsystem
MIPS Millionsof InstructionsPerSecond
NLMS NormalizedLeastMeanSquareadaptivelter
op. numberofequivalentrealOperations
ops. numberofequivalentrealOperationsperSecond
P/S Parallel{to{Serialconverter
PBFDAF PartitionedBlockFrequency{DomainAdaptive
Filter
PBFDRAP PartitionedBlockFrequency{DomainRAP
adaptivelter
QMF QuadratureMirrorFilters
RAP RowActionProjection
RLS RecursiveLeastSquaresadaptivelter
S/P Serial{to{Parallelconverter
SNR Signal{to{NoiseRatio
SPL SoundPressureLevel
SRAM StaticRandomAccessMemory
SVD SingularValueDecomposition
VME VERSAModuleEurocard(IEEE1014)computer
architecture
vs. versus
w.r.t. withrespectto
@ at
Voorwoord i Abstract iii Korte Inhoud v Glossary vii Contents xiii Samenvatting xxi 1 Introduction 1 1.1 Problemstatement . . . 1 1.2 Hands{freecommunication . . . 3 1.2.1 Denition . . . 3
1.2.2 Examplesofhands{freecommunicationsystems . . . 4
1.2.3 Signaldeterioration . . . 6
1.3 Characteristicsofspeech andtheacousticenvironment. . . 7
1.3.1 Speechsignals . . . 7
1.4 Enhancementtechniques . . . 10
1.4.1 Acousticechocancellation . . . 10
1.4.2 Noisesuppressionandinterferencecancellation . . . 13
1.4.3 Dereverberation . . . 14
1.5 Outlineofthethesisandcontributions . . . 15
1.5.1 Motivation . . . 15
1.5.2 Chapterbychapteroverviewandcontributions . . . 15
1.6 Conclusions . . . 20
2 BasicConcepts 21 2.1 Signalprocessingbasics . . . 22
2.1.1 Representationofvariables . . . 22
2.1.2 Multiratesignalprocessing . . . 22
2.1.3 Somedenitionsrelatedtomatrixalgebra. . . 23
2.2 Filterbankbasics. . . 24
2.2.1 Generalsubbandscheme. . . 24
2.2.2 Modulatedlterbanks. . . 25
2.2.3 Polyphase implementation . . . 28
2.2.4 Perfect reconstruction . . . 30
2.2.5 Overviewoflterbankdesigntechniques . . . 30
2.3 Adaptivelteringtechniquesforspeechenhancement. . . 33
2.3.1 Standardadaptivelteringtechniques . . . 35
2.3.2 Block{basedtechniques . . . 39
2.4 Computationalcost. . . 44
I DFT Modulated Filter Bank Design for Oversampled
Subband Systems
3 PerfectReconstructionOversampledDFTModulated FilterBank
Design 47
3.1 OversampledDFTmodulatedsubbandsystems . . . 48
3.1.1 DFTmodulatedanalysislterbank . . . 48
3.1.2 DFTmodulatedsynthesislterbank. . . 51
3.1.3 Implementation issues . . . 55
3.2 Perfectreconstruction . . . 55
3.2.1 Smith{McMillandecompositionbasedperfect reconstruction lterbankdesign . . . 57
3.2.2 Para{unitarylterbanks . . . 60
3.3 Para{unitarylterbankdesign . . . 61
3.3.1 Imposingpara{unitarity . . . 61
3.3.2 Para{unitarylattices . . . 63
3.3.3 Optimizationofthepara{unitarylattices . . . 64
3.3.4 Adjustingtheprototypelterlength . . . 65
3.3.5 Designexamples . . . 68
3.4 Conclusions . . . 71
4 Nearly Perfect Reconstruction DFT Modulated Filter Bank De-sign 73 4.1 NearlyperfectreconstructionDFTmodulatedlterbanks . . . 74
4.2 Frequency{domainoptimization. . . 75
4.3 Mixedtime/frequency{domainoptimization . . . 77
II Subband and Frequency{Domain Adaptive Filtering
5 SubbandAdaptive Filtering 89
5.1 Subbandadaptivesystems. . . 90
5.1.1 Generalsubbandadaptivelteringsetup. . . 90
5.1.2 Subbandversusfullbandadaptiveltering. . . 91
5.1.3 Filterbankselection . . . 92
5.1.4 Polyphase implementation . . . 93
5.1.5 DFTmodulatedsubbandadaptivelters . . . 93
5.2 Designcriteriaforsubbandadaptivesystems . . . 94
5.2.1 Frequencyselectivity . . . 95
5.2.2 Perfect reconstruction . . . 95
5.2.3 Perfect pathmodelling. . . 97
5.3 Downsampling andaliasing: twoextremecases . . . 98
5.3.1 Criticallydownsampledsubbandschemes . . . 98
5.3.2 Two{foldoversampledsubbandsystems . . . 98
5.4 Subbandadaptivelterlength . . . 99
5.4.1 Innite{lengthsubbandlters. . . 99
5.4.2 Introducinganti{causalltertaps . . . 104
5.5 Implementationcostand complexitygainwithrespecttoLMS . . . 110
5.5.1 Roughcostestimate . . . 110
5.5.2 Detailedcostanalysis . . . 111
5.5.3 Costevaluation . . . 112
5.6 Conclusions . . . 115
6 AnalysisofthePartitionedBlockFrequency{DomainAdaptive
6.1.1 DerivationofthePBFDAFalgorithm . . . 118
6.1.2 PBFDAFalgorithm: equationsandproperties . . . 122
6.1.3 Normalization. . . 123
6.1.4 Constrainedversusunconstrainedupdating . . . 124
6.1.5 AmbiguitycompensationforM>P+L 1 . . . 125
6.2 ThePBFDAFasaspecialcaseofsubbandadaptiveltering . . . . 127
6.3 PBFDAF:designcriteria . . . 133
6.4 Implementationcost . . . 135
6.4.1 Costcomputation . . . 135
6.4.2 Costevaluationandoptimalparametersetting . . . 136
6.5 Conclusions . . . 140
7 Fullband ErrorAdaptation Scheme 145 7.1 Fullbanderroradaptation . . . 146
7.2 Computationalcomplexity. . . 150
7.3 PBFDAFweightupdatingrevisited. . . 153
7.4 Conclusions . . . 155
III Iterated Partitioned Block Frequency{Domain Adap-tive Filtering 8 PartitionedBlockFrequency{Domain RAP 157 8.1 Partitionedblockfrequency{domainRAP . . . 158
8.1.1 Denition . . . 158
8.1.2 Mechanism . . . 159
8.2 OniteratingthePBFDRAP . . . 160
8.2.1 Computationof lim R!1 w (n;R) p . . . 161
8.2.2 UnconstrainedPBFDRAP: lim w (n;R) p . . . 165
8.2.3 ConstrainedPBFDRAP: lim R!1 w (n;R) p . . . 168 8.2.4 Summary . . . 173 8.3 Simulationexamples . . . 175 8.4 Conclusions . . . 176
9 FastPartitionedBlockFrequency{DomainRAP 179 9.1 FastPBFDRAP . . . 180
9.1.1 FastPBFDRAP,version1. . . 180
9.1.2 FastPBFDRAP,version2. . . 181
9.1.3 FastPBFDRAP,version3. . . 181
9.1.4 FastconstrainedPBFDRAP . . . 182
9.1.5 Summary . . . 183
9.2 Computationalcost. . . 188
9.2.1 UnconstrainedPBFDRAP. . . 188
9.2.2 ConstrainedPBFDRAP . . . 188
9.2.3 UnnormalizedconstrainedPBFDRAP versusPRA . . . 191
9.3 Conclusions . . . 194
IV Acoustic Echo Cancellation, Implementation and Ex-periments 10Acoustic Echo Cancellation, Implementation& Experiments 195 10.1 Robustoperationandcontrol . . . 196
10.1.1 Short{timeenergy . . . 197
10.1.2 Far{endactivitydetection . . . 198
10.1.3 Double{talkdetection . . . 199
10.3 Areal{timeimplementationofanacousticechocancelleronDSP . . 204
10.3.1 DSPequipment. . . 205
10.3.2 Software . . . 206
10.3.3 Experiments . . . 207
10.4 Conclusions . . . 214
11Conclusions and FurtherResearch 217 11.1 Conclusions . . . 217
11.2 Suggestionsforfurther research . . . 220
Bibliography 223 Appendices 241 A Somedenitionsrelatedtomatrixalgebra . . . 241
B AppendixtopartI . . . 245 B.1 Proofoftheorem3.1 . . . 245 B.2 PropertiesofB(z) . . . 246 B.3 Proofoftheorem3.2 . . . 248 B.4 Proofoftheorem3.3 . . . 250 B.5 Proofoftheorem3.4 . . . 251
B.6 Inversedecompositionofpara{unitarylattices. . . 253
B.7 Para{unitaryparameterizationforM=2N . . . 255
B.8 Para{unitaryDFTmodulatedlterbanksrevisited. . . 259
C AppendixtopartII . . . 261
C.1 Proofoftheorem5.2 . . . 261
C.2 Proofoftheorem5.3 . . . 265
C.5 Proofoftheorem6.2 . . . 272
C.6 \Time{reversed"PBFDAF . . . 273
C.7 Proofoftheorem6.3 . . . 276
C.8 Proofoftheorem6.4 . . . 279
C.9 ComplexityanalysisforthePBFDAF . . . 280
C.10 Proofoftheorem7.1 . . . 283
D AppendixtopartIII . . . 287
D.1 Proofoftheorem8.1 . . . 287 D.2 Proofoftheorem8.2 . . . 288 D.3 Proofoftheorem8.3 . . . 289 D.4 Proofoftheorem8.4 . . . 290 D.5 Proofoftheorem8.5 . . . 290 D.6 ConstrainedPBFDRAP:L FB <L. . . 291 D.7 Proofoftheorem8.7 . . . 293
Introduction
Intherst sectionofthis introductorychapter amotivation isgiven forthe
tech-niques that will be developed in the forthcoming chapters of the thesis and we
will presentsome future perspectives on hands{freecommunication, which is the
applicationwehaveinmind.
Insection1.2 afewexamplesofhands{freecommunicationsystemsaregivenand
thedierenttypesofsignaldegradationthatoccurareidentied.
Itappearsthat thecharacteristicsofspeechand thepropertiesof theacoustic
en-vironmentimposespecicconstraintsonthetypeofsignalenhancementalgorithm
that canbeused andonthewaythealgorithmsareapplied. Hence,in section1.3
somebasicsofspeechandacousticsarediscussed.
Foreachtypeofsignaldegradationthatcanbeidentiedinthehands{free
commu-nicationsetup,anumberofenhancementtechniquesareknownfromtheliterature.
Insection1.4severalsignalenhancementalgorithmsarebrie yaddressed.
An outline and an overview of the dierent chapters and parts of the thesis will
bepresentedin section1.5. Themaincontributionsaresummarizedandreferences
willbegiventothepublicationsthatwerebroughtaboutintheframeofthiswork.
Someconclusionstothischapterareformulatedinsection1.6.
1.1 Problem statement
The telecommunications market has rapidly expanded in recent years. This has
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
0
500
1000
1500
2000
Worldwide cellular subscribers
year
millions of worldwide cellular subscribers
Figure 1.1: Numberofworldwidecellularsubscribers[39][179]
annualrevenueoftheglobal telecommunicationsmarketin 1996wasestimated at
US$645billionandisexpectedtosurpassUS$1trillionin2002[85]. Thisgrowth
ispartlydueto theexpansionofthemobilephoneindustry. Asindicatedingure
1.1theestimatednumberofworldwidecellularsubscribersnowexceedsonebillion
and it is expected that this number will continue to increasesubstantially in the
nearfuture.
Thetelecommunicationsindustryischaracterizedbyanongoingtendencytowards
innovationandoptimization. Thisimplies,amongotherthings,afocusingtowards
user{friendliness and interactivity and hence explains the increasing demand for
hands{free communication systems today. As it is believed that more and more
telecom applications will become hands{freein the near future, a large potential
isexpectedforinnovativeandproduct{orientedresearchin theeldof hands{free
communicationin thecomingyears. Thisisconrmedbytheobservationthatthe
globalhands{freemarketcangrowfrom US$3billiontodayto overUS$ 9billion
inthenextveyears[151].
In present{day hands{freecommunication systemsthe signal quality is often
un-satisfactory. Several types of signaldeterioration canbedistinguished, as will be
.
Figure1.2: Hands{freecommunicationsetup
Inthisthesissubbandandfrequency{domainadaptivelteringtechniquesare
stud-ied. Thesesignalprocessingalgorithmscanbeusedinawidevarietyofapplications
wheresignalenhancementisrequired. InpartI,IIandIIIofthethesisseveralsignal
processingalgorithmswillbeconsidered. InpartIVitwillbeshownthatthese
sig-nalprocessingtechniquescanbeappliedtoenhancethesignalqualityinhands{free
communicationsystems. Wewillconcentrateononeformofdegradationin
partic-ular,whichiscausedbyso{calledacousticechoes,andillustratehowthealgorithms
discussedinpartI,II andIII ofthethesiscanbeemployed.
1.2 Hands{free communication
1.2.1 Denition
Considergure 1.2, which showsa typicalhands{free communicationsetup. The
conference room accommodates one or morecorrespondents, which interact with
otherpeople ataremotesiteviaawirelessorwired communicationchannel. The
roomshowningure1.2iscalledthenear{endconferenceroomasitaccommodates
thelocalornear{endspeaker(s). Attheremotesitethereisasimilarroom,called
far{endconferenceroom,withthefar{endspeaker(s).
systems they are granted the freedom to walkaround and to interact with each
otherin anaturalway.
Toestablishhands{freecommunication, in each conferenceroomanumberof
mi-crophones areinstalled to recordthelocal conversation. The recordedsignalsare
thensentto theremotesitewheretheyarefed intoasetofloudspeakers.
1.2.2 Examples of hands{free communication systems
Hands{free telephony
Dierentsortsofapplicationstinthehands{freecommunicationframework.Most
importantfrom aneconomic point of viewis certainly hands{freetelephony.
Re-cently in many countries all over the world mobile telephony has been forbidden
whiledriving. Mobilephonecallsincarsareallowedonlyifhands{freekitsareused.
Thisismotivatedbytheobservationthathand{heldmobilephonecallsdistractthe
driverandincreasethenumberofaccidents. Duringamobilephonecallthedriver
misses4outof 10road signsandfails to giveway toother vehicles in 25%ofthe
cases. It appearsthat theaccidentriskincreaseswith75%,whichreduces to 24%
ifahands{freekitisused [171].
ItwasfoundthatpeopleinNorthAmericaspendacombined500millionpassenger{
hoursin theirvehicleseachweek. Although65percentofallcell{phone
conversa-tions take place in a car or other form of transport, less than 15 percent of the
cell{phoneusersin theUShavehands{freeaccessories[25]. So,ahugemarketfor
hands{freekitsisexpectedin thenearfuture.
A little side{remark is however that cell{phone usage is responsible for only 1.5
percent of all accidentsin the US. On theother hand outside distraction was
re-sponsible foralmost 30percent of allcrashes. Adjusting theradio or changinga
tape orCD was thesecond{biggest cause of accidents, amounting to 11 percent.
Furthermore,itappearsthattheconversationsthemselvesleadtoadangerous
driv-ingbehavior,notthetypeof phonethat isused [25]. It shouldbeadded however
thatincontrasttotheUSmanualgearchangesarestillverypopularinEurope. It
isclearthat itisalmost impossibleto changethegear,to useamobile phoneand
tosteeranddrivesafelyatthesametime.
The mostcommon low{cost hands{freekits for mobile telephony in cars, such as
theKX{TCA87ofPanasonic(US$25),areheadsetswitha(directional)
micro-phoneand headphone. Thequalityis satisfactory,but accordingtoour denition
of hands{freesystemsin section 1.2.1these systemsare nottruehands{free
solu-tions. Asecond classofproducts, suchas thehands{freecaradapterNTN1583of
Motorola(US$100),useahands{freemicrophoneandabuilt{inspeaker,which
are connectedtothe dashboard. These are hands{freesystems,but thequalityis
and guaranteeabettersound quality. These systemshoweverneedto bebuilt in
and are integrated in the dashboard. The most advanced products rely on echo
cancellation and noise suppression techniques. The Sonata III echo cancellation
andvoiceenhancementsystemofNMSCommunications 1
wasdevelopedforservice
providersofE1longdistanceanddigitalwirelesstechnology. Itisexpectedthat in
thenearfuture smallerandmoreadvancedsolutionsforhands{freetelephonywill
bedeveloped,whichcan beintegratedin thehand{heldmobilephonesthemselves
andprovidehighqualitywidebandspeechenhancement.
Teleconferencing
Apartfrom hands{freetelephony alsoteleconferencing tsin thehands{free
com-munication framework presented in section 1.2.1. Teleconferencing systems are
commonlyusedinbusinessmeetingstoday. Teleclassing,whichenablesstudentsto
attend classes and lecturesfrom aremoteclassroom,is a special caseof this. As
theparticipantsin ateleconferencing meeting canstay in theirlocal oÆce
unnec-essarytravelingisavoided. Hence, alarge costreductionis obtainedandthe loss
ofprecious timeiskepttoaminimum. Amarketresearchreport fromWainhouse
Researchstatesthatthemarketforaudio,videoandwebconferencingserviceswill
reachUS$9.8 billionby2006,upfrom US$2.8billionin2000[135].
Powerful teleconferencing systems are already commercially available. Polycom,
Inc.,whichacquired PictureTelCorporationin 2001,brings arange offullduplex
audioconferencingequipmenttothemarket. Thesesolutionshavealimited
band-widthandaresuitedforsmallbusinessmeetings. Largersystemsarealsoavailable,
suchastheiPower TM
900seriesofPolycom,Inc. Theyprovideintegratedaudioand
videoconferencingandoerbetteraudioquality. Futuresystemswillhavetocope
withhigherbandwidthsandmulti{channel signalenhancement,forwhich eÆcient
signalprocessingalgorithmsareneeded.
Domoticand voice{controlledsystems
Nowadaysthere is anincreasing interest in so{calleddomotic systems. More and
morevoice controlled systemsare encountered in daily life at home and at work.
These hands{free systems canbe used for the automatic conditioning of a living
room orthe oÆce at work (switching the light or the central heating on and o,
openingthecurtains,...). Otherexamplesarevoicecontrolledelectronicdevicesor
HiFisystems,theon{boardcomputerinyourcar,voicecontrolledPCsoftware,... .
Telematicsseemstobethenextbigchallengeintheautomotiveindustry,providing
cellularvoiceandinternetservicesin vehicles. InNorthAmericaalone themarket
fortelematicsequipmentis expectedtogrowto US$7billionin2007[180].
1
near−end
speaker
far−end speaker
acoustic far−end echo
Figure1.3: Full{duplexhands{freecommunicationsetup
In 2001 Ford and Vodafone announced a strategic partnership to provide in{car
telematicservices. WithinveyearsnearlyallnewFordvehicleswillbettedwith
some telematics system. These systems will include voice recognition and text{
to{speech technology to recognizespoken phonenumbersas well asthenames of
previouslyentered contacts. Advancedsignalprocessingtechniqueswill beneeded
foradequatesignalconditioningandpreprocessing.
1.2.3 Signal deterioration
Consideragaingure1.2. Ideally,thedesirednear{endspeechsignal,whichstems
fromalocalcorrespondent,issenttotheremotesitewithoutanyqualitylosses. It
is clear that in ahands{free systemthe signalquality is degraded in many ways.
Duetothelargespeaker{to{microphonedistanceundesiredbackgroundsignalsare
recordedandaretransmittedtothecorrespondentaswell.
A rst type of disturbance are so{called acoustic echoes, which arise whenevera
far{endloudspeakersignalispickedupbythenear{end microphone(s)andissent
totheremotesite. Atthefar{endsitethesamecouplingmightexistbetween
loud-speakerandmicrophoneandhencethesignalcancirclearoundinthesystem. The
localspeakerhearsanechooradelayedversionofhis/herownspeech(gure1.3).
Such delayedsignalshinder smoothconversationandlowerthespeech
intelligibil-ity. Delayscouldbequitelong(several hundredsofmilliseconds),especiallywhen
satellitelinksareinvolved. Intheworstcasetheclosed{loopgainmightbecometoo
largeandtheechogetsunstable,resultinginaharmfulsinusoidaltone. Anumber
A second source of signal deterioration is \background noise". This type of
dis-turbance can e.g. be generated by a ventilator or a computer fan. It can also
come from people in theconference room not participating in thediscussion but
having adiscussion among themselvesin thebackground(cf. cocktailparty). In
carapplicationsnoiseisbeinggeneratedbytheengineorbythecarradio. It may
alsocomefrom thewindpassingaroundthecarcabinorfrom thecontactbetween
road and tires [94] [160]. Signal processing techniques that are applied to reduce
thebackgroundnoiselevelarereferredtoasnoisesuppressionorsourceseparation
algorithms. Ifareferenceofthedisturbingsignalcanbeobtained,e.g. in thecase
of radio orengine noise,morespecic enhancementtechniques canbeused. This
iscalledinterferencecancellationandisverysimilartoacousticechocancellation.
Finally,remarkthat allsignalspropagatethroughtherecordingroom. Asa
conse-quence reverberationis addedto thesignals,which leadsto anothertypeof signal
distortion. Althoughsignals(especially music)maysound morepleasantwhen
re-verberationis added,ingeneraltheintelligibilityislowered. Inordertocopewith
thiskindofdeformationdereverberationordeconvolutiontechniquesarecalledfor.
1.3 Characteristics of speech and the acoustic
en-vironment
Thecharacteristicsofspeech and thepropertiesof theacousticenvironmenthave
anin uence on thetypeof algorithm that isused and onthe way thealgorithms
are applied. In this section some characteristics and peculiarities of speech and
acousticsarediscussed. Onlythosepropertiesarementionedthatareimportantfor
thealgorithmsandtechniquesconsideredinthisthesis. Moredetailedinformation
on speech and signal processing for speech signals can befound in [29] [124]. A
good referenceonacousticsis[93].
1.3.1 Speech signals
Veryoftenin hands{freeapplicationsthe signalto beenhancedis speech. Speech
is a signal with highly time{varying characteristics. Sometimes speech is quasi{
periodic(e.g. vowels),atotherinstancesitactslikecolorednoise(fricatives)oritis
impulse{like(plosives).Forexample,intheword\peace"thereisacleardierence
betweentheplosive/p/,thevowel/i:/andthefricative/s/.
Speechisawideband signalwith frequencycomponentsbetween100and8000Hz,
hencecoveringmorethan6octaves. Forspeechunderstandingfrequenciesbetween
300and3400 Hz, i.e. 3.5octaves,areof mostinterest. Hence,asampling rateof
so{calledwideband speech systemsfor which highersampling rates, e.g. 16 kHz,
areused.
It is observedthat boththe time envelopeand thespectralcontentof speech are
continuouslychanging: theenergyofthespeechsignalisbothtime{andfrequency{
dependent. The meanfrequencyenvelopeof voiced speech isabout-6dB/octave.
Signal enhancement algorithms have to copewith the changing frequency
depen-denceandhenceoftenrelyonfrequency{domainandsubbandtechniques.
Thetime{domainevolutionofthespeechsignalischaracterizedbyitshighdynamic
range: speech pausesalternate with high energetic vowels orplosives, which
sig-nicantlyincreasethe short{timeenergy. This cane.g. beveried in gure10.12
(chapter 10)were aspeech signalisshownat thetop. It isfoundthat the
ampli-tudeofspeechvariesbetween30and90dBSPL[124]. Inorderto copewiththese
amplitudevariations12to16bitslinearquantizationiscommonlyusedforspeech.
Furthermore,due to thehigh dynamic rangeofthe speech signal,signal
enhance-mentalgorithmshavetobenormalizedbytheactualsignalenergy. Inthiswaythe
algorithmcan bepreventedfrom divergingandatthesametimeslowconvergence
canbeavoided.
1.3.2 The acoustic environment
It is observed from gure 1.2 that acoustic waves travel from source to listener
and thereby propagate through the recording room. This propagation results in
asignal attenuation and spectral distortion. It appears that the attenuationand
the distortion can be modelled quite well by alinear lter. Nonlinear eects are
typically of second order and mainly stem from the nonlinear characteristics of
theloudspeakers. Thelinearlterthat characterizestheacousticsandrelatesthe
emitted signal to the received signal, is called the acoustic impulse response and
playsanimportantroleinmanysignalenhancementtechniques.
Acoustic impulse responses can be measured quiteeasily, an exampleof which is
givenin gure1.4. Observethat theacousticimpulseresponse ischaracterizedby
adeadtime. Thedeadtimeisthetimeneededfortheacousticwavetopropagate
fromsourcetolistenerviatheshortest,directacousticpath. Afterthedirectpath
impulse a set of early re ections are encountered, whose amplitude and delay is
stronglydeterminedbytheshapeoftherecordingroomandthepositionofsource
and listener. Next come a set of late re ections, also called reverberation, which
decayexponentiallyintime. These impulsesstemfrom multi{pathpropagationas
acousticwavesre ectonwallsandobjectsintherecordingroom. Acousticimpulse
responsesaretypicallyhighly time{varying, asshownbythefollowingexperiment.
Experiment1.1 Considertheacousticimpulseresponse w
1
shown in gure1.4.
loud-0
0.05
0.1
0.15
0.2
0.25
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
time (s)
amplitude
Acoustic impulse response of the ESAT speech laboratory
Figure1.4: AcousticimpulseresponseoftheESATspeechlaboratory
speaker. Theresponsey =w
1
?x wasrecordedwithamicrophone. Thedistance
between loudspeaker and microphone was approximately 180 cm. Based on the
loudspeakerandmicrophonesignal,w
1
couldbedetermined. Thentheexperiment
wasrepeated. Thecongurationwasslightlychanged,movingthemicrophone1cm
to theleft andleavingtheposition oftheloudspeakerandtherest ofthe
environ-mentunchanged. Againtheacousticimpulseresponse wascomputed, resultingin
w
2
. Despitethesmallchangeinmicrophonepositiontheimpulseresponsechanged
substantially: itwas foundthat
jjw 1 w 2 jj 2 jjw 1 jj 2 =72%:
Tosimulate theeect of movingcorrespondents in the recording room adummy
was placed between loudspeaker and microphone and the impulse response (w
3 )
wascomputed. Thenthedummywasmovedapproximately1cm. Allotherobjects
were left unchanged. Againtheacousticimpulseresponsew
4 wasdetermined. In thiscase jjw 3 w 4 jj 2 jjw 3 jj 2 =34%: 5
arecalledfor. Thankstothecontinuousupdatingthesealgorithmsaremoreorless
robustagainstpossiblesystemvariations.
Tocharacterizetheamountofreverberationinarecordingroomthereverberation
time (RT
60
) is dened as the time that the sound pressure level or the intensity
needs to decay to e.g. -60 dB of its original value. It is therefore a measure of
the decay and of theduration of theacoustic impulseresponse. It appearsto be
independentof the actualposition ofsource and listener. Thereverberationtime
was computed forthe impulse response shown in gure 1.4 following the method
describedin [60]. Itappearedthat RT
60
240ms.
Typicalreverberationtimesareintheorderofhundredsoreventhousandsof
mil-liseconds. ForatypicaloÆceroomRT
60
isbetween100and400ms,forachurch
RT
60
can be several seconds long. If therefore in a digital signal enhancement
application the acoustic impulse responses are characterized by FIR ltersmany
hundredsorseveralthousandsofltertapsareneeded,dependingonthesampling
rate. Hence,computationallyeÆcientalgorithmsarerequired.
Inorder to reduce thelter order,i.e. the numberof delay elements, IIRmodels
couldbecalled for. Itappearsthatalthough theordercanbereducedinthis way
it still remains quite large, i.e. in the order of several hundreds[75] [108]. IIR{
basedenhancement techniqueshaveto be reliedonin that case, typically leading
toeitheranincreasedcomputationalload,orstabilityproblemsandconvergenceto
localminima[108][141].
Inorderto optimallycontrol theexperimentscarriedoutintheframeof this
the-sis simulated room impulse responses were often used. These simulated acoustic
impulseresponsesweredesignedfollowingthemethod describedin[4][129][154].
1.4 Enhancement techniques
Eachofthethreeformsofsignaldegradationthatariseinhands{freecommunication
are now discussed in more detail, emphasizingexisting algorithmic solutions that
areknownfrom theliterature.
1.4.1 Acoustic echo cancellation
Experiments have shown that suppressing the acoustic echoes with 45 dB leads
to satisfactory perceptual results, as long asthe overall delay introduced by the
echo canceller doesn't exceed a certain upper bound. The input{output delay is
far−end echo
+
−
far−end signal
output
near−end
speaker
local near−end
acoustic path
signal
adaptive filter
+
x d e y s ^ w wFigure1.5: Adaptiveacousticechocancellation
with respect to echo cancellation are containedin the ITU{T recommendations 2
(G.167) on acoustic echo controllers [86]. For instance, the end{to{end delay is
recommended not to exceed 16 ms for wideband teleconferencing. The far{end
signal suppression (when no near{end signal is present) should reach 40 dB for
teleconferencing systemsand45 dBin hands{freetelephony. Inpresence ofnear{
endsignals(doubletalk)thesuppressionshouldbeatleast25dB.Convergenceto
a3dBattenuationlevelshouldlast lessthan20msin thecaseofsingletalk.
Tosuppresstheechoseveralconventionalacoustic echocancellation techniquescan
be applied [77]. For instance, highly directional loudspeakers and microphones
and sound absorbingmaterialscan be usedto avoid re ections. Another popular
technique is voice controlled switching or loss control, which mutes channels in
whichnoorverylow{energeticactivityismeasured. Itisclearthatthesetechniques
relyon accurate voice activity detectionand hence quickly degrade. Further, the
stabilitymarginoftheclosed{loopsystemcanbeimprovedusingso{calledhowling
control. Theretoalmost inaudible nonlinearoperationsare inserted in the signal
pathtoavoidinstabilityoftheclosed{loopsystem,asthiswouldresultinaharmful
sinusoidaltonecirclingaroundinthenetwork.Frequency{shifting,combltersand
resonantpeak removalare often used. Finally, nonlinearpost{processing devices
canbeaddedtoremoveresidualerrorsignalsandtomakethesignalmorepleasant
tolistento.
Inpracticenowadaysacousticechocancellersare basedonadaptive ltering
tech-niques[76][77][106][176]. Adaptivelterswillbediscussedinsection2.3. Ageneral
adaptiveacousticechocancellationsetupisshowningure1.5. Iftheadaptivelter
^
wisagoodestimateoftheacousticimpulseresponsew itisobservedthat
e[k] = d[k] y[k] (1.1)
= (s[k]+w?x) w^?x (1.2)
s[k]; (1.3)
+
+
+
+
near−end
speaker
far−end
speaker
Figure1.6: Stereoacousticechocancellation setup
hence theecho can be removed. The adaptivelter w^ is a self{designingsystem
that usesagradientalgorithm thatminimizestheerrorsignalenergy. Inthis way
agood replica ofthe unknownsystem wcanbeobtained. Apartfrom theability
toobtainagoodechopathreplica,timevariationsoftheacousticimpulseresponse
canbetrackedaswell,thankstotheadaptivity. However,accuratetrackingofthe
acoustic impulse response w is still a challenge even if fast and hence expensive
adaptivelteringstructuresareapplied[62][162]asacousticimpulseresponsesare
knownto behighlytime{varying(cf. experiment1.1).
Inmoreadvanced systemstwoormoreloudspeakerchannels havetobecancelled
asshowningure1.6. Itcanbeproventhatstereoor|ingeneralmulti{channel|
acoustic echocancellation inherentlysuers from anon{uniqueness problem[113].
Inpracticehowever,auniquesolutiontothestereoechocancellationproblemdoes
exist, but theunderlying optimization that drives the adaptivelters appears to
beseverelyill{conditioned. Severaltechniques weredevelopedthat copewith this
issue. Theytryto decorrelatethestereochannels byinsertionof nonlinearities in
thesignalpathsorbyapplyingpsycho{acousticnoisemaskingtechniques[58] [68]
[87][121].
Although commercialadaptiveecho controllersare available onthe market
nowa-days, providing amerely satisfactorysolution to the single{channel acousticecho
cancellation problem, further improvement and research will be necessaryin the
comingyears. It isfor instance clear that in thenear future there will be aneed
for N{channel acousticecho controllers (e.g. for stereo,surround systems, Dolby
Digital 5.1). Remark that the number of adaptive lters in an N{channel echo
cancellationsystemequalsN 2
mostlyoperateatratherlowsamplingrates(8kHz)higherqualitywillberequired
inthenearfuture(16kHz,orevenhigher). Asthecomplexityofanecho
cancella-tionsystemusingalinearadaptivelteringalgorithm,changesquadraticallywith
thesamplingrate,againeÆcientadaptivestructureswillbeneeded. Finally,there
will be a request for a better overall performance and morerobustness in highly
non{stationaryand complexacousticenvironments. This requires reliablecontrol
software,whichis addedtotheadaptivelteringscheme.
1.4.2 Noise suppression and interference cancellation
Single{channelnoisereductionmethodshavebeenknownforalongtimenow. They
exploitthecharacteristicsofspeechandthenoiseandenhance theSNR by
appro-priate (matched orWiener)ltering operations[149]. More advanced techniques,
commonlyusedtoday,relyonspectralsubtraction[11][182].
NoisesuppressionisadiÆcultproblem. Itisobservedthatthesignalofinterestand
thebackgroundnoisetypicallyoverlapbothinthetimeandinthefrequencydomain.
Thisiscertainly truewhen bothsignalsarespeech. Thesignalofinterestandthe
\noise"arethereforediÆculttoseparateifclassicalspectro{temporalenhancement
techniquesareemployed.
Itisobservedhoweverthatthecorrespondentandthebackgroundnoisesourceare
typically atadierentposition in theconferenceroom. Hence, multi{microphone
techniques can be called for, which exploit the spatial information present in the
dierent microphone signals. This in general leads to spatio{temporal ltering
operationsandincreasestheperformance.
A rstclassofenhancementtechniquesthat relyonthisspatial diversityis
beam-forming. Thebeamformingidea comesfrom telecommunicationswhere it was
in-troducedtodesignantennaarrays. Lateritwassuccessfullyappliedtoacoustic
ap-plicationsaswell. Astheacousticenvironmentisinherentlytime{varyingadaptive
beamformingtechniquesare often called for. Broadbandbeamforming for speech
enhancementis stillatopic ofongoingresearch[17] [18][34] [66][74] [91][92] [97]
[122][123][125][136][150] [158][161][164] [165][172][174][175].
Morerecentlyoptimal ltering techniqueshave been proposed forthe suppression
of additivebroadbandnoise. These techniques relyonpowerfulmatrix
decompo-sitions such as theSVD and the Quotient SVD [33] [148]. They show asuperior
performancecompared toclassicalbeamformingapproachesbutare
computation-allymoredemanding.
Ifareferenceofthenoisesignalcanbeobtainedmorespecicsignalenhancement
techniquescanbeapplied. Forinstance,inthecaseofenginenoiseinacarthespark
signalcanbemeasuredandused tosuppress thenoiseinthe carcabin. Adaptive
Echo cancellation and noise suppression have been addressed independently for
many years now. Recently, it has beenrecognized that both problems are better
tackledin acombinedapproach, especially ifmulti{microphonesettingsare being
used. Initial results indicate that the combined approach yields a better
perfor-manceat alowercomputationalcost[1][31][63][102][103] [104][105].
Multi{microphone noise reduction schemes are being commercialized nowadays.
The systems that are available on the market howeverare typically rather basic
solutions with alimited numberof microphones and often relying onsimple, not
fullyadaptivesignalprocessing tools. There iscertainlyaneedfor morepowerful
and robustsystemswith ahigherperformanceat an acceptablecost in the
forth-comingyears.
1.4.3 Dereverberation
Ofthe threetypesofsignaldeteriorationthat occurin hands{freecommunication
dereverberationisleastprominent. However,inroomswithahighre ectivity
rever-berationeectshaveaclearlynegativeimpactontheintelligibility. Dereverberation
techniqueshavebeendevelopedoverthelastyearsbutthesolutionsavailabletoday
arenotyetsatisfactory.
Single{channel dereverberation techniques werereportedrst. Inverselteringcan
becalled for, by tryingto invert theacoustic impulse response. However, asthe
impulseresponsesare knownto benon{minimumphasesystemstheyhavean
un-stableinverse[112][120]. Cepstrum{based techniques aremorepromising [6][126]
[131]andrelyontheseparabilityofspeechandtheacousticsinthecepstraldomain.
Throughmulti{channelprocessingthespatialdiversityofthehands{freesetupcan
be exploited, in general leading to abetter performance. Acoustic beamforming
techniquesarebeingused,asapartfromnoisesuppressiontheyareknownto
par-tiallydereverberatethesignalsaswell. Asecondclassofmulti{channel
dereverbera-tiontechniquesisbasedoncepstralprocessing. Itwasshownthatthesingle{channel
cepstralbaseddereverberationalgorithmscanbeextendedtothemulti{channelcase
[96].
Matched ltering algorithms were reported in [2] [167]. They rely on subspace
trackingtechniques. Thesealgorithmsshowanimproveddereverberationcapability
with respect to classical approaches but as some environmental parameters are
assumedtobeknowninadvancetheseapproachesmaybelesssuitableinpractical
applications.
Duringthe lastyearsMIMO blind system identicationtechniqueshavebeen
de-veloped for equalization in digital communications [80] [118] [163] [166]. These
1.5 Outline of the thesis and contributions
In this section an outlineand anoverview of the thesiscan be found. The main
contributionsaresummarizedandreferenceswillbegiventothepublicationsthat
were broughtaboutin theframeofthis work.
1.5.1 Motivation
Inthisthesissubbandandfrequency{domainadaptivelteringtechniquesare
stud-ied, putting forward acousticecho cancellation as a possible and straightforward
application.
Acousticechocancellation,aswellasothersignalenhancementproblemsinhands{
freecommunication,dealswiththeretrievalofdegradedspeechembeddedin\noise".
Toenhance the speech signaltheacousticsofthe recordingroomneed to be
esti-mated. In section 1.3 we discussed some properties of speech and the acoustic
environmentthat imposespecicconstraintsonthesignalenhancementalgorithm
thatisused. Itwasforinstanceobservedthatacousticimpulseresponsesaretime{
varying high{order systems. It was further indicated in section 1.4.1 that there
willbeaneedfor(multi{channel)acousticechocontrollersin thenearfuture that
oerahighperformanceat increasingsamplingrates. Hence,computationally
eÆ-cientand adaptive algorithmic solutionsshould be called for. Finally, as thetime
envelope and thespectral content of speech are continuously changing time{ and
frequency{dependent signalprocessingisrequired.
Itwillappearintheforthcomingchaptersofthethesisthatsubbandandfrequency{
domainadaptivelteringtechniquesmeetalltherequirementsspeciedabove,
com-biningadaptivityandfrequency{dependentprocessing,and oeringahigh
perfor-manceatalowcost. Hence,subbandandfrequency{domainadaptivelterswillbe
putforwardasbeingappropriateapproachestosolvetheacousticechocancellation
problem.
It isnotonly ourobjectiveto presentexisting and novelsubbandand frequency{
domainadaptivesolutionsforacousticechocancellation,wewillalsodwellonthe
structures and principles that lie behind these techniques, in an attempt to get
moreinsight in theunderlying fundamentals. Whereas acousticecho cancellation
was presentedas the startingpoint and amotive forthis research, themain part
of thetext dealswithsignalprocessing assuch. Thepresentedtechniques can be
employedin manyapplications,goingfarbeyondacousticechocancellationalone.
1.5.2 Chapter by chapter overview and contributions
Theintroductory andconcludingchapterareomittedhoweverinthegure.
Inchapter2somebasicconceptsarediscussedandthenecessarysignalprocessing
toolswillbepresentedtounderstandthemainpartofthetext.
Part I : DFT modulated lter bank design for oversampled subband
systems
It wasmotivated in this introductory chapter that frequency{dependentadaptive
signalprocessingisrequiredforadequateacousticechosuppression. Frequency
de-pendencycanbeachievedthroughtheuseofdigitallterbanksandtheintegration
ofthese structuresin existing adaptivelteringschemes, leadingto so{called
sub-bandadaptivelters. Ingeneralhowever,digitallterbanksintroduceconsiderable
signalandaliasingdistortion.InpartIofthethesisdesignmethodsforperfectand
nearlyperfectreconstructionDFTmodulatedlterbanksarediscussed. These
l-ter banks introduce noor almostno signaldistortion and are easily integrated in
subbandadaptivelteringstructures.
Inchapter 3designmethods forperfect reconstructionoversampledDFT
modulated lter banks are presented. A para{unitary lter bank design
method is discussed,which waspresentedin [22]. With this method
how-evertheorderofthelterbankscannotbeadjustedaccurately. Wepresent
an extensionto this method, which basically allowsto chooseanydesired
lter length. Further, weshow that based onthe inverse parametrization
of thelter bankparametersappropriate startingvaluescanbeobtained,
which reducestheoptimizationtime.
Thestopbandattenuationofperfectreconstructionlterbanksistypically
unsatisfactoryifintermediateoperations,suchasadaptiveltering,are
per-formed on thesubband signals. In chapter 4, theperfect reconstruction
condition is relaxed to nearly perfect reconstruction. Both a frequency{
domainandamixedtime/frequency{domainbaseddesignmethod are
pre-sentedfornearlyperfect reconstructionDFTmodulatedlterbanks.
Sub-band adaptivelteringis takenasanexampleto illustrate that thanksto
theirlowerstopbandlevelnearlyperfectreconstructionlterbanks
outper-formperfectreconstructionsystems.
Publicationsrelatedtotherstpartofthethesisare[43] [45][52].
Part II : Subbandand frequency{domain adaptive ltering
In section1.5.1 subband and frequency{domainadaptive lterswere put forward
Part II
Part I
Chapter 2
Basic Concepts
Chapter 3
Filter Bank Design
Perfect Reconstruction
Chapter 5
Subband Adaptive Filtering
Chapter 7
Fullband Error Adaptation
Partitioned Block
Frequency−Domain
Adaptive Filtering
Chapter 6
Chapter 4
Filter Bank Design
Nearly Perfect Reconstruction
Frequency−Domain RAP
Chapter 8
Partitioned Block
Frequency−Domain RAP
Chapter 9
Fast Partitioned Block
Chapter 10
Experiments
Acoustic Echo Cancellation
Part III
Part IV
in more detail and discuss some of their properties. Although both approaches
weredevelopedindependentlyintheliteraturetheyarestronglyconnectedtoeach
other. We will focus on the interrelation between both techniques and combine
theirmechanismsto obtainimprovedalgorithmicstructures.
The subband adaptivelter is discussed in chapter 5. A comparison is
made between the subband approach and standard fullband adaptive
l-ters in terms of complexity and performance. It will be shown that
sub-bandadaptivelteringstructuressuerfromaconsiderableresidual
under-modelling errorunlessextra(anti{)causalsubbandltertapsareinserted.
Although the complexity gain w.r.t. the fullband approach is less than
expected, still a considerable cost reduction can be obtained. Next, we
formulate three design criteria for subband adaptive systems, which deal
with frequency selectivity, perfect reconstruction and perfect path
mod-elling. These conditions are necessaryrequirements to ensuresatisfactory
performanceofthesubbandadaptivelter.
In chapter 6 the partitioned block frequency{domain adaptive lter
(PBFDAF)isstudied. Itappearsthatthisalgorithm,whichisknownfrom
theliteratureforsomeyearsnow,outperformsstandardsubbandsystemsin
termsofconvergencebehaviorandmodellingcapabilities. Itwillbeproven
that thePBFDAFcanbe consideredasaspecialsubband adaptive
lter-ingstructure,whichfullls twooutofthethree designcriteriaforsubband
adaptivesystemsthatarespeciedinchapter5. Itisfurthershownthatthe
frequency{domainadaptivelterreliesonaspecialerrorcorrection
mecha-nism. Thankstotheerror correctiontheltercoeÆcientscanbe updated
withaliasing{freeerrorsignals,whichleadstoimprovedperformance.
Inanattempttogeneralizetheerrorcorrectionmechanismofthefrequency{
domainapproachtosubbandadaptivesystemsweproposeanovelfullband
error adaptation scheme for subband adaptive lters in chapter 7. The
alternativeadaptationschemeadjuststhesubbandltersbasedonthe
full-band error instead of using the subband errors, as is done in a classical
subband adaptivesystem. Inthis wayimprovedperformanceis obtained.
It is shown that for some common parameter settings the weight update
mechanism of the so{called unconstrained PBFDAF corresponds to that
ofthe fullband erroradaptationalgorithmpresentedinthis chapter. This
provesthatthefullbanderroradaptationalgorithmcanbeconsideredasan
extension of the frequency{domain error correction mechanism to a more
generalclassofsubbandadaptivelters.
PartIII: Iteratedpartitionedblockfrequency{domainadaptiveltering
InpartIII anextensiontothePBFDAFisproposed,called thePBFDRAP,which
is an adaptive ltering algorithm combining partitioned block frequency{domain
adaptivelteringwithso{called rowactionprojection. Thealgorithmispresented
andanalyzedandfastimplementationschemesarederived.
Inchapter8thePBFDRAPisdened anditisexplainedhowextraerror
suppression canbeachievedw.r.t. thePBFDAF. Further, theasymptotic
propertiesof thealgorithmareanalyzed: forsomeparametersettingsthe
PBFDRAPalgorithmapproacheswell{knownadaptivelteringalgorithms.
Finally, it is shown that the PBFDRAP outperforms the PBFDAF in a
realisticechocancellation setup.
FastimplementationsarederivedforthePBFDRAPalgorithminchapter
9. Thedierentfastimplementationschemesarecomparedwith the
stan-dard implementation of the PBFDRAP for dierent parameter settings.
It appears that a signicant complexity reduction canbe obtained. The
PBFDRAPadaptivelteristhencomparedwiththePRA algorithmfrom
a computational complexity point of view. It is seenthat for large block
lengthsthePBFDRAP isacheaperalternativetothePRA.
Publicationsrelatedtothispartare[50][54][55].
Part IV : Acousticecho cancellation, implementationand experiments
Inthenalpartofthethesistheacousticechocancellationproblemisrevisited. It
was pointedoutin section1.4.1that in thenearfuture there willbearequestfor
morerobustacousticechocancellationschemesoeringabetteroverallperformance
inhighlynon{stationaryandcomplexacousticenvironments. Thisrequiresreliable
control software, which is added to the adaptive ltering scheme to monitor the
adaptationspeed.
Chapter 10illustrates how thedierentadaptiveltersdevelopedin the
preceding chapters canbe applied to an acousticecho cancellation setup,
providingthemwithcontrolandso{calleddouble{talkdetectiontechniques,
known from the literature. Several experiments are discussed, dierent
adaptive ltering solutions are compared and some observations
concern-ing a real{time implementation of an acoustic echo canceller on DSP are
presented.
Publicationsrelatedtothispartare[44][46].
1.6 Conclusions
Intherstsectionofthischaptertheeconomicimpactoftelecommunication
tech-nologyandhands{freecommunicationinparticularwashighlightedanda
motiva-tionwasgivenforthework thatwas performedin theframeofthisthesis.
Insection 1.2 hands{freecommunicationwasdened,examples were givenand it
was pointedoutthat dierentsortsofsignaldegradationdooccur.
Insection1.3somebasicsofspeechandacousticswerestudied.
It wasshown in section1.4 that a largevarietyof signalenhancement techniques
are known from theliterature. They can beemployedin present{day hands{free
communicationsystemstoobtainabettersignalquality.
In section1.5 an outline and anoverview was given of the dierent chapters and
Basic Concepts
Inthischaptersomebasicconceptsarediscussedandthenecessarysignalprocessing
toolsarepresentedtofullyunderstand theforthcomingchaptersofthethesis.
Manyofthealgorithmsdescribedinthis thesisare so{calledblockbasedadaptive
lters. Ofteninthiskindofalgorithmssignalswithdierentsamplingratescoexist,
hencethename multiratesystems. Insection2.1 somebasicsofsignalprocessing
andofmultiratesignalprocessinginparticulararethereforediscussed.
PartIofthethesisfocusesondigitallterbankdesign. Theselterbankscanthen
beintegratedinthesubbandadaptivelteringstructuresthatarediscussedinpart
II. Section 2.2 thereforediscussessomelterbank fundamentals and presentsthe
necessarybackgroundinformationthat isneededtofullyunderstandpartI andII
ofthiswork.
ThealgorithmspresentedinpartIIandIIIareadaptivelters. Abriefoverviewof
existingadaptivelteringtechniqueswillbegivenin section2.3.
For many of thealgorithms that are discussed further on, acost analysis will be
performed. Theassumptions wewillmakeforthese costanalysesaresummarized
insection2.4.
2.1 Signal processing basics
2.1.1 Representation of variables
Mostofthesignals,ltersandsystemsthatarereferredtointhisthesisarediscrete{
timevariables. Theyarerepresentedinthetime,thefrequencyorinthez{domain.
Thetime{domainrepresentationofavariableh
h[k]=f ::: h[ 1] h[0] h[1] h[2] ::: g (2.1)
dependson thediscrete timek, which relatesto theactual time t=k=f
s
viathe
samplingfrequencyf
s .
H(z)isthez{transformofh[k]and isdened as
H(z)= 1 X k = 1 h[k]z k : (2.2)
An in{depthdiscussionof theuseandvalidityofthez{transformcanbefound in
manybooks onsignalprocessing[126][134]orcontrol theory[65][110].
By evaluating H(z) on the unit circle, i.e. replacing z by e j2f
in Eq. 2.2, the
frequency{domainrepresentationofh[k]isobtained:
H(f)= 1 X k = 1 h[k]e j2k f : (2.3)
H(f) is periodic in the frequency f 2 IR . For the evaluation of the frequency{
domain characteristics the fundamental interval (1 period) is usually considered,
i.e. 1 2 < f 6 1 2
, in which f = 0:5 corresponds to the Nyquist frequency. The
inversefrequency{domaintransformation
h[k]= Z 1 2 1 2 H(f)e j2k f df (2.4) computesh[k] fromH(f).
2.1.2 Multirate signal processing
In many of the adaptive ltering algorithms discussed in this thesis signals with
dierentsamplingratesareencountered. Asdierentsamplingratescoexistwithin
thesamealgorithmtheseadaptivelteringstructuresarecalledmultiratesystems.
Tofullydescribeamultiratesystemin thetimedomainseveraldiscrete{time
vari-ables should be dened and used in parallel. It is however more convenient to
representthese systemsin thez{domain.
Todescribetheconversionfrom onesamplingrateto another,twooperationswill
bediscussedhere: thereductionofthesamplingratewithanintegerfactor,called
downsampling,andtheincreaseof thesamplingratewithanintegerfactor,which
isreferredto asupsampling.
Downsampling
f[m]isanN{folddownsampledversionofh[k]if
f[m]=h[k]
N#
=h[mN]; 8m2ZZ;N 2IN
0
: (2.5)
InthiswaythesamplingrateisreducedbyafactorN. Insignal owgraphsN{fold
decimatorsordownsamplersarerepresentedas N # . Itcanbeshown[156]that
F(z)= 1 N N 1 X n=0 H(z 1=N e j 2 n N ) (2.6)
holds,in whichf[k] !F(z)andh[k] !H(z)arez{transformpairs.
Upsampling
f[m]isanN{foldupsampledversionofh[k]if
f[m]=h[k] N" = h[m=N] ifm=pN; 8p2ZZ;N2IN 0 0 otherwise, (2.7)
which increases the sampling rate by a factor N. In signal ow graphs N{fold
expandersor upsamplersaremarkedas N " . It canbeshown[156] that
F(z)=H(z N
): (2.8)
Bothoperationsintroduceartifacts. Inthecaseofdownsamplingaliasingisadded
to thesignal. Upsamplinginvokesso{calledmirrorfrequencies. Moreinformation
about this and how to get rid of theartifacts can be foundin any good book on
(multirate)signalprocessing,e.g. [156].
2.1.3 Some denitions related to matrix algebra
Inappendix A afew matrixalgebradenitions and propertiesare combined that
will be used and referred to in the forthcoming chapters of the thesis. A good
+
...
...
x
intermediate
processing
analysislterbank synthesislterbank y H 0 H 1 H M 1 G 0 G 1 G M 1 N N N N N N
Figure2.1: Ageneralsubbandscheme: allintermediateoperationscanbedoneat
thedownsampledrate,whichtypicallyleadstoareducedimplementationcostand
improvedperformance.
2.2 Filter bank basics
Filterbanksarewidelyusedindigitalsignalprocessing[156]. Typicalapplications
are subband coding [87] [169] and subbandadaptiveltering [53] [142]. Subband
techniquescanimprovetheperformanceofstandardfullbandalgorithmsforspeech,
audioorimageprocessing,astheyallowanoptimaltuningofthealgorithmineach
subband. Inthiswaysubbandalgorithmsoftenoutperformtheirgloballytuned
full-bandcounterparts. Furthermore,byusingmultiratetechniquestheimplementation
costcantypicallybereduced.
2.2.1 General subband scheme
Ageneralsubbandschemeisshowningure2.1. Theso{calledanalysislterbank
splitstheinputxinasetofsmallbandsignals: theinputislteredwitheachofthe
M analysisltersH
0 ;:::;H
M 1
andeachsubbandsignalis N{folddownsampled.
Hence, intermediate operations can be performed on the subband signals at the
downsampled rate and this typically leads to acheaperimplementation. Finally,
arecombination operationtakes place in thesynthesis lter bank G
0 ;:::;G
M 1 ,
whichoperatesontheN{foldupsampledsubbandchannelsandresultsintheoutput
y.
A lter bank is a set of parallel lters, which each lter out a part of the
fre-quencyspectrum. Ifallltershavethesamebandwidththelterbankissaidtobe
perceptionandthereforethebandwidthof thedierentltersischangedin a
log-arithmicway. Non{uniformlyspaced lterbanks areoftentree{orwavelet{based
[156][169]and mightbemoresuitableforapplications such asaudioor video. In
thisthesisonlyuniformlyspacedlterbankswillbeconsidered.
Uniformlyspaced lterbanksaretypicallyobtainedbymodulating,i.e. frequency
shiftingawell{designedlowpassprototypelter. Hence,theyarecalledmodulated
lterbanks. EachoftheanalysisltersH
0 ;:::;H
M 1
thenltersoutapartofthe
frequencyspectrumandasthereareMsubbandsintotalthebandwidthofeachof
theanalysisltersisequalto(orlargerthan) f
s
M
,inwhichf
s
representsthesampling
frequency corresponding to theinput signalsignal x. Modulated lter bankscan
easilybeimplementedbydecomposingtheprototypelterinpolyphasecomponents
andapplyingaDFTorDCToperation(seesections2.2.3,3.1and[156]). Thelatter
operationscanbeimplementedeÆcientlyusingfastsignaltransforms.
Critically downsampledsubbandschemes
Forauniformlyspacedmodulatedlterbankthebandwidthofeachoftheltersis
largerorequalto fs
M
. Hence,ifthedownsamplingfactorN islargerthanM a
con-siderableamountofaliasingwillbeinsertedin thesubbandsbythedownsampling
operation. Aliasing is often detrimental for the performance of the intermediate
subband operations (e.g. subband adaptive ltering, see also gure5.10). Hence
in practice, N is restrictedto be smaller or equal to M, with M = N being an
upper bound for N. A subband system for which M = N is called a critically
downsampled subband scheme. In thiscase theimplementation costcanbe
opti-mallyreduced: theintermediateoperationsandin manycasesalsothelterbank
operations(see section 2.2.3)can be doneat the lowest sampling rate, asN is as
largeaspossible.
Oversampledsubband schemes
Inpracticehowever, niteorder lterbanks haveto be usedin order to limitthe
processing delay and the computational complexity. Finite order lters have a
non{negligible transition bandwidth and therefore aliasing will beinserted in the
subbandseven ifthedownsampling factorN is smallerthanM. Critically
down-sampledniteordersubbandschemesarestronglysensitivetoaliasingandinmany
cases theloss in performance due to the critical downsampling is not acceptable.
Fromthispointofviewoversampledsubbandschemes(M >N)aremoreattractive
astheytradeobetweencomplexityreductionandaliasingdistortion.
2.2.2 Modulated lter banks
−0.5
0
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.2
0.4
0.6
0.8
1
digital frequency
frequency amplitude response
DFT modulated filter bank
m=0
m=1 m=2
m=2 m=3
Figure2.2: 4{bandDFTmodulatedlterbank: frequencyamplituderesponse
The M subbandlters of a DFTmodulated lterbank are derived by frequency
shiftingawell{designedlowpassprototypelterh
0 [k]oflengthL f inthefollowing way: h m [k] = h 0 [k]e j 2 k m M ; m=0:M 1 k=0:L f 1 (2.9) () H m (z) = H 0 (e j 2 m M z) (2.10) () H m (f) = H 0 f + m M : (2.11)
Equation 2.10followsfrom Eqs. 2.2 and 2.9 and Eq. 2.11can beobtained from
Eq. 2.10byreplacingz bye j2f
. Theltersarefrequencyshiftedversionsofeach
other andthe completeset of M lterscoversthewhole frequency spectrum. An
exampleof aDFTmodulatedlterbank,withM=4isshowningure2.2.
AvariantofthisistheInverseDFT modulatedlterbank,whichisdened as
h m [k] = h 0 [k]e j 2 k m M ; m=0:M 1 k=0:L 1 (2.12)
−0.5
0
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
1
2
3
4
5
6
digital frequency
frequency amplitude response
DCT modulated filter bank
m=0 m=1 m=1 m=2 m=2 m=3 m=3
Figure2.3: 4{bandDCTmodulatedlterbank: frequencyamplituderesponse
Thez{transformandfrequency{domainrepresentationare givenby
H m (z) = H 0 (e j 2 m M z): (2.13) () H m (f) = H 0 f m M : (2.14)
IDFTmodulationdiersfromDFTmodulationonlyintheorderinwhichthelters
arefrequencyshifted.
Apartfrom DFTmodulatedlterbankscosineorDCTmodulatedlterbanksare
often used. Based on a well{designed FIR prototype lter p[k] of length L
f the
dierentDCTanalysisltersh
m
[k]canbederivedasfollows[156] :
h m [k]=2p[k]cos M m+ 1 2 k L f 1 2 +( 1) m 4 ; m=0:M 1: (2.15)
An example ofa DCTmodulatedlter bank, with M =4is shownin gure 2.3.
WhereasDCTmodulatedlterbanksaretypicallyusedincriticallydownsampled
modu-frequentsubbandstendtooverlapwhenthesubbandsignalsarenot{critically
down-sampled. In this way alarge amount of aliasing is inserted in the subbands. As
it is our goal to design oversampledlter banks (M > N)that introduce onlya
small quantityof subbandaliasing, standardDCTmodulatedlterbanks arenot
applicable. Someschemes havebeenproposed that combine reallter bankswith
unequalsubsamplingindierentbandsto overcomethisproblem[78].
In thecase ofDFT modulated lterbanks onthe other hand the subbandlters
lter out asingle contingent frequency region. If there are M subbands and the
lowpassprototypelterhasagoodstopbandrejection,thebandwidthofthe
band-passedsignalsthat arelteredoutby eachofthesubband ltersis approximately
f
s
M
. Hence,the subband signalsarecorrectly projectedinto thenew fundamental
interval[ f s 2N ; f s 2N
] bythedownsamplingoperationifM >N,avoidingsevere
alias-ing distortion. As a consequence, oversampledsubband schemes are often based
on DFT modulated lter banks because of their aliasing robustness and ease of
implementation.
2.2.3 Polyphase implementation
The analysis and the synthesis lter bank are immediately followed, respectively
precededbydownsamplingorupsamplingunits(seegure2.1). Hence,itischeaper
to do notonly theintermediate processing, but also the lterbank operations at
thedownsampledrate,whichcanbeachievedthroughpolyphasedecomposition.
Analysis bank
IfthesignalspassingthroughtheM{bandanalysisbankare subsequentlyN{fold
downsampled, each subband lter h
m
[k] can be decomposed in its N{th order
polyphasecomponents: H m (z)= N 1 X n=0 z n H m n:N (z N ); (2.16) inwhichH mn:N
(z)isthen{thoutofN polyphasecomponentsofthem{thsubband
lterh
m
[k],in otherwordsthez{transformofh
m
[n+Nk],k=0;1;::: :
SwappingthepolyphasecomponentsandthedownsamplersleadstoamoreeÆcient
implementation. The ltering operationscan now be done at the lower sampling
rate,asshowningure2.4 forthem{thsubband.
The analysis bank can now schematically be represented as shown in gure 2.5.
H(z)iscalledtheanalysispolyphasematrix[156]. Element(m;n)ofH(z)is
[H(z)] m;n =H mn:N (z) m=0:M 1 n=0:N 1 (2.17)
+
+
...
...
...
...
x x x x m x m x m z 1 z 1 z 1 z 1 N N N N N H m (z) H m 0:N (z N ) H m1:N (z N ) H m N 1:N (z N ) H m 0:N (z) H m 1:N (z) H m N 1:N (z)Figure2.4: Analysislterpolyphase decomposition
Synthesispart
Forthesynthesispartasimilarderivationcanbemade. Bypolyphase
decomposi-tion G m (z)= N 1 X n=0 z n G mn:N (z N ); (2.18)
andswappingthepolyphasecomponentsandtheupsamplersgure2.6isobtained.
All N{th order polyphase components are contained in the synthesis polyphase
matrixG(z)[156]: [G(z)] m;n =G mn:N (z): m=0:M 1 n=0:N 1 (2.19)
Bycombiningtheanalysisandsynthesispartgure2.1canbere{arranged,resulting
ingure2.7(omittingtheintermediateprocessingforawhile). Itisobservedthat
for the analysis part the analysis polyphase matrix H(z) is used whereas at the
synthesissideJG T
(z)is found. Jistheexchangematrixwith onesalongitsmain