audio ar hives
WillemijnHeeren
Human Media Intera tion
Fa ulty of Ele tri alEngineering, Mathemati s and Computer S ien e
University of Twente
w.f.l.heerenewi.utwente.nl
Abstra t
Sear hingar hivedaudiovisual olle tionswill hange inthe near
fu-ture. Instead of sifting through kilometers of analog tapes inar hives'
deposits,endusers will beable toexplore the olle tions frombehinda
personal omputer, either at an ar hive or at their home. A rst step
inthedevelopmentof sear hte hnologyand userinterfa es suitablefor
supportingsu h a ess is ndingout what userswant and expe tfrom
the te hnology. Therefore, this report present a requirements analysis
ondu tedwithintheCHoralproje t,whi hispartoftheNWO-CATCH
program.
1 Introdu tion
Thenumberofspokenword olle tionsthat anbesear hedviatheWeborvia
internalnetworksisgrowing,ontheonehandduetoretrospe tivedigitization,
asfor instan e in radio ar hives,and onthe otherhand dueto the in reasing
amountofdigital-bornaudiovisualdo umentswithaspee htra k,asfor
exam-ple in meeting andinterview olle tions. These olle tionsare beingexploited
forallkindsofpurposes, e.g.,in orporate environments(meetingre ordings),
by ontentprodu ers(TVandradioar hives),inresear hsettings(oralhistory),
fortea hing(le turere ordings)andbythegeneralpubli .
In the ultural heritage (CH) domain, however, audiovisual (A/V)
olle -tions in general areat riskof be oming ina essible, be ause both theanalog
data arrierstheyare stored onare deteriorating and orrespondingplayba k
devi esarebe omingobsolete,andthedo umentsareinsuÆ ientlydis losedto
allowfastand easya ess,see e.g.,[3℄. Thepreservation issuehasbeentaken
upin retrospe tivedigitizationproje tsforhistori A/V olle tions,large-s ale
examplesofwhi haretheEUISTPrestoSpa e 1
proje tandtheDut h`Beelden
VoorDeToekomst'(`imagesforthefuture') 2
. Theissueofimprovingdis losure
ofspokenworddo umentshasbeentakenupinresear hproje ts. Thegeneral
approa h is to in rease the granularity of annotations by automati ally
pro- essing digitizedA/Vdo uments forindex generation,byapplying te hniques
1
http://www.prestospa e.org/
2
innovativeuserinterfa es.
Inthispaperwewillfo usona essibilityofspokenwordaudio olle tions,
i.e. on olle tions where the spee h is the main information layer. Resear h
initiatives aiming to improve a ess to spoken word CH olle tions are e.g.,
Multilingual A essto LargeSpoken Ar hives(MALACH)that addressedthe
issueofautomati indexinganda esste hnologyfortheShoahVisualHistory
foundation interview olle tion [7℄, the National Gallery of the Spoken Word
the goal of whi h was to make Ameri an histori al voi es sear hable online
[8℄, and the CHoral proje t on spoken do ument retrieval (SDR) for Dut h
histori al olle tions[19℄. Inadditiontothe ontributiontheseproje tsmaketo
largevo abularyspee hre ognition(LVCSR)andSDR,thestudyof olle tion
usage and user needs re eived attention in the MALACH proje t [7, 12, 22℄
and is re eiving attention in the CHoral proje tand the Multimat h proje t,
whi haims todevelopamultilingual sear h enginefor CH ontent, [1℄. Inthe
developmentofa esste hnologyforspokenheritage olle tions,itisimportant
togaininsightintohowthe olle tionsarebeingusedandforwhi hpurposes.
An earlier requirements analysis of A/V ar hiving in the CH domain was
undertakenas partof the EU Prestospa e proje t 3
. The goal of PrestoSpa e
is to studyand fa ilitatedigitalpreservationof Europe'saudiovisualheritage.
A surveyundertaken as partof that proje taimed at providing \an overview
of potential users' fun tional requirements for the PrestoSpa e fa tory tools
and servi es",[5℄. Three typesof end userswere identied: produ ers (58%),
ultural institutes (21%), and resear hers and private persons (21%). As for
a esstothe olle tions,remotea esswasreportedtobevirtuallynonexistent
and on-linea esswasverylimited. Catalogs ouldgenerallybe onsulted
on-linebyqueryingthemetadata,butoneoutofeverytwoar hiveswasnotsatised
with their system's performan e. The parti ipating audiovisualar hiveswere
onvin ed that online viewing and listening a ess would in rease their sales,
but that rights management would remain a problem. Another problem for
onlinelisteningisthatthedataarenotstoredonservers. Digitization urrently
meansthatA/Vdataistransferredfromanalog arrierstodigitalonesinstead
of onto Digital Mass Storage Systems (DMSSs), as proposed by Prestospa e.
The reasons for using DMSSs are that they support fast and easy a ess to
audiovisualdata,andthey anbeusedto he ktheintegrityofthedatainorder
topreventdataloss. Moreover,theex hangeabilityofmaterialsis ompli ated
by the urrent la kof standardsfor do umentation. One ofthe hallengesin
hanginga essthereforelieswiththemanagementof olle tionsatthear hives
themselves.
As partof the MALACH proje tstudies were ondu ted on therelevan e
riteriathatusersemploytojudgesear hresultsforaudio,[12,15℄. Theresults
of[12℄indi atedthat generallytopi s andsummariesarehelpful, butthat also,
informationonthegenreoftheaudio(e.g.,interview,debate,report)wasjudged
relevant,aswellastime information: boththetimeframeoftheaudioandits
re en y.
Inthispaperwewillfo usonhowtoimprovea essto olle tionsfromthe
CHdomainfromanA/Var hiveuser'sperspe tive. Sin estudiesintousageof
A/Vheritage olle tionsares ar e,therststeptowardsgatheringinformation
3
ofarequirementsanalysis. This was ondu tedto gaininsightintouserneeds
and the urrentstateofdis losureanda ess ofDut h audiovisualdo uments
from theCH domain. Theresultswill beused to formulate re ommendations
forfurtherresear handdevelopmentinse tion3.
2 Requirements analysis
IfSDRsystemswouldalreadybein usefor retrievalof informationfrom A/V
ar hives,a tualusers ouldbeaskedfortheiropinionson urrentsystemsand
users' sear h a tions ould be logged. To my knowledge, there are only two
Dut h A/V olle tions that an be sear hed and a essed online: A ademia 4
,
a esstowhi hisrestri tedtopaying lientsfromedu ationalinstitutes,andthe
MediaandCommuni ationse tionoftheMemoryoftheNetherlands,anonline
show aserepositoryofdigitalCH ontent 5
. Sear hlogsforA/V olle tionsare
notavailable 6
.
In order to gather information on user needs and urrent usage of Dut h
CH olle tions our starting point wasa requirementsanalysis with olle tion
keepers,whodealwithrequestsfrom usersin theirdailypra ti e.
2.1 Methodology
This requirementsanalysis wasdonein theform ofsemi-stru turedinterviews
with maintainers of audiovisual olle tions from the Dut h ultural heritage
domain. Itsobje tivewastomeetthefollowingthreegoals:
1. Gaininsightintothe urrentpra ti eofdis losureofand therealization
ofa esstohistori allyrelevantaudio olle tions;
2. Determineinformation needs for avariety of usersof audiovisual
olle -tions;
3. Estimatethea eptan eofautomati indexingte hnologyfromthepoint
ofviewof olle tionmaintainers.
Maintainers of dierent Dut h spoken word olle tions parti ipated: two
ar hivists from the Rotterdam Muni ipal Ar hives (GAR) 7
, an ar hivist and
an editor-in- hief from the regional radio station Radio Rijnmond (RTV) 8
, a
maintainer of an oral history olle tion at the Royal NetherlandsInstitute of
SoutheastAsianandCaribbeanStudies (KITLV) 9
,aradiodo umentalistfrom
theNetherlandsInstituteforSoundand Vision(SV) 10
,and thegeneralaairs
dire torand theaudioexpertof theMeertensInstituteforResear hand
Do -umentation of the Dut h Language (MI) 11
. In this sele tion, keepers of both
4
http://www.a ademia.nl
5
http://www.geheugenvannederland.nl
6
Aftertheanalysisreportedinthispaperwas ompleted,someofthelogsfromthe
Nether-landsInstituteforSoundandVisionhavebe omeavailableforresear h.
7 http://www.gemeentear hief.rotterdam.nl 8 http://www.rijnmond.nl 9 http://www.kitlv.nl 10 http://www.beeldengeluid.nl 11 http://www.meertens.knaw.nl
luded.
The groupof parti ipantsis far from exhaustive, though. Other CH
spo-kenword olle tionsintheNetherlandsin ludetheKomMissieMemoires 12
,het
GeheugenvanNederland(`memoryoftheNetherlands') 13
,the olle tionof
in-terviewsinDut hoftheShoahVisualHistoryFoundation 14
,Groningendiale t
spee h 15
,and(radio)pod asts 16
.
Thequestionnairethatwasusedforthesemi-stru turedinterview onsisted
ofquestions on erning:
theaudio olle tionsandtheirmaintenan e;
dis losureofthe olle tions;
sear habilityand a essibilityofthe olle tions;
thetypesofusersand theiruses;
a eptan eofautomati indexingte hnologyinthear hivingwork ow.
Thefulllistofquestions anbefoundintheappendix. Asforinformation
gath-eredonthetypesofusersandthe urrentpra ti eofdis losureanda essthere
is some overlap with the PrestoSpa e study. The urrent study furthermore
aimed atgatheringinformation one.g.,requestsfrom usersand thefrequen y
of requests, pra ti es at smaller ar hives, and maintenan e of non-broad ast
olle tions togetamore ompleteimpressionofaudio-ar hivingin Dut h CH.
Theresult ofthe interviewswill bepresentedin thefollowingsubse tions.
First,the olle tions andtheirmaintenan ewillbedes ribed. This isfollowed
byadis ussiononthedis losureofthose olle tions,andonhowusers angain
a ess to the ontentof the olle tions at present. Thirdly, the dierent user
groups and their information needs will be presented. Finally, the role that
audio indexing anplay in the dis losureand a essibility of su h olle tions
willbedis ussed.
2.2 Audio olle tions and their maintenan e
The typesof olle tions maintainedby the institutes that ontributed to this
analysis are broad ast olle tions and olle tions developed for resear h
pur-poses,eitherin theoral historydomainorforlinguisti resear h.
Oral history olle tions onsist of eyewitness reports on a ertain eventor
periodinhistory. Theeldoforalhistoryis on ernedwiththerelationbetween
thehistorythathasbeenre ordedinbooksandthememoriesofindividuals[16℄.
An exampleisthe`Memories oftheEast' olle tionontheend ofthe
Nether-lands olonial presen e in Asia [23℄. Colle tions that are quite similar with
respe ttotheiraudioandspee h hara teristi sareotherinterview olle tions,
for instan e gatheredto makedo umentaries, su hasthe `In mei, Rotterdam
1940' olle tiononthebombardmentofthe ityofRotterdamduringWorldWar
12 http://www.ru.nl/kd /beeldengeluid/kommissiememoires/ 13 http://www.geheugenvannederland.nl/indexen.html 14 http://www.jhm.nl/default.aspx 15 http://www.gava.nl/ 16
the Meertens Institute. The approximately 5.000 hours of audio maintained
there {80%of whi h hasbeendigitized{arebeingusedto investigate Dut h
languagevariation.
Thelargestrepositoryofhistori alaudiointheNetherlandsistheSV,where
theradioandTVar hivesofDut hnationaltelevisionarebeingkept: it onsists
ofover700.000hoursofmultimediaar hives. Smallerbroad ast olle tions,i.e.
fromregionalradioandtelevisionstations,aremostlybeingmaintainedat
mu-ni ipalar hivesandatthebroad astersthemselves. TheRotterdamMuni ipal
ar hivesforinstan emaintainover2.000hoursofregionalradiobroad astsfrom
RTVRijnmond that havebeendigitizedand dis losed,and amultiple ofthat
amountwhi hremainsundis losed,mainly onanalogdata arriers.
Digitizationisatrendin the ulturalheritagedomainasanalog arriersare
de ayingandtheirplayba kdevi esarerunningoutoffashion. Ar hivistsand
maintainers, however, tend to prefer the use of those data arriers for whi h
the durability, reliability and quality are well known: analog arriers. When
analogdataarebeingtranslatedtothedigitalrealm,the ontentofanalog
ar-riersisintheNetherlandsmostoftentransferredtodigitaldata arriers,su has
CD(-rom)sorDVDs. Atthemoment,thisisthe aseatallinstitutesthat
parti -ipatedinthisstudy. Eventhoughthismayguaranteethe olle tions'
preserva-tionforawhile,itdoesnotaidtheira essibility. Forthattobethe ase,digital
massstoragesystems(DMSSs)shouldbeused,seehttp://www.prestospa e.org.
The Dut h maintainers re ognizethe importan e andthe addedvalueof su h
asystem,butnotallinstitutesa tuallyhaveplanstostartusingDMSSsinthe
near future(forinstan eduetola koffundingandexpertise).
Inadditiontodigitalmaterialsarisingfromthe onversionofexistingar hives
fromanalogtodigitalformats,radioandTVbroad astsandalsooralhistory
ini-tiativesarenowadaysbeingre ordeddigitally. Theabsen eofdigitalstandards
isillustratedbyanexamplegivenbyaborn-digital olle tionoftheKITLV.Its
`MemoriesoftheEast' olle tionwasre ordedonmini-dis ,butanewmedium
isneededsin eminidis re ordersaregettingoutoffashion. Asdigital
ar hiv-ingstandardsarestillunderdevelopment,ar hivistsmaytendtopostponethe
digitization pro ess until interoperability, uniformity and quality anbe
guar-anteed.
2.3 Dis losure of olle tions
There are basi ally two types of des riptors to dis lose AV do uments:
on-tentannotationsand ontextannotations. Contentdes riptorsareforinstan e
summaries, full trans ripts and keywords. Theydes ribewhat the do ument
is about. Context des riptors on ern produ tion date, data arrieret ., i.e.
thete hni aldetailsofthedo ument. Withrespe ttothe ontent,des riptions
are found at the level of tra ks,i.e. oherent hunks of several minutes ea h,
su h asat GAR and at theKITLV,and at thelevelof programswith a
reso-lution ofseveralminutesto anentire hourormore,su hasatRTVRijnmond
and SV.Thesedieren esin thegranularityofdes riptiondire tly impa tthe
granularitywithwhi ha esstothe olle tion anbeprovided.
Theamountofeortputintothedes riptionsgreatlydierswithinaswellas
between olle tionstypes. Resear h olle tions,forinstan e,aregenerallymore
at SV. This is mainly due to the dierent ompany goals, i.e. broad asting
and ar hiving, respe tively. At RTV only a few keywords and sometimes a
short ontent des ription were used to annotate aprogram until 1995. As a
result,overade adeofmaterialisrelativelybadlydis losed(the hannelstarted
broad astingin1983). After1995,RTVusedprograms riptsandwrittennews
itemstomakemoreelaboratedes riptionsofbroad astnews,butthe ontentof
interviews,forinstan e,wasnotelaboratelydis losed. Radiobroad astsatSV
arebeingannotatedmu hmoreelaborately. Theseisaproto olthatpres ribes
theamountoftime thedes riptionof a ertaintypeofprogrammay ost,the
vo abularyto beused andthetypes ofinformation topreserve. Not all radio
broad asts arebeingannotated,however;newsandsportsare fully des ribed,
but only a sele tion of the other programs is being annotated. If available,
programs riptsandother ollateraldo umentsareusedformanual dis losure
anddes ription.
TheSVhasdo umentalistsspe ializedinar hivingbroad astmaterials,but
at RTV ar hiving is the responsibility of the program makers. The result is
that des riptionsmaynot be fullya urate, and even that programsmaynot
be ome ar hived at all. Asimple exampleof ina ura yare spelling mistakes
thatmayee tivelyresultinirretrievabilityofdo uments{atleastuntilmore
advan edsear hsystemsareimplemented.
Resear h olle tionsare generallydes ribedmoreelaboratelyasisthe ase
with theKITLVs Memoriesof theEast olle tion 17
. Themetadataperaudio
do ument hasbeen entered into adatabasethat {in additionto an elaborate
summary of the ontent{ ontainselds for all kindsof information, su h as
personaldata(name,birthdate,family onstitution,pla eofresiden e,
o upa-tion,et .),andinformationontheinterviewitself(su hasdateandduration).
Ea h audiodo ument ispartitioned into anumberof10-minutetra ks,whi h
enablestheretrievalofrelativelyshortfragments.
Therehavebeeninitiativestomaketheusuallylengthydes riptionpro ess
moreeÆ ient.Forinstan e, fairly re ently, theSV startedusing anew atalog
system with an on-line sear h interfa e, iMMix, and des riptions in the new
system are stru turedsu h that program hara teristi sare annotatedat one
levelonly,preventingdoubleinputs. Insome ases,metadatadatabasesmaybe
deliveredtoar hivestogetherwiththeaudiovisualdataandin orporatingsu h
digitalmetadataintothear hivaldes riptionsmaysavetime. Itis,however,not
withoutproblems,sin einadditiontoformat onversionsthesetsofdes riptors
maydierbetweenthemetadatadatabaseandthear hivingstandards,andthe
( ontrolled)vo abulariesused mayalsodiersubstantially.
Not allDut h audiovisualCH olle tions arebeingdis losed. Fa torsthat
ontributeinthede isionofwhethermaterialsshouldbedis losedarethe
(his-tori )importan eofthere ording,the ostofmaintenan eandmaterials,
opy-rightissuesand the olle tions uniqueness. Apartfrom ons iousde isionson
whether ornota olle tionshould be dis losed, la k of time, man power and
resour esfurther prevent olle tionsfrombe omingannotated.
17
Thisse tionwilldis ussthedierent olle tions' urrenta essibilityand
sear h-ability. A essandsear hwillrstbedis ussedgiventhes enariothattheuser
is at the institute where the olle tion is maintained, and se ond in the ase
that theusersear hesfromhomeusingtheWeb.
Ifanar hiveisopento thegeneralpubli (the MI,forinstan e, isnot), its
atalog angenerallybesear hedthroughsomeuserinterfa eattheinstitute's
reading room. Standard sear h options allow keyword sear h and free text
sear h, and morespe i optionsmay be available. Keywords are inherently
restri ted to those terms that appear in a ontrolled vo abulary (whi h often
remainsunknowntotheuser). Asaresult,theuser annotde idehowaquery
ouldbeimprovedifnoresultsorunsatisfyingresultsarefound. Eventhough
there aremethodsininformationretrievalto workaroundsu hproblems,e.g.,
thesauriandspelling orre tion/suggestion,thesedonotseemtobewidelyused
in sear henginesproviding a ess to databaseson histori alaudio olle tions.
Often, olle tion keepers are being onsulted. Sin e they know the ontents
of the olle tions,they are ableto nd fragments that ould notbefound by
a relatively naive sear her or that have not been do umented as su h in the
database. Moreover, those spe ialists an give detailed information on data
formats, onversionpossibilities, opyrightissueset .
Ifa atalogsear hhasanumberofresults,itlists(apartof)thedes riptions,
butdoesnotgivealinktotheaudioitself. Thisisin ontrastwithsear ha tions
forphotos,maps,textsorother2Dmediathatare anoftenbeshowndire tly.
Sowithrespe ttoaudioorvideodo uments,theuserendsupwithseveralIDs
that refer to the a tual audiodo uments. At the GAR, users an then listen
to the do umentsin theaudiovisual self-servi eroom, where opies onCD or
VHS-tapeareavailableforexploration. Iftheuserwantsa opyofthematerials
forhis/herown use,it anberequestedfrom theservi edesk. As fortheoral
history olle tionoftheKITLV,a esstotheaudioissimilarlyorganized:after
identifyingrelevantaudiodes riptions,theaudiodo umentsarerequestedfrom
a ounter lerk. In ontrastwiththeaudiomaintainedatthemuni ipalar hives,
however,this olle tion anonlybelistenedtoandstudiedattheKITLV.Copies
arenotdistributedtopreventpriva ybrea hesandoutof ontextpresentation
ofthe,sometimessensitive,materials. Another,verypra ti alreasonthatwas
givenfornotlinkingtheaudiodire tlytothesear hresultsisthefa tthatthis
wouldrequiremu hmemory apa ityoftheinstitute'snetwork.
Thenumbersofrequestsforsear hesinspokenword olle tionsaregenerally
small. AtRTV thereareseveral requestsperday,theKITLV oralhistory
ol-le tionre eivesaboutonerequestperday,attheRotterdammuni ipalar hives
and at the MeertensInstitute there are only afew per month. From SV the
numbersof requests are unknown to the author,but one of itsdo umentalist
mentionedthattherewerefewrequestsforradiobroad astsspe i ally.
Inmany ases,usersnowadaysdonotinitiateasear hbyvisitinganar hive,
but startat homebehind theirpersonal omputerand sear h either the Web
using oneofthewell-knownsear hengines,oraparti ular institute'swebsite.
From those institutes' websites the atalogs an usually be sear hed online.
Sear h options are generally the same as from within the institute, and the
audio do umentsthat sear hresults referto annot belistened to. A visitto
uation relate to the des riptions, the user interfa e and the use of a DMSS
server. Firstly, maintainers remarked that the moredetailis being des ribed,
theeasieritistofulllrequests. Inthe aseofar hivingbroad asts,annotating
quotes and remarkable ba kground sounds su h asbarrel organs haveproven
to bevaluable. Moreover,maintainerslearnfrom experien ewhi h topi sthey
en ounter in user requests; this an make them adapt theway in whi h they
dis lose newmaterials. Se ondly, sear h interfa es ouldin some instan esbe
more user-friendly. One way of realizing this { a ordingto the parti ipants
{is byoeringastandardandan advan ed sear hs reen; theuser aneither
enter anumberof sear h termsinto thegeneralsear held, orhe anspe ify
the nature of anumber of sear h terms to retrieve do umentsmore pre isely.
Thirdly, dire t a ess to audioand videodo umentsis thought toimproveon
the urrentsituation. Itwouldsigni antlyredu ethetimethatlapsesbetween
enteringarequestandlisteningtoa tualresults. Thiswouldbeverybene ial
in the ase of produ ing newsitems in response to unexpe ted events.
More-over,users ouldassesstheuseofthematerialsfasterandnewusergroups ould
berea hed. The urrentsituation of audio olle tionsondigital data arriers
annotsupport thiss enario: therefore,DMSSsshould beused. Inthe aseof
most olle tions,however,on-linea essshouldbe arefullydesignedtoprevent
opyrightviolationsandmisuse.
2.5 Users and uses of histori al audio olle tions
Users of histori al audio ar hives an be divided into two main groups:
pro-fessional users and the general publi . At GAR about 75% of the users are
professionals (e.g., makers of new ontent, resear hers 18
). Also SV is mainly
being sear hed by professional users. The olle tion of the MI is ex lusively
for resear h purposes. The KITLV olle tion's users are mainly resear hers,
students and ontentprodu ers, but it is also being onsulted by the general
publi .
Professional users: An important user group for audiovisual (broad ast)
ar hivesaremakersofnew ontent. Thisgroupisverydiverse: e.g.,exhibition
makers,eventorganizers, ompanies,makersoflms ordo umentaries,artists
andbothlo alandnationalbroad asters. Moreover,thekeepersofthear hives
themselvesalsofun tionasmakersofnew ontent,espe iallyinthe aseofthe
SV and RTV. As a whole, this user grouphas twomain uses for the
materi-als: (i) resear h during the preparation of a produ tion, and (ii) ontent for
a produ tion. Mainly in these ond ase, timepressure may be high asnews
produ erswanttorea tassoonaspossibleto suddeneventssu hasa idents
and disasters. Content produ erstend to sear h audio olle tionslooking for
( ombinationsof)events,keywords,lo ationsandpersons. Insome ases,they
lookforsoundimpressions,su hasthesoundofaharboror ity,butthesemay
beverydiÆ ulttondas theyarenormallynotbeingannotated.
Anothergroupofprofessionalusersare resear hers,studentsand tea hers.
These users usually pose very spe i resear h questions in omparison with
18
In ontrasttothe Prestospa esurvey wein luderesear hersamongstthe groupof
the elaborate summaries works relatively well. For most of the other
olle -tions,keywordsear histhemostpromisinga ordingtothe olle tionkeepers.
Resear hersmaybeinterestedinallkindsofsubje tsandwillsear hfor
( om-binations of) names, keywords,events, periods and lo ations in order to nd
resultsto in orporateinto their resear h,writing andtea hing. The MI's
ol-le tion that was spe i allybuilt for linguisti resear h purposes is somewhat
dierent. Whenstudyinglanguagestru turethequestionofhow thingsare
be-ingsaid (e.g.,pronun iation,grammati alstru ture,intonation)is oftenmu h
more important than the question of what is being said. The users of this
olle tionoftenmaketheirown,time- ostlyannotations.
General publi : These ond typeofusersarethegeneralpubli . They
typ-i ally sear h for audio do uments that t their personal interests, su h as a
hobbyor theirfamilyhistory. Theirrequests aremostlyfornames ofpersons,
ompanies, lo ationsand/or eventsin the ase ofboth thebroad astand the
oralhistory olle tions.
2.6 A role for automati indexing
The ulturalheritagedomainhasbeen hara terizedassomewhathesitant
to-wardste hnologi aldevelopment on erning theautomati indexing of
olle -tions. This is understandable, for instan e be ause automati ally generated
trans ripts are ertainly not error-free (as opposed to manually he ked
de-s riptions). Still, giventhe vastsize of several audio olle tionsthat havenot
been dis losed, and the manual labor of oneto ten times real time that
dis- losurewould ost,ar hivistsand olle tionmaintainers understandtheadded
value ofautomati indexing. They furthermoresuggestedanother use.
Auto-mati indexing ouldbeemployedtoprovidear hivistswithanimpressionofa
olle tionon thebasis ofwhi h asele tionfor full dis losure anbe made. A
number ofparti ipantsexplained that their deposits held tapes forwhi h the
exa t ontentwasunknown.
A ordingto theintervieweesthere are ertainrestri tionsonwhat anbe
expe tedfromautomati indexing. First,sin ehumaninterpretationisla king,
ertain abstra tions,i.e. higher-level annotations, annot easily be made. As
a resultit wasexpe tedthat userslooking for journalisti ontent (i.e. fa ts)
would have less problems retrieving relevant do uments that were
automati- allyannotatedthanuserslookingforartisti ontent(i.e. soundimpressions).
Moreover, there is the (partial) mismat h between the words that are being
spokenand themoreabstra t topi that is being talkedabout. Se ond, when
olle tions areto beusedfor ertaintypesofresear h,automati trans ription
maynotbesuitableat all,sin eresear hersneedmanually he kedindexesat
layers of information that may abstra t from the words (e.g., ommuni ative
a ts,prosody). Togeneratearstversionofanindex,however,spee h
te hnol-ogy might be employed to redu e the amountof work (whi h is exa tlywhat
hasbeenproposedin thePrestospa eproje t). Thirdly,sear hersare
unfamil-iarwithasituation in whi h theannotationsonwhi h sear h isbasedarenot
manually he ked. Theymustthereforere eiveinstru tionsontheprobabilisti
preferablyatmultiple levelsofabstra tion.
3 Dis ussion
Inthisreportwesetoutwiththreegoalsinmind: (i)gaininsightintothe
ur-rentpra ti eofdis losureandtherealizationofa essibilityofDut hhistori al
audio olle tions,(ii)gatherinformationontheusersofaudiovisual olle tions
andtheirneeds,and(iii)tore eivefeedba kfrom olle tionmaintainersonthe
potentialof automati indexing te hnology in the audiovisual ar hiving
work- ow. Together these goals aimed at gathering user requirements for spoken
do ument retrievalsystems in CH. Those requirements, in turn, will be used
to determinearesear h agendafor improvingautomati dis losureand a ess
forspokenword olle tions fromCH.Intherestof thisse tionwewill dis uss
themainndingsofourrequirementsanalysis,andalsohowthoserequirements
anbemetoraddressedin futureresear h.
Nowthat ar hivesin reasingly a knowledge theneed to maketheir
olle -tionsa essibletoend users{inadditiontothetraditionaltasksofdes ribing
andmaintaining olle tions{thequestionofhowtomake olle tionsavailable
tothegeneralpubli isbeingaddressed. Wefoundthata esstoDut h
audio-visual olle tions{inlinewithitsEuropean ounterparts,[5℄{isrelativelyslow
and umbersomeat present: e.g.,dire t, on-line a ess to audiovisual ontent
isbasi allynonexistent,short ontentdes riptionsseeminsuÆ ienttomeetthe
widevarietyinusers'informationneeds,andmany olle tionshavenotyetbeen
annotatedwhi hmakesthemalmostina essible. Problemsindis losure,whi h
is aprerequisitefor a ess,are mainly ausedby the ostliness{bothin time
andinman-power{ofprodu ingelaborate,high-qualityannotations. A essis
furthermore ompli atedbythe fa t that thedigitalinfrastru turein ar hives
is in many ases notyet ready for on-line presentationof audiovisual ontent
(provided thatIPRissueset . enablepubli ation).
Therstrequirementis to makea ess faster. Thisis expe tedto be
real-izable (i)by presenting ontent online,and (ii)bymaking sear hresultsmore
fo used, i.e. by retrievingpointers to relevant lo ationswithin do uments
in-stead oftoentiredo uments. Foronlinepresentation,audiosour esshouldbe
digitally available and linkedto the resultsfrom atalogsear h. If ne essary,
this ouldbearrangedviaalog-inpro eduretopreventIPRviolationsand/or
misuse. Onlinepresentationmoreoverentailsthedevelopmentofan
infrastru -ture that supports data management, and also retrieval and presentation of
bothmetadata and time-labeled spoken ontent. Most of these developments
fall outsidethes opeoftheCHoralproje t,butare beingtakenupat theCH
institutesthemselvesandinotherresear hproje ts.
These ond wayof making a ess faster is beingresear hed in the CHoral
proje t, and has been addressed in other spoken do ument retrieval proje ts
su h asMALACH and TheNational Gallery of theSpoken word. Automati
ontentindexingandaudiopro essingtoolsarebeingdevelopedtoin reasethe
time-resolutionofsear hresultsthroughtheadditionoftimelabelstothe
spo-ken ontent, orto highlightswithin thedo uments. Sin e urrentdes riptions
mayla kmu hdetail,su hte hnologyhasthepotentialtoin reasethenumbers
automati annotation,mainly ausedbythestatisti alnatureofthete hniques
employed. Content-basedannotationofspokenworddo umentsisusuallydone
usingautomati spee hre ognition(ASR).TheWordErrorRatesondo uments
withspontaneous onversationalspee hlieintherangeof40-60%foranumber
oflanguages,seee.g.,[3,8,11℄. Only orre tlyre ognizedwords aninprin iple
besu essfullyretrieved. Corre tre ognitiondepends onthesuitabilityofthe
a ousti andlanguagemodelsforthere ognitiontaskathand.
Audio pre-pro essing tools su h as Spee h A tivity Dete tion (SAD) and
speakersegmentation aregenerallyreferredtoasaudiodiarization. Itaimsat
determining whi h audio intervals ontain spee h and of whi h type, so that
theASR engineisonlyfedspee h,andnotmusi , andmodels anbeadapted
to thedata. Performan eonbroad astnewsaudioishigh,withmiss andfalse
alarm rates around 1% (see [24℄ for an overview), but the SAD error rate is
signi antlyhigher,i.e. around11%,onmoreheterogenousdatasets[10℄.
InlinewithndingsfromthePrestoSpa erequirementsstudy,Dut h
audio-visual olle tions aremostlybeingusedbyprofessionals,both ontent
produ -ersandresear hers/students 19
. These ondrequirementthereforeisto support
sear h bythese usergroups. A logi alrst step in further resear h would be
to study how these usersaresupported by the urrentstate-of-the-art in
spo-kendo umentretrievalin omparisonwithrunningdemonstratorsystems,su h
as theRadio Oranje demonstrator,[25℄. A next stepwould then beto make
adaptationstotheuserinterfa esgiventheusers'preferen esandfeedba k.
Anotherrequirementpertainstothetypesofinformationuserswanttond.
Theymainlysear hfor( ombinationsof)events,keywords,lo ationsandnames
of persons/ ompanies. A signi ant proportion of requests therefore ontains
namedentitiesthatare{however{notstraightforwardlyextra tedfromspoken
ontentautomati ally. Both approa hesto namedentityre ognitionand
opti-mal use of information on named entities present in manual metadatashould
beinvestigatedfurtherin ordertosupportsear hers. Assolutionstoenhan ed
re ognitionof named entities,orOut-Of-Vo abulary (OOV)terms in general,
several approa heshavebeen forwarded, e.g., multi-pass re ognition,e.g., [6℄,
queryanddo umentexpansion,e.g.,[27℄,andsubwordapproa hes,e.g.,[2,17℄.
As part of the CHoral proje t, sear h in phoneme latti es derived from word
latti eswillbeinvestigatedfurtherwiththegoalofimprovingretrievalofnamed
entitiesandOOVsinDut h. The urrentanalysisfurthershowedthatusersdo
notoftensear hfortime-related on eptssu h asdatesorperiods,andifthey
do,itismainlyin ombinationwithothertypesofterms.
A seeminglyobviousrequirementis that sear h interfa esshould be made
moreuser-friendly. InboththePrestoSpa esurveyandthepresentrequirements
analysis sear h interfa esfor exploring anar hive's atalog were often judged
asinsuÆ iently user-friendlyby olle tionkeepers. Toimproveon the urrent
situationdierenttoolsthathavedemonstratedaddedfun tionalityinresear h
systemsshould betestedon systemsin publi use. Moreover,additional tools
to support sear hing, browsing, andsele tionof audiovisual do umentsarebe
needed. Forinstan e,thankstothein reasedgranularityofautomati indexing
time-aligned metadata an be presented while the A/V do uments are being
19
Giventhepopularityofwebsitessu hasYouTube(http://www.youtube. om/),however,
the interestofthe generalpubli inonlineintera tivitywithAVdo umentsmayin reasein
assessmentof A/Vdo uments byrepresentingthe ontentstextually, [18,20℄,
and/orvisually,[21,26℄.
Manyusersupport toolsdevelopedsofar wereintended foruseon spe i
olle tions with spe i do ument types. Audiovisual ar hives typi ally hold
a large variation in do ument types, whi h ompli ates the situation for the
user. Retrieved results should not only be on-topi , but their genres should
for instan e also be available to sear hers. For as far aswe know automati
dete tion ofgenre in spokenworddo umentsis an openresear h question,as
wellasthewaysu hinformationshouldbepresentedintheuserinterfa e. First
stepstowardssu ha lassi ation ouldbemadebyusingdiarizationte hnology
toestimatethenumbersand turnpatternsofspeakers.
An earlierreport thatinvestigatedthe attitudeof ar hiviststowardsaudio
indexing te hnology showedthat ar hivistareambivalenttowardsthe
te hnol-ogy,[28℄. Ontheonehandtheya knowledgedthepotentialaddedvalue.Onthe
otherhand, onfrontedforexamplewithimperfe tspee htrans ripts,ar hivist
maybe omeskepti alabouttheusefulnessoftheautomati allygenerated
meta-data. Thepresentanalysisshowedthatmostparti ipantswel omedautomati
solutionsforannotations,andtheyalsoforwardedanewideasforapplyingthe
te hnology. Keepersofresear h olle tions,however,wereunderstandablymore
hesitant,sin etheir olle tionsneedexa tannotationsfors ienti analysisthat
in most ases annot(yet)begeneratedautomati ally.
4 Con lusion
The ndingsof thisstudy providemorespe i instru tionson whi h lines of
resear htopursue inordertoimprovesear hinA/Var hives:
Fo uson twotarget groupsin user andusability studies; produ ers and
resear hers;
Furtherautomati indexingte hnologyforspokendo uments,i.e.
diariza-tionandspee hre ognition;
Resear handdevelopwaystodealwithOOVqueries,mainlywithnamed
entities;
Develop lassi ationtoolsforgeneratinghigher-levelsemanti s,e.g.,topi
lassi ation;
Optimallyusetheindexfor ontentrepresentationin theuserinterfa e;
Test theusabilityof theUIand its omponentsin e ologi allyvalid
set-tings.
Mostoftheseissueshavebeentakenupin theCHoralproje t.
A knowledgements Thispaperisbasedonresear hfundedbytheCATCH
program(http://www.nwo.nl/ at h)oftheNetherlandsOrganisationfor
views with olle tionkeeperswere:
1. In1999,Kooijmanpublishedaninventoryoftheaudiovisual olle tionsinDut h
ar hives. Whi h hanges/additions anyoureportforyourinstitute?
2. Whi htypesof olle tionsdoyoumaintain?
Æradio/TV
Æoralhistory
Æothertypesofinterviews
Æspee hes/monologues
Æother
3. Does youar hive ontain digital audioor video materials? If so, are the les
keptonharddrivesorondata arriers?
4. Howhavethematerialsbeendis losed? Whi htypeofmetadata/des riptionof
thesematerialsisavailable?
5. How anausergeta esstothematerialsfromwithinthear hive?
6. How anausergeta esstothematerialsfromoutsideofthear hive?
7. Howoftendoyoure eiverequestsforaudiodo uments?
8. Whi hgroupsofusersdothoserequests omefrom?
9. Whi htypesofqueriesdousers have? Arethereanytopi sthat areaskedfor
regularly?
10. Whattypeofinformationdouserssear hfor?
Ænamesofpersonsorpla es
Ædatesorperiods Æevents Ækeywords Æaparti ulartopi Æspeakerprole Æother
11. Whatdosear herswanttousetheinformationfor?
12. Coulddis losure anda ess ofthe olle tion(s)youmaintainbeimproved? If
so,how?
13. Whatis youropinion on developmentsinspee hand languagete hnology for
spoken do ument retrieval? (This question was always pre eded by a short
explanationofthestate-of-the-artinSDR)
14. Doyouhaveanyfurther omments?
Referen es
[1℄ Giuseppe Amato,Juan Cigarran, Julio Gonzalo,and arol Peters. Multimat h
-multilingual/multimediaa ess to ulturalheritage. InPro eedings of the2nd
ItalianResear hConferen e onDigital Library ManagementSystems,2006.
[2℄ M. G.Brown, J. T.Foote, Gareth J. F.Jones, Karen Spar k Jones, and S.J.
Young. Open-vo abularyspee hindexing for voi eandvideomail retrieval. In
J. Psutka, B. Ramabhadran, D. Soergel, T. Ward, and W-J. Zhu. Automati
re ognitionofspontaneousspee hfora esstomultilingualoralhistoryar hives.
IEEE Trans.Spee h AudioPro .,12(4),2004.
[4℄ M.G.Christel. Evaluationanduserstudieswithrespe ttovideosummarization
andbrowsing. InPro eedings ofIS&T/SPIESymposiumonEle troni Imaging,
2006. SanJose,CA.
[5℄ B.DelaneyandB.Hoomans.Prestospa edeliverable2.1UserRequirementsFinal
Report,2004.
[6℄ P. Geutner, M. Finke, and A. Waibel. Sele tion riteria for hypothesis driven
lexi al adaptation. In ICASSP '99: Pro eedings of the A ousti s, Spee h, and
SignalPro essing,1999.on1999IEEEInternationalConferen e,pages617{620,
Washington,DC,USA,1999.IEEEComputerSo iety.
[7℄ S.Gustman,D.Soergel,D.Oard,W.Byrne,M.Pi heny,B.Ramabhadran,and
D.Greenberg. Supportinga esstolargedigitaloralhistoryar hives. page18.
[8℄ J.H.L.Hansen,R.Huang,B.Zhou,M.Deadle,J.R.Deller,A.R.Gurijala,M.
Ku-rimo,andP.Angkititrakul. Spee hnd: Advan esinspokendo umentretrieval
for a national gallery of the spoken word. IEEE Transa tions on Spee h and
AudioPro essing,13(5):712,2005.
[9℄ W.F.L. Heeren, L.B. van der Wer, R.J.F. Ordelman, A.J. van Hessen, and
F.M.G. de Jong. Radio oranje: Sear hing the queen's spee h(es). In C.L.A.
Clarke, N. Fuhr, N. Kando, W. Kraaij, and A. de Vries, editors, Pro eedings
of the 30thACM SIGIR,pages 903{903, NewYork, 2007.ACM.
ISBN=978-1-59593-597-7.
[10℄ M.A.H. Huijbregts and C. Wooters. The blame game: Performan e
analy-sis of speaker diarization system omponents. In Pro eedings of Interspee h
2007,page4,Antwerp, 2007.InternationalSpee hCommuni ationAsso iation.
ISSN=1990-9772.
[11℄ MarijnHuijbregts,RoelandOrdelman,andFran iskadeJong.Annotationof
het-erogeneousmultimedia ontentusingautomati spee hre ognition.Inpro eedings
ofSAMT,2007.
[12℄ J.Kim,D.W.Oard,andD.Soergel.Sear hinglarge olle tionsofre ordedspee h:
A preliminarystudy. InPro eedings of theAnnual Conferen e of the Ameri an
So iety forInformation S ien eandTe hnology,LongBea h,CA,2003.
[13℄ S.R.Klemmer,J.Graham,G.J.Wol,andJ.A.Landay.Bookswithvoi es: paper
trans riptsasatangibleinterfa etooralhistories. InPro eedingsof CHI2003,
2003. Ft.Lauderdale,Florida.
[14℄ T.Kouwenhoven. ZoekenNavigeren Vinden.Overzoekers, zoekgedrag,
zoekma- hinesenhuninterfa esbijhetzoekennaaraudiovisuele ontent, hapter3,page
116.
[15℄ Katy Newton Lawley, Soergel Dagobert, and Xiaoli Huang. Relevan e riteria
used by tea hers insele ting oral historymaterials. InPro eedings of the 68th
AnnualMeetingoftheAmeri anSo ietyforInformationS ien eandTe hnology
(ASIST),2005.
[16℄ S.Leydesdor. Demensen ende woorden. Meulenho,2004.
[17℄ Beth Logan,Pedro Moreno, and OmDeshmukh. Wordand sub-word indexing
approa hesforredu ingtheee tsofoovqueriesonspokenaudio.InPro eedings
ofthese ondinternational onferen eonHumanLanguageTe hnologyResear h,
re ognitiona ura yratesontheusefulnessandusabilityofweb astar hives. In
Pro eedingsofCHI2006,page493.
[19℄ R.J.F.Ordelman,F.M.G.deJong,and W.F.L.Heeren. Explorationof
audiovi-sual heritage usingaudioindexing te hnology. In L. Bordoni, A.Krueger, and
M. Zan anaro,editors,Pro eedings ofthe rstworkshoponintelligent
te hnolo-gies for ultural heritage exploitation, pages 36{39, Trento, 2006. Universitadi
Trento. ISBN=notassigned.
[20℄ A. Ranjan,R.Balakishnan, andM. Chignell. Sear hinginaudio: the utility of
trans ripts, di hoti presentation and time- ompression. InPro eedings of CHI
2006,2006.
[21℄ Laura Slaughter, DouglasW. Oard, VernonL. Warni k, Julie L. Harding, and
Galen J. Wilkerson. A graphi al interfa e for spee h-based retrieval. In ACM
DL,pages305{306,1998.
[22℄ D. Soergel, D.Oard,S.Gustman, L. Fraser,J. Kim,J.Meyer, E.Proen, and
T.Sartori. Themanyusesof digitizedoralhistory olle tions: Impli ationsfor
design. Mala h te hni al report, College of Information Studies. University of
Maryland,June2002.
[23℄ F.Steijlen. MemoriesofTheEast. KITLVpress,2002.
[24℄ S.E.Tranterand ReynoldsD.A. Anoverviewofautomati diarizationsystems.
IEEETransa tionsonAudio,Spee handLanguagePro essing,14(5):1557,2006.
[25℄ L.B.vanderWer,W.F.L.Heeren,R.J.F.Ordelman,andF.M.G.deJong.
Ra-dio oranje: Enhan ed a essto ahistori al spokenword olle tion. InP.Dirx,
I. S huurman, V. Vandeghinste, and F. VanEynde, editors, Pro eedings of the
17th Meeting of Computational Linguisti s in the Netherlands, pages 207{218,
Utre ht,2007.LandelijkeOnderzoeks hoolTaalwetens hap.
[26℄ Steve Whittaker, Julia Hirs hberg, John Choi, Donald Hindle, Fernando C.N.
Pereira, andAmit Singhal. SCAN: Designing andevaluating userinterfa esto
supportretrievalfromspee har hives.InPro eedingsofSIGIR99Conferen eon
Resear h andDevelopmentin InformationRetrieval,pages26{33,1999.
[27℄ P.C. Woodland,S.E.Johnson, P.Jourlin, andK.SparkJones. Ee ts ofoutof
vo abularywords inspokendo umentretrieval. InSIGIR 2000,2000. Athens,
Gree e.
[28℄ E. Zuurbier. Onderzoeknaarde haalbaarheidvanSpokenDo ument Retrieval.