User requirements for access to Dutch spoken audio archives

(1)

audio ar hives

WillemijnHeeren

Human Media Intera tion

Fa ulty of Ele tri alEngineering, Mathemati s and Computer S ien e

University of Twente

w.f.l.heerenewi.utwente.nl

Abstra t

Sear hingar hivedaudiovisual olle tionswill hange inthe near

fu-ture. Instead of sifting through kilometers of analog tapes inar hives'

deposits,endusers will beable toexplore the olle tions frombehinda

personal omputer, either at an ar hive or at their home. A rst step

inthedevelopmentof sear hte hnologyand userinterfa es suitablefor

supportingsu h a ess is ndingout what userswant and expe tfrom

the te hnology. Therefore, this report present a requirements analysis

ondu tedwithintheCHoralproje t,whi hispartoftheNWO-CATCH

program.

1 Introdu tion

Thenumberofspokenword olle tionsthat anbesear hedviatheWeborvia

internalnetworksisgrowing,ontheonehandduetoretrospe tivedigitization,

asfor instan e in radio ar hives,and onthe otherhand dueto the in reasing

amountofdigital-bornaudiovisualdo umentswithaspee htra k,asfor

exam-ple in meeting andinterview olle tions. These olle tionsare beingexploited

forallkindsofpurposes, e.g.,in orporate environments(meetingre ordings),

by ontentprodu ers(TVandradioar hives),inresear hsettings(oralhistory),

fortea hing(le turere ordings)andbythegeneralpubli .

In the ultural heritage (CH) domain, however, audiovisual (A/V)

olle -tions in general areat riskof be oming ina essible, be ause both theanalog

data arrierstheyare stored onare deteriorating and orrespondingplayba k

devi esarebe omingobsolete,andthedo umentsareinsuÆ ientlydis losedto

allowfastand easya ess,see e.g.,[3℄. Thepreservation issuehasbeentaken

upin retrospe tivedigitizationproje tsforhistori A/V olle tions,large-s ale

examplesofwhi haretheEUISTPrestoSpa e 1

proje tandtheDut h`Beelden

VoorDeToekomst'(`imagesforthefuture') 2

. Theissueofimprovingdis losure

ofspokenworddo umentshasbeentakenupinresear hproje ts. Thegeneral

approa h is to in rease the granularity of annotations by automati ally

pro- essing digitizedA/Vdo uments forindex generation,byapplying te hniques

1

http://www.prestospa e.org/

2

(2)

innovativeuserinterfa es.

Inthispaperwewillfo usona essibilityofspokenwordaudio olle tions,

i.e. on olle tions where the spee h is the main information layer. Resear h

initiatives aiming to improve a ess to spoken word CH olle tions are e.g.,

Multilingual A essto LargeSpoken Ar hives(MALACH)that addressedthe

issueofautomati indexinganda esste hnologyfortheShoahVisualHistory

foundation interview olle tion [7℄, the National Gallery of the Spoken Word

the goal of whi h was to make Ameri an histori al voi es sear hable online

[8℄, and the CHoral proje t on spoken do ument retrieval (SDR) for Dut h

histori al olle tions[19℄. Inadditiontothe ontributiontheseproje tsmaketo

largevo abularyspee hre ognition(LVCSR)andSDR,thestudyof olle tion

usage and user needs re eived attention in the MALACH proje t [7, 12, 22℄

and is re eiving attention in the CHoral proje tand the Multimat h proje t,

whi haims todevelopamultilingual sear h enginefor CH ontent, [1℄. Inthe

developmentofa esste hnologyforspokenheritage olle tions,itisimportant

togaininsightintohowthe olle tionsarebeingusedandforwhi hpurposes.

An earlier requirements analysis of A/V ar hiving in the CH domain was

undertakenas partof the EU Prestospa e proje t 3

. The goal of PrestoSpa e

is to studyand fa ilitatedigitalpreservationof Europe'saudiovisualheritage.

A surveyundertaken as partof that proje taimed at providing \an overview

of potential users' fun tional requirements for the PrestoSpa e fa tory tools

and servi es",[5℄. Three typesof end userswere identied: produ ers (58%),

ultural institutes (21%), and resear hers and private persons (21%). As for

a esstothe olle tions,remotea esswasreportedtobevirtuallynonexistent

and on-linea esswasverylimited. Catalogs ouldgenerallybe onsulted

on-linebyqueryingthemetadata,butoneoutofeverytwoar hiveswasnotsatised

with their system's performan e. The parti ipating audiovisualar hiveswere

onvin ed that online viewing and listening a ess would in rease their sales,

but that rights management would remain a problem. Another problem for

onlinelisteningisthatthedataarenotstoredonservers. Digitization urrently

meansthatA/Vdataistransferredfromanalog arrierstodigitalonesinstead

of onto Digital Mass Storage Systems (DMSSs), as proposed by Prestospa e.

The reasons for using DMSSs are that they support fast and easy a ess to

audiovisualdata,andthey anbeusedto he ktheintegrityofthedatainorder

topreventdataloss. Moreover,theex hangeabilityofmaterialsis ompli ated

by the urrent la kof standardsfor do umentation. One ofthe hallengesin

hanginga essthereforelieswiththemanagementof olle tionsatthear hives

themselves.

As partof the MALACH proje tstudies were ondu ted on therelevan e

riteriathatusersemploytojudgesear hresultsforaudio,[12,15℄. Theresults

of[12℄indi atedthat generallytopi s andsummariesarehelpful, butthat also,

informationonthegenreoftheaudio(e.g.,interview,debate,report)wasjudged

relevant,aswellastime information: boththetimeframeoftheaudioandits

re en y.

Inthispaperwewillfo usonhowtoimprovea essto olle tionsfromthe

CHdomainfromanA/Var hiveuser'sperspe tive. Sin estudiesintousageof

A/Vheritage olle tionsares ar e,therststeptowardsgatheringinformation

3

(3)

ofarequirementsanalysis. This was ondu tedto gaininsightintouserneeds

and the urrentstateofdis losureanda ess ofDut h audiovisualdo uments

from theCH domain. Theresultswill beused to formulate re ommendations

forfurtherresear handdevelopmentinse tion3.

2 Requirements analysis

IfSDRsystemswouldalreadybein usefor retrievalof informationfrom A/V

ar hives,a tualusers ouldbeaskedfortheiropinionson urrentsystemsand

users' sear h a tions ould be logged. To my knowledge, there are only two

Dut h A/V olle tions that an be sear hed and a essed online: A ademia 4

,

a esstowhi hisrestri tedtopaying lientsfromedu ationalinstitutes,andthe

MediaandCommuni ationse tionoftheMemoryoftheNetherlands,anonline

show aserepositoryofdigitalCH ontent 5

. Sear hlogsforA/V olle tionsare

notavailable 6

.

In order to gather information on user needs and urrent usage of Dut h

CH olle tions our starting point wasa requirementsanalysis with olle tion

keepers,whodealwithrequestsfrom usersin theirdailypra ti e.

2.1 Methodology

This requirementsanalysis wasdonein theform ofsemi-stru turedinterviews

with maintainers of audiovisual olle tions from the Dut h ultural heritage

domain. Itsobje tivewastomeetthefollowingthreegoals:

1. Gaininsightintothe urrentpra ti eofdis losureofand therealization

ofa esstohistori allyrelevantaudio olle tions;

2. Determineinformation needs for avariety of usersof audiovisual

olle -tions;

3. Estimatethea eptan eofautomati indexingte hnologyfromthepoint

ofviewof olle tionmaintainers.

Maintainers of dierent Dut h spoken word olle tions parti ipated: two

ar hivists from the Rotterdam Muni ipal Ar hives (GAR) 7

, an ar hivist and

an editor-in- hief from the regional radio station Radio Rijnmond (RTV) 8

, a

maintainer of an oral history olle tion at the Royal NetherlandsInstitute of

SoutheastAsianandCaribbeanStudies (KITLV) 9

,aradiodo umentalistfrom

theNetherlandsInstituteforSoundand Vision(SV) 10

,and thegeneralaairs

dire torand theaudioexpertof theMeertensInstituteforResear hand

Do -umentation of the Dut h Language (MI) 11

. In this sele tion, keepers of both

4

http://www.a ademia.nl

5

http://www.geheugenvannederland.nl

6

Aftertheanalysisreportedinthispaperwas ompleted,someofthelogsfromthe

Nether-landsInstituteforSoundandVisionhavebe omeavailableforresear h.

7 http://www.gemeentear hief.rotterdam.nl 8 http://www.rijnmond.nl 9 http://www.kitlv.nl 10 http://www.beeldengeluid.nl 11 http://www.meertens.knaw.nl

(4)

luded.

The groupof parti ipantsis far from exhaustive, though. Other CH

spo-kenword olle tionsintheNetherlandsin ludetheKomMissieMemoires 12

,het

GeheugenvanNederland(`memoryoftheNetherlands') 13

,the olle tionof

in-terviewsinDut hoftheShoahVisualHistoryFoundation 14

,Groningendiale t

spee h 15

,and(radio)pod asts 16

.

Thequestionnairethatwasusedforthesemi-stru turedinterview onsisted

ofquestions on erning:

theaudio olle tionsandtheirmaintenan e;

dis losureofthe olle tions;

sear habilityand a essibilityofthe olle tions;

thetypesofusersand theiruses;

a eptan eofautomati indexingte hnologyinthear hivingwork ow.

Thefulllistofquestions anbefoundintheappendix. Asforinformation

gath-eredonthetypesofusersandthe urrentpra ti eofdis losureanda essthere

is some overlap with the PrestoSpa e study. The urrent study furthermore

aimed atgatheringinformation one.g.,requestsfrom usersand thefrequen y

of requests, pra ti es at smaller ar hives, and maintenan e of non-broad ast

olle tions togetamore ompleteimpressionofaudio-ar hivingin Dut h CH.

Theresult ofthe interviewswill bepresentedin thefollowingsubse tions.

First,the olle tions andtheirmaintenan ewillbedes ribed. This isfollowed

byadis ussiononthedis losureofthose olle tions,andonhowusers angain

a ess to the ontentof the olle tions at present. Thirdly, the dierent user

groups and their information needs will be presented. Finally, the role that

audio indexing anplay in the dis losureand a essibility of su h olle tions

willbedis ussed.

2.2 Audio olle tions and their maintenan e

The typesof olle tions maintainedby the institutes that ontributed to this

analysis are broad ast olle tions and olle tions developed for resear h

pur-poses,eitherin theoral historydomainorforlinguisti resear h.

Oral history olle tions onsist of eyewitness reports on a ertain eventor

periodinhistory. Theeldoforalhistoryis on ernedwiththerelationbetween

thehistorythathasbeenre ordedinbooksandthememoriesofindividuals[16℄.

An exampleisthe`Memories oftheEast' olle tionontheend ofthe

Nether-lands olonial presen e in Asia [23℄. Colle tions that are quite similar with

respe ttotheiraudioandspee h hara teristi sareotherinterview olle tions,

for instan e gatheredto makedo umentaries, su hasthe `In mei, Rotterdam

1940' olle tiononthebombardmentofthe ityofRotterdamduringWorldWar

12 http://www.ru.nl/kd /beeldengeluid/kommissiememoires/ 13 http://www.geheugenvannederland.nl/indexen.html 14 http://www.jhm.nl/default.aspx 15 http://www.gava.nl/ 16

(5)

the Meertens Institute. The approximately 5.000 hours of audio maintained

there {80%of whi h hasbeendigitized{arebeingusedto investigate Dut h

languagevariation.

Thelargestrepositoryofhistori alaudiointheNetherlandsistheSV,where

theradioandTVar hivesofDut hnationaltelevisionarebeingkept: it onsists

ofover700.000hoursofmultimediaar hives. Smallerbroad ast olle tions,i.e.

fromregionalradioandtelevisionstations,aremostlybeingmaintainedat

mu-ni ipalar hivesandatthebroad astersthemselves. TheRotterdamMuni ipal

ar hivesforinstan emaintainover2.000hoursofregionalradiobroad astsfrom

RTVRijnmond that havebeendigitizedand dis losed,and amultiple ofthat

amountwhi hremainsundis losed,mainly onanalogdata arriers.

Digitizationisatrendin the ulturalheritagedomainasanalog arriersare

de ayingandtheirplayba kdevi esarerunningoutoffashion. Ar hivistsand

maintainers, however, tend to prefer the use of those data arriers for whi h

the durability, reliability and quality are well known: analog arriers. When

analogdataarebeingtranslatedtothedigitalrealm,the ontentofanalog

ar-riersisintheNetherlandsmostoftentransferredtodigitaldata arriers,su has

CD(-rom)sorDVDs. Atthemoment,thisisthe aseatallinstitutesthat

parti -ipatedinthisstudy. Eventhoughthismayguaranteethe olle tions'

preserva-tionforawhile,itdoesnotaidtheira essibility. Forthattobethe ase,digital

massstoragesystems(DMSSs)shouldbeused,seehttp://www.prestospa e.org.

The Dut h maintainers re ognizethe importan e andthe addedvalueof su h

asystem,butnotallinstitutesa tuallyhaveplanstostartusingDMSSsinthe

near future(forinstan eduetola koffundingandexpertise).

Inadditiontodigitalmaterialsarisingfromthe onversionofexistingar hives

fromanalogtodigitalformats,radioandTVbroad astsandalsooralhistory

ini-tiativesarenowadaysbeingre ordeddigitally. Theabsen eofdigitalstandards

isillustratedbyanexamplegivenbyaborn-digital olle tionoftheKITLV.Its

`MemoriesoftheEast' olle tionwasre ordedonmini-dis ,butanewmedium

isneededsin eminidis re ordersaregettingoutoffashion. Asdigital

ar hiv-ingstandardsarestillunderdevelopment,ar hivistsmaytendtopostponethe

digitization pro ess until interoperability, uniformity and quality anbe

guar-anteed.

2.3 Dis losure of olle tions

There are basi ally two types of des riptors to dis lose AV do uments:

on-tentannotationsand ontextannotations. Contentdes riptorsareforinstan e

summaries, full trans ripts and keywords. Theydes ribewhat the do ument

is about. Context des riptors on ern produ tion date, data arrieret ., i.e.

thete hni aldetailsofthedo ument. Withrespe ttothe ontent,des riptions

are found at the level of tra ks,i.e. oherent hunks of several minutes ea h,

su h asat GAR and at theKITLV,and at thelevelof programswith a

reso-lution ofseveralminutesto anentire hourormore,su hasatRTVRijnmond

and SV.Thesedieren esin thegranularityofdes riptiondire tly impa tthe

granularitywithwhi ha esstothe olle tion anbeprovided.

Theamountofeortputintothedes riptionsgreatlydierswithinaswellas

between olle tionstypes. Resear h olle tions,forinstan e,aregenerallymore

(6)

at SV. This is mainly due to the dierent ompany goals, i.e. broad asting

and ar hiving, respe tively. At RTV only a few keywords and sometimes a

short ontent des ription were used to annotate aprogram until 1995. As a

result,overade adeofmaterialisrelativelybadlydis losed(the hannelstarted

broad astingin1983). After1995,RTVusedprograms riptsandwrittennews

itemstomakemoreelaboratedes riptionsofbroad astnews,butthe ontentof

interviews,forinstan e,wasnotelaboratelydis losed. Radiobroad astsatSV

arebeingannotatedmu hmoreelaborately. Theseisaproto olthatpres ribes

theamountoftime thedes riptionof a ertaintypeofprogrammay ost,the

vo abularyto beused andthetypes ofinformation topreserve. Not all radio

broad asts arebeingannotated,however;newsandsportsare fully des ribed,

but only a sele tion of the other programs is being annotated. If available,

programs riptsandother ollateraldo umentsareusedformanual dis losure

anddes ription.

TheSVhasdo umentalistsspe ializedinar hivingbroad astmaterials,but

at RTV ar hiving is the responsibility of the program makers. The result is

that des riptionsmaynot be fullya urate, and even that programsmaynot

be ome ar hived at all. Asimple exampleof ina ura yare spelling mistakes

thatmayee tivelyresultinirretrievabilityofdo uments{atleastuntilmore

advan edsear hsystemsareimplemented.

Resear h olle tionsare generallydes ribedmoreelaboratelyasisthe ase

with theKITLVs Memoriesof theEast olle tion 17

. Themetadataperaudio

do ument hasbeen entered into adatabasethat {in additionto an elaborate

summary of the ontent{ ontainselds for all kindsof information, su h as

personaldata(name,birthdate,family onstitution,pla eofresiden e,

o upa-tion,et .),andinformationontheinterviewitself(su hasdateandduration).

Ea h audiodo ument ispartitioned into anumberof10-minutetra ks,whi h

enablestheretrievalofrelativelyshortfragments.

Therehavebeeninitiativestomaketheusuallylengthydes riptionpro ess

moreeÆ ient.Forinstan e, fairly re ently, theSV startedusing anew atalog

system with an on-line sear h interfa e, iMMix, and des riptions in the new

system are stru turedsu h that program hara teristi sare annotatedat one

levelonly,preventingdoubleinputs. Insome ases,metadatadatabasesmaybe

deliveredtoar hivestogetherwiththeaudiovisualdataandin orporatingsu h

digitalmetadataintothear hivaldes riptionsmaysavetime. Itis,however,not

withoutproblems,sin einadditiontoformat onversionsthesetsofdes riptors

maydierbetweenthemetadatadatabaseandthear hivingstandards,andthe

( ontrolled)vo abulariesused mayalsodiersubstantially.

Not allDut h audiovisualCH olle tions arebeingdis losed. Fa torsthat

ontributeinthede isionofwhethermaterialsshouldbedis losedarethe

(his-tori )importan eofthere ording,the ostofmaintenan eandmaterials,

opy-rightissuesand the olle tions uniqueness. Apartfrom ons iousde isionson

whether ornota olle tionshould be dis losed, la k of time, man power and

resour esfurther prevent olle tionsfrombe omingannotated.

17

(7)

Thisse tionwilldis ussthedierent olle tions' urrenta essibilityand

sear h-ability. A essandsear hwillrstbedis ussedgiventhes enariothattheuser

is at the institute where the olle tion is maintained, and se ond in the ase

that theusersear hesfromhomeusingtheWeb.

Ifanar hiveisopento thegeneralpubli (the MI,forinstan e, isnot), its

atalog angenerallybesear hedthroughsomeuserinterfa eattheinstitute's

reading room. Standard sear h options allow keyword sear h and free text

sear h, and morespe i optionsmay be available. Keywords are inherently

restri ted to those terms that appear in a ontrolled vo abulary (whi h often

remainsunknowntotheuser). Asaresult,theuser annotde idehowaquery

ouldbeimprovedifnoresultsorunsatisfyingresultsarefound. Eventhough

there aremethodsininformationretrievalto workaroundsu hproblems,e.g.,

thesauriandspelling orre tion/suggestion,thesedonotseemtobewidelyused

in sear henginesproviding a ess to databaseson histori alaudio olle tions.

Often, olle tion keepers are being onsulted. Sin e they know the ontents

of the olle tions,they are ableto nd fragments that ould notbefound by

a relatively naive sear her or that have not been do umented as su h in the

database. Moreover, those spe ialists an give detailed information on data

formats, onversionpossibilities, opyrightissueset .

Ifa atalogsear hhasanumberofresults,itlists(apartof)thedes riptions,

butdoesnotgivealinktotheaudioitself. Thisisin ontrastwithsear ha tions

forphotos,maps,textsorother2Dmediathatare anoftenbeshowndire tly.

Sowithrespe ttoaudioorvideodo uments,theuserendsupwithseveralIDs

that refer to the a tual audiodo uments. At the GAR, users an then listen

to the do umentsin theaudiovisual self-servi eroom, where opies onCD or

VHS-tapeareavailableforexploration. Iftheuserwantsa opyofthematerials

forhis/herown use,it anberequestedfrom theservi edesk. As fortheoral

history olle tionoftheKITLV,a esstotheaudioissimilarlyorganized:after

identifyingrelevantaudiodes riptions,theaudiodo umentsarerequestedfrom

a ounter lerk. In ontrastwiththeaudiomaintainedatthemuni ipalar hives,

however,this olle tion anonlybelistenedtoandstudiedattheKITLV.Copies

arenotdistributedtopreventpriva ybrea hesandoutof ontextpresentation

ofthe,sometimessensitive,materials. Another,verypra ti alreasonthatwas

givenfornotlinkingtheaudiodire tlytothesear hresultsisthefa tthatthis

wouldrequiremu hmemory apa ityoftheinstitute'snetwork.

Thenumbersofrequestsforsear hesinspokenword olle tionsaregenerally

small. AtRTV thereareseveral requestsperday,theKITLV oralhistory

ol-le tionre eivesaboutonerequestperday,attheRotterdammuni ipalar hives

and at the MeertensInstitute there are only afew per month. From SV the

numbersof requests are unknown to the author,but one of itsdo umentalist

mentionedthattherewerefewrequestsforradiobroad astsspe i ally.

Inmany ases,usersnowadaysdonotinitiateasear hbyvisitinganar hive,

but startat homebehind theirpersonal omputerand sear h either the Web

using oneofthewell-knownsear hengines,oraparti ular institute'swebsite.

From those institutes' websites the atalogs an usually be sear hed online.

Sear h options are generally the same as from within the institute, and the

audio do umentsthat sear hresults referto annot belistened to. A visitto

(8)

uation relate to the des riptions, the user interfa e and the use of a DMSS

server. Firstly, maintainers remarked that the moredetailis being des ribed,

theeasieritistofulllrequests. Inthe aseofar hivingbroad asts,annotating

quotes and remarkable ba kground sounds su h asbarrel organs haveproven

to bevaluable. Moreover,maintainerslearnfrom experien ewhi h topi sthey

en ounter in user requests; this an make them adapt theway in whi h they

dis lose newmaterials. Se ondly, sear h interfa es ouldin some instan esbe

more user-friendly. One way of realizing this { a ordingto the parti ipants

{is byoeringastandardandan advan ed sear hs reen; theuser aneither

enter anumberof sear h termsinto thegeneralsear held, orhe anspe ify

the nature of anumber of sear h terms to retrieve do umentsmore pre isely.

Thirdly, dire t a ess to audioand videodo umentsis thought toimproveon

the urrentsituation. Itwouldsigni antlyredu ethetimethatlapsesbetween

enteringarequestandlisteningtoa tualresults. Thiswouldbeverybene ial

in the ase of produ ing newsitems in response to unexpe ted events.

More-over,users ouldassesstheuseofthematerialsfasterandnewusergroups ould

berea hed. The urrentsituation of audio olle tionsondigital data arriers

annotsupport thiss enario: therefore,DMSSsshould beused. Inthe aseof

most olle tions,however,on-linea essshouldbe arefullydesignedtoprevent

opyrightviolationsandmisuse.

2.5 Users and uses of histori al audio olle tions

Users of histori al audio ar hives an be divided into two main groups:

pro-fessional users and the general publi . At GAR about 75% of the users are

professionals (e.g., makers of new ontent, resear hers 18

). Also SV is mainly

being sear hed by professional users. The olle tion of the MI is ex lusively

for resear h purposes. The KITLV olle tion's users are mainly resear hers,

students and ontentprodu ers, but it is also being onsulted by the general

publi .

Professional users: An important user group for audiovisual (broad ast)

ar hivesaremakersofnew ontent. Thisgroupisverydiverse: e.g.,exhibition

makers,eventorganizers, ompanies,makersoflms ordo umentaries,artists

andbothlo alandnationalbroad asters. Moreover,thekeepersofthear hives

themselvesalsofun tionasmakersofnew ontent,espe iallyinthe aseofthe

SV and RTV. As a whole, this user grouphas twomain uses for the

materi-als: (i) resear h during the preparation of a produ tion, and (ii) ontent for

a produ tion. Mainly in these ond ase, timepressure may be high asnews

produ erswanttorea tassoonaspossibleto suddeneventssu hasa idents

and disasters. Content produ erstend to sear h audio olle tionslooking for

( ombinationsof)events,keywords,lo ationsandpersons. Insome ases,they

lookforsoundimpressions,su hasthesoundofaharboror ity,butthesemay

beverydiÆ ulttondas theyarenormallynotbeingannotated.

Anothergroupofprofessionalusersare resear hers,studentsand tea hers.

These users usually pose very spe i resear h questions in omparison with

18

In ontrasttothe Prestospa esurvey wein luderesear hersamongstthe groupof

(9)

the elaborate summaries works relatively well. For most of the other

olle -tions,keywordsear histhemostpromisinga ordingtothe olle tionkeepers.

Resear hersmaybeinterestedinallkindsofsubje tsandwillsear hfor

( om-binations of) names, keywords,events, periods and lo ations in order to nd

resultsto in orporateinto their resear h,writing andtea hing. The MI's

ol-le tion that was spe i allybuilt for linguisti resear h purposes is somewhat

dierent. Whenstudyinglanguagestru turethequestionofhow thingsare

be-ingsaid (e.g.,pronun iation,grammati alstru ture,intonation)is oftenmu h

more important than the question of what is being said. The users of this

olle tionoftenmaketheirown,time- ostlyannotations.

General publi : These ond typeofusersarethegeneralpubli . They

typ-i ally sear h for audio do uments that t their personal interests, su h as a

hobbyor theirfamilyhistory. Theirrequests aremostlyfornames ofpersons,

ompanies, lo ationsand/or eventsin the ase ofboth thebroad astand the

oralhistory olle tions.

2.6 A role for automati indexing

The ulturalheritagedomainhasbeen hara terizedassomewhathesitant

to-wardste hnologi aldevelopment on erning theautomati indexing of

olle -tions. This is understandable, for instan e be ause automati ally generated

trans ripts are ertainly not error-free (as opposed to manually he ked

de-s riptions). Still, giventhe vastsize of several audio olle tionsthat havenot

been dis losed, and the manual labor of oneto ten times real time that

dis- losurewould ost,ar hivistsand olle tionmaintainers understandtheadded

value ofautomati indexing. They furthermoresuggestedanother use.

Auto-mati indexing ouldbeemployedtoprovidear hivistswithanimpressionofa

olle tionon thebasis ofwhi h asele tionfor full dis losure anbe made. A

number ofparti ipantsexplained that their deposits held tapes forwhi h the

exa t ontentwasunknown.

A ordingto theintervieweesthere are ertainrestri tionsonwhat anbe

expe tedfromautomati indexing. First,sin ehumaninterpretationisla king,

ertain abstra tions,i.e. higher-level annotations, annot easily be made. As

a resultit wasexpe tedthat userslooking for journalisti ontent (i.e. fa ts)

would have less problems retrieving relevant do uments that were

automati- allyannotatedthanuserslookingforartisti ontent(i.e. soundimpressions).

Moreover, there is the (partial) mismat h between the words that are being

spokenand themoreabstra t topi that is being talkedabout. Se ond, when

olle tions areto beusedfor ertaintypesofresear h,automati trans ription

maynotbesuitableat all,sin eresear hersneedmanually he kedindexesat

layers of information that may abstra t from the words (e.g., ommuni ative

a ts,prosody). Togeneratearstversionofanindex,however,spee h

te hnol-ogy might be employed to redu e the amountof work (whi h is exa tlywhat

hasbeenproposedin thePrestospa eproje t). Thirdly,sear hersare

unfamil-iarwithasituation in whi h theannotationsonwhi h sear h isbasedarenot

manually he ked. Theymustthereforere eiveinstru tionsontheprobabilisti

(10)

preferablyatmultiple levelsofabstra tion.

3 Dis ussion

Inthisreportwesetoutwiththreegoalsinmind: (i)gaininsightintothe

ur-rentpra ti eofdis losureandtherealizationofa essibilityofDut hhistori al

audio olle tions,(ii)gatherinformationontheusersofaudiovisual olle tions

andtheirneeds,and(iii)tore eivefeedba kfrom olle tionmaintainersonthe

potentialof automati indexing te hnology in the audiovisual ar hiving

work- ow. Together these goals aimed at gathering user requirements for spoken

do ument retrievalsystems in CH. Those requirements, in turn, will be used

to determinearesear h agendafor improvingautomati dis losureand a ess

forspokenword olle tions fromCH.Intherestof thisse tionwewill dis uss

themainndingsofourrequirementsanalysis,andalsohowthoserequirements

anbemetoraddressedin futureresear h.

Nowthat ar hivesin reasingly a knowledge theneed to maketheir

olle -tionsa essibletoend users{inadditiontothetraditionaltasksofdes ribing

andmaintaining olle tions{thequestionofhowtomake olle tionsavailable

tothegeneralpubli isbeingaddressed. Wefoundthata esstoDut h

audio-visual olle tions{inlinewithitsEuropean ounterparts,[5℄{isrelativelyslow

and umbersomeat present: e.g.,dire t, on-line a ess to audiovisual ontent

isbasi allynonexistent,short ontentdes riptionsseeminsuÆ ienttomeetthe

widevarietyinusers'informationneeds,andmany olle tionshavenotyetbeen

annotatedwhi hmakesthemalmostina essible. Problemsindis losure,whi h

is aprerequisitefor a ess,are mainly ausedby the ostliness{bothin time

andinman-power{ofprodu ingelaborate,high-qualityannotations. A essis

furthermore ompli atedbythe fa t that thedigitalinfrastru turein ar hives

is in many ases notyet ready for on-line presentationof audiovisual ontent

(provided thatIPRissueset . enablepubli ation).

Therstrequirementis to makea ess faster. Thisis expe tedto be

real-izable (i)by presenting ontent online,and (ii)bymaking sear hresultsmore

fo used, i.e. by retrievingpointers to relevant lo ationswithin do uments

in-stead oftoentiredo uments. Foronlinepresentation,audiosour esshouldbe

digitally available and linkedto the resultsfrom atalogsear h. If ne essary,

this ouldbearrangedviaalog-inpro eduretopreventIPRviolationsand/or

misuse. Onlinepresentationmoreoverentailsthedevelopmentofan

infrastru -ture that supports data management, and also retrieval and presentation of

bothmetadata and time-labeled spoken ontent. Most of these developments

fall outsidethes opeoftheCHoralproje t,butare beingtakenupat theCH

institutesthemselvesandinotherresear hproje ts.

These ond wayof making a ess faster is beingresear hed in the CHoral

proje t, and has been addressed in other spoken do ument retrieval proje ts

su h asMALACH and TheNational Gallery of theSpoken word. Automati

ontentindexingandaudiopro essingtoolsarebeingdevelopedtoin reasethe

time-resolutionofsear hresultsthroughtheadditionoftimelabelstothe

spo-ken ontent, orto highlightswithin thedo uments. Sin e urrentdes riptions

mayla kmu hdetail,su hte hnologyhasthepotentialtoin reasethenumbers

(11)

automati annotation,mainly ausedbythestatisti alnatureofthete hniques

employed. Content-basedannotationofspokenworddo umentsisusuallydone

usingautomati spee hre ognition(ASR).TheWordErrorRatesondo uments

withspontaneous onversationalspee hlieintherangeof40-60%foranumber

oflanguages,seee.g.,[3,8,11℄. Only orre tlyre ognizedwords aninprin iple

besu essfullyretrieved. Corre tre ognitiondepends onthesuitabilityofthe

a ousti andlanguagemodelsforthere ognitiontaskathand.

Audio pre-pro essing tools su h as Spee h A tivity Dete tion (SAD) and

speakersegmentation aregenerallyreferredtoasaudiodiarization. Itaimsat

determining whi h audio intervals ontain spee h and of whi h type, so that

theASR engineisonlyfedspee h,andnotmusi , andmodels anbeadapted

to thedata. Performan eonbroad astnewsaudioishigh,withmiss andfalse

alarm rates around 1% (see [24℄ for an overview), but the SAD error rate is

signi antlyhigher,i.e. around11%,onmoreheterogenousdatasets[10℄.

InlinewithndingsfromthePrestoSpa erequirementsstudy,Dut h

audio-visual olle tions aremostlybeingusedbyprofessionals,both ontent

produ -ersandresear hers/students 19

. These ondrequirementthereforeisto support

sear h bythese usergroups. A logi alrst step in further resear h would be

to study how these usersaresupported by the urrentstate-of-the-art in

spo-kendo umentretrievalin omparisonwithrunningdemonstratorsystems,su h

as theRadio Oranje demonstrator,[25℄. A next stepwould then beto make

adaptationstotheuserinterfa esgiventheusers'preferen esandfeedba k.

Anotherrequirementpertainstothetypesofinformationuserswanttond.

Theymainlysear hfor( ombinationsof)events,keywords,lo ationsandnames

of persons/ ompanies. A signi ant proportion of requests therefore ontains

namedentitiesthatare{however{notstraightforwardlyextra tedfromspoken

ontentautomati ally. Both approa hesto namedentityre ognitionand

opti-mal use of information on named entities present in manual metadatashould

beinvestigatedfurtherin ordertosupportsear hers. Assolutionstoenhan ed

re ognitionof named entities,orOut-Of-Vo abulary (OOV)terms in general,

several approa heshavebeen forwarded, e.g., multi-pass re ognition,e.g., [6℄,

queryanddo umentexpansion,e.g.,[27℄,andsubwordapproa hes,e.g.,[2,17℄.

As part of the CHoral proje t, sear h in phoneme latti es derived from word

latti eswillbeinvestigatedfurtherwiththegoalofimprovingretrievalofnamed

entitiesandOOVsinDut h. The urrentanalysisfurthershowedthatusersdo

notoftensear hfortime-related on eptssu h asdatesorperiods,andifthey

do,itismainlyin ombinationwithothertypesofterms.

A seeminglyobviousrequirementis that sear h interfa esshould be made

moreuser-friendly. InboththePrestoSpa esurveyandthepresentrequirements

analysis sear h interfa esfor exploring anar hive's atalog were often judged

asinsuÆ iently user-friendlyby olle tionkeepers. Toimproveon the urrent

situationdierenttoolsthathavedemonstratedaddedfun tionalityinresear h

systemsshould betestedon systemsin publi use. Moreover,additional tools

to support sear hing, browsing, andsele tionof audiovisual do umentsarebe

needed. Forinstan e,thankstothein reasedgranularityofautomati indexing

time-aligned metadata an be presented while the A/V do uments are being

19

Giventhepopularityofwebsitessu hasYouTube(http://www.youtube. om/),however,

the interestofthe generalpubli inonlineintera tivitywithAVdo umentsmayin reasein

(12)

assessmentof A/Vdo uments byrepresentingthe ontentstextually, [18,20℄,

and/orvisually,[21,26℄.

Manyusersupport toolsdevelopedsofar wereintended foruseon spe i

olle tions with spe i do ument types. Audiovisual ar hives typi ally hold

a large variation in do ument types, whi h ompli ates the situation for the

user. Retrieved results should not only be on-topi , but their genres should

for instan e also be available to sear hers. For as far aswe know automati

dete tion ofgenre in spokenworddo umentsis an openresear h question,as

wellasthewaysu hinformationshouldbepresentedintheuserinterfa e. First

stepstowardssu ha lassi ation ouldbemadebyusingdiarizationte hnology

toestimatethenumbersand turnpatternsofspeakers.

An earlierreport thatinvestigatedthe attitudeof ar hiviststowardsaudio

indexing te hnology showedthat ar hivistareambivalenttowardsthe

te hnol-ogy,[28℄. Ontheonehandtheya knowledgedthepotentialaddedvalue.Onthe

otherhand, onfrontedforexamplewithimperfe tspee htrans ripts,ar hivist

maybe omeskepti alabouttheusefulnessoftheautomati allygenerated

meta-data. Thepresentanalysisshowedthatmostparti ipantswel omedautomati

solutionsforannotations,andtheyalsoforwardedanewideasforapplyingthe

te hnology. Keepersofresear h olle tions,however,wereunderstandablymore

hesitant,sin etheir olle tionsneedexa tannotationsfors ienti analysisthat

in most ases annot(yet)begeneratedautomati ally.

4 Con lusion

The ndingsof thisstudy providemorespe i instru tionson whi h lines of

resear htopursue inordertoimprovesear hinA/Var hives:

Fo uson twotarget groupsin user andusability studies; produ ers and

resear hers;

Furtherautomati indexingte hnologyforspokendo uments,i.e.

diariza-tionandspee hre ognition;

Resear handdevelopwaystodealwithOOVqueries,mainlywithnamed

entities;

Develop lassi ationtoolsforgeneratinghigher-levelsemanti s,e.g.,topi

lassi ation;

Optimallyusetheindexfor ontentrepresentationin theuserinterfa e;

Test theusabilityof theUIand its omponentsin e ologi allyvalid

set-tings.

Mostoftheseissueshavebeentakenupin theCHoralproje t.

A knowledgements Thispaperisbasedonresear hfundedbytheCATCH

program(http://www.nwo.nl/ at h)oftheNetherlandsOrganisationfor

(13)

views with olle tionkeeperswere:

1. In1999,Kooijmanpublishedaninventoryoftheaudiovisual olle tionsinDut h

ar hives. Whi h hanges/additions anyoureportforyourinstitute?

2. Whi htypesof olle tionsdoyoumaintain?

Æradio/TV

Æoralhistory

Æothertypesofinterviews

Æspee hes/monologues

Æother

3. Does youar hive ontain digital audioor video materials? If so, are the les

keptonharddrivesorondata arriers?

4. Howhavethematerialsbeendis losed? Whi htypeofmetadata/des riptionof

thesematerialsisavailable?

5. How anausergeta esstothematerialsfromwithinthear hive?

6. How anausergeta esstothematerialsfromoutsideofthear hive?

7. Howoftendoyoure eiverequestsforaudiodo uments?

8. Whi hgroupsofusersdothoserequests omefrom?

9. Whi htypesofqueriesdousers have? Arethereanytopi sthat areaskedfor

regularly?

10. Whattypeofinformationdouserssear hfor?

Ænamesofpersonsorpla es

Ædatesorperiods Æevents Ækeywords Æaparti ulartopi Æspeakerprole Æother

11. Whatdosear herswanttousetheinformationfor?

12. Coulddis losure anda ess ofthe olle tion(s)youmaintainbeimproved? If

so,how?

13. Whatis youropinion on developmentsinspee hand languagete hnology for

spoken do ument retrieval? (This question was always pre eded by a short

explanationofthestate-of-the-artinSDR)

14. Doyouhaveanyfurther omments?

Referen es

[1℄ Giuseppe Amato,Juan Cigarran, Julio Gonzalo,and arol Peters. Multimat h

-multilingual/multimediaa ess to ulturalheritage. InPro eedings of the2nd

ItalianResear hConferen e onDigital Library ManagementSystems,2006.

[2℄ M. G.Brown, J. T.Foote, Gareth J. F.Jones, Karen Spar k Jones, and S.J.

Young. Open-vo abularyspee hindexing for voi eandvideomail retrieval. In

(14)

J. Psutka, B. Ramabhadran, D. Soergel, T. Ward, and W-J. Zhu. Automati

re ognitionofspontaneousspee hfora esstomultilingualoralhistoryar hives.

IEEE Trans.Spee h AudioPro .,12(4),2004.

[4℄ M.G.Christel. Evaluationanduserstudieswithrespe ttovideosummarization

andbrowsing. InPro eedings ofIS&T/SPIESymposiumonEle troni Imaging,

2006. SanJose,CA.

[5℄ B.DelaneyandB.Hoomans.Prestospa edeliverable2.1UserRequirementsFinal

Report,2004.

[6℄ P. Geutner, M. Finke, and A. Waibel. Sele tion riteria for hypothesis driven

lexi al adaptation. In ICASSP '99: Pro eedings of the A ousti s, Spee h, and

SignalPro essing,1999.on1999IEEEInternationalConferen e,pages617{620,

Washington,DC,USA,1999.IEEEComputerSo iety.

[7℄ S.Gustman,D.Soergel,D.Oard,W.Byrne,M.Pi heny,B.Ramabhadran,and

D.Greenberg. Supportinga esstolargedigitaloralhistoryar hives. page18.

[8℄ J.H.L.Hansen,R.Huang,B.Zhou,M.Deadle,J.R.Deller,A.R.Gurijala,M.

Ku-rimo,andP.Angkititrakul. Spee hnd: Advan esinspokendo umentretrieval

for a national gallery of the spoken word. IEEE Transa tions on Spee h and

AudioPro essing,13(5):712,2005.

[9℄ W.F.L. Heeren, L.B. van der Wer, R.J.F. Ordelman, A.J. van Hessen, and

F.M.G. de Jong. Radio oranje: Sear hing the queen's spee h(es). In C.L.A.

Clarke, N. Fuhr, N. Kando, W. Kraaij, and A. de Vries, editors, Pro eedings

of the 30thACM SIGIR,pages 903{903, NewYork, 2007.ACM.

ISBN=978-1-59593-597-7.

[10℄ M.A.H. Huijbregts and C. Wooters. The blame game: Performan e

analy-sis of speaker diarization system omponents. In Pro eedings of Interspee h

2007,page4,Antwerp, 2007.InternationalSpee hCommuni ationAsso iation.

ISSN=1990-9772.

[11℄ MarijnHuijbregts,RoelandOrdelman,andFran iskadeJong.Annotationof

het-erogeneousmultimedia ontentusingautomati spee hre ognition.Inpro eedings

ofSAMT,2007.

[12℄ J.Kim,D.W.Oard,andD.Soergel.Sear hinglarge olle tionsofre ordedspee h:

A preliminarystudy. InPro eedings of theAnnual Conferen e of the Ameri an

So iety forInformation S ien eandTe hnology,LongBea h,CA,2003.

[13℄ S.R.Klemmer,J.Graham,G.J.Wol,andJ.A.Landay.Bookswithvoi es: paper

trans riptsasatangibleinterfa etooralhistories. InPro eedingsof CHI2003,

2003. Ft.Lauderdale,Florida.

[14℄ T.Kouwenhoven. ZoekenNavigeren Vinden.Overzoekers, zoekgedrag,

zoekma- hinesenhuninterfa esbijhetzoekennaaraudiovisuele ontent, hapter3,page

116.

[15℄ Katy Newton Lawley, Soergel Dagobert, and Xiaoli Huang. Relevan e riteria

used by tea hers insele ting oral historymaterials. InPro eedings of the 68th

AnnualMeetingoftheAmeri anSo ietyforInformationS ien eandTe hnology

(ASIST),2005.

[16℄ S.Leydesdor. Demensen ende woorden. Meulenho,2004.

[17℄ Beth Logan,Pedro Moreno, and OmDeshmukh. Wordand sub-word indexing

approa hesforredu ingtheee tsofoovqueriesonspokenaudio.InPro eedings

ofthese ondinternational onferen eonHumanLanguageTe hnologyResear h,

(15)

re ognitiona ura yratesontheusefulnessandusabilityofweb astar hives. In

Pro eedingsofCHI2006,page493.

[19℄ R.J.F.Ordelman,F.M.G.deJong,and W.F.L.Heeren. Explorationof

audiovi-sual heritage usingaudioindexing te hnology. In L. Bordoni, A.Krueger, and

M. Zan anaro,editors,Pro eedings ofthe rstworkshoponintelligent

te hnolo-gies for ultural heritage exploitation, pages 36{39, Trento, 2006. Universitadi

Trento. ISBN=notassigned.

[20℄ A. Ranjan,R.Balakishnan, andM. Chignell. Sear hinginaudio: the utility of

trans ripts, di hoti presentation and time- ompression. InPro eedings of CHI

2006,2006.

[21℄ Laura Slaughter, DouglasW. Oard, VernonL. Warni k, Julie L. Harding, and

Galen J. Wilkerson. A graphi al interfa e for spee h-based retrieval. In ACM

DL,pages305{306,1998.

[22℄ D. Soergel, D.Oard,S.Gustman, L. Fraser,J. Kim,J.Meyer, E.Proen, and

T.Sartori. Themanyusesof digitizedoralhistory olle tions: Impli ationsfor

design. Mala h te hni al report, College of Information Studies. University of

Maryland,June2002.

[23℄ F.Steijlen. MemoriesofTheEast. KITLVpress,2002.

[24℄ S.E.Tranterand ReynoldsD.A. Anoverviewofautomati diarizationsystems.

IEEETransa tionsonAudio,Spee handLanguagePro essing,14(5):1557,2006.

[25℄ L.B.vanderWer,W.F.L.Heeren,R.J.F.Ordelman,andF.M.G.deJong.

Ra-dio oranje: Enhan ed a essto ahistori al spokenword olle tion. InP.Dirx,

I. S huurman, V. Vandeghinste, and F. VanEynde, editors, Pro eedings of the

17th Meeting of Computational Linguisti s in the Netherlands, pages 207{218,

Utre ht,2007.LandelijkeOnderzoeks hoolTaalwetens hap.

[26℄ Steve Whittaker, Julia Hirs hberg, John Choi, Donald Hindle, Fernando C.N.

Pereira, andAmit Singhal. SCAN: Designing andevaluating userinterfa esto

supportretrievalfromspee har hives.InPro eedingsofSIGIR99Conferen eon

Resear h andDevelopmentin InformationRetrieval,pages26{33,1999.

[27℄ P.C. Woodland,S.E.Johnson, P.Jourlin, andK.SparkJones. Ee ts ofoutof

vo abularywords inspokendo umentretrieval. InSIGIR 2000,2000. Athens,

Gree e.

[28℄ E. Zuurbier. Onderzoeknaarde haalbaarheidvanSpokenDo ument Retrieval.