Explorations in multimodal information presentation

(1)

Tilburg University

Explorations in multimodal information presentation

van Hooijdonk, C.M.J.

Publication date:

2008

Document Version

Publisher's PDF, also known as Version of record

Link to publication in Tilburg University Research Portal

Citation for published version (APA):

van Hooijdonk, C. M. J. (2008). Explorations in multimodal information presentation. PrintPartners Ipskamp.

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal Take down policy

(2)

EXPLORATIONS IN

MULTIMODAL

INFORMATION

PRESENTATION

Speech Action Spatial

Procedural

QuaUty Segmental Suprasegrnental Types Expressions Instructions

„.ic..

Action Levels Dlphone Analysts Synthesis Spatial Conceptuallition Thinking Aloud Eye Tracking RSI Exercises HYP,kmt llustrative VisualWorld Paradigm Questions Text Brief Extended Visuals nforrnative

and Answers Unit Selection Static Synthesis Cognitive Engineering Dynamic Hurnan Speech Production

0

Information Seeking Preference Learning Evaluation Speech Experimental Modality Evaluation

(3)

(4)

I

EXPLORATIONS IN

MULTIMODAL INFORMATION PRESENTATION

(5)

ISBN: 978-90-9022855-6

Druk:PrintPartners Ipskamp,Enschede

Omslag: Lennard van de Laar

No part of thisthesis maybereproduced, stored inaretrievalsystemor transmitted

(6)

Explorations in

Multimodal Information

Presentation

Proefschrift

ter verkrijging vandegraadvandoctor aan de Universiteitvan_Tilburg, op gezag van derector magnificus,

prof. dr. R A. van derDuynSchouten,

inhetopenbaarteverdedigentenoverstaan van een door het collegevoor promotiesaangewezen commissie

in de aula van deUniversiteit op woensdag 19 maart 2008 om 16.15 uur

door

Charlotte

Miriam

Joyce

van

Hooijdonk

(7)

Prof. Dr. E. Krahmer Leden promotiecommissie: Prof. Dr. LBateman Dr. H. van_Oostendorp Prof. Dr.

W

_Spooren Prof. Dr. M. Steehouder Prof. Dr. M.Swerts Dr. M. Theune

(8)

Dutch) 9

1 General

introduction 11

1.1 What

is

multimodal

information presentation? 12

1.2 Research questions addressed inthis thesis 14

1.3 Research approach 18

1.4 Thesisoverview 20

2 Production and evaluation of multimodal information 23

presentations 2.1 Introduction 24 2.2 Experiment I: Production 27 2.2.1 Research

method 27

2.2.2 Results 31

2.2.3 Conclusion 36

2.3 _{Experiment II: Evaluation 36}

2.3.1 Research

method 36

2.3.2 Results 42

2.3.3 Conclusion 45

2.4 Discussion 46

Appendix A 50

3 _Spatial_{conceptualization} in multimodal information 53

presentations 3.1 Introduction 54

3.1.1 Effective navigation in hypertext:

navigation maps 54

3.1.2 The role of space in conceptualizing

hypertext 56

and hypertexttasks 3.1.3 Theinvestigation

of

spatial conceptualization in 58

(9)

3.2.2 Codingsystem 66

3.2.3 Coding

procedure 69

3.3 Results 69

3.3.1 Overall

results 69

3.3.2 Spatialverbalizations related to

action type 70

andactionlevel 3.3.3 Spatialverbalizationsrelated to

other 72

performance data 3.4 Discussion 72

4 Modalities

for procedural

instructions 75

4.1 Introduction 76

4.1.1 The effectiveness of different information modalities 76

4.1.2 Expectationsconcerning the

effectiveness of 80

information modalities 4.2 Effectivenessandsubjective satisfaction of 84

information

modalities 4.2.1 Research

method 84

4.2.3 Conclusion 93

4.3 _Subjective_preferencefor information modalities 94

4.3.1 Research

method 94

4.3.2 Results 96

4.3.3 Conclusion 96

4.4 Discussion 97

4.4.1 _{Which information modality was most effective? 97} 4.4.2 Research limitations 100

Appendix B 103

5 Evaluating

thespeech

modality with

eye movements 105

5.1 Introduction 106

5.2 Research method 108

(10)

5.2.2 Stimuli 108

5.2.3 Procedure 112

5.2.4 _{Coding procedure and} _{data processing} 113

5.3 Results 115

5.3.1 Results

ofthe

_eyemovementdata 115

5.3.2 Intelligibility

and naturalness ofthe three speech 125

conditions

5.3.3 Conclusion 126

5.4 Discussion 126

5.4.1 Comparingthe

intelligibility

of

syntheticand 127

natural_speech

5.4.2 Research

limitations

anddirections for future research 130

Appendix C 132

6 Generalconclusionand discussion 133

6.1 Conclusion 134

6.2 Discussion 139

6.2.1 Characteristics

ofthe

task 139

6.2.2 Characteristics

within the

same

information

modality 140

6.2.3 Characteristics ofthe researchmethodology 140

6.2.4 Characteristics of the user 141 6.3 _Studyingmultimodalinformation _{presentation:} 142

pitfalls andcaveats

6.3.1 _Comparing_apples_{and oranges} 142

6.3.2 The_redundancy

of multimodal information

143

presentations

References 145

Summary 155

Samenvatting 161

(11)

Aan de totstandkoming van dit proefschrift hebben veel mensen een bildrage geleverd die ikhiergraag

wil

bedanken.

Allereerst, Fons Maes en Emiel _{Krahmer als mijn promotoren} en Nicole Ummelen als _mijn _{begeleidster. Met} _zijn vieren hebben we veel discussies _gewijd aan de _{richting van}

dit

_{proefschrift.} Nicole was _mijn eerste _{dagelijks begeleidster} en heeft een _belangrijke _{bijdrage geleverd aan} het onderzoek dat beschreven is

in Hoofdstuk 3. Na

haar vertrek werd Emiel _mijn _dagelijks _begeleider. Emiel is een _vrolijke, enthousiasmerende en _inspirerende onderzoeker. Ik wil hem in het bijzonder danken voor zijn inzet, geduld en _{optimisme. Fons} is _{vriendelijke en}

inspirerendeonderzoeker met wie ik, onder het genot van een appeltje, graag van

gedachtewisselden over het lopendeonderzoek. Ik wil hem inhetbijzonderdanken voor zijn steun en zijn geloof in mij.

Bij de sectieCommunicatie &Cognitie vond ik een plek waar ik metveel plezier

aan_{mijn proefschrift}heb_{gewerkt. Ik dank dan ook}

_mijn

_{collega's voor hun steun en}

gezelligheid. Inhetbijzonder wil ikde_{volgende mensen danken:} • Carel van Wijk voor zijnstatistische adviezen,

• Reinier _Cozijn en Edwin

_{Commandeur voor hun hulp bij}

het _{opzetten en}

uitvoeren van het oogbewegingsregistratie-experiment,

• Lennard van de Laar voorzijntechnische ondersteuning tijdensdeexperimenten en zijn hulp bijhetmaken vande_{omslag van}_{dit proefschrift.}

Met Anja Arts en Pashiera _{Barkhuysen deelde}iksameneen_{kamer. Ik wil Anja}

danken haar steunen _{goede raad.}MetPashiera,_mijnparanimf, heb ik vier jaar lang lief enleed _mogendelen. _{Bij Ingemarie Sam} en_{Lauraine Sinay kon ik}_mijnverhaal

kwijt

tijdens een kopkoffie ofeenlunchwandeling in hetWarandebos.

Het onderzoek dat indit proefschriftisbeschreven, hebikgrotendeelsuitgevoerd binnenhet _{IMOGEN project. Ik}

wil

daarom WauterBosma,ErwinMarsi enMariet Theune danken voordeprettige samenwerking.

Tenslotte wil ik mijn achterban danken: _mijn ouders, Jos en _{Marie-Josu, mijn}

zus Elise en _mijn broer Olivier. Bedankt _{voor jullie} steun en interesse en voor de

(12)

(13)

Speech Action Spatial Procedural _Quality _Segmental Suprasegmental Types Expressions Instructions

0 -O '

' ; 4 Action Levels M h. Protocol Diphone =

r-0-

Analysis Synthesis Spatial Conceptualizatlon Thinking Aloud _{.- Eye} mcklng RSI Exercises PHyper„" Illustrative

0-- . - .0- ' .

VisualWorld Paradigm Questions Text Brief Extended Visuals Informative

(14)

Chapter 1

1.1 What

is

multimodal information

presentation?

Tile cover ofthisthesis isinspired on the London Underground Map. This map has

not only been a guide for travellersgoing from

point A

topoint B, but it has also becomea_symbol forLondon itself_{(Roberts, 2005).}The _{London Underground Map} is agood example ofa multimodal informationpresentationbecause it presents the information

of

_{London Underground} _{by combing} several _{presentation modes, i.e.,} textand visual representations of the tubelines._{Moreover, the}London Underground Map is anexample of a goodmultimodal informationpresentationbecause the use

of-multimodal means matches the map's goal, i.e., guiding travellers in the right direction ina_complexnetwork

of

lines, stations,and zones.

A multimodal information presentation can be classified on the basis

of

three criteria or perspectives, i.e., the deliverymedium, the presentation mode, and the sensory modality (Mayer, 2001). Thefirst distinguishes presentations based on the

devices used to deliver theinformation _{(e.g., paper,} _computerscreen,_{loudspeaker).} Thesecond classifies presentations on the basis of theformat ofthe message or the sign system used, like text or visuals. Finally, the

third

perspective starts from the hunian senses employed to process information, such as the auditory and visual

senses. Notethat thesedifferent views are highly related and often show different sides of the same coin (Maybury, 1993). For example, a particular medium may

restrictthe sensory modesinvolved(e.g.,information on paperonlyservesthe visual sensory mode), ora _{single medium} _may _support several _{presentation modes (e.g.,} a piece

of

papersupports both text and visuals). Also, asingle mode, likelanguage,

may be processed through different human senses (e.g., spoken text is processed

aurally, whilewritten textisprocessedvisually). Although differentdistinctions can

bemade between 'mode'and'modality', theonesformulatedby_{Mayer (2001)}enable

us to define the modes and modalities discussed in thisthesis.Accordingto Mayer's tripartition, Chapters 2,3 and 4 focus on different modes (e.g., text, graphics, and film clips) presented on a computerscreen,whileChapter5focuses on themodality

of

the_auditorysense.

(15)

to the combination

of

verbal and nonverbal elements presenting

information in

documents. Examples

of

nonverbal elements are the visual vocabularyto organize text_{in lines, on a page, or in}adocument (see foranoverview Kostelnick&Roberts, 1998), butalsostatic(e.g.,photos) anddynamic(e.g.,animations)visuals.In spoken

language,

multimodality

refers tothe different modeswith whichspoken messages

are communicated, suchasintonation,_speech_quality,and_{facial expressions (Knapp,}

1978). In this thesis, both research perspectives on multimodality are discussed:

Chapters 2,3, and4 start from written language research, whereasChapter 5 starts from _spoken_{language research.}

Inthisthesis, wespeak ofa multimodal information presentation ifa chunk of

information is presented through several presentation modes, like a combination

of written or spoken text andvisuals. There are reasons to believe that presenting informationusingmultiplemodalities is more effective than presentinginformation using a _single _modality _(e.g., _Mayer, 2001; Oviatt, 1999). Recent developments in computer technology have led to newpossibilities

of

presentinginformation and to

arenewed interest in the effects ofdifferentpresentation modes. Naturally, thisraises

"

questions, like "Which presentation modes are most suitable in which situationg

and "How should different presentation modes be combined?" A research project which addresses these questions is the IMOGEN (Interactive Multimodal Output GENeration)project. This projectisembedded in the

IMIX

(Interactive Multimodal Information eXtraction) research programme in the field

of

Dutch speech and language technology andissponsored bytheNetherlandsOrganisation forScientific Research (NWO).

Within the

IMIX research programme a multimodal medical question answeritig (QA) system is being developed. A QA system is anautomatic system thatcananswerauser's questionposedinnatural language (e.g., "What does RSI stand for?") with an answer formulated in natural language (e.g., "Repetitive Strain

Injury").

Nowadays, QAsystems are notonlyexpected togive answers to these

simple questions, but also to more complex questions, like "Howshould I organize my workspace inordertoprevent RSIF or "What is a goodexercise toprevent RSI in

myhands?"The answers to thesequestionsmight be moreinformativeandeffective

if

they contained multiplemodalities, like text andapicture (Theune et al., 2007). In

(16)

Chapter 1

1.2 Research

questions addressed in

this

thesis

Presentinginformation in a multimodal way isnottrivial. Itimplicatesacomplicated

mixture of characteristics

of

communicative tasks and goals, user characteristics

and preferences,characteristics

of

_{sensory modalities,} and_qualities

of

_presentation

modes. One of the first questions that arises when presenting

information in a

multimodal way is which _{presentation mode(s)} should be used. _{For example,} suppose someone wants

information on how

to organize his / her workspace to prevent Repetitive Strain Injury. How should this information be presented to the

user? A possibility would be to present the information through text (see Figure 1.1). However, the presentation would probably be moreinformative if itcontained a visual asit would clarifythe relations between the objects (e.g., chair, desk, and computer screen)

within

an _ergonomic _{workspace iii} one _glance (see Figure 1.1). Another possibility would be a multimodal information presentation in which a text anda visual arecombined (see Figure 1.1). Note that the relation between the text and the visual should be considered when presenting them together (e.g., Carney & Levin, 2002; Twyman, 1987). For example, thevisual can have a low or high informativevalue, e.g., thevisual represents theinformation mentioned in the text or the visual explains the information mentioned in the text as in Figure 1.1. According to research by _{Glenberg & Robertson (1999),} informativevisuals allow

readers to 'index' information presented in text to the information presented in a visual, hencehelpingreaders tomake_{relevant"affordances" (Gibson, 1972). The term}

affordancesrefers to the actions that an individual can potentially perform in their environment. Thus, in this example, when amultimodal information presentation

is well-designed, users will be able to derive the proper actions in organizing an ergonomic workspace. Chapter 2 discusses these basic issues around niultimodal information presentationthroughthefollowingresearchquestions:

• When and how do_people_present

information in

a_{multimodal way?}

• How do people evaluate

unimodal

and

multimodal information

(17)

Well-designed multimodal information presentations not only facilitate comprehension, they can also help users find the appropriate information quickly. Thisisespeciallyimportant inlargemultimodalinformationpresentations, like web

sites. Users_{often experience problems when searching}forinformation inweb sites, like disorientation _{and cognitive overload} (e.g., Ahuja & Webster, 2001; Conklin, 1987; Elm &Woods, 1985). Therefore, several multimodal _{navigational aids (e.g.,} sitemaps, bread crumbs) have been developed aimed at helping users to create a representation ofthe structure or content of the web site or to clarify the users' position within the web site (Maes, Van Geel & Cozijn, 2006). However, studies on the effectiveness

of

these navigational aids show equivocal results (e.g., Dias & Sousa, 1997; Hofman &Van Oostendorp, 1999). In order to help users finding the information they need,wefirst have to investigate how they conceptualize web

sites. There are several indications that the spatial character ofweb sites plays an important role in users' _{conceptualisation} _(e.g., Boechler,2001; Maglio & Matlock, 2003). Therefore, Chapter 3 sets outtoexplore howusersconceptualizetheiractions when navigating a websitethrough thefollowingresearchquestion:

• Howdo usersconceptualize

their

actions when navigating in

multimodal

information environmentsi

Another question that arises in multimodal information presentation is which presentation mode is most effective for a particular learning task (e.g., learning

how to organize an ergonomic workspace). For instance, it might be that a text is most effective in _{expressing abstract matters, whereas}a static visual_(e.g., _{photo or}

graphic) might be most effective in representing perceptualinformation. Adynamic visual (e.g., film clip or animation) is argued to be best in representing temporal aspects (Park& Hopkins, 1993). Moreover, much ofthe empirical research on the effectiveness

of

differentpresentation modeshas focusedon declarativetasks,where

a learner acquires knowledge aboutacertain topic (e.g.,meteorological changes as in Lowe, 2004) It is unclear to what extent findings for learning declarative tasks

carry overtolearningproceduraltasks,wherealearner acquiresacertain skill (e.g.,

bandaging a hand as in Michas&Berry, 2000). Chapter4focuses onthe effectiveness

(18)

Chapter l

Howtoorganize myworkspace to prevent RSI?

1.mil... ..b.-.. ... . '. mland /81, '... '. Ioe:• I ., ' : .0...

How_{to organize my}_workspaceto_{prevent RSI?}

.. 4 1 -D*- i-- - I -- --* Lip. E

* -7...::all

. » .. - 3349(9

₄

i.-- -11-Julf .

*. -,45 m//1

- /&.IMLEJ 1

2/1 ==. I

I ····---*gf e & a *

-How toorganize my wdrkspa toprevent RSI?

i..rut.r.·re...k..boa. .. .1./w

1,„....'.'...

b.

Figure 1.1

(19)

procedural instructions. The characteristics of the presentation modes (i.e., text, photo, and film clip) as wellaslearners' preferencesare takenintoaccountthrough thefollowingresearchquestions:

•

Which

_presentation _{modes are most}_effective

for

learning_and _executing

procedural instructions?

•

Which

_presentation modes do _people _{prefer when learning procedural} instructions?

Textual instructionson_organizingan_ergonomic_{workspace, could}be_presented

visually but also

auditorily. In fact,

the modality principle states that when a multimodal informationpresentation consists of textandvisuals, thetextshould be presentedasspokentextrather than asvisualtext(Mayer&Moreno,1998; Moreno & Mayer, 1999). But when followingthemodality principle and usingspoken text instead

of

written text, the question arises

which kind of

voice should be used. Mayer, Sobko,and Mautone (2003) investigated the effectiveness ofahuman voice and a machine-synthesized voice that accompanied an animation that explained howlightningstormsdevelop. They found that people learnedbetter withahuman

voice than with

a machine-synthesized voice. However, developments in speech

technology have led toa frequent use

of

syntheticspeech in computer applications,

like_{computer-aided}instructionsand_{consumer products}_(e.g.,_{navigational aids and} mobiletelephones) (Paris,Thomas, Gilson & Kincaid, 2000).There are two reasons whysyntheticspeech ishardertocomprehend than humanspeech. First, synthetic

speech is lessintelligible than human speech asthe acoustic signals

of

synthesized

speech are impoverished (e.g., Luce, Feustel, & Pisoni, 1983; Nusbaum & Pisoni,

1985). Second,syntheticspeechsoundsunnaturalcomparedtohumanspeech due to

the_{limited modeling}

of

_prosodiccues,likeintonation,stress,anddurational_patterns (Nusbaum, Francis&Henly, 1995).Currently,there aretwocommon waysto create speech synthesis. The first is _{diphone synthesis} which is based _{on concatenating}

(20)

Chapter 1

sum, evaluatingmultimodal information _{presentations not} _only_{implies evaluating} different presentation modes, but also the quality differences

within the same

modality. Chapter5 focuses onthequalitydifferences betweensyntheticspeech and

humanspeechusingthefollowingresearch question:

• How

do

quality

differences

within

the speech

modality influence its

incremental processing and how canweassessthese

quality

differences?

1.3 Research

_approach

In the previous section, we mentioned that several factors should be considered when presentinginformation inamultimodal way. Inthis section, we

will

argue that knowledgeon multimodalinformationpresentation canbe_{obtained using}different research methodologies. In this thesis, each chapter discusses a different research

methodology usedtoevaluatemultimodal informationpresentations.

In theresearchfield

of

speech andlanguage technology there isagrowinginterest in multimodal _{human computer} interaction. Past research _{in human-computer} interaction has shown that the use

of

multiple output modalities makes systems

more robust andeflicient to use (Oviatt, 1999). Also, in the area

of

computational linguistics, research has been done on multimodal documents analysis and generation (e.g.,Bateman, Kamps, Reichenberger &Kleinz, 2001). In multimodal systems guidelines are needed to combine the different modalities in such a way that each bit

of

information ispresented in the mostappropriatemanner. A way to generate optimal multimodal presentations is _{investigating when and} how human

users_presentinformation in a multimodal _way._Chapter2 startsfrom multimodal human computer interaction and describes two experiments using the cognitive engineering approach (Tversky et al., 2006). In this approach, humanusers are asked to produce information presentations, which are then rated by other users (e.g., Agrawala&Stolte, 2001; Heiser,Phan, Agrawala, Tversky&Hanrahan, 2004).

(21)

conceptual map) using performance measures, like the number

of

opened pages

andthenumberofpagesrecalled. However, the relation betweentheseperformance measures and how users_mentally_{conceptualise a web site}is unclear. For instance,

suppose users arepresented with the spatial map and open many _{web pages. Does} this mean that they haveaclear overview of the web site or that theyaredisoriented?

Users' _{representation of a web site can} be _{investigated using} other methods, like protocolanalysis. Inthisresearchmethod,participantsareaskedtocarry out a task, while verbalizingtheirthoughts. Theseverbalizations arewritten down in averbal

report and analyzed in a way that depends on the research question (Ericsson & Simon, 1993). Chapter 3 discusses an exploratorystudy in which protocolanalysis is used to get a fine-grained view ofhow users conceptualize their actions when navigating a web site.

In thefield

of

cognitive andinstructionalpsychology research has been done on the influence

of

differentpresentation modes on theusers'understanding, recall, and processingefficiency ofthe presented material(e.g.,Mayer,2005;Tversky,Morrison

& Bttrancourt,

2002). Several studies compared the effectiveness

of

different

presentationmodes,howeverwith mixedresults_(e.g.,B6trancourt &_{Tversky, 2000;} Lewalten2003;Tversky et al.,2002).Variousreasons have beenmentionedforthese

findings: the lack

of

equivalence

of

information in thedifferentpresentation modes (Tversky et al., 2002), differences in learning tasks (Hegarty, 2004), or in learning performance measures (Brunken, Plass &Leutner, 2003).Apart from the objective

effectiveness

of

differentpresentation modes,users''subjective satisfaction' (Nielsen, 1993) should alsobetakenintoaccount, asanattractiveand_motivating_presentation formatcouldalsoinfluenceits effectiveness.Chapter4describestwoexperiments. In

thefirst experiment, the effectiveness

of

threepresentation modes (i.e., text, photo, and film clip) was evaluated using several objective measures, like learning times and recall. In the second experiment, we investigated whether users subjectively preferred one

of

thesethree presentation modes.

Research inspeech synthesis hasevaluated the

intelligibility

and naturalness of syntheticspeechwith offlineresearchmethods. For example, intheModifiedRhyme

(22)

(Schmidt-Chapter 1

Nielsen, 1995) in which listeners have to rate the quality

of

spoken sentences on scales (i.e.,excellent - bad).Yet, theseresearchmethods donotconsider thatspeech

is transient: spoken instructions are "gone" once they have been uttered. Online researchmethods,likeeyetracking, give a direct insight inhowspeechisprocessed

incrementally. Chapter5 describes an eye tracking experiment in which the visual world _paradigm _{(e.g., Tanenhaus,} _{Spivey-Knowlton,} Eberhard & _{Sedivy, 1995) is} used toevaluatethe processingof syntheticspeechandhuman speech.

1.4 Thesis

overview

Figure 1.2 gives a 'multimodal' overview of this thesis. Chapter 2 offers a general

introduction into multimodal information presentation and presents two studies. The first study, a production experiment, was conducted to investigate when and how users_{present medical} information in a_{multimodal way.} The second _{study, an}

evaluation experiment, was donetoinvestigate howusersevaluatetheinformativity and attractiveness

of

unimodal and multimodal information _{presentations. The} later chapters are more detailed case studies _{looking into multimodal information} presentationfrom differentperspectives.

Chapter3focuses ontheresearchquestion howusersconceptualizetheiractions when navigating a web site. Thinkingaloud protocolswere analyzed todistinguish users' actions involved in web sites navigation and the type

of

expressions used to verbalizethese actions.

Chapter 4 also presents two studies. The first study describes an experiment investigating a specific kind of procedural instructions, i.e., RSI exercises, taking presentation mode (text vs. _{photo vs. film clip)}and

_difficulty

_{degree of the}exercises (easyvs.

diflicult)

asindependent variables.Thesecond studydescribesanexperiment concentratingonwhichpresentation people prefer whenlearningRSI exercises.

Chapter5takesacloser look at thespeechmodality. Aneyetrackingexperiment was conductedtostudytheincremental processing oftwoforms

of

speechsynthesis

(23)

Finally, Chapter 6 presents a review of the results found as well as a general

discussion of themost interesting findings of thisthesis.

Chapter 2

Production andevaluation of multimodal Information presentations

1.1

Chapter 3 Chapter4 - Chapter 5

Spatial conceptualizationin Modalities forprocedural Evaluatingthe speech multimodalinformation instructions modality with

presentations eye movements .

Figure 1.2

(24)

Production

and

evaluation

of

multimodal information

presentations

Speech Action Spatial

Procedural _Quality

Segmental Suprasegmental Types Expressions Instructions

0 -0 ' ' ' .i

Action Levels 46. A/ Protocol Diphone -Synthesis Spatial A-ty'1, Conceptuallzation - Thinking J Aloud 0<1 U H,--•

... IM

o t,

n-.

Visual Worid I I Paridigm Questions Text Brief Extended Visuals Informative

Ind Answers . =.

A

- Static Selection . AI. Synthesis F Cognitive Engineering - Dynamic - Human _Production

=ech 0

Information Seeking _Learning Preference Evaluation Speech Experimental Modality Evaluation

Ajournal paper based on this chapter is submitted for publication. Earlier versions of this chapter

appeared asVanHooijdonk, C.M.J., De Vos,J., Krahmer, E.J., Maes,A.,Theune, M.,&Bosma, W. (2007).

On the role of visuals in multimodal answers to medical questions. Proceedings of the International Professional Communication Conference (IPCC), Seattle, USA: IEEE and as Van Hooijdonk, C.M.J.,

(25)

2.1 Introduction

This chapter offers a first exploration into multimodal information _presentation

from the perspective of human-computer interaction. More specifically, we take the perspective

of

multimodalpresentation

of

answersinquestion answering (QA). Early research in the

field of

QA concentrated on answering factoid questions,

"

i.e., questions that have one word or phrase as theiranswer, such as "Amsterdam in response to the question "What is the capital of the Netherlands?" The output modality to these questions

will

typically be text. However, there is currently a growing interest in moving beyond factoid questions and purely textual answers,

and then output generation becomesan important issue. _Questions that arise are: how todetermine foragiven question, what thebestcombination

of

modalities for the answer is,Andrelated to this: what is theproperlengthofa non-factoidanswer In this chapter, we address these basic issues around multimodal information presentation in thecontext

of

medical question answering.

Inthe medical domainseveral_{question types occur, such}asdefinitionquestions

and procedural questions, which require different types

of

answers. For example the answer to the definition question "What does RSI stand for?" would probably

be a

brief

_{textual answer, like} "RSI standsfor_{Repetitive Strain Injury". However, a} textonlyanswer may not be thebestchoice forevery type

of

information. In some

casesothermodalities_(e.g.,_pictures,

film

_{clips, etc.) or}_modality_{combinations (e.g.,}

text and a picture) may be _{more suitable (Theune et al., 2007). For example, the} answer totheproceduralquestion "Howtoorganizeaworkspace inordertoprevent

RSI?"would probably be more

informative if

it containeda picture. Moreoven the

length oftheanswer could also playan important role inthe answer presentation. For example, the answer to the question "What does RSI stand for?" could be an extended one: "RSI stands for _{Repetitive Strain} _Injury. This disorder involves damage_{to muscles,}_tendonsandnervescausedby overuseormisuse, andaffects the hands,wrists, elbows, arnis, shoulders, back, or neck". This answer provides the user with relevant backgroundinformationabout thetopic ofthe question. Inaddition, including informative text in the answer may allow the userto assessthe answer's

(26)

Production & evaluation <,1- multimodal information presentations

modalities for the answer is. And related to this: what is the proper length of an answer?

Much research has been done in the field

of

cognitiveandeductionalpsychology on theinfluenceof (combinationsof) different modalitiesontheusers'_{understanding,}

recalland_{processing efficiency of the presented material}_(e.g.,_Carney

&

Levin, 2002;

Mayer, 2005; Tversky et al., 2002). This research has resulted in several guidelines on how to present _{(multimodal) information to the user, such as} the multimedia principle (i.e., instructions should be presented

using both text

and pictures, rather than text only) and the spatial contiguity principle (i.e., when presenting a combination of text and pictures, the text should beclose to or embedded

within

thepictures) (Mayer,2005).However,theseguidelinesarebasedon specifictypes of information used in_{specific domains, in} _particular_descriptionsofcauseandeffect chains which_{explain how systems work} _(e.g., _Mayer, 1989; _Mayer & Gallini, 1990;

Mayer

&

Moreno, 2002) and procedural information describing how to acquire a certainskill _(e.g.,_{Marcus, Cooper}&Sweller, 1996;Michas&_Berry,2000;Schwan & Riempp, 2004).Yet, theseguidelines do not telluswhichmodalities aremostsuited for which information types, as each learning domain has its own characteristics (Van Hooijdonk &Krahmer, inpress).

Several researchers have

tried to make

an overview of the characteristics of modalities,informationtypes, and the matches between them. For example, Bernsen (1994) focused on thefeatures

of

modalities inhisModalityTheory, i.e.,"Given any

particular set

of

information which needs tobeexchangedbetween userand system during task performance in context, identify the input/output modalities, which, from the user's

point of

view, constitute an optimal solution to the representation and exchange of thatinformation"(Bernsen, 1994, p. 348). He proposedataxonomy

to define generic unimodalities consisting

of

various features. Other researchers

proposed taxonomies

of

information types such as dynamic, static, conceptual,

concrete, spatial, and temporal in order to select the appropriate modalities (e.g., Heller,Martin,Haneef& Guevka-Kriliu, 2001;Sutcliffe, 1997).

Otherresearch has beenconcerned with theso-called"media allocation_problem":

"

(27)

Hovy & Vossers, 1993, p. 280). According to Arens et al. ( 1993) the characteristics

of

the media used are not theonlyfeatures that play a role in media allocation. The characteristics of theinformation tobeconveyed, thegoals andcharacteristics of the producer, and the characteristics of the perceiver and the communicativesituation are also important. In order to create a multimodal information presentation, modalities should be _integrated _dynamically based on a _general communication

theory (e.g., _{Arens et al.,} _1993; _Andr6, _2000; _{Maybury & Lee,} _2000; _{Oviatt et al.,}

2003).

Inshort, attempts have been madeto generateoptimal multimodal information presentations resulting in several guidelines, frameworks, and taxonomies. However, whatisneededin additionis_gaining_{knowledge on when and how people}

producemultimodal informationpresentations and howotherpeople evaluate such presentations. To achieve this goal, we carried out two experiments following the cognitive engineering approach as used by _{Heiser et al. (2004). In this} approach, people are asked to produce information presentations (e.g., route maps, assembly instruction,etc.),which are then rated byotherpeople. Based onthe results, design principlesareidentified and usedtoimprovetheseinformationpresentations.

This chapter describes two experiments carried out in order to investigate the role

of

visualsinmultimodal_{answer presentations for}amedical question answering

system.First,aproductionexperimentisdescribedthatfocusesonwhichmodalities

userschoose to answer medical questions. Participants were instructedto create a brief and an extended answerto different medical question types (i.e., definition

"

questions, like: "Where is _{progesterone produced?} vs. _{procedural questions,}

like "

How is a SPECT scan made?"). Next, an _{evaluation experiment} is described that concentrates on how users evaluate different types

of

answer presentations. Participantswereinstructedtocarefullystudy answer presentations thatwereeither unimodal (i.e., consisting of text only) or multimodal(i.e., consisting of text and a picture), and thatwere based onthe answer presentations collected in theproduction experiment.Aftertheparticipantshadstudiedthese answerpresentations, theyhad to

(28)

Production & evaluation of multimc)dal intkirmation presentations

2.2 Experiment I: Production

2.2.1 Research

method

Participants

One hundred and eleven students

of

_{Tilburg University participated for} course credits(65 female and 46male,between 19 and 33_{years old).} All _{participants were}

native_speakers

of

Dutch.

Stimuli

The participantswere given one of four sets

of

eight general medical questions (see appendix A) for whichtheanswerscouldbefound ontheInternet. Theparticipants had to givetwo types

of

answers per question i.e.,a briefanswer and anextended

answer. Besides, different (combinations of) modalities could be used to answer the questions. The participants had to assess for themselves which (combinations of) modalities were best for agiven question, and they were specifically asked to

present the answers as they would

prefer to find them in a

QA system. To make

sure they could carry out this task, theywere instructed about the working of QA systemsin advance.Questionsandanswers had tobepresented in afixedformat in

PowerPoint'"withareas forthe question ("vraag") and the answer("antwoord"). This programmewas chosenbecause it has the possibilityto insert pictures,

film

clips,

and_{sound fragments in} an_{answer presentation.}

All

participantswerefamiliar with PowerPoint' and most of them used it on a_monthlybasis (51,4%).

Of

the_{eight questions in each set, four}were_randomlychosen fromonehundred medical questions

formulated to test the IMIX

QA system (e.g., "How many X chromosomes doesafemale body cell have?"). Of theremainingfourquestions, two

were definition questions and twowere procedural questions. Orthogonal to this, twoquestions referredexplicitlyor

implicitly to

bodyparts and two did not. These four question types were given to the participants in arandom order. Examples of

the questions were:

• _Definition _question

referring to

body parts: "Where is progesterone produced?"or"Where areredbloodcellsproduced?

(29)

• _{Procedural question}_{referring to}_body _{parts: "How}_to_apply_a_{sling to the left}

arm2" or "What should be done when havinga nosebleed?"

• _{Procedural question} _not _{referring to} _{body parts:} _{"What happens when a}

myelogram istaken?" or "How is aSPECT scan made?"

Codingsystem

Each answerwascoded on thefollowingvariables: thepresence

of

photos, graphics, animations, andthefunction

of

thesevisual media related to the text of the answer.

The coding criteria for these variables are discussed below. To determine the reliability

ofthe

codingsystem, Cohen'sK (Krippendorff, 1980) wascalculated. • Photos:Wedistinguishedwhether the answer contained no photo,onephoto or

several_photos.

• Graphics:We_{defined graphics}as_{non-photographic,}_{static depictions}

of

_concepts

(e.g., diagrams, charts, and line drawings). We _{distinguished answers with no}

graphics,onegraphic, orseveralgraphics.

• Animations:Wedefined animationsas_dynamic_{visuals possibly}with_{sound (e.g.,} film clipsandanimated pictures).We_{distinguished}answerswithoutanimations, withoneanimation, orseveralanimations.

• Function

of

visual media:We distinguishedthreefunctions

of

visuals in relation

to_{text, loosely}based onCarney& Levin (2002)':

1. Decorationatfunction:avisual has a decorational function if removing it from theanswerpresentation doesnotalter the informativity ofthe answer in any way. Figure2.1 showsan_example

of

_{answer presentations with}adecorational visual. The example shows an answer to the question: "What are the side effects ofavaccinationfor diphtheria, whoopingcough, tetanus,andpolior"

The anszverconsists ofacombination of text andagraphic. Thetextdescribes theside effectsofthevaccination,whilethe graphiconlyshowsasyringe. Tile graphic does not addanyinformation totheanswer. Theexample on theright shows an answer tothequestion: "How manyXchromosomes doesafemale

body cellhave?" The answerconsists ofa combination of text anda _graphic.

(30)

Pri,duction & evaluation of multimodal information presentations

2. Representationalfunction.a_{visual has a representational function if removing} it fromtheanswer presentation doesnotalter the

informativity of

theanswer,

but its presence clarifies the text. Figure 2.2 shows two examples

of

answer presentations with a representational visual. Theexample on the left shows

an answer to the question:"Whattypes of colitis canbedistinguished?" The answerconsists ofacombination of text andagraphic. Thetextdescribes the four types

of

colitisandtheiroccurrence in the intestines. This information is visualized in the graphics. The example on the right shows an answer to the question: "Howto apply a sting to the left arm?"Theanswer consists of three photos illustratingthe procedure, which isdescribed in moredetail in the text on the right.

3. Informativefunction:avisual hasaninformative function

if

_{removing it from}

the answerpresentationdecreases theinformativity oftheanswer. Ifananswer only consists ofavisual,

it

automatically hasaninformative function.Figure 2.3 shows two examples

of

answer presentations with informative visuals. The example on the left shows the answer to the question: "Howto apply a sling to the left arm?" The answer consists of four graphics illustrating the procedure.Theexample on therightshows an answer to thequestion: "How can I strengthen my abdominal muscles?" The text describes some general information about abdominal exercises (i.e., anexercise _program should be well balanced and train all abdominal muscles). The photos represent four exercises that can be donetostrengthentheabdominal muscles.

Coding procedure

In total 1776 answers werecollected (111 participants x 8 questions x 2 answers).

Howeven oneparticipant gave 15answers resulting in onemissing value. Thus, the codedcorpus consisted of1775answers. Thecodingscheme wasgiven tosix analysts. The annotation was done in two steps. First, each analyst independently coded a

part of

the corpus to determine the adequacy of the coding scheme. Differences between the analysts were discussed, which resulted in some adjustments of the codingsystem.Subsequently, every analyst independently coded the same set of 112

(31)

Tocompute agreement weusedCohen'sKmeasure. Followingstandard practice,

Cohen'sKscoresbetween .81 and 1.00signifyan_{almost perfect agreement, between}

.61 and .80 _signify a substantial agreement, _{between .41 and .60 is} a moderate

agreement,and between .21 and .40 is afairagreement (Rietveld & Van Hout, 1993). It turned out thattheanalysts almostperfectlyagreedin judgingtheoccurrence of photos (K = .81),graphics (K = .83), and animations (t< =.92) Moreover, an almost perfect agreement was reached in assigning the

function of

_{the visual media (K =}

.83).

+

VUAG « MAAG Wat zijndebljwerkIngen van een DKTP·prik?

Hoeveel«hromosomen bevateen lichaamscel yan een vrouw?,

ANTWOORD ANTWOORD

Bgwerkingen van een DKTP·vaccinate· _{I Een 1,Chaamscel van een vrouw heeft}2X·chromosornen. Plaarseillke reacties

Hangerigheld,onrustlg slapen.koorts 1 Langdurlg ontroostbaarMullen

Flauwvallen

Een verkleurd arm of been

n

chrome-Koortsstuipingen _soom

Bljwerkingen van een DTP·,accinatie zil„ milder dan vanher

19.

DKTPvaccin. aangezlen kinderen ouder zljn als ze het DTP-vaccin

4gen. Bovendien heeft dit vaccir, een anderesamenstelling

Figure 2.1

Examplesof answer presentationswithdecorational visuals

'RAAG «

'MAG

, C Welkevormen vancolitisworden onderschelden? Hoe leg jeeen mitella ain b'Jdolinke,ann?

ANTWOORD ANTWOORD

Bycolltis ofwel ontsteking van de dikke . G.,·i,ii,.., i ··I- I."'"... i,«0„ Ill'.2/... Illifiguill/"

dam women4vormen..der.cheiden. I ... .. .. ... -/--

r-'e,•--Fle '9*.-*...

,/ 1/1 .1

/,Ce _{.., ... ler...}

il recticitis of proctitis: hierbii is de

· 2 1:111:5 1:t Lriende tda- --»1 tl: -"3 : 1110'.00 '1"'"'. f : c'.I:. '11 hhI.darm en het sigmoid liaaiste 20 cm _ _.rs-1 .-· ·7 ... ---*-/-vande dikke darmiaangetast ··=3EE---- --

-.ar le I... : I

a., n. le . .. ., M>t) or, ./ :h. der

I linkszydige coiltis: h,erbil gaar de colitis tot aa. de milthoek en

Is eigenlifk de gehele linkerzilde van de dikke dann ziek . Z . ..: ··-,'·>. ng.,·, D..'i nue;.. ..e·

I panColitls of totale colitis nierbu is de genele dikke darm

I.-.... ....- » I aangetast door colitis ulcerosa e I...g yoi,ioel.elOrI ge· ...

Figure 2.2

(32)

Production & evaluation ofmultimodal information presentations

VRAAG

4'

YRAAG <

Hoe legle een mitella aan bij de linkerarm? Hoe kan ik mijn buikspieren versterken? 3

ANTWOORD ANTWOORD

Buiks/efen bunnen worder versierkt cloof net doen van /u:/pier/efen,/Ren Niet alle ouikspiefoeferlingen zorger voor een optimaa' res/!taat Eer

11 91 9

0

oefepprogramma Yoof de bu 'sperer net 00'ou#end en goed u, gebalancee'd

t.( *9 : 4.P\,1 '31), 4/9 1.RI•

..gelt'ke .anieren gestimuleer. wor.en om ..er.en alleen ...0. J. hel M. er allebu,I,tsp,eren moetenget'alna *oider. De Duiksperen Inoeter op alle

43 '1 #&- '

'A

builisoieroefen ngen

Derfec:le resultaat Hleronder swan mi aanta{ voorbeeidei, Ar goede

7/rip/<7:/9

i%7/1.

..R.1/.WER

Stap1 Stap 2 Stap 3 gap4

Ii:rill Ii/li,

Figure 2.3

Examplesof answer presentations with informative visuals

2.2.2 Results

Descriptive statistics

Table 2.1 shows the percentages

of

visual media (overall), photos, graphics, and animations in the complete corpus

of

coded answer presentations. Inspection of Table 2.1 reveals that almost one in four answers contained one or more visual media,

of

which _{graphics were most frequent and animations} were least _frequent. Thepresenceofphotoswasbetweenthese two. Insome answers severalvisual media occurred (i.e.,photos, graphics, and animations). These instances were counted as one occurrenceof-visual media. Thus, the sum of thepercentages

of

photos, graphics, and animations in the corpusexceededthe percentage of the variable visual media.

Table 2.1

Percentagesof answer presentations containing text only(no visualmedia)andvisual media (overall)

dividedinto photos,graphics and animations in the complete corpus ofcodedanswers (n = 1775).

Novisual media 75.1

Visualmedia 24.9

Photos 8.6

Graphics 14.9

(33)

Table 2.2 shows the percentages

of

photos, graphics, and animations related to their

function. Note that in

some answers several visuals occurred (i.e., photos, graphics, andanimations).Theseinstanceswerecounted asoneoccurrence

of

visual media. Thus, the sum ofthe percentages

of

photos, graphics, andanimations in the corpusexceededthe percentage ofthe overall occurrence

of

visual media.Table 2.2

reveals that thedistribution

of

photos relatedtotheir functiondifferedsignificantly from chance (X (2) = 41.30, p< .001). Mostphotos had arepresentationalfunction. Also, there was an association between graphics and_{their function (Xl (2) = 38.09,} p< .001). Most graphics hadarepresentationalfunction.Finally,there wasarelation

betweenanimations and thefunction

of

visual media (Xl (2) = 67.52, p< .001). Most animations hadan informativefunction.

Table 2.2

Percentages of photos.graphics,andanimations relatedtotheir function.

Functionof visual media

Decorational Representational Informative Totals Photos (n =152) 20.4 579 21.7 100.0

Graphics Cn =265) 15.8 45.3 38.9 100.0

Animations (n =67) 7.5 11.9 80.6 100.0

Within

the corpus

of

collected answer presentations different types

of

photos and graphics occurred. It turned out that somephotos and graphicscontained text and some did not. Therefore, a sub-analysis was done to investigate whether the

distribution of

the functions

of

visual media differed between photos with and without textand between graphics withandwithouttext.Table2.3shows the results. It turned outthat photoswithouttextoccurred signihcantlymore often than photos with text (Xl (1)_{= 60.63, p<.001).} The reverse wasfound for_graphics:_{graphics with}

text occurred significantly more often than graphics _{without text (X2 (1) = 38.49,} p< .001).

There was a dependence between thefunction ofvisual media andphotos with and without text (X (2) = 5.97, p = .05). Most photoswithout textwereassociated with

(34)

Production & evaluationof multimodal informationpresentations

Table 2.3

Percentagesof types of photosand typesof graphics relatedtotheir function. Function of visual media

Decorational Representational Informative Totals Photoswithout text (n = 124) 16.9 58.9 24.2 100.0

Photos with text (n =28) 35.7 53.6 10.7 100.0

Graphicswithout text (n =82) 30.5 40.2 29.3 100.0

Graphics with text (n = 183) 9.3 47.5 43.2 100.0

However,mostphotos with textwere associated witharepresentationalfunction or a decorational function (X (2) = 7.79 p< .025). Also, the

distribution of the

functions

of

visual media differed significantly between the graphics with and without text (X (2) = 19.54, p< .001). There wasno association between graphics without textand their function (X (2) = 7.78, p - .41). Graphicswithout text were

evenly associated with the three functions

of

visual media. However, there was an association between graphics with textandtheir function (X (2) = 48.13, p< .001). Most graphics with text hadarepresentational oraninformative function.

Answer length

The brief and the extended answerswere related to differentanswer presentations.

(35)

Table 2.4

Percentages and /' statistics of the presence ofvisual media (overall)divided intophotos,graphics,

andanimations related to the brief and the extended answers(Scores are percentages ofanswers:

n = 1775).

Length oftheanswer

Brief (n =888) Extended (n =887) 22 statistics Visual media 11.4 38.4 7.2 (1) = 173.89. p<.001

Photos 4.6 12.5 7.2(1)- 35.34. p<.001

Graphics 6.3 23.6 72 (1) = 104.04, p< .001

Animations .9 6.7 72 (1) = 40.40. p< .001

Table2.5 showsthe percentages and X2statistics of the functions

of

visual media

relatedtobriefandextendedanswers. Theresultsshowed thattheoveralldistribution

of

the functions

of

visual media across the answer types differed significantly (XI (2) = 34.31, p< .001). Decorational visuals occurred more often in briefanswers, whereas_{representational visuals} occurredmore often inextended answers. _Finally, informativevisualsoccurred more often in

brief

answers.

Table 2.5

Percentages of the function ofvisual mediarelatedtobrief and extended answers (n = 444) Length of the answer

Brief (n =102) Extended (n =342) y.2 statistics

Decorationalfunction 26.5 12.9 22 Cl) = 4.07, p<.05

Representational function 20.6 52.9 72 (1) = 126.73, p< .001 Informativefunction 52.9 34.2 12 (1) = 23.21. p<.001

Type

of

question

We were interested whether different types

of

questions were related to different answer presentations. Thereforewe analyzed asubset of the medical questions (i.e.,

the definition andproceduralquestions with and without reference to body parts).

Table 2.6 shows the percentages and X1 statistics ofthe presence of visual media

(overall), photos, graphics, and animations

within

the definition and procedural questions and

within

questions with and without reference to body parts. The distribution ofall variablesdifferedsignificantlyacrossthe questiontypes.Ingeneral,

(36)

Production & evaluation of multimodalinformationpresentations

body parts. Lookingat specific types

of

visual media, we see that graphics occurred more oftenin answers todefinitionquestionswithreference tobodyparts, but that photosandanimations occurred more ofteninanswers toproceduralquestions with reference tobodyparts.

Table 2.6

Percentages and₇₂statistics of the presence ofvisual media (overall)divided intophotos, graphics, andanimations related to the four questiontypes.

Definition questions Procedural_questions (n =443) (n = 444)

2 statistics

Bodyparts .Body parts Bodyparts .Body parts

(n =222) (n =221) (n =222) (n = 222)

VisualMedia 31.1 10.0 47.7 33.3 72 (3) = 53.09, p< .001

Photos 4.1 5.4 22.1 19.8 %2(3)=46.07, p<.001

Graphics 28.8 5.0 15.3 12.6 %2(3)- 42.77, p<.001

Animations .5 .9 14.9 5.4 72(3)= 55.17, p<.001

Table2.7showsthe percentages and_)dstatistics of thefunctions

of

visual media

within

definitionand _procedural_questions and

within

_{questions with}andwithout reference to bodyparts. The results show that the

distribution of

the functions of visualmediadiffered significantly

within

thequestion types (X (6) = 91.84, p< .001). Decorational visuals occurred more often indefinition _questionswithoutreference

(37)

Table 2.7

Percentages and y.2statistics ofthe functions of visual media related to the four question types

(n = 272).

Definition questions Procedural_questions En =91) (n = 181)

lv2 statistics Body parts.Bodyparts Body parts.Bodyparts

(n =69) (n =22) (n =106) (n = 75)

Decorationalfunction 5.8 63.6 3.8 8.0 %2 (3) = 9.71, p<.025

Representational function 63.8 22.7 39.6 52.0 72 (3) = 31.42, p<.001 Informativefunction 30.4 13.6 56.6 40.0 _{7.2 (3) - 59.68, P< .001}

2.2.3 Conclusion

The results of theproduction experiment showedthat users do make use

of

multiple media in their answer presentations and that the design

of

these presentations is affected bytheanswerlength and question type. However, what is not clear is how

usersevaluate_{different typesof}answer_{presentations}(i.e.,unimodalvs.multimodal).

In the

next _section, an evaluation experiment is discussed in which users were instructedtoassessanswer presentations ontheir

informativity

andattractiveness.

2.3 Experiment II: Evaluation

2.3.1 Research

method

Participants

Participants were 108native_speakersofDutch (66female and 42male, between 18

and 64years old). None hadparticipated intheproductionexperiment.

Design

(38)

Production & evaluation 01-niultimodalint-ormation presentations

an extended answer with an informative visual) as between _participants variable and question type as

within

participants variable. The dependent variables were

the participants' assessment of the informativity and the attractiveness of the text

and visual combinations and the number of correct answers in the post-test. The participantswererandomlyassigned toanexperimentalcondition.

Stimuli

Forthe evaluation experiment, 16 medical questions were selected from the set of

32medical questions of theproductionexperiment.We selectedquestionsforwhich the production corpus contained two re|evanttypes ofvisuals: informativevisuals and decorational or representational visuals. For the purpose of this experiment, decorationaland_{representational visuals}werecombinedinto illustrativevisuals. An illustrative visual did not add anymoreinformation to thetextualanswer, whereas

an informative visual didaddinformation tothetextualanswer.

The selected set

of

_{medical questions consisted}

of

_eightdefinition_{questions and} eight procedural questions. In both question types, half ofthe questions referred

to body parts and half did not. Examples ofthe questions used in the evaluation experiment were:

• _Definition_{questions: "Where}_is _testosterone_{produced?" or "What does ADHD}

stand for?"

• _{Procedural questions: "How}toapplyasling to the left armp or "Howtoorganize a workspace in ordertoprevent RSI?

The 16 medical questions were presented in four different answer presentation formats:a

brief

textual answer withanillustrativevisual,anextended textualanswer with an illustrativevisual,a brieftextual answer with an informativevisual, and an extendedtextual answer withan informativevisual. For the sake

of

comparison, two unimodal answer presentation formats were added: a

brief

textual answer and an

extendedtextualanswer.

For everyquestion abrief andan extended textual answer wasformulated. The brief and the extendedtextualanswerswere based ontheanswersfound inthecorpus

(39)

provided some relevant background information about the

topic of

the question.

The averagelength ofthebriefanswerwasalmost 26words and the average length ofthe extendedanswers wasalmost66 words. Thesamebriefandextendedanswers

were also used in the text withanillustrativevisualcondition and in the text with an informativevisualcondition.

In the two text with an illustrative visualconditions, the brief andthe extended textual answers were presented together with an illustrative visual. An illustrative visual hadbeengivenadecorational orarepresentationalfunction intheproduction experiment(seesection2.2.1). Figure2.4 showsan_{example of}abrieftextualanswer

andan extended textual answer with an illustrative photo. Both examples show the

answer to thequestion: "Howtoorganizeaworkspace inordertoprevent RSI?" The answer presentation on the left containsa brieftextual answerdescribingthree tips for organizingaworkspace in ordertoprevent RSI. Theanswer presentation on the rightcontainsanextendedtextual answer describinganergonomic workspace. Both answer presentations containa_photo_illustratinga_{workspace. This}_photo_represents

an element (i.e., a desk) mentioned in the textual answers. However, the answers

would not be lessinformative if_{the photo was not} _present.

- A

YRAAG,: IMP< VRAAG

_{-- )(9<}

Hoe mosfik mUR =Ap k inlighten om RSI tevoorkomen? Hoemoetikminwerkptek inrichten om RSItevoorkomen?

ANTWOORD ANTWOORD

Stelde hoogte vanhetbureaublad n op middelhoogte en srel de 20'g DIJ Ce i-STe"Ing V" Je "rea·, e,voor Ga: de "ogle var' ne' D:reauD,ad 00 bovenkant vanhetbeeldscherm op oognoogre In. Stel le stoel zo In rn //e·noogte :5 'flgeste,/ De we/v:/dieote var je b/M /MIT ·vin maal BO cm zodatJerecthtop zit. .€ i n Zorg 5, de •nstel,·ig le bee,dscherm e„wr dat de Deveokant par je_{bee ...ner. op .gghoogle is ·.ges:eld Tenslotte incet 'e ervoor Wger dat le} /, ·ea.,stoe zo s ,ngesil d/ :e u/// r er Je Oete' piat 00 de gorid uster

* d/'Er

Figure 2.4

(40)

Production & evaluation of multimodal information presentations

VUAG 09< YRAAG

_le<

Hoe moet Ik mijn werkplek Inrichten om RSI te voorkomen? Hoe moet ik mijn werkplek inrchten om RSItevoorkomen?

ANTWOORD ANTWOORD

Srelde hoogte van het bureaublad n op mlddemhoogte en stelde Zorg 511 de nstel'ing Kan ,e Dureau e,voor (ja, 3/ hOOKIe vari lie[ b:reaubla(j oe bovenkant van het beeldschermopooghoogre in. Stel le stoel zo in m,ddelnoogre·s Inges,ed De *er.,1#em,ar, Je bureauler': minimaa·80 cmI ilin Zorg 6 j de Instell,ng le Dieelds/liem /100, dat de lovenkant van je zodatie recthtopzit. neeldscnerm op VMF Y#* /rslotte moet ie e„oor Zorgen dat e Dureabstoel /0 is ingesteld dat F 'echz" · lzer' ;e voet" plat ciD de grond 'US'en

2

Uw-6&7

. . . MI- -1

U- 1

1.-. h,»=

-,7 64-k- 1

_- 1

':-Ah - 1

Figure 2.5

Examples ofabrief textual answer (left) and anextended textual answer (right) withaninformative

visual

In the two text withan informativevisualconditions,wepresentedthebrief and extended textual answerstogether withaninformativevisual.Avisualwasinformative if it hadbeen_givenaninformative function inthe_production_{experiment. Figure 2.5} illustratesa

brief

textual answer andanextendedtextualanswer withaninformative

graphic to thequestion: "Howto organizea_{workspace in}orderto prevent RSI". Both

answer_{presentations include}a_graphic_depictingin detailanergonomic workspace. Both answer presentations would contain less

information if

the graphic was not present.

We made sure that the type

of

question did not affect the answer length for brieftextual answers (F [1,141 - 3.59, p = .08), nor for extended textual answers

(F< 1).Theillustrativeandinformativevisualsweretaken fromthe corpus

of

answer

presentationscollected in the production experiment. In a few cases, a visual was used from the Internet, when the corpus didnotcontainasuitable visual. Moreover,

in afewcases thetext

within

the visualswasenlarged to make it morereadable.

The experiment was conducted usingWWSTIM (Veenker, 2005), aCGI-based script that automatically presents

stimuli to

the participants and transfers all data to a database. This enabled us to run the experiment via the Internet. Theanswer

presentations

of

proceduralanddefinitionquestionswerepresented inonerandom

(41)

Procedure

The participants received an e-mail

inviting them to take part in

the experiment. This e-mail shortlystated the goal of the experiment, the amount of time it would taketoparticipate, the possibility to win a gift certificate, and the URL. Figure 2.6 illustratestheprocedure oftheevaluation experiment.

When the participantsaccessed the experiment, theyfirst received instructions about the procedure of the experiment. Inthese instructions,the participants were told thattheywouldreceivethe answerpresentations of16medical questions. They

had to study these answer presentations carefully, afterwhich they had to assess

them on their

informativity and

on their attractiveness. Next, the participants enteredtheirpersonal data (i.e.,age,gender, level

of

education,andoptionally their e-mail to win a giftcertificate).

After the participants had filled out their personal data, they practiced the procedure ofthe actual experiment in a practice session: theywere presented with themedical question "Where are redbloodcellsproduced?"together withananswer

presentation.Theparticipantsstudiedtheanswerpresentation

until

theythought that they couldassessitsinformativityand attractiveness. Subsequently, theparticipants were shown the medical question, the answer presentation, and a questionnaire.

In the _{unimodal (i.e., text} only) conditions, this questionnaire consisted

of

three questions addressingthe

formulation of

the answer presentation, the informativity

of

theanswerpresentation, and the attractiveness of the answer presentation. In the four texts withavisual conditions, theparticipants filled outthe above-mentioned questions andtwoother questions addressingthe informativity andtheattractiveness of the textandvisualcombination.The participantscould indicatetheirassessment

on a seven-point Likert scale,implemented asradio buttons. After completing the practicesession, theparticipants started with theactual experiment, proceeding in the same wayasduringthe practicesession.

After completing the assessment ofthe answer presentations to the 16 medical questions, the participants received a post-test: they had to answer the same

16 medical questions by means ofa multiple choice test, in which each medical question wasprovided with four textual answer possibilities.

Of

these fouranswer possibilities, one answer was correct and the other three were plausible incorrect

"

(42)

Production & evaluation of milltimodal information presentations Instructions Personal data r Practicesession -- Experiment . Question 1 + Answerpresentation 1

Question 1 and Answer

presentation 1 + Informativity and Attractiveness Questionnaire -Question 2 + Answerpresentation 2 T

Question 16 and Answer

presentation 16 + Informativity and Attractiveness Questionnaire < Post-test Figure 2.6

(43)

a. Testosteroneisasexhormonethatis_produced_{bymalesand femalesintheadrenal}

glands. Besides, malesproduce testosterone in thetestes.(correctanswer) b. Testosterone is a sex hormone that is only produced by males. Testosterone is

produced in thetestes and intheadrenal glands.(incorrectanswer)

c. Testosterone is a sex _{hormone produced by}males and females. Testosterone is produced in thepancreas and in the hypothalamus. (incorrectanswer)

d. Testosterone is a sex _{hormone produced by} males and females. Testosterone is produced in theadrenalglands. _(incorrect_answer)

The order in which the medical questions were presented in the post-test was the same as in the actualexperiment. Note that the information mentioned in the extended textualanswers, andillustrated intheinformativevisuals was not necessary to answerthequestion inthepost-test correctly.

Data processing

The following datawerecollected:theinformativity andtheattractiveness of the text

and visual combination ofthe answer presentations, and the number

of

correctly answered questions of the post-test. Tests for significance were performed using a

4 _{(answer presentation) x}2 (question type) repeated measuresanalysis

of

variance (ANOVA), with a significance threshold of .05. For posthoc tests, the Bonferroni method was used. The participants were randomly assigned to an experimental condition. Note that inconclusive results were

found for

answer presentations to questions with and withoutreference to body parts. Therefore, we do not report on thisanyfurther.

2.3.2 Results

Informativity of the text

andvisualcombinations

Table 2.8shows the mean results of theassessment on the informativity of the text and visual combinations. A main effect was found of answer _presentation format on the perceived informativity of the textand_{visual combinations, F 13,681 = 9.32,}

(44)

Production & evaluation of multimodal information presentations

notdiffer significantly fromextended answers with anillustrativevisual (p = 1.00). However,

brief

answers with an illustrative visual differed

_{significantly from}

both brief (p< .001) and extended (p< .005) answers with an informative visual.

Also, extended answers with an illustrative visual _{differed significantly from brief} (p< .025) and extended (p< .025)answers withaninformativevisual. No significant differences were found between brief and extended answers with an informative visual (p = 1.00).

Table 2.8

Mean results of the assessment on the informativity and the attractiveness of the four textand visual

combinations(Scores range from 1 = "very negative" to 7 = "very positive": standard deviations in

parenthesis).

Factor Question Text withanillustrative visual Text with an informative visual

type Brief Extended Brief Extended

Informativity of Definition 3.83 (1.13) 4.01 (1.30) 4.91 (.81) 4.97 (1.20) the text and visual Procedural 3.70 (1.26) 4.27 (1.18) 5.53 (.70) 5.40 (.84) combination Totals 3.76 (1.16) 4.14 (1.19) 5.22 (.69) 5.18 (1.00)

Attractiveness of Definition 3.93 (.87) 3.76 (1.14) 4.43 (.88) 4.69 (1.01)

the text and visual Procedural 4.18 (1.12) 4.18 (1.10) 4.95 (.84) 5.08 (.76) combination Totals 4.06 (.96) 3.97 (1.07) 4.69 (.75) 4.89 (.79)

Moreover, a main effectwasfound

of

_{question type on the perceived}informativity of the textand visual combinations, F [1,68] = 15.13, p< .001, 921, = .18.The answer presentations

of

procedural questionswere evaluated as moreinformative than the answer presentations

of

definitionquestions.

Finally,aninteractionwas_{foundbetween answerpresentation}format_andquestion

type, F [3,68] = 4.27, p< .01, 112, = .16. Thisinteraction can beexplained asfollows: for bothbrief (F [1,171 - 17.12, p< .005, 112p = .50) and extended (F 11,171 = 7.31,

Explorations in multimodal information presentation

Tilburg University