About maps and Gantt charts:

(1)

About maps and Gantt charts:

An empirical experiment that investigates the usability of a map-based

diagram and Gantt chart in spatio-temporal data exploration tasks

Master thesis Computer-Mediated Communication Daniël Houben S3473023 dr. L.M. Bosveld-de Smet 23 December, 2019

(2)

Preface

Voor u ligt de scriptie ‘About maps and Gantt charts: an empirical experiment that investigates the usability of a map-based diagram and a Gantt chart in spatio-temporal data exploration tasks’. Deze scriptie is geschreven in het kader van mijn afstuderen aan de masteropleiding Computer-mediated Communication aan de Rijksuniversiteit Groningen.

Het afgelopen jaar ben ik bezig geweest met het opzetten van het experiment, het exploreren en analyseren van de data en het schrijven van de scriptie. De focus van de scriptie ligt op de visualisatie van spatio-temporale data. Dit type data is vrij complex, waardoor het opzetten van een betrouwbaar experiment een uitdaging was. Samen met Tessa Huisman, die de verkregen data uit het experiment voor haar eigen scriptie heeft gebruikt, is het gelukt om een uitgebreid en betrouwbaar experiment op te zetten, waarbij is doorgebouwd op bestaande studies om nieuwe inzichten te verkrijgen.

Om bij te dragen aan het onderzoek over visualisaties heb ik de unieke kans gekregen om samen met mijn scriptiebegeleidster, Leonie Bosveld-de Smet, een artikel op basis van mijn scriptie te schrijven voor het Diagrams 2020 congres. Graag wil ik Leonie Bosveld-de Smet bedanken voor haar fijne begeleiding tijdens het hele scriptie-proces. De begeleiding en suggesties hebben het resultaat gemaakt tot wat het nu is. Ook wil ik graag mijn vrienden, en met name Lisa, Tessa en Dewi, bedanken voor hun support, ideeën en gezelligheid. Tijdens het gehele mastertraject waren we een onafscheidelijke groep waarbij het samenwerken vlekkeloos ging. Als laatste wil ik graag mijn ouders, broertje, zusje en vriendin bedanken voor hun steun en liefde waar ik altijd op kon rekenen. Hun motiverende woorden hebben bijgedragen om mij het beste uit deze scriptie te laten halen.

Ik wens u veel leesplezier toe.

Daniël Houben

(3)

Abstract

Visualizations of abstract data can help users understanding the meaning of the data. Different types of data can be visualized in different ways. Spatio-temporal data are complex, and involve data collected across both space and time. Due to this complexity, the data can be used for many different data exploration tasks. This study aims to find out which types of visualizations for spatio-temporal data support users best in different data exploration tasks. A comparison is made between two visualizations of spatio-temporal data; the map and the Gantt chart. An empirical experiment was set up to evaluate the usability of both visualizations with respect to different data exploration tasks. Thirty-two query statements corresponding to five different data exploration tasks were created. Forty individuals participated in the experiment and were asked to check whether the query statement was true of false with the help of the given visualization; the map or the Gantt chart. A within-subject design was chosen; hence all participants were exposed to both the map and the Gantt chart. The analysis of the results shows little difference in the response accuracy of the map and the Gantt chart. Response times show more divergence. For some data exploration tasks, the map and the Gantt chart perform equally well regarding completion time. In most most data exploration tasks, the map has however a significantly shorter completion time than the Gantt chart. The results show that the map and the Gantt chart are both viable options for visualization of spatio-temporal data regarding the response accuracy, but the map turns out to support users better with respect to completion time in most data exploration tasks. An explanation for this result may be the observation that map-based visual representations are cognitively less demanding than more abstract, timeline-based visualizations, such as the Gantt chart.

(4)

Index

PREFACE 2

ABSTRACT 3

INDEX 4

LIST OF FIGURES AND TABLES 6

FIGURES: 6

TABLES: 6

1. INTRODUCTION 8

1.1IMPORTANCE OF VISUAL REPRESENTATIONS 8

1.2PROBLEM STATEMENT 10

1.3RESEARCH QUESTION AND SUBQUESTIONS 11

1.4READING GUIDE 11

2. THEORETICAL BACKGROUND 12

2.1VISUAL REPRESENTATIONS 12

2.1.1 Visual communication: application and graphical domain 12

2.1.2 Taxonomies of visual representations 13

2.2VISUALIZING TIME,SPACE AND OBJECTS 15

2.2.1 Visualizing time 15

2.2.2 Visualizing space 17

2.2.3 Visualizing objects 18

2.3SPATIO-TEMPORAL DATA 20

2.3.1 Question types and search levels 21

2.3.2 Visualizing spatio-temporal data 23

2.4HYPOTHESES 26

3. METHODOLOGY 27

3.1DESIGN 27

3.1.1 Dependent and independent variables 27

3.1.2 Operationalization dependent variables 27

3.2PARTICIPANTS 27

3.3MATERIALS 29

3.3.1 Visualizations 29

3.3.2 Experimental query statements 32

3.3.3 Survey design 36

3.4PROCEDURE 38

3.4.1 Controlled Setting 38

3.4.2 Procedure 39

3.5PRE-PROCESSING OF RESULTS 39

3.5.1 Checking for outliers 39

3.5.2 Data preparation for statistical analysis 40

4. RESULTS 42

4.1GLOBAL RESULTS 42

4.1.1 Basic global results 42

4.1.2 Global results for visualization complexity 43

4.2RESULTS PER DATA EXPLORATION TASK 45

4.2.1 Response accuracy per data exploration task 45

4.2.2 Completion time per data exploration task 46

4.3RESULTS PER COMPLEXITY AND DATA EXPLORATION TASK 48

4.3.1 Response accuracy per complexity and data exploration task 48 4.3.2 Completion time per complexity and data exploration task 50

(5)

4.4COMPARING COMPLEXITY RESULTS FOR THE SAME VISUALIZATIONS 52

4.4.1 Response accuracy per complexity and visualization type 52

4.4.2 Completion time per complexity and visualization type 53

4.5OVERALL PERFORMANCE 54

4.5.1 Global overall performance 54

4.5.2 Overall performance per data exploration task 55

4.6SUMMARY OF RESULTS 57

5. CONCLUSION AND DISCUSSION 58

5.1GLOBAL PERFORMANCE 58

5.1.1 Response accuracy 58

5.1.2 Completion time 58

5.1.3 Influence of a guess due to the query statements 58

5.2PERFORMANCE PER DATA EXPLORATION TASK 59

5.3DISCUSSING THE MAIN OUTCOMES OF THIS STUDY 59

5.4LIMITATIONS AND FUTURE RESEARCH 61

5.4.1 Participants 61

5.4.2 Query statement types and their data exploration tasks 61

5.4.3 Data exploration strategies 62

5.4.4 Visualization limitations 62

REFERENCES 63

APPENDIX 66

APPENDIX A|HYPOTHESES 66

A.1: General hypotheses 66

A.2: Hypotheses per data exploration task 66

A.3: Hypotheses per complexity and data exploration task 68

A.4: General hypotheses (complexity differences between the same visual representations) 72

A.5: Overall performance 73

APPENDIX B|FINALIZED VISUALIZATIONS VARIANTS 75

B.1: Map 75

B.2: Gantt chart 77

APPENDIX C|STATEMENTS PER STATEMENT TYPE 79

C.1: Gantt chart 79

C.2: Map 80

APPENDIX D|INFORMED CONSENT (NL) 82

APPENDIX E|OUTLIERS 83

APPENDIX F|RESULTS PER HYPOTHESES 85

F.1: General hypotheses 85

F.2: Hypotheses per data exploration task 87

F.3: Hypotheses per complexity and data exploration task 97

F.4: General hypotheses (complexity differences between the same visual representations) 121

(6)

List of figures and tables

Figures:

Figure 1: Scatterplot representations of the four datasets of Anscombe’s Quartet ... 9

Figure 2: The visual communication model of Wang (1995) depicting the three parts of visual communication ... 12

Figure 3: Card’s Reference Model of Visual Communication (Card et al., 2007) ... 13

Figure 4: Scientific visualization of weather information ... 14

Figure 5: The structure of time. A: linear; B: cyclic ... 16

Figure 6: A simple Gantt chart for visualizing time intervals using the length of lines.... 16

Figure 7: Example of a more complex Gantt chart for tracking project progress ... 17

Figure 8: Abstract height data concretized by overlaying the data over a map ... 18

Figure 9: Google Maps; a popular map visualization ... 18

Figure 10: Both images share an iconic relation with the real physical object; a car ... 19

Figure 11: The blue dot is an index or symbol of an object it represents ... 20

Figure 12: Examples of the map (A) and Gantt chart (B) for visualizing spatio-temporal data (Kriglstein, 2016) ... 24

Figure 13: Within-group design of the study ... 27

Figure 14: Simplified map (A) and the simplified Gantt chart (B) ... 32

Figure 15: Data exploration task distribution ... 33

Figure 16: Example of the statement presentation in Qualtrics ... 37

Figure 17: Testing environment and schematic illustration of positioning ... 39

Figure 18: Both the map and Gantt chart datasets approach a normal distribution. ... 40

Figure 19: Both the map and the Gantt chart response accuracy data have a normal distribution ... 43

Figure 20: Both the map and the Gantt chart completion time data have a normal distribution ... 43

Tables:

Table 1: A tabular representation of the four datasets of Anscombe’s Quartet sharing the same summary descriptive statistical characteristics ... 9

Table 2: Information visualization of weather information ... 14

Table 3: Finer-grained information visualization ... 15

Table 4: Question types, reading levels, cognitive operations and their focus ... 23

Table 5: Overview of the strengths and weaknesses of the map and Gantt chart ... 25

Table 6: Participant characteristics ... 28

Table 7: Independent variables in the visualizations ... 30

Table 8: Low complexity vs high complexity data variations ... 31

Table 9: All visualization variations ... 32

Table 10: Data exploration tasks and their question types, their reading levels, their cognitive operations and their focus as used in this thesis ... 34

Table 11: Query statement formulation per type: the number of statements in category, rules, sentence order and examples ... 35

Table 12: Global performance measures for Gantt and Map visualizations ... 42

Table 13: Global Response accuracy performance measures for Gantt and Map visualizations with low and high complexity ... 44

Table 14: Global Completion time performance measures for Gantt and Map visualizations with low and high complexity ... 45

(7)

Table 15: Response accuracy performance measures per data exploration task for both the

Gantt and map visualizations ... 46

Table 16: Completion time performance measures per data exploration task for both the Gantt and map visualizations ... 46

Table 17: Response accuracy performance measures per data exploration task for both the Low complex Gantt and map visualizations ... 48

Table 18: Response accuracy performance measures per data exploration task for both the High complex Gantt and map visualizations ... 49

Table 19: Completion time performance measures per data exploration task for both the Low complex Gantt and map visualizations ... 50

Table 20: Completion time performance measures per data exploration task for both the High complex Gantt and Map visualizations ... 52

Table 21: Response accuracy performance measures between the low and high complex visualizations ... 53

Table 22: Completion time performance measures between the low and high complex visualizations ... 54

Table 23: Global overall performance measures for Gantt and map visualizations... 54

Table 24: Overall performance measures per data exploration task for both the Map and the Gantt chart ... 55

Table 25: Overview of significant results ... 57

Table 26: IQR Response accuracy and Completion time ... 83

(8)

1. Introduction

1.1 Importance of visual representations

The history of communication abounds with examples of visual communication for telling stories or for providing information of any nature. The first people used cave paintings to tell stories, which later evolved into canvas paintings and stained-glass windows. Nowadays, stories can be told in numerous ways: from paintings in museums to pictures in handbooks and abstract diagrams in scientific journals. Visual communication always serves a purpose (Tversky, 2013). Visual communication provides a language for communication which has to be understood by others and oneself, making it a social phenomenon.

Nowadays, most complex visualizations for displaying information are computer generated. Computer support makes it possible to easily create multiple data views; i.e. different visual representations based on the same data set, with each visualization allowing for a different view on the data or a complete different communicative purpose. This approach of data visualization is adopted by Card, Mackinlay and Shneiderman (2007, 7), who define visualization as the “use of computer based, interactive visual representation of data to amplify cognition”. According to Card, Mackinlay and Shneiderman (2007), a visualization can enhance and amplify cognition and give an insight into the data that generates new ideas, or an unexpected solution to a problem.

The importance of visualization can clearly be demonstrated by referring to Anscombe’s Quartet, a collection of four datasets of eleven pairs of data points X and Y. These four datasets have the same statistical results with respect to the total sum, mean, variance and standard deviation (see Table 1). The data of Anscombe’s Quartet can be visualized in different ways. Two options for representing these data are a table and a scatterplot. Table 1 shows the data in a tabular representation, while Figure 1 shows the same data as a scatterplot. The table allows the viewer to focus on local numerical information of data points, and compare these to other data points. In contrast, the scatterplot in Figure 1 allows to see patterns of arrangements of all data points with respect to each other. Different visual presentations of the same data set provide different views on the data leading to a more complete picture. Visualizations are essential for correct interpretation (Anscombe, 1973) of data, and may amplify cognition (Card, Mackinlay & Shneiderman, 2007).

Not all types of data are well suited for being mapped to certain visual representations. Different types of data require a different approach to visualize them, in line with the message this data image intends to convey to its viewers. It depends on the communicative purpose of a visualization to visual representation, data should be mapped.

(9)

Table 1: A tabular representation of the four datasets of Anscombe’s Quartet sharing the same summary descriptive statistical characteristics

Set 1 X Y Set 2 X Y Set 3 X Y Set 4 X Y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

Descriptive statistical results of Anscombe’s Quartet

Sum 99.00 82.51 99.00 82.51 99.00 82.50 99.00 82.51 Mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50 Variance 11 4.125 11 4.125 11 4.125 11 4.125 Standard Deviation 3.22 2.03 3.22 2.03 3.22 2.03 3.22 2.03

Figure 1: Scatterplot representations of the four datasets of Anscombe’s Quartet

Visualizations can help users with different tasks or queries. For example, a visualization of a ship moving from one part of the world to the other can help solving queries about the route traveled. Anscombe’s Quartet depicts the same data in different ways, and shows that the type of task of the user is important for choosing the appropriate

(10)

visualization. A table view can give a more focused and local view, that provides users with an ordered list of information and allowing them to find specific, precise numeric information. A representation, like the scatterplot in Figure 1, on the other hand has so called meaning derivation properties (Shimojima, 2004). The scatterplot has indeed the capacity to express semantic contents that are not defined in the basic semantic properties of each data point, but are derivable from them. The scatterplot makes clear at a glance the variability of its data points, merely by showing the arrangement of the dots in the two-dimensional plane. The scatterplot gives the viewer a direct overview of the data, and shows information (e.g. patterns) that wouldn’t be directly visible in the table that depicts numbers in rows and columns. Which visualization of the data is best to view depends on the type of task to be accomplished, and the communicative purpose of the visualization.

An interesting combination of data types is formed by spatio-temporal data, i.e. data that relate to both space and time. A simple example of spatio-temporal data is the movement of a person to different locations over time. Visualizing spatio-temporal data can be done in numerous ways; each with different advantages and disadvantages. Two possible informationally-equivalent visualizations that support the exploration of spatio-temporal data are a map-based representation (that will be called map from now on) and a timeline-based one, the so-called Gantt chart. These two visualizations are very different in the way they depict spatio-temporal data, as will be shown in subsection 2.3.2 of Chapter 2. Multiple data exploration tasks can be performed with the map and the Gantt chart. We expect that some tasks will be better performed with the map than with the Gantt chart, and the other way around. In this thesis, we attempt to discover which visualization is best suited for which data exploration tasks.

1.2 Problem statement

Spatio-temporal data are complex data. They involve data collected across both space and time. These data are always linked to some object, entity, or agent. Due to this complexity, the data can be used for a lot of different queries supporting different data exploration tasks (which will be elaborated in more detail in subsection 2.3.1). Many work fields use spatio-temporal data one way or the other. Think for example of school or project schedules or delivery services that move packages from locations over time. A simple example for better understanding spatio-temporal data can be the movement of a newspaper from one location to another over time. A newspaper will be delivered by the delivererearly in the morning. At one point, the newspaper is picked up by its receiver who will put it on a table, and pick it up for reading in the garden in the afternoon. After being read, the newspaper will end up in the wastepaper trash in the evening. This example shows the change of locations of one object, a newspaper, over time; we are dealing here with spatio-temporal data.

While a lot of research has been done on spatio-temporal data, it is still not clear which types of visualizations for spatio-temporal data support users best in data exploration tasks. The study reported on in this thesis intends to contribute to this body of research by investigating two specific visualizations of spatio-temporal data; the map and the Gantt chart. This research also has an important practical relevance. Because spatio-temporal data form a complex concept, research on visualizing spatio-temporal data could help designers of visualization tools to recommend which visualization would be the best choice for users

(11)

in order to complete certain tasks as effectively and efficiently with this complex type of data.

1.3 Research question and subquestions

The main research question of this research is: What is the best performing visualization for spatio-temporal data exploration tasks: the map or the Gantt chart? To find an answer to this question, the following subquestions guide this research:

1) Which one of the two visualizations (map or Gantt chart) is most effective, i.e. induce the least number of errors?

2) Which one of the two visualizations (map or Gantt chart) is most efficient, i.e. allow to perform tasks in the least time?

3) Does the number of spatio-temporal data (i.e. its complexity) influence the effectivity and efficiency of the visualizations?

4) Which data exploration tasks (formulated as query statements) are best supported by either the map or the Gantt chart?

1.4 Reading guide

Chapter 2 provides a theoretical background for this research. In 2.1, relevant information on information visualization, visual communication and the taxonomy of visualizations is presented as an introduction to the subject of this thesis. Next, visualization of time, space and objects will be introduced separately in 2.2 and in combination with each other in 2.3. Based on the strengths and weaknesses of the different visualizations of spatio-temporal data studied in this experiment, hypotheses will be formulated in 2.4.

An empirical experiment has been designed. The design process is explained in chapter 3. This chapter will elaborate on the methodology used to experimentally test the hypotheses and which choices have been made along the way.

The results of the empirical experiment are provided in chapter 4. 4.1, 4.2 and 4.3 will compare two visualizations. 4.1 will give a broad and global perspective on the performance measures (the response accuracy and efficiency) of both visualizations. 4.2 and 4.3 will have a narrower perspective by analyzing specific subsets of the data. The results given in 4.2 focus on the comparing the performance measures per data exploration task, while chapter 4.3 has an even more narrow perspective by comparing the performance measures per data exploration task and visualization complexity. 4.4 shows the influence of complexity on the performance measures in the same visualization type. At last, the results of the overall performance in will be summarized in 4.5. Answers to the research question (and sub questions) will be given in Chapter 5. The main outcomes will be discussed in the context of related studies. A discussion of the remarkable findings will be given in this chapter as well. Chapter 5 also point out the limitations of various aspects decided on in the experimental setup for the comparison of the map and the Gantt chart. It will point out their potential effects on the experiment outcomes. Chapter 5 will end with suggestions for future research.

(12)

2. Theoretical background

This thesis attempts to find out which one of two possible static visualization of spatio-temporal data is best suited for helping users to perform data explorations tasks effectively and efficiently: the map or the Gantt chart. The data involved in these exploration tasks are spatio-temporal data. The map and the Gantt chart both allow to communicate visually about this complex combination of data. Any visual communication involves visual representations that correspond to entities and links between them in some application domain. We will start this chapter by introducing visual representations in general, and some of the taxonomies proposed in the literature. Next we will show how a specific set of application domain data is visualized. This set includes space, time, and objects. The visualization of each of these will be discussed individually, and in combination with each other. Finally, we will point out the strengths and weaknesses of two possible visual representations for spatio-temporal data, which are commonly used in practice: a space-oriented visualization, which we will call a map, and a time-space-oriented one, which is the Gantt chart. This comparison is guided by several studies discussing spatio-temporal data in general, and by studies of map-based visualizations and Gantt charts in particular.

2.1 Visual Representations

2.1.1 Visual communication: application and graphical domain

Wang (1995) introduces visual communication as the communication about something with the aid of pictures. She points out that visual communication involves three important components: the graphical domain, the application domain and the link between these two domains, as shown in Figure 2.

Figure 2: The visual communication model of Wang (1995) depicting the three parts of visual communication

The graphical domain contains the graphical objects making up the picture. The application domain holds the actual problem to be visualized, i.e. the data or information to be conveyed to the viewer of the picture. The link between these two domains is an essential component as well, as a picture without a link to its application domain is only able to depict spatial concepts. The link between the graphical and application domain is the mapping of the data in the application domain to graphical objects in the graphical domain. There are certain conditions that have to be met so that a viewer of the picture is able to establish successfully the link between these two domains. First, the picture should have a natural link with its application domain, which means that people are able to associate the picture to the information in the application domain in a way that feels natural for them. Second, the link should not be misleading. A misleading link could lead to misinterpretations due to a graphical representation which gives an incorrect depiction of

(13)

the information in the application domain. Without a correct mapping, the visual representation in the graphical domain is not representative of the actual data in the application domain and could lead to miscommunication (Wang, 1995).

Card, Mackinlay and Shneiderman (2007) also highlight the importance of the link or mapping between a visualization and the data it represents. They focus on computer-supported visualizations and define these as mappings from data to visual form that can be adjusted by a human perceiver via interaction with the interface that presents some view of the data. They propose a reference model that gives a clear overview of the steps needed for mapping data to adjustable visual representations. This model is shown in Figure 3.

Figure 3: Card’s Reference Model of Visual Communication (Card et al., 2007)

The transformation from raw data and data tables to visual representations and views and the importance of visual mappings shows overlaps with the simpler model of Wang (1995). The raw data and data tables can be considered to belong to the application domain and the visual representations and views are part of the graphical domain.

2.1.2 Taxonomies of visual representations

Visual representations are classified according to different criteria. Hegarty (2004) makes a first distinction between external and internal visualizations. External visualizations can be further subdivided into iconic visualizations, schematic ones, or ones primarily based on numbers, such as bars and graphs (Novick, 2006, citing Hegarty, Carpenter, and Just, 1991). Card, Mackinlay and Shneiderman (2007), focusing on computer-supported visualizations, make a distinction between scientific and information visualizations. With respect to spatio-temporal data, it is also important to point out that visualizations can be static and dynamic. Computer-supported visualizations allow for interaction and are typically dynamic.

For this thesis, the difference between scientific and information visualizations are of particular interest. Scientific visualizations use the representation of the physical environment (for instance the globe) as a plane on which more abstract data (for instance temperatures) are projected. In a scientific visualization, “what is seen by the user primarily relates to a physical ‘thing’” (Spence, 2007, 12). Scientific visualizations share features with the “real world”. An example of a scientific visualization is a map that maps locations to a representation of the real world. Figure 4 is an example of a scientific visualization. In Figure 4, weather information (temperature deviations from normal mean temperatures at a specific time in different countries) is laid over a map that relates to the physical world. Color is used to show temperature deviation differences in different countries. The colors

(14)

are chosen based on the physical perception (i.e. sensation) of temperatures, that is according to the so-called hue heat hypothesis (Ziat, M., Balcer, C. A., Shirtz, A., & Rolison, T., 2016). The picture tells us that in the middle European countries, temperatures deviate considerably from what is normally expected at a specific date.

Figure 4: Scientific visualization of weather information

While scientific visualizations focus on physical things in order to represent more abstract data, information visualizations are primarily concerned with the abstractness of data. An information visualization is based on abstract concepts, such as prices, family relations, countries, which do have an association with physical things such as pound coins, human beings, and (part of ) the globe of the earth, but do not require these physical things to be portrayed. An information visualization does not include any connection to physical, real-world representations to make the data understandable (Card et al., 2007; Spence, 2007). An example of an information visualization is a table, line graph or bar chart. The same information of the weather map in Figure 4 could also be shown in a simple table, as in Table 2. To keep a clear overview, Table 2 represents a subset of the data depicted in the scientific visualization given in Figure 4.

Table 2: Information visualization of weather information

Country Temperature (c) Hotter/Colder relative to normal (c) Scotland 18 -1 Ireland 16 -2 England 23 0 France 22 -3 Spain 31 -2 Portugal 32 -2

When we compare Figure 4 and Table 2, it is clear that both visualizations have advantages and disadvantages. While the scientific visualization can gradually depict the

(15)

different temperatures in small zones on the map, the information visualization in Table 2 restricts this information to one big zone: a single country. A finer-grained choice could however be made as well, as shown in Table 3. In contrast to the map-based view, the table view requires a selection of zone.

Table 3: Finer-grained information visualization

Country Temperature (c) Hotter/Colder relative to normal (c) Edinburgh (Scotland) 17 -1 Glasgow (Scotland) 16 -2

An information visualization is more abstract and less iconic than a scientific visualization, but less abstract as a textual representation of information in the form of descriptive text or mathematical formulae.

Another interesting classification of visualizations worth mentioning is the one introduced by Hegarty (2004), between internal and external visualizations. Visualizations do not have to be present in the real world, and be tangible. People can also visualize internally, which means that they create mental images of the real world or abstract information, and that they infer new information using these mental images. Hegarty (2004) suggests that external visualizations are often supported by internal visualization activities, implying that both external and internal visualizations complement each other.

2.2 Visualizing Time, Space and Objects

This thesis focuses on the visualization of spatio-temporal data. These data involve time, space and objects. These concepts differ from each other in fundamental ways and allow for mappings to different visual structures. To be interpretable for humans, these mappings should be as natural as possible.

2.2.1 Visualizing time

Time is a highly abstract and complex concept (Kriglstein, S., M. Pohl, and M. Smuc, 2013). The characteristics of temporal data in the application domain determine their mapping to graphical objects in the graphical domain. The most important characteristic of time is its structure. The structure of time can be interpreted as a linearly ordered set of time points, where a time point is considered as a specific moment in time (see Figure 5A). Time can also be considered to be cyclic and recurring (see Figure 5B) (Aigner, W., Miksch, S., Müller, W., Schumann, H., & Tominski, C., 2007). The recurring of the four seasons can be seen as a cyclic phenomenon, while in the natural perception of time, time is perceived as a linear phenomenon: “time proceeds from the past to the future” (Aigner et al., 2007, 403). Time is experienced as a continuous unidirectional phenomenon, involving change of events and objects (Boroditsky, 2000). This thesis approaches the perception of time as a linearly ordered set of time points, as shown in Figure 5A.

(16)

A) B)

Figure 5: The structure of time. A: linear; B: cyclic

Kriglstein et al. (2013) point out that time can be represented as space and as time. In the latter case, the representation is dynamic, as time visualized as time can only be represented as an animated picture, showing the course of time that may modify objects in some way. In static visualizations time can only be represented by space. Space in this sense is used as a metaphor. The most obvious metaphor used to visualize time using the space metaphor are timelines. The metaphorical use of space is a way to display complex abstract notions, such as time, in a simplified way (Blandford, A., Faisal, S. & Attfield, S., 2013).

For the natural visual representation of time as a timeline, generally two options are available for visualization: the time point on the timeline and the interval between two time points on the timeline (Aigner et al, 2007). The intervals between time points can differ considerably, from seconds, days, months to years, or any other possible interval. Kriglstein et al. (2013) share the view of Aigner et al. (2007). A timeline contains time points in their chronological order, making it a robust way to reason about time. While time points occupy little space, time intervals can require more space in the visualization. Common ways to visualize time intervals are the use of lines to show their duration (see Figure 6). In figure 6, you clearly see that an interval has a start time point and end time point. The intervals are clearly marked by time points as small circles. Each line represents a different project, and the colors of the lines correspond to the month the projects have started. In the representation of time by time, this would be shown by different speed of animations (Kriglstein et al, 2013). Time intervals can also be visualized as ordered time categories, e.g. morning, noon, afternoon and evening.

(17)

Figure 6 is a version of a so-called Gantt chart. The Gantt chart has been developed in 1910 by Henry Laurence Gantt, and was meant to illustrate project schedules. The Gantt chart makes use of columns and rows. The columns represent time intervals, and these time intervals are ordered on a timeline that is represented by the x-axis. This x-axis can be on top of the chart, or at the bottom. Another example of visualization of project scheduling, where the elements of a Gantt chart are integrated, is given in Figure 7. This example was downloaded from Microsoft Excel templates. The rows on the y-axis are divided in broad projects and subprojects, and color labels are used to make an even more clear distinction between the projects. The rows represent activities or project phases (Wilson, 2003). The rows of a Gantt chart can be split in broad projects and their subprojects. This makes it helpful to keep an overview of the tasks. In this more advanced Gantt chart, the progression of each task can also be displayed as a percentage, which is also color labeled in the Gantt chart bars. Additionally, the start and end of each task is shown in two ways; as a date and as a bar. These additions show that a Gantt chart can be simple as in Figure 6, but can also extended with more features as in Figure 7.

Figure 7: Example of a more complex Gantt chart for tracking project progress

2.2.2 Visualizing space

A visualization can be seen as “a representation of information in a visual-spatial medium” (Hegarty, 2004, 1). This definition implies that the mapping of any data is to space.

To visualize space, a ‘map’ of any physical object is the most obvious visualization to choose. The object can be a room, building, city, or the globe. A map can show space geographically and provides a natural mapping to the physical world (Kessel & Tversky, 2011; Blandford, Faisal, & Attfield, 2013). Maps are not only helpful to depict geographical information, but can also help to show more abstract data. By overlaying data such as weather information (see subsection 2.1.2) or heightmaps (see Figure 8), abstract data become more concrete due to their mapping to the physical world (Skupin, 2004).

(18)

Figure 8: Abstract height data concretized by overlaying the data over a map

A widely known example of a map visualization is Google Maps; a web-based service that provides users open access to a visual representation of the real world (Google Maps, n.d.). When looking at Google Maps, geographical locations are marked by name and or color that help distinguish areas and ground structures, such as roads, grass and buildings (see Figure 9).

Figure 9: Google Maps; a popular map visualization

2.2.3 Visualizing objects

The visualization of objects can differ considerably, because objects vary a lot between each other. Objects can be animate objects, such as humans and animals, or inanimate objects, such as cars and hats. When objects are humans, these objects are often referred to as agents (Kessel, 2013), people, or population (Andrienko, N., & Andrienko, G., 2006).

(19)

One way to reason about the representation of objects is by means of semiotics. Semiotics is the study of the interpretation of signs and how these signs are created. A sign can be words, sounds, images. According to research in semiotics, signs derive their meaning from how they relate to the actual reference (Vickers, Faith, & Rossiter, 2013). Peirce (1902) differentiates between three types of relationships between the sign and the actual object it is referring to: 1) an iconic relation, 2) a symbolic relation or 3) an indexical relation. An icon has a clear physical relation with the object. An iconic relation can be highly detailed, like a photograph, or a schematic representation resembling the object. An example is shown in Figure 10 with the same object as an icon. Figure 10A is a highly detailed picture, while Figure 10B is a simple drawing that still keeps a physical relation with the real object.

A) B)

Figure 10: Both images share an iconic relation with the real physical object; a car

A symbol on the other hand does not have any physical relation with the object it refers to. Characteristic for symbols is that the relation between the symbol and the object has to be learned. The most recognizable example is the alphabet. There is no clear relation between the textual symbols used and the sounds or meanings these symbols represent. Due to conventions and common use of symbols, people will automatically connect the symbol to its meaning (Port, 2000). A name is a symbol for a person. Nothing about the cross symbol used in the upper right corner in Windows computer interfaces relates to its meaning: “closing an open interface window”, but the consistent use of this symbol makes it a convention.

At last, indices are used to describe the relation between the index and the object. The meaning of an index “points” to the object. A graphical representation of a dark cloud with drops of water points to rain and smoke could index fire. Indices don’t need to have a physical relation with the object, as is the case with icons. It can be the symbolic representation of a person by its written name.

In this context it should be pointed out that it’s not always clear what graphical objects in a visual representation stand for. Consider for instance the blue dot in Figure 11. It may correspond to any object at a certain location on a map of part of the city Groningen. As a consequence, it is neither clear whether the blue dot is a symbol or an index. It’s not an icon, as an icon would reveal the actual object referred to.

(20)

Figure 11:The blue dot is an index or symbol of an object it represents

While objects are often depicted as static elements, objects could be moving in space and time. For example, when an object represents a person, this person could have dinner at one place and go to the cinema at a later moment. When visualizing the movement of objects through space and time, we refer to this combination of data types as spatio-temporal data.

2.3 Spatio-temporal data

Spatio-temporal data are data that relate both to space and time. These data describe phenomena or objects in a certain location at a certain time. Consider for example the shipping movements across a geographic area over time, the daily prices of a stock on a stock market, the movement of a delivery van on its daily schedule, or the changes of clouds over time.

Andrienko, N., Andrienko, G., & Gatalsky, P. (2005) propose a classification of spatio-temporal data that is broadly applicable. This classification is based on the kind of changes phenomena or objects are subjected to over time. Andrienko et al. (2005) distinguish between:

(i) existential changes, that is appearance and disappearance;

(ii) changes of spatial properties: shape, size, orientation, altitude, height, gradient and volume;

(iii) changes of thematic properties expressed through values of attributes: qualitative changes and changes of ordinal or numeric characteristics.

Spatial objects undergoing existential changes are events, which may be momentary (e.g. earthquakes) or durable (e.g. wars).

In this thesis time and space are situational attributes that are linked to an object. Kessell and Tversky (2010) prefer to talk about agents rather than objects. An agent can move from one location to another across time, changing his or her spatial property of orientation (see (ii) above). As one agent cannot be at different locations at one single time, but can stay at one location at different times, time changes may but need not be linked to space changes.

Time values may be described in ways that are more or less fine grained, i.e. they can be described at different levels of abstraction depending on the accuracy required or the available knowledge. Time information can be modelled with respect to differently grained temporal domains. One can use different time units, e.g., minutes, hours, days and

(21)

years, to represent time quantities in a unique flat temporal model, but one can also express time from a layered view, switching from one temporal domain to a coarser or finer grained one. This ability of providing and relating temporal representations at different “grain levels” of the same temporal reality is not exploited in this research. The description of a time moment or interval can be done in a numeric way (at 14:00; from 2 to 4 p.m.; in 2019), but also by reference to temporal categories (in the evening, in summer). Space values can be described in a similar way, depending on the granularity chosen. It may be expressed as a precise location, a town, district or country. Also, one can use numbers referring to the latitude and longitude coordinates of a particular location. Locations are usually expressed in a nominal/categorical way (in Amsterdam, at my home, in the university library).

2.3.1 Question types and search levels

Bertin (1983) introduces a typology of potential information needs, questions, or tasks, based on components present in data and so-called reading levels. The different components present in data allow for asking different kinds of questions, depending on the component in focus. Bertin states that “there are as many types of questions as components in the information” (Bertin, 1983, p.10). Each question focuses on a different unknown entity; hence a question type is defined by the unknown entity. Andrienko et al (2005) name this unknown entity the search target. Bertin’s typology also includes a reading level: does the question refer to a single data element, to a group of elements, or to the whole phenomenon characterized by all elements together? The scope of the question determines its reading level. While Bertin (1983) introduces his typology for arbitrary data, Peuquet (1994) (cited by Andrienko et al., 2005) specifically considers spatio-temporal data. She distinguishes three search targets for spatio-temporal data:

(i) space – where (ii) time – when (iii) objects – what

Based on these three possible search targets, Peuquet (1994) (cited by Andrienko et al., 2005, 204) distinguishes the following three ways in which one can query spatio-temporal data:

• when + where → what: Describe the object or set of objects that are present at a given location or set of locations at a given time or set of times. • when + what → where: Describe the location or set of locations occupied by a

given object or set of objects at a given time or set of times.

• where + what → when: Describe the time or set of times that a given object or set of objects occupied a given location or set of locations.”

For example, in the question type described as when + where → what, the object or set of objects constitute the unknown entity, the search target.

Bertin (1983) gives a further categorization of the question types. He states that there are different reading levels which define the set of data a question applies to. The reading levels can be categorized as: elementary, intermediate and overall. A question with an elementary reading level contains only individual elements of entities; only one location, one time point and one object. An intermediate reading-level question on the other hand

(22)

contains a subset of the data; for example, two objects. A question with an overall reading level considers the whole dataset; for example: all locations.

Another division that can be made when arguing about question types, is a division based on the cognitive operationsrequired to perform data exploration tasks. Andrienko et al (2005) categorize the cognitive operations as either identifying or comparing. They state that the cognitive operation of comparing means “establishing relationship of various kinds, including temporal relations” (p. 206). This means that one could “compare” the order of appearance of entities (sequence) or if entities occur together (synchronization).

Based on terminologies and classifications just explained, Table 4 provides a complete overview of question types, reading levels and cognitive operations.

(23)

Table 4: Question types, reading levels, cognitive operations and their focus Question type Reading level Cognitive operation Focus when + what → where

Elementary Identify Location

where + what → when

Elementary Identify Time

when + where → what

Elementary Identify Object

when + what → where

Intermediate Identify More than one location

Intermediate Identify More than one time

Intermediate Identify More than one object

when + what → where

Overall Compare All locations

Overall Compare All times

Overall Compare All objects

To experimentally test which visualizations best support which data exploration tasks, Kessell and Tversky (2010) use query statements that are either true or false. These query statements contain the entities of spatio-temporal data and describe a scenario. Changing the sentence order of a query statement or generalizing an entity (e.g. today instead of a precise timeframe), focus is put on different entities. The collection of question types in Table 4 can be converted to query statements. There can be a specific reference to the object entity in a query statement by giving, for example, the object’s name. It is also possible to extract a specific feature that characterizes the object or set of objects, and use for example the cardinality of a set of objects in a query statement.

2.3.2 Visualizing spatio-temporal data

Two commonly used options for the visualization of spatio-temporal data are a map-based representaion and a timeline-based one, such as the Gantt chart. The way these

(24)

visualizations map the data from the application domain to the graphical domain is substantially different.

The map-based visualization (map for short) includes the most natural visualization of space, while the Gantt chart incorporates the most natural visualization of time. The map maps locations onto a real world. The timeline used in the Gantt chart represents temporal data in a linearly ordered way that feels natural for its viewers (Kriglstein, S., Haider, J., Wallner, G., & Pohl, M, 2016; Andrienko et al, 2006). A consequence of this focus is that time in the map-based representation, and space in the Gantt chart still need to be represented visually in a way that is understandable for their viewers. For both visualizations objects need to get a suitable visual representation.

Though there are more visualizations possible for visualizing spatio-temporal data, this thesis will focus on the difference between these two specific visualizations, which emphasize different data by their visualization of either space or time.

2.3.2.1 Mapping spatio-temporal data to the map and the Gantt chart

This subsection will show how spatio-temporal data can be represented in a map-based or Gantt chart visualization. Figure 12 illustrates the options that are used in the visualizations created for the experiment conducted in this study. These options are inspired by Kriglstein (2016). The spatio-temporal data visualized in Figure 12 consist of three locations (A, B and C), three time intervals (5:00-6:00, 6:00-7:00 and 7:00-8:00) and two objects (object 1 encoded as dark blue, and object 2 encoded as light blue). The scenario depicted is the following:

(i) Object 1 and object 2 are both at location A from 5:00 to 6:00; (ii) Object 1 is at location B from 6:00 to 8:00;

(iii) Object 2 is at location C from 6:00 to 8:00;

A) B) Legend

Figure 12: Examples of the map (A) and Gantt chart (B) for visualizing spatio-temporal data (Kriglstein, 2016)

Objects

Objects are most naturally depicted at the location where they are present at a certain time interval. Objects themselves can be represented in multiple ways, for instance as iconic pictures resembling the objects referred to, as simple dots, or as names. For both the map and the Gantt chart, the most suitable way to visualize objects is by giving them a color code, with a legend with names of the objects involved, as shown by Figures 12A and 12B, and their legend. It would be possible to place pictures or symbols at the location of the objects, but this may lead easily to clutter. In the map, objects are shown as colored

(25)

segments in the circles of location, making the circles to look like pie charts. In the Gantt chart, objects are represented as colored bars in cells corresponding to locations. Both choices are artificial rather than natural, requiring a legend and blocking direct identification of objects.

Space

As shown in Figure 12A, the different locations are positioned on a map and marked with circles and letters. Each letter refers to the location’s name. In the Gantt chart, locations, indicated by the locations’ names (letters), are listed in rows in alphabetical order. This is an artificial choice, which disregards a natural mapping with the real world. There is no way to map the true geographical location visually in the Gantt chart.

Time

In the map-based representation, the circles corresponding to the geographical locations are converted to pie charts for the display of objects. Next to each colored segment representing an object the time spent at this location is indicated as a numerical time interval. This is not a natural way to represent time, as overlaps cannot be detected easily. In contrast, in the Gantt chart, these intervals are concatenated in a timeline at the top. The colored bars representing the objects at a specific location indicate the time spent at that location in a straightforward way.

Strengths and weaknesses

Table 5 gives an overview of the strengths and weaknesses of both the map and the Gantt chart for visualizing spatio-temporal data.

Table 5: Overview of the strengths and weaknesses of the map and Gantt chart Visualizatio

n type

Strengths Weaknesses

Gantt Natural mapping of time (using the timeline metaphor)

Easy identification of time intervals

Easy identification of time intervals where objects reside at the same location

No natural mapping of space (lack of geographical information) Difficult identification of locations Difficult identification of the locations where objects spent part of the same time

Map Natural mapping of space (geographical mapping of locations)

Easy identification of locations Easy identification of objects residing at the same location in some time interval

No natural way to depict time Due to the lack of a timeline, there is no possibility to see the linearity of time intervals

Difficult identification of objects residing at the same time at some location

(26)

2.4 Hypotheses

Kriglstein et al. (2016) suggest that the Gantt chart performs better than the map for spatio-temporal data exploration tasks in general. So far no research has investigated whether this is the case for different question types, such as the ones introduced in Table 4 (see subsection 2.3.1). This study intends to get more insight in the relation between various visual representations of spatio-temporal data, the map and the Gantt chart, and specific types of data exploration tasks individuals are asked to solve with the aid of these two visualizations. Differences in usability are measured. More in particular, effectiveness (can the task be solved?) and efficiency (how long does it take to solve the task?) are measured for each question type and each visual representation involved in the experiment. We formulate the following very general hypotheses:

General hypotheses

H1-0: The map is as effective as the Gantt chart for solving spatio-temporal data exploration tasks

H1-A: The map differs in effectiveness from the Gantt chart for solving spatio-temporal data exploration tasks

Simple notation: Effectiveness of Map ≠ Effectiveness of Gantt chart H2-0: The map is as efficient as the Gantt chart for solving spatio-temporal data exploration tasks

H2-A: The map differs in efficiency from the Gantt chart for solving spatio-temporal data exploration tasks

Simple notation: Efficiency Map ≠ Efficiency Gantt chart

Based on more or less naturalness of links between space and time with graphical objects in the map and the Gantt chart, we predict that, for data exploration tasks where the focus is on location (where), the map supports the viewer better than the Gantt chart. However, for exploration tasks where the focus is on time (when), the viewer is helped better by the Gantt chart than by the map.

(27)

3. Methodology

3.1 Design

In order to answer the research question and its associated subquestions, an empirical experimental design was set up. A within-subject design (see Figure 13) was chosen to minimize random noise (Budiu, 2018; Lazar, Feng, & Hochheiser, 2015), i.e. all participants have been exposed to all independent variables (see subsection 3.1.1). A digital questionnaire was created to generate data for quantitative data analysis. The questionnaire focused on the usability, i.e. the effectiveness and efficiency, of both visualizations. The questionnaire included thirty-two query statements that were either true or false. It was up to the participant to check with the help of the given visualization which query statements were true or false. The experiment took place on a one-to-one basis in a controlled setting (see subsection 3.4.1).

Figure 13: Within-group design of the study

3.1.1 Dependent and independent variables

This research contains three manipulated, independent variables, which are the visualization types (Gantt/Map), the Complexity (High complex/Low complex) and the different data exploration tasks. Subsection 3.3.2 elaborates on this.

3.1.2 Operationalization dependent variables

The dependent variables effectiveness and efficiency are used to measure the usability of the visualizations. Effectiveness is measured by the number of correct answers: i.e. the more correct, the more effective the visualization is. The effectiveness will be referred to as the response accuracy. Efficiency is measured by the speed each task is completed: the faster a task is completed, the more efficient the visualization is. The speed is measured using a page-timer (see subsection 3.3.3.2), and will be referred to as the completion time.

3.2 Participants

In total, forty (n=40) individuals with an age that ranges between 16 and 78, participated in the experiment. The participants were obtained through convenience sampling. Table 6 shows the participant characteristics.

(28)

Table 6: Participant characteristics N = 40 % Gender Man Female 18 22 45% 55% Age <26 26-65 65+ 19 19 2 47,5% 47,5% 5% Level of education None High school MBO HBO bachelor WO bachelor Master PhD 0 2 9 17 2 10 0 - 5% 22,5% 42,5% 5% 25% - Familiarity with Groningen Very unfamiliar Unfamiliar Neutral Familiar Very Familiar 1 2 6 16 15 2,5% 5% 15% 40% 37,5% Current situation Student Full-time job Part-time job Entrepreneur Looking for a job Housewife/man 16 8 12 1 0 3 40% 20% 30% 2,5% - 7,5% Colour blindness Yes No 0 40 0% 100%

To eliminate the confounding variable language, it was mandatory all participants were native Dutch speakers with at least normal reading proficiency. There were no other requirements for participation. After testing for color blindness (see subsection 3.3.3.3), none of the participants appeared to show any form of color blindness.

(29)

3.3 Materials

3.3.1 Visualizations

Two popular visual representations for spatio-temporal data were used to test the hypotheses, being a map and a Gantt chart (Kriglstein, Haider, Wallner, & Pohl, 2016). As seen in subsection 2.3.2.1, these two visualizations both use a different method to depict the location and time characteristic of the spatio-temporal data. This subsection will explain which elements of the entities are present in the application domain, how the application domain was linked to the graphical domain of a map and Gantt chart visualizations and what kind of variations were added.

3.3.1.1 Visualization data

Data in the application domain

The application domain of the created data consists of the all three entities of spatio-temporal data: location, time and objects. The data that was created for the visualizations is based on plausible scenarios. The data consists of objects (persons) changing location over time and was randomly entered in Gantt charts. In this process, it has been taken into account that there should be no overlap of objects presents on multiple locations at the same time.

The process of creating the data was executed in the following order: - An empty Gantt table was created with all locations added on the x-axis.

- The corresponding colour of an object (see ‘designing the visualizations’ at the end of this subsection) was entered in the first column (9 AM) of the line of a random location into the Gantt chart. This part of the process was done on alphabetical order to ensure no object would be missing.

- The timeframe an object was present at a specific location was coloured with the corresponding colour.

- Depending on the variant, change of location was added a single (low complex) or multiple (high complex) times. This was done until an object was present from 9:00 to 21:00.

An extra layer of complexity was added to one of the Low Complex and one of the High Complex variants by removing the all people present from one time-column. This indicates that the none of the given persons were present on any of the given locations. This has been done to don’t make the Low Complex variants too easy. The adjustment has also been made in the High Complex Variants to balance the changes.

Elements of the entities in the visualizations

All three entities are assigned a total of twelve elements to reduce the chance of cognitive overload (Garcia-Solorzano, Cobo, Santamaria, Mor´n, & Melenchon, 2011). In Table 7 all three entities and their corresponding elements are shown.

(30)

Locations

In total, twelve different locations were selected. All locations are located in the city center of Groningen, and are selected on their recognizability and their absolute location. Only relatively popular locations were selected to see if familiarity with the city center influences the performance using one or the other visualization. The popularity of the locations is partly based on Tripadvisor: popular locations are included (e.g. Martinitoren), and partly based on commonsense: locations commonly known to locals are included (e.g. Grote Markt).

Another important factor for selecting the locations was that a location should not be too close to one other. Too many locations on a small part of the map visualization could lead to clutter, making it harder to visualize the data in a 2D non-interactive space. Hence locations that are not too close to one other were chosen.

Time

The selected timeframe of twelve hours reaches from 9:00 (9 A.M.) to 21:00 (9 P.M.). The use of this timeframe approaches the total time an everyday person would plausibly outside their house.

Objects

In this research, persons were used as object entity. Persons are selected as object entity due to its recognizability: people are moving from location to location over time.

In total, twelve different persons are present in the visualized data. The chosen names are selected on ease of use. Each name is short and easily distinguishable for native Dutch speakers, to eliminate the name difficultness as confounding variable.

Table 7: Independent variables in the visualizations

Entities Elements

Locations (location name) Academiegebouw, A-Kerk, Concerthuis, Grote Markt, Harmoniegebouw, HEMA, Martinitoren, Museum, Pathé, Politiebureau, Station, Vismarkt Time (timeframe of one hour) 9-10, 10-11, 11-12, 12-13, 13-14, 14-15, 15-16,

16-17, 17-18, 18-19, 19-20, 20-21

Objects (person name) Arie, Bert, Dewi, Jim, Julia, Kim, Lieke, Lisa, Ron, Roos, Sam, Stefan

Complexity difference in visualizations

To test if data complexity has any effect on the performance of the visualizations, a variation on complexity was added in the data: Low Complexity and High Complexity. The Low Complex Variant contains a maximum of twenty rows and people can change locations two times at max. With the maximum of two times location changing, the Low Complex Variant has the least complexity possible. In the High Complex Variant however, the maximum number of times a person can change locations has been limited to six to limit the cognitive overload. With the limitation of forty rows in the High Complex Variant,

(31)

it was made possible to show the full Gantt visualization on the computer screen without the need of scrolling (see subsection 3.3.3.2). The visualization requirements of these variations are shown in Table 8.

Table 8: Low complexity vs high complexity data variations

Maximum rows Maximum number of times a person can change locations

Low complexity 20 2

High complexity 40 6

Designing the visualizations

Map

According to Cambridge dictionary, a conventional map visualization is ‘a drawing that gives you a particular type of information about a particular area’ (Map: meaning in the Cambridge English Dictionary, n.d.). For this research, the map visualization has been enhanced the same as in the research of Kriglstein et al (2016) to work with spatio-temporal data, as seen in subsection 2.3.2.1. The locations are marked on the map with their names. Each location has a location specific pie chart to show the people present and in what timeframe that person is present. If a person is present on a location, the persons corresponding colour label and timeframe is added to that pie chart. If a person has visited the location multiple times, another timecode is added below its first timecode. If multiple persons have visited the location, the pie chart is split into different fragments.

The fragments are proportional to the time spend on a specific location by a person. If a person visits a location multiple times, the fragment shows a cumulation of the total time spent on that location, as is the case with the person corresponding with the light green colour in Figure 14A. Figure 14A shows a simplified version of the used map visualization. As visible in this example, the person that corresponds with the colour light blue is present at the Museum from 15:00 to 18:00 and doesn’t change location within this timeframe.

The map visualizations have been created with the help of Adobe Illustrator CC. First, a high-resolution map of Groningen has been downloaded from Google Maps. To eliminate distracting elements, all labels and detailed features have been removed with HTML. A screenshot of the city map was taken and imported in Adobe Illustrator CC. To preserve a high-resolution map, the screenshot was vectorized by image tracing the screenshot. The finalized maps were saved in an RGB-colourspace to eliminate variations in display colour of both the Gantt and Map visualization.

Gantt

The used Gantt visualizations are designed with the help of Microsoft Excel. The x-axis represents a linear timeline from 9:00 (9 A.M.) to 21:00 (9 P.M.) with intervals of one hour. The locations are listed in alphabetical order on the y-axis. The Gantt chart contains coloured bars to represent the time a person was present on a specific location. Figure 14B

(32)

shows a simplified version of the used Gantt chart. In this example, the person that corresponds with the colour red is at the Museum from 15:00 to 17:00, and then changes location to the Pathé from 17:00 to 18:00.

A) B)

Figure 14: Simplified map (A) and the simplified Gantt chart (B)

In both the Gantt and the Map visualization a legend is present in the upper right. The legend contains the persons names and their corresponding colour. The names of the persons are listed in alphabetical order and is the same for both visual representations. The colour scheme used in the visualizations is based on Sashat colour palette (Trubetskoy,

2019). This colour palette contains easy distinguishable colours, and was used to reduce the chance of participants mixing up the colours.

Two variants of both complexity variations were created to eliminate a learning effect, i.e. two versions of a High Complex Gantt chart were created: Gantt_HighComplexV1 and Gantt_HighComplexV2, and two versions of a Low Complex Gantt were created: Gantt_LowComplexV1 and Gantt_LowComplexV2. Hence, a total of four different Gantt charts were generated. Based on these four Gantt charts, four map visualizations have been created with the exact same data: two versions of a High Complex Map were created: Map_HighComplexV1 and Map_HighComplexV2, and two versions of a Low Complex Map were created: Map_LowComplexV1 and Map_LowComplexV2. This resulted in a total of eight visualizations. Table 9 gives an overview of the created visualizations. All finalized visualizations can be found in Appendix B.

Table 9: All visualization variations

Low Complex High Complex Gantt Gantt_LowComplexV1 Gantt_LowComplexV2 Gantt_HighComplexV1 Gantt_HighComplexV2 Map Map_LowComplexV1 Map_LowComplexV2 Map_HighComplexV1 Map_HighComplexV2

3.3.2 Experimental query statements

The total experiment consisted of thirty-two data exploration tasks, formulated as query statements. With each visualization variant (e.g. Gantt_HighComplexV1), five different