Modelling the alignment gap between academic and practical information visualization design approaches

(1)

Universiteit van Amsterdam

Master Information Studies: Business Information Systems

Modelling the alignment gap between academic and

practical information visualization design approaches

Author:

Sammy Odenhoven

Student number: 6077269

Supervisor:

Marcel Worring

August 18

th

_{, 2014}

(2)

2

Abstract

This thesis models the alignment gaps between academic approaches to information visualization design, and the actual approaches taken in the practice of businesses. Information visualizations have great potential for businesses: they allow for insight in relevant and aggregated business-data, improvement of productivity and efficiency by speeding up analysis tasks and create new insights in customer behaviour and market trends. The field of information visualization is a broad domain and consists of many aspects: technical aspects such as data formatting and visual mapping algorithms, social and cognitive aspects such as the user’s cognitive capabilities and the fit between a visual representation and the user’s perceptual system, and business related aspects such as the role of visualizations in decision-making. This thesis starts by creating models based on academic literature. The domain of information visualization and all its aspects is divided into three parts: the pipeline (What needs to be designed?), the design methodology (How will it be designed?) and contextual factors (In which context will it be designed?). These models are evaluated by interviewees of business cases in order to determine to what extent the theoretical models represent practice. This feedback is used to create three models which represent information visualization design in practice. The academic models are compared against the practical models, after which several alignment gaps are identified. This thesis concludes with several recommendations on how these alignment gaps can be overcome and what areas are interesting for future research.

(3)

3

Abstract _________________________________________________________________________________________________________________ 2 Table of contents _______________________________________________________________________________________________________ 3 1. Introduction _______________________________________________________________________________________________________ 5 2. Domain description _______________________________________________________________________________________________ 8 2.1. Information visualization pipeline ______________________________________________________________________ 9 2.2. Information visualization design methodology _____________________________________________________ 11 2.3. Contextual factors of information visualization _____________________________________________________ 14 3. Methodology ______________________________________________________________________________________________________ 17 3.1. Research method _________________________________________________________________________________________ 17 3.2. Data collection ____________________________________________________________________________________________ 17 3.3. Data sources _______________________________________________________________________________________________ 17 3.4. Case studies _______________________________________________________________________________________________ 18 3.4.1. Case 1: A large beer company ________________________________________________________________________ 18 3.4.2. Case 2: An international electronics company ____________________________________________________ 19 3.4.3. Case 3: A large soft drinks company ________________________________________________________________ 19 4. Results ____________________________________________________________________________________________________________ 19 4.1. Results of case studies ___________________________________________________________________________________ 19 4.2. Case study models ________________________________________________________________________________________ 22 4.2.1. Information visualization pipeline _________________________________________________________________ 22 4.2.2. Information visualization design methodology ___________________________________________________ 23 4.2.3. Contextual factors of information visualization __________________________________________________ 25 4.3. Comparison of theory and practice ____________________________________________________________________ 27 5. Overcoming the alignment gaps ________________________________________________________________________________ 28 6. Conclusions, limitations and future research __________________________________________________________________ 30 6.1. Conclusions ________________________________________________________________________________________________ 30 6.2. Limitations ________________________________________________________________________________________________ 31 6.3. Future research ___________________________________________________________________________________________ 32 6.4. In conclusion ______________________________________________________________________________________________ 32 References _____________________________________________________________________________________________________________ 33 Appendix _______________________________________________________________________________________________________________ 34 Appendix A – Interview questionnaire ________________________________________________________________________ 34

(4)

4 Appendix B – Information visualization pipeline and design methodology based on case studies _ 38 Appendix C – The information visualization domain in business cases with the detailed information visualization pipeline and design methodology models, and their related contextual factors. ______ 39

(5)

5

1. Introduction

Information visualizations have great value and potential for businesses. They allow to summarize big amounts of data in order to gain insight and make decisions based on this data. Using information visualizations business decision-makers are enabled to find relevant and aggregated information in the piles of business-data available (Tegarden, 1999). Information visualizations also have the potential to improve productivity and efficiency by speeding up analysis tasks, which results in reduced costs. In addition, new insights in customer behaviour, market trends and operational efficiency can be found at an increased accessibility rate (Lurie & Mason, 2007). This business value and potential can however only be achieved through effective design.

Despite the value of effective visualizations there seems to be a lack of a formal and structured approach to designing visualizations in businesses, although academic literature describes this extensively. Yet, in practice the choice for graphical representation, interaction technique etc., is based on previous experience of the designer or the possibilities that a visualization tool has to offer; stepwise design and explicit exploration of the design space are lacking. This thesis addresses this problem by investigating the alignment gap between academic design approaches and the actual approaches used in businesses.

The academic field of information visualization is a broad domain and requires elements and techniques from several other domains. It is considered to be part of the human-computer interaction domain, where elements such as technical aspects on the computer side and social and cognitive aspects on the human side are relevant and need to be taken into account when designing visualizations. As Keim (Keim, 2002) describes: “The basic idea of visual data exploration is to present the data in some visual form, allowing the user to get insight from the data, draw conclusions and directly interact with the data”. This quote highlights some other aspects that are relevant for information visualizations; data formatting, visual representation, interaction, cognition, etc. Many research papers can be found that describe these aspects extensively. However, these papers focus only on specific aspects separately and do not look at the bigger picture. In order to give an impression of the broadness of information visualization as a research domain an overview of a wide variety of papers will follow. We will consider four topics, addressing a selection of the various aspects in information visualization design.

First of all, several researchers focus on classifications of visualisations based on various characterizing aspects. A famous paper by Shneiderman (Shneiderman, 1996) describes data types (1-, 2-, 3-, dimensional data, temporal and multi-dimensional data, and tree and network data) and user tasks (overview, zoom, filter, details-on-demand, relate, history, and extract) and describes a task by data type taxonomy. Keim (Keim, 2002) provides a similar, but more extensive classification of information visualizations based on three criteria: data type (1-, 2-, multidimensional data, text and hypertext, hierarchies and graphs, and algorithms and software) visualization technique (2D/3D display, geometrically transformed displays, icon-based displays, dense pixel displays and stacked displays) and interaction technique (interactive projection, interactive filtering, interactive zooming, interactive distortion, and interactive linking and brushing). Both these classifications can be used to understand the rich set of visualizations in a detailed low-level way. Secondly, other research focuses on technical aspects of visualization design. An example of this is Heer et al. (Heer & Agrawala, 2006). Their paper describes design patterns of successful information visualization solutions in order to make re-use easier. They describe visualization design through software design patterns (which are defined as “descriptions of communicating objects and classes”) and describe a set of twelve design patterns for information visualization software. This paper is useful for identifying and developing software design patterns in visualizations.

(6)

6 Another topic in information visualization are the analytical aspects. A paper by Amar et al. (Amar & Stasko, 2004) describes analytical gaps: obstacles faced by visualizations aiding in higher-level analytical tasks such as decision-making. Awareness of the existence of these gaps helps designers in creating more useful and effective visualizations. In other research Amar et al. (Amar, Eagan, & Stasko, 2005) go one step further and describe ten low-level analysis tasks that represent users’ activities while using visualization tools. Examples of these low-level analysis tasks are retrieving a value, sorting a set and finding anomalies. These papers are helpful when investigating the analytical tasks that a user can perform with visualizations.

Finally, another essential topic of study in information visualization is the perceptual system of the user. For visualizations to be understood in an effective way, attention must be given to the perceptual capabilities of humans and visualizations must match these capabilities. Many papers and books have been written about this topic, of which Perception for Design by Ware (Ware, 2012) is an example. Ware describes the need for this topic as follows: “Although very flexible, the visual system is tuned to receiving data presented in certain ways, but not in others. If we can understand how the mechanism works, we can produce better displays and better thinking tools.” and discusses many perceptual aspects, such as contrast, the use of space and interaction. This paper is useful for designing visualizations that have a proper fit with the user’s cognitive capabilities and preferences.

For practitioners, however, reference work used for the most part involves guidelines, “do’s and don’ts” papers and non-scientific books. An example of this is the work done by Stephen Few (Few, 2004, 2006, 2009). In these books, practical guidelines are given regarding the design of dashboards, infographics, tables and graphs. It provides simple principles for building effective visualizations. Other reference work that is used are whitepapers written by visualization software companies, describing best practices and guidelines123_{. These articles mainly describe the final visual form and do not focus on the wider information} visualization domain such as the overall pipeline and the design methodology of information visualization in businesses, as it is described by academic literature.

This short overview of both academic literature and reference work used by practitioners highlights a gap in the approaches taken when designing information visualizations. This gap is reinforced by a lack of research evaluating the design process of visualizations and the occurrence of the wide set of sub-domains through case studies. It is useful to gain insight in differences between academic research approaches and the actual used approaches. The reason for these differences can indicate room for improvement in businesses, as well as shed new light upon existing information visualization design research. This thesis aims to achieve just that by investigating the alignment gap between theoretical design approaches proposed by academic literature and actual design approaches used in practice. By comparing information visualization research with applied projects, this study shows room for improvement in both domains. By studying cases, behaviour can be described through which this thesis aims to show why certain gaps between both domains exist, providing room for further research.

1_{http://www.tableausoftware.com/learn/whitepapers} 2_{http://www.qlik.com/us/explore/resources/whitepapers} 3_{http://www.microstrategy.com/us/about-us/white-papers}

(7)

7 Modelling the alignment gaps and finding room for improvement will be done by considering the main research question of this thesis:

“How can the alignment gaps between theory and practice be overcome?” In order to answer this question we will focus on the following sub-questions:

1. What model best represents approaches of designing information visualization as described by academic literature?

2. What model best represents the approaches of designing information visualizations used in practical business cases?

3. What differences and alignment gaps can be found between the academic model and the practical model?

This thesis is organised as follows: after a literature study we will provide an overview of the components of the information visualization pipeline, the design methodology and factors affecting the visualization and its design relevant to this study in section 2. Section 3 describes the methodology of this thesis, introduces the structure of the interviews and presents the cases that are studied. In section 4 the results of the case studies are presented, which will lead to models describing information visualization in practice (section 4.2). Section 4.3 describes alignment gaps between theory and practice. Section 5 covers a comparison of theory and practice based on the models. Finally, in section 6 the conclusions of this thesis will be presented, along with limitations and direction for future research.

(8)

8

2. Domain description

As described in the introduction there are many aspects that need attention when designing visualizations. When considering all the aspects in this domain, three key questions arise:

1. What needs to be designed? 2. How will it be designed?

3. In which context will it be designed?

These questions define the division of the information visualization domain into three sub-domains. The first question “What needs to be designed?” describes the actual core of this domain: the information visualization pipeline. This pipeline describes the process of the transformation of data into a visual form, which is then perceived by the user. This pipeline needs to be designed using a design methodology, which is described by the second question. This methodology is in some ways very similar to regular software development methodologies, but there are several papers (North, 2005; Rogers, Sharp, & Preece, 2011; Sedlmair, Meyer, & Munzner, 2012) which specifically describe the design of information visualization. Finally, the pipeline and its design exist or happen in a certain context. This context addresses the third question: there are contextual factors that influence both the information visualization pipeline and the design methodology. Several papers (North, 2005; Pfitzner, Hobbs, & Powers, 2003; Zhang, Johnson, Malin, & Smith, 2002) have been written about these factors as well. In summary, these questions define the three sub-domains that are considered in this thesis: “What processes need to be designed that transform data into a visual form?”, “What methodologies can we use for this design process?” and “Which contextual factors play a role?”. These sub-domains are depicted in Figure 1.

For every sub-domain this thesis will provide a theoretical model based on literature reviews, which will be evaluated through businesses cases. The aim is to find generic models describing a general view on the visualization pipeline. We will look for aspects that correspond between multiple papers in order to increase the soundness of our model, as well as for unique aspects in papers in order to increase the completeness of our model. Since this thesis focusses on business information visualization, not every aspect of the literature review will be relevant. We will take into account that business information visualization design happens by experts in certain dedicated departments. These three sub-domains and their models will be further addressed in respectively section 2.1, 2.2 and 2.3.

(9)

9

Figure 1 The information visualization domain with its three sub-domains. The information visualization pipeline is the process of

transforming business information into a visualization, which will be perceived by the user with the goal to create insights (see item 1). This pipeline needs to be designed and developed using a certain design methodology (see item 2). Lastly, the pipeline and its design operate in a certain context. This context influences these components and certain design decisions made (see item 3).

2.1. Information visualization pipeline

In order to evaluate design processes a clear view on what needs to be designed is needed. Therefore, a concrete definition and systematic view on the information visualization pipeline will be given. This section will provide an overview of a selection of research papers addressing this topic, after which the definition and scope used in this thesis will be provided.

Let us first look at Ware (Ware, 2012), who describes four stages in the process of information visualization: 1. Collection and storage of data.

2. Preprocessing the data in order to transform it into something understandable. 3. Hardware and algorithms that present an actual image on the screen.

4. The user’s perceptual and cognitive system that perceives the image on the screen.

Although these stages are rather high-level, they effectively describe four distinct and major components of the information visualization pipeline. The fourth component is a key element in information visualizations. It emphasizes the need for a visualization to fit with the user’s perceptual and cognitive system, which is one of the things that need consideration during the design (see section 2.2) and to some extent can be considered a contextual factor (see section 2.3). An example of visual perception is the Gestalt principles (Wertheimer, 1938), which describes how people perceive visual items as organized patterns and groups, instead of many different parts. The perceptual and cognitive system is virtually the same for everyone, but also complex and needs taken into account in order to make an effective visualization.

(10)

10

Figure 2 The visualization pipeline (Card, Mackinlay, & Shneiderman, 1999)

A similar approach is described by North (North, 2005) in his adapted version of the visualization pipeline by Card et al. (Card et al., 1999), see Figure 2. The first step in this pipeline is the processing of raw information into a well-organized dataset, resulting in a set of data entities with associated data attribute values. The next step maps the dataset into a visual form. The third step transforms the visual form into views which display the visual form on the screen. The last step is the visual perception of the view through the human visual system. This approach is more detailed compared to Ware’s (Ware, 2012) approach.

Next to the information visualization pipeline we need to consider how data flows, which is described by Haber et al. (Haber & McNabb, 1990) in their dataflow model. The first step is the data analysis step, where the data is prepared for visualization (e.g. adding missing values). After that a processing function selects the part of the data that will be visualized. In the visual mapping step the data is mapped to geometric primitives (e.g., points and lines) and their attributes (e.g. color). In the final step the geometric data is rendered to the screen and transformed into an actual image.

The three papers largely correspond in their description of the information visualization pipeline. North (North, 2005) and Haber (Haber & McNabb, 1990) both describe a view transformation or rendering stage. This stage renders the visual form of data to the actual screen using computer hardware such as GPUs. The rendering done by a GPU is a stage that designers have little influence on and is not very relevant for the design of visualizations: the goal of visualizations is to create insight, the way a GPU processes data does not matter in this design. Furthermore, Ware (Ware, 2012) describes the visual mapping through algorithms and rendering through hardware in one stage. Therefore, the pipeline model used in this thesis will also combine these two stages into one. The model by Haber (Haber & McNabb, 1990) focusses on the role and transformations of data within the information visualization pipeline and does not describe a stage in which the user is present. Since we aim to find a generic model that provides a wide view on the pipeline and takes into account all parties that have a role, we will add the user to the Haber’s pipeline as is done by North (North, 2005) and Ware (Ware, 2012). We will use the visualization pipeline described by North (North, 2005) as a basis of our model since this pipeline is most detailed and employs a clear distinction between stages and transformation processes. North’s model will be adapted according to the review of the other papers, which will result in the final information visualization pipeline model.

The combination of these approaches that will be used in this thesis is depicted in Figure 3. This model clearly depicts several high-level stages and processes that together construct the information visualization pipeline, consisting of a system-side and a user-side. All of these stages need to be designed or taken into account while designing. The following section will provide a detailed description on the relationship between the information visualization pipeline and the design methodology.

(11)

11

Figure 3 Extension of Figure 1: the information visualization domain with the detailed information visualization pipeline model.

2.2. Information visualization design methodology

As described in the previous section the information visualization pipeline consists of components that need to be designed or taken into account while designing. This section addresses the design methods described in information visualization papers. Literature describes several approaches for designing information visualization. This section will describe an overview of a selection of this literature.

A common form of HCI design is interaction design. This topic has been extensively described in books and research papers. An example is “Interaction design: beyond human-computer interaction” by Rogers (Rogers et al., 2011). Here, interaction design is described as involving four basic attributes: establishing requirements, designing alternatives, prototyping and evaluating.

Sedlmair et al. (Sedlmair et al., 2012) present a more detailed, but comparable design approach to Rogers’. After observing that “there is a lack of specific guidance in the visualization literature that describes holistic methodological approaches for conducting design studies”, Sedlmair et al. developed a nine-stage framework describing the method for conducting design studies, consisting of the following stages:

 Learn: create solid knowledge on visualization literature, including data abstraction, visual forms, visual perception and interaction techniques.

 Select: select promising collaborations

 Cast: identify collaboration roles

 Discover: problem characterization, abstraction and requirements analysis through user analysis. This stage matches the first stage in Rogers’ approach.

 Design: data abstraction, visual encoding and interaction through explicit consideration of possibilities. This stage matches the second stage in Rogers’ approach.

 Implement: prototypes, tools & usability. This stage matches the third and fourth stage in Rogers’ approach.

 Deploy: release and gather feedback

(12)

12

 Write: design study paper

Although the model is linear, the stages overlap and jumping back to a previous step is common.

Several papers discuss a more user-centric approach. Zhang et al. (Zhang et al., 2002) describe a human-centered multilevel design, which covers four stages: functional analysis (identifying structure of the domain), user analysis (identifying user characteristics, such as expertise level), task analysis (identifying the tasks that the user performs with the system) and representation analysis (identifying the best graphical representation and flow structure).

A less explicitly defined design method is described by North (North, 2005). North discusses three iterative steps: requirement analysis, design and evaluation. In the requirements phase, two main inputs for design are identified: information characteristics (such as data schema, underlying structures and quantity) and types of insight that the visualization should create. The requirements analysis phase also consists of identifying user tasks, users’ domain knowledge, etc. In the design phase decisions regarding visual mapping, representations of information structures and overview, navigation and interaction techniques are made. Finally, the evaluation phase must be considered at all times providing constant feedback on the visualization design. Examples are analytical and empirical evaluations, usability-testing and benchmarking.

The four papers largely correspond in the design stages they describe. The stages described by Rogers (Rogers et al., 2011) and North (North, 2005) can be fit into the model of Sedlmair et al. exactly. Zhang et al. describe a more differing approach which focuses on different types of analyses. The functional analysis and representation analysis stages correspond with respectively the ‘discover’ and ‘design’ stage of the model of Sedlmair et al. The user analysis and task analysis stages address contextual factors (such as expertise level) and will be incorporated in the model described in section 1.3. Furthermore, the ‘select’ and ‘cast’ stage in the model of Sedlmair et al. are not applicable to this thesis. As described in section 2 this thesis focusses on business information visualization and in businesses certain business units are dedicated to designing and developing visualizations in which the collaboration roles are mostly predefined. Therefore these two steps are left out in the final model used in this thesis. Additionally, Sedlmair et al. and North both define their model to be somewhat iterative; the stages are not clearly separated and going back to a previous stage is common. We will use the visualization pipeline described Sedlmair et al. (Sedlmair et al., 2012) as a basis of our model since this method is most extensive. Sedlmair’s model will be adapted according to the review of the other papers, which will result in the final information visualization design methodology model.

Finally, all four papers describe only the processes of the design methodology. In order to make this model consistent with the model in section 2.1 we will split up the stages into processes and results.

(13)

13

Figure 4 Extension of Figure 1 and 3: the information visualization domain with

the detailed information visualization pipeline and design methodology models.

The combination of the described papers is presented in Figure 4, which is the design methodology that will be used in this thesis. A description of the content of this methodology and its relation with the information visualization pipeline of in section 2.1 is presented in Table 1:

Table 1 Description of the stages in the information visualization design methodology, divided into the process involved and the results

of that process.

Stage Process Result

Learn Create solid knowledge on visualization literature, including data abstraction, visual forms, visual perception by the user and interaction techniques. This knowledge will inform other stages and will help in finding the best solutions.

This process results in an overview of the possibilities regarding structure and abstraction of focus data, visual

forms representing the data, visual perception by the user and

interaction techniques.

Discover Establishing information visualization requirements, identify characteristics of the information and data, and identify desired insights obtained by the user. Analyze and discover contextual factors such as user analysis (see section 2.3).

Results in a list of requirements composed by the user, the characteristics and structure of focus

data and an overview of the desired

insights the user should obtain through the visualization.

(14)

14 Design Identify the best graphical representation of the visual

mapping and focus data. Perform representation analysis of information structure. Design overview, navigation and interaction techniques (while considering the available alternatives and literature).

This process results in a design for the

whole information visualization pipeline: the architecture of the data,

the visual mapping algorithms, the visual representation of the focus data and the interaction techniques.

Implement Build prototypes, perform testing with the users and

evaluate the prototypes. Results in prototypes of the whole information visualization pipeline,

along with tests results and feedback from the user.

Deploy Release the visualization into the organization and gather feedback from users, developers, designers, etc. about the project (management, results, etc.).

Results in a release of the whole

information visualization pipeline,

after which feedback will be gathered from team members and users involved in the project.

Reflect Confirm, refine, reject and propose guidelines, based on

evaluation and feedback obtained in the previous step. Results in guidelines based on the evaluation and feedback from project members and users.

Write Document the design method for future use, taking into

account all the steps, decisions, feedback, evaluations, etc. Results in documentation about the project for future use.

Overlapping stages & jumping back to previous stage

The stages are not clearly separated and going back to the previous stage is common.

2.3. Contextual factors of information visualization

Apart from the information visualization pipeline and approaches that can be taken when designing visualizations, there are several factors that influence the usefulness of a visualization. These factors can be considered variables that depend on the context. Several papers argue that the context of use is relevant for the effectiveness of visualizations (Pfitzner et al., 2003) (Lurie & Mason, 2007). This section will provide an overview of several research papers discussing this topic in order to provide a generic list of contextual factors of information visualization.

The first paper we consider is by Wassink et al. (Wassink, Kulyk, van Dijk, van der Veer, & van der Vet, 2009). They argue that design of visualizations should be user-centric and that user and domain analysis is essential. It should be evident what kind of insights the visualization should enable, who the intended users and their characteristics are and what the users’ tasks will be.

A more concrete overview of contextual factors is described by Pfitzner et al. (Pfitzner et al., 2003). They describe a unified taxonomic framework for information visualization outlining the major factors in interface development: user’s skill level, contextual factors, interactivity type, task type and data type. The users’ skills level can range from novice to expert. Contextual factors include the life experience of a user, need, history and device. The interactivity type can range from textual to graphic, and static to dynamic. The tasks are defined as what the user aims to achieve and are listed by Shneiderman (Shneiderman, 1992) as follows: overview, zoom, filter, details on demand, relate, history and extract. Finally data includes data relationship structures and data types.

As previously described, Zhang et al. (Zhang et al., 2002) describe several human-centered design stages of which two can be classified as contextual factor analysis: user analysis (identifying user characteristics, such as expertise level and cognitive capabilities) and task analysis (identifying the tasks that the user performs with the system). The cognitive capabilities relate to the user’s perceptual and cognitive system, which was introduced in section 2.1. While the cognitive system is (virtually) the same for everyone, a user’s cognitive

(15)

15 capabilities can vary. People differ in attention-spans, memory, recognition and interpretation of visual stimuli. Therefore cognitive capabilities are considered a variable that depends on the context; a contextual factor.

In section 2.2 we described the design approach of North (North, 2005) in which requirements analysis is a design-stage. This stage, among other tasks, consists of identifying the user’s domain knowledge. In this thesis this factor is also considered a contextual factor, since the domain knowledge can vary depending on the user.

Another important contextual factor that influences the effectiveness of visualizations is the fact that one of the goals of business information visualization is to improve decision-making quality. We therefore argue that explicit attention must be given to how the visualization serves as a tool in the decision-making process. Citroen (Citroen, 2011) describe several roles that information (and thus visualizations) can have in the process of decision-making. Visualizations can have a role in the preparation phase in which the issue is defined and the objectives of the decision are set. In the analysis phase the environment is reviewed, for example by analysing comparable developments in other organisations. In the specification phase alternatives are specified, based on the information gathered in the analysis phase. In the phases that follow the alternatives are limited to those feasible, after which a final decision is made.

These papers describe a range of contextual factors relevant for information visualization, of which several overlap. The insights a visualization should enable, as described by Wassink et al. (Wassink et al., 2009), are considered to be part of the requirement analysis of the design approach (see Figure 4) and are therefore not included in this model. User and task analysis however will be included.

The data and interactivity factors described by Pfitzner et al. (Pfitzner et al., 2003) are included in the information visualization pipeline model in Figure 3 and will therefore not be included in this model. Contextual factors will be limited to device type, while life experience, need and history will be considered to be part of user analysis. The tasks described by this model will be extended with high-level analytical tasks such as decision-making and learning (Amar & Stasko, 2004).

Analyzing the user’s expertise level, as described by Zhang et al. (Zhang et al., 2002), matches the previously identified user’s skill level, as does the task analysis.

In conclusion, the contextual factors that have an influence on the visualizations of business information are given:

 User’s skill/expertise level: This can range from novice to expert, indicating the experience that a

user has with similar visualizations. This is a general contextual factor and is related to the whole pipeline.

 User’s cognitive capabilities: Capabilities such as attention, memory, recognition and

interpretation of visual stimuli. This is a factor related to the perceptual and cognitive system of the user.

 User’s domain knowledge: This can range from beginner (rather unfamiliar with the domain) to

professional (user has been working in the domain for a significant time). This is a general contextual factor and is related to the whole pipeline.

 Task type: High-level tasks such as learning, as well as low-level tasks such as overview, zoom and

details on demand. This is a factor related to the perceptual and cognitive system of the user.

 Device type: The type of device that is used to view the visualization, such as tablet, paper report,

(16)

16

 Role in decision-making process: Possible roles can be preparatory, analytical, specifying or

limiting. This is a general contextual factor and is related to the whole pipeline.

Figure 5 shows the extension of Figure 1 filled in with the models describing the information visualization pipeline, the design methodology and the contextual factors (from respectively section 1.1, 1.2 and 1.3) and their mutual relations.

Figure 5 Extension of Figure 1, 3 and 4: the information visualization domain with the detailed information

(17)

17

3. Methodology

For answering the research questions posed earlier in this thesis a qualitative study is conducted. In order to investigate the alignment gap between design approaches proposed by academic literature and the actual approaches that are taken in practice, several cases of visualization design in practice will be studied. These cases are projects done in a consulting company. In these cases the consulting company designed and developed an information visualization tool for specific customers. Using semi-structured interviews feedback will be gathered on the occurrence of components of the academic models (proposed in section 2.1, 2.2 and 2.3) in practice. This feedback will be structured and interpreted, resulting into models similar to the academic models, but based on practice. These practical and academic models will be compared, demonstrating alignment gaps between theory and practice.

3.1. Research method

The main reason for performing case studies in this thesis is because they focus on practical cases, which makes it suitable for finding alignment gaps between academic theory and practice. Furthermore the context of information visualization design is of big importance. As described earlier in this thesis there are many relevant aspects to information visualization design and especially in businesses the barriers between all these aspects might not be very clear. This calls for the need of analysing information visualization design in its context, which can be achieved through case studies (Yin, 2009). Case studies can furthermore be used to explain or describe behaviour (Yin, 2009), which is useful for identifying underlying reasons.

Since there is no explicit design process, as identified before, by definition there is no structured documentation of the design processes taken during the cases that are studied. Therefore the design approaches in the cases will be analysed through in-depth interviews with experts involved in these cases. These in-depth interviews furthermore provide the opportunity to investigate underlying reasons and motivations.

3.2. Data collection

The data sources used in this thesis are three case studies of projects done in the consulting company. In order to select the most useful cases several criteria are used:

1. The cases have to be part of a Business Intelligence/Business Information Management department in order to guarantee a strong focus on information visualization in a business context.

2. The cases should include a rich data set that needs to be visualized, resulting in an extensive tool rather than a simple visualization.

3. The experts involved in the cases should have several years of experience in the BI/BIM/information visualization domain, so that sufficient insight is gained in the design process, underlying reasons, etc.

3.3. Data sources

The cases are analyzed through at least two interviews with involved employees. The goal of the first interview is to familiarize with the case, learn about the context and determine if the project is qualified to be a data source of this thesis. During the second interview a semi-structured interview is conducted. The goal of these interviews is to get specific and concrete feedback on the theoretical models created in section 2, in order to determine to what extent these models represent the practical approaches that are taken in information visualization design. The interviews cover several topics (see Appendix C for the full interview questionnaire) and are structured as follows:

(18)

18 1. Introduction of the project: several questions are asked in order to get to determine the context and

content of the project.

2. Review of the visualization pipeline model: in order to determine the validity of this model (see Figure 3) questions are asked regarding the following:

a. The presence of specific components in practice which are described by the model. b. The absence of specific components in the model which did occur in practice. c. The hierarchy of the components of the model.

3. Review of the design approach model: in order to determine the validity of this model (see Figure 4) questions are asked regarding the following:

a. The presence of specific components in practice which are described by the model. b. The absence of specific components in the model which did occur in practice. c. The hierarchy of the components in the model.

4. Review of the contextual factors: in order to determine the validity of the proposed contextual factors (see Figure 5) questions are asked regarding the following:

a. The presence of factors in practice which are described by the model. b. The absence of factors in the model which did occur in practice.

The first topic mainly serves as an introduction of the project’s context and content, while the second, third and fourth topic provide concrete feedback on the validity of the models. The grouping of the topics allows for effective comparison between the three cases and the theoretical models and will serve as a framework for interpretation of the results.

3.4. Case studies

This section will introduce the three cases that are studied in this thesis, which are provided by a consulting company. The cases differ in two ways: 1) case 1 and 3 involve projects in which the visualizations were to be used by board of directors, i.e. high-level visualizations for insight in performance and strategy planning. Case 2 involves a project in which the visualization was to be used by sales representatives, i.e. low-level visualizations for support in daily operations. 2) in case 2 and 3 the user of the visualization was very involved in the design and development process, while this was not the case in case 1. We will now provide a short introduction into the cases.

3.4.1. Case 1: A large beer company

The first case covers a project during which a mobile BI dashboard representing KPIs was built. This application was to be used by the company’s board of directors with subject area “Strategic business planning and control”. This board room’s current process of analysing KPIs and related data is by reviewing big piles of printed Excel documents in physical folders, without much effective data aggregation. In an effort to change this way of working into a more efficient one the company decided to develop a mobile BI application. The dashboard provides insights based on exceptions instead of having to analyse all the data, so the main improvement of the mobile BI dashboard is that is allows for strategy planning based on exceptions. The dashboard provides insights regarding where and why exceptions are happening and what can be done to resolve them. This requires the visualization of a broad data set, including KPIs based on channels, geographical regions, business lines, brands, history, etc.

The expert involved in this project worked as both as a business development lead and a development lead for the first proof of concept application. He has many years of experience in information visualization and business intelligence, and has been working as a consultant for several years.

(19)

19

3.4.2. Case 2: An international electronics company

The second case covers a project during which a visualization application was built. When the project started, the organisation was in the middle of a big business intelligence tool selection procedure. Due to the duration of this tool selection procedure a temporary visualization project was set up in order to provide in their information needs. During this project a temporary visualization application was built that represented certain KPIs. The application was to be used by sales representatives, account managers and other data analysts of the sales and marketing departments worldwide.

The main role of the visualizations was to create insight in the leads and opportunities process of this department. Leads can become opportunities after a follow-up by the company, which can turn into a sale once a deal is made. The types of data that are visualized include run times of going from a lead to an opportunity to a sales and the number of open leads per business unit per month.

The expert involved in this case has many years of experience as business intelligence consultant and has done many information visualization and business intelligence projects in different domains. His role in this project was as a solution architect.

3.4.3. Case 3: A large soft drinks company

The third case covers a project during which an information visualization application was built. This application was to be used on an iPad and included several dashboards representing KPIs. This project was run from the company’s headquarter in Brussel and was to be used by all business unit presidents worldwide and some managers. The initial request was to replace the monthly 60 page-long reports with tables, with an iPad application that allowed for more effective data analysis.

The application shows the main KPIs, which indicate how the company is doing and why. The dashboards provide insights in the general performance. The dataset that is visualized is extensive and includes KPIs regarding sales volumes, customer appreciation, competitor market shares, etc.

Two experts were interviewed in this case. One had the role of general project leader and business analyst, and the other as delivery manager, who has worked as business intelligence consultant for several years and has many years of experience in business intelligence and information visualization.

4. Results

This section describes the results of the three case studies that were conducted. These results will then be interpreted and practical models will be proposed, reflecting the practice of the case studies.

4.1. Results of case studies

Conform the domain division in section 2, the results have been divided into three groups: results regarding the information visualization pipeline, results regarding the information visualization design methodology and results regarding the contextual factors.

The results regarding the information visualization pipeline (Table 2) and design methodology (Table 3) are divided into three topics, which are corresponding with the structure of the interview as described in section 3: the presence of components in practice which are described by the model, the absence of components in the model which did occur in practice and the hierarchy of the model. The first topic addresses the occurrence of components from the model in practice. For every case and every component in the

(20)

20 information visualization pipeline in Figure 3 and 4, a short description is added regarding the occurrence or non-occurrence of the respective component in the practice of the case. Furthermore a color is added indicating the degree of consistency of the results with the theoretical models. The second topic addresses any missing components in the model which proved relevant in practice. This topic is divided into the types of results that arose from the interviews. If an item is applicable to the case a small description is added. If not, this is indicated with ‘N.A.’. The final item addresses the hierarchy of the components in the model and describes a possible different order of the components in practice. This topic too is divided into the types of results that arose from the interviews and is followed by either a small description, or with ‘N.A.’.

The results regarding the contextual factors (Table 4) are structured similarly as the results regarding the pipeline and design methodology. The results are divided into two topics, consistent with the structure of the interview structure (section 3.2): the presence of components in practice which are described by the model and the absence of components in the model which did occur in practice.

Table 2 Results regarding the information visualization pipeline. A green colour indicates that the component is consistent with the

theoretical model. A blue colour indicates that the component is consistent with the theoretical model, but in a slightly different way.

Case 1 Case 2 Case 3

The presence of components in practice which are described by the model. Raw data Yes, but falls under a bigger

BI domain (as data warehouse).

Yes, but falls under a bigger BI domain (as data warehouse).

Processing Yes, but falls under a bigger

BI domain. Yes, but falls under a bigger BI domain. Yes, but falls under a bigger BI domain.

Focus data Yes, but falls under a bigger

BI domain (as data mart). Yes, but falls under a bigger BI domain (as data mart). Yes, but falls under a bigger BI domain (as data mart).

Visual mapping Yes. Yes. Yes.

Visual form Yes. Yes. Yes.

Visual perception Yes. Yes. Yes.

Perceptual & cognitive

system of the user Yes. Yes. Yes.

Interaction Yes. Yes. Yes.

The absence of components in the model which did occur in practice

Information

visualization software tool.

Microstrategy. Excel. MicroStrategy.

Distinctive subdomains

in the pipeline. There was a significant alignment gap between the information analyst (data structure) and the business analyst (visual forms and perception).

There was a clear distinction between BI back-end (data storage), BI front-end (visual form) and UX designers (visual perception).

The pipeline is considered to consist of three non-overlapping parts: the data structure, the BI visualization part and the UX part (visual perception).

More divided data

storage components. N.A. From perspective the raw data a general BI component consists of raw source data and integrated business data.

N.A.

The hierarchy of the components of the model

Interaction with data and visual mapping through tool.

(21)

21

Table 3 Results regarding the information visualization design methodology. A green colour indicates that the component is consistent

with the theoretical model. A blue colour indicates that the component is consistent with the theoretical model, but in a slightly different way. A red colour indicates that the component did not occur in practice.

Case 1 Case 2 Case 3

The presence of specific components in practice which are described by the model.

Learn Yes. No. No.

Discover Discovering requirements and data structure occurred, discovering required insights did not.

Yes. Yes.

Design Designing visual forms and interaction techniques occurred, designing data structures did not.

Designing visual forms and interaction techniques occurred, designing and developing data structures did not.

Designing visual forms and interaction techniques occurred, designing data structures did not.

Implement Yes. Yes. Yes.

Deploy Deploying the final product occurred, gathering feedback at this point not. This was done at the evaluation phase (see below).

Deploying the final product occurred, gathering feedback at this point not. This was done at the evaluation phase (see below).

Reflect No, not explicitly. No. Yes.

Write No, not explicitly. Yes, but as part of the evaluation (see below). Yes. The absence of specific components in the model which did occur in practice. Tool selection process In order to select appropriate

software, requirements were

discovered, possibilities were learned, prototypes were designed and

implemented, which were evaluated, based on which a

tool was selected.

In order to select appropriate software, requirements were

implemented, which were evaluated, based on which

a tool was selected.

In order to select appropriate software, requirements were

implemented, which were evaluated, based on which

a tool was selected.

Explicit evaluation Yes. Yes. Yes.

The hierarchy of the components of the model. Development iterations Discovering requirements,

designing and

implementing visualizations

and evaluating those happened in short iterations, instead of linearly.

Discovering requirements, designing and

implementing

visualizations and evaluating those happened in short iterations, instead of linearly.

Discovering requirements, designing and

implementing

visualizations and evaluating those happened in short iterations, instead of linearly.

First discover, then

learn. First the requirements were discovered and then the

possibilities within that range of requirements were

learned.

(22)

22

Table 4 Results regarding the contextual factors. A green colour indicates that the component is consistent with the theoretical model. A

blue colour indicates that the component is consistent with the theoretical model, but in a slightly different way. A red colour indicates that the component did not occur in practice.

Case 1 Case 2 Case 3

The presence of components in practice which are described by the model.

User’s skill level No. Yes. Yes.

Cognitive capabilities No. No. No.

Task type No. The task types were big

drivers for the tool selection.

Yes.

Device type Yes. The device type was a big

driver for the tool selection. Yes, as well as during the tool selection.

Role in decision-making

process Yes. Yes. Yes.

User’s domain

knowledge No. Yes. Yes.

The absence of components in the model which did occur in practice

Guidelines N.A. Many design decisions were

decided according to predefined Hikert guidelines.

N.A.

Factors influencing the

tool selection. There are many factors and requirements that play a role during the information visualization software tool selection process.

There are many factors and requirements that play a role during the information visualization software tool selection process.

4.2. Case study models

This section will cover an interpretation of the results and propose according models reflecting the practice of the case studies. The models will be composed through the occurrence or non-occurrence of the results of the topics described in Tables 2, 3 and 4. For a result to be considered in the model it has to be consistent through all three cases. If that is not the case, the aspect will not be considered and will be left out of the model.

4.2.1. Information visualization pipeline

Two major observations can be made regarding the results of the review of the information visualization pipeline (Table 2):

 The information visualization domain consists of several distinctive sub-domains

 The information visualization software tool is the core of the visualization pipeline.

All of the interviewees indicated that information visualization consists of three clearly separated and non-overlapping parts: the back-end consisting of the data storage, the front-end part consisting of the information visualization software tool and the user experience part consisting of the visual perception. Furthermore, in all the cases information visualization was part of a much broader business intelligence domain and business unit. This corresponds with the fact that the pipeline consists of three parts. Literature (Golfarelli & Rizzi, 2009) describes business intelligence architecture to consist of a data warehouses and data marts, BI tools and users. In this same line of thought, another essential part of the business intelligence architecture is the external data sources. This external data is extracted, transformed and loaded (ETL) into internal data warehouses. This corresponds with an observation made by the interviewee of case 2 (see Table 2). Every subdomain requires different competencies, design and development procedures, requirements, etc. This was also reflected by the response of the interviewees: different project teams worked

(23)

23 on different parts of the information visualization pipeline, resulting in either alignment and communication gaps, or additional required team members to actively connect the different projects.

In the studied cases the visualization of information was not built from scratch, but designed and developed using information visualization software and their built-in visualization functionalities. Visualization software serves as a “portal” through with the user interacts with the visualization; the user does not interact directly with the data, but through the tool. Interaction results in a different selection of the dataset being visualized (such as details on demand or filter, as described by Shneiderman (Shneiderman, 1996)) or the use of a different visualization type (such as a bar chart instead of a line chart). The use of information visualization software also has consequences for the design methodology, which will be addressed in the next section. The results of the case studies lead to the information visualization pipeline model described in Figure 6.

Figure 6 Extension of Figure 1: Information visualization pipeline based on case studies

4.2.2. Information visualization design methodology

Several observations can be made regarding the results of the review of the information visualization design methodology (Table 3):

 The cases are not consistent in the occurrence or non-occurrence of the individual components of the model.

 The tool selection process is a significant part of the design process

 Design and development happens in iterations

 Evaluation of prototypes happens explicitly.

The case results differ much in the occurrence or non-occurrence of the components of the model. Learning the possibilities was an explicit stage in case 1, but not in the other cases. Therefore it will not be used in the final model. Discovering the requirements, data structure and required insights was mostly present in the cases, except for in case 1. The interviewees indicated that the user was very absent during the design and development of this visualization, due to scheduling and communication conflicts. The intention of

(24)

24 discovering the required insights was present, therefore this result will be considered as being similar to the other two cases. Designing the visual forms and interaction techniques was part of the design process, but designing the data structure was not. This corresponds with the results described in section 4.2.1: designing the data architecture is a different domain and is done in other projects. Moreover, when designing the data architecture the final visualization form is not taken into account at all. The interviewee of case 2 even stated that it is considered bad practice to do so; the data architecture should support the widest range of visualization types and reporting applications as possible, so that the data layer is independent of the actual visualization that is chosen to be implemented. Implementing prototypes and final products was clearly part of the design process, as was deploying the products. However, gathering feedback during or after the deployment did not happen, as this was mostly done through evaluations during the design and development process. Reflecting and writing was done explicitly in only 1 case, so these stages will not be used in the final model.

As described in section 4.2.1 the visualization tool is at the core of visualizing information. This is reflected by the fact that the tool selection process is a significant part of the information visualization design process. All the interviewees indicated that information visualization projects start with selecting the appropriate software. This process involves several steps: discovering the requirements (both technical and functional), learning the possibilities different software has to offer, designing and implementing prototypes using the software and finally evaluating the prototypes. This process results in the choice of a tool, which is then used to build the visualizations.

The visualizations in all the cases were designed and developed using the Agile/Scrum methodology. This mainly involves designing, building and evaluating in short iterations in order to improve the communication and alignment between the developers and the users. This is corresponding with the interview results, in which the interviewees all clearly indicated that the design and development does not happen linearly, but iteratively. First the requirements of small functionalities are discovered, which are then designed and implemented. The functionalities are evaluated with the user, user representative or product owner, who gives feedback. Based on this feedback the requirements or the design is adjusted, after which the cycle starts again.

The interviewees of case 1 indicated that during their project the discovering of the requirements happened before learning the possibilities. Since this was not the case in the other two cases, this result will not be used in the final model.

The final observation concerns the evaluation of prototypes and functionalities and is related to the earlier described iterative design and development. All interviewees clearly stated that evaluation was an explicit step of the design process, which corresponds with the Agile/Scrum methodology in which continuous feedback is an essential element.

The results of the case studies lead to the information visualization design methodology model described in Figure 7, which extends upon Figure 6.

(25)

25

Figure 7 Extension of Figure 6: Information visualization pipeline and design methodology based on case studies (see Appendix B for

larger image).

4.2.3. Contextual factors of information visualization

The results of the case studies regarding the review of contextual factors are mostly inconsistent. We will go through them one by one.

In case 1 the user’s skill level and user’s domain knowledge were not taken into account. However, as described earlier, the interviewees indicated that the user was very absent during this project, due to communication and scheduling conflicts. The interviewees also indicated that the intention of thorough user analysis was present; therefore this result will be considered to be similar to the results of the other two cases. The cognitive capabilities of the user were not taken into account in any of the cases. Taking into account the user’s cognitive capabilities (through a lot of research and user testing) is a very user-centric process. During the three cases the user was not present enough to allow analysis of cognitive capabilities

(26)

26 (due to costs and other resources). The task types and device types were a contextual factor in only one of the cases, so they will not be used in the final model. The role in the decision-making process was taken into account in all three cases. The use of guidelines was a contextual factor in case 2, but not in the other cases. This factor will therefore not be used in the final model.

Furthermore, the interviewees all indicated that a lot of factors are relevant for selecting the appropriate information visualization tool, such as IT portfolio management, security and the organisation’s technical architecture. These factors are not very relevant specifically to the information visualization domain, but more to software requirements analysis in general.

Figure 8 shows the information visualization pipeline (Figure 6) and the design methodology (Figure 7), combined with the contextual factors and their mutual relations.

(27)

27

Figure 8 Extension of Figure 6 and 7: the information visualization domain in business cases with the detailed

information visualization pipeline and design methodology models, and their related contextual factors (see Appendix C for larger image).

4.3. Comparison of theory and practice

As described by the third sub-question in the Introduction we aim to find differences and alignment gaps between the academic model in Figure 5 and the practical model in Figure 8. When comparing these two models, the following differences can be identified:

 The information visualization pipeline in the practical model consists of several distinctive parts, instead of one holistic pipeline as described by the academic model:

o The back-end of the pipeline (the data storage components) belong to an organization’s more general BI and data warehousing domain. This back-end consists of external data sources which are extracted, loaded and transformed (ETL) into data warehouses. Certain