Summarizing Trailplots: A Visual Analytics Approach to Facilitate the Analysis of Two Dimensional Temporal Data

(1)

Summarizing Trailplots:

A Visual Analytics Approach to Facilitate the

Analysis of Two Dimensional Temporal Data

Master Thesis Information Science

Business Information Systems track

Faculty of Science

University of Amsterdam

Supervisor: Prof. Marcel Worring

Student Author: Athanasios Athanasiadis

(2)

Summarizing Trailplots: A Visual Analytics Approach to

Facilitate the Analysis of Two Dimensional Temporal Data

Athanasios Athanasiadis

sakisp.athanasiadis@gmail.com

ABSTRACT

Businesses have increasing needs in analyzing

multidimensional data over time. However, visualizing such data is a challenging task for visualizations due to the volume of information that has to be rendered to the screen. Trailplots are a visualization technique that shows promise in handling two dimensional temporal data and facilitating the users’ analytic tasks, yet it suffers from screen clutter for large datasets. Summarizing Trailplots is a visual analytics system that utilizes trailplots and unsupervised learning techniques to support analytic tasks and reduce screen clutter. Thus, we develop an interestingness detection algorithm to identify the most interesting data items based on their behavior over time that is used in coordination with a self-organizing map for clustering. The system is evaluated by user tests that provide evidence of the Summarizing Trailplots’ usefulness and algorithm’s effectiveness along with guidelines for further improving the system.

Keywords

Visual Analytics, Change Over Time, Clustering, Anomaly Detection, Temporal data, Multidimensional Data

1. Introduction

The contemporary world is swarming with ever-increasing amounts of data and companies that successfully harness them outperform those that don’t. Data analytics enable companies to provide additional services, differentiate from their competitors and/or operate more efficiently (McAfee and Brynjolfsson 2012). Therefore, data collection has become a strategic priority with activities aiming at acquiring new types of data, relating the data to business goals and deploying technologies that handle, analyze and communicate the data (Glaser 2006).

While the data generating and storing capabilities are developing rapidly, the capabilities of understanding and communicating information and knowledge are not keeping up. Therefore, businesses and organizations end up sitting on top of piles of data that cannot utilize or extract value from them. Information technology has to become the enabler of extracting value from data by providing sufficient analytics and communication services (Maes et. al 2000). In a recent survey (LaValle 2011), it is stated that business executives desire tools that help them absorb large amounts of information quickly. These tools, among other properties, should be able to display historical, time oriented data, making insights more comprehendible and highlighting changes in critical values. In order to achieve that, information systems need to expand their analytical and reporting competencies.

One very important type of data that is challenging to analyze is time oriented multivariate data. A dataset’s relation with time and the way its values change over time is one of the most important elements of any dataset. Modern society perceives time as a commodity that is utilized to reach a goal state in the future and objectives travel linearly from the past, to the present, to the future.

Thus, financial and economic concepts that determine performance such as the value of money or economic growth are almost always studied over time. The dominant perception of time has a linear time perspective with a given past, an ephemeral present and a multi aspect future (Hassard 2001). This is reflected in the fact that around 70% of all the graphs stored by businesses contain time series information (Few 2007). Time series provide information about the historical context of a value, enabling the comprehension of the past which provides insights for the present and an estimation of what to expect from the future. According to Colin Ware, “It is not enough to focus on what’s happening today. You must see what’s happening in the context of history to understand it fully” (Ware 2004). What is more, the data in economic applications apart from being time oriented, they are multivariate (Keim et. al 2008). The datasets related to economic and business activities are inherently multivariate since many factors are relevant for decision making (costs, market shares, interest rates, delivery times etc.). Managers and analysts need to comprehend various types of information as well as the relationships between them before they make decisions. Thus, there is need for information technology systems that are able to communicate large volumes of multivariate temporal information to their users in order to lead them to insights. Visual Analytics, "the science of analytical reasoning facilitated by visual interactive interfaces" (Thomas and Cook 2005), promises to bridge the gap between data and insights. It undertakes this by combining the superior memory and unbiased computational capabilities of computers with the superior and highly adaptable reasoning of humans. These complementary strengths are bind together by interactive visual interfaces (Green et. al 2009). Therefore, visual analytics systems utilize automated data analysis techniques, to abstract from the data and communicate them to humans by employing visualizations. Interaction with the visualizations that adjusts the analysis and creates new visual outputs enables a dialogue between the human user and the data (Keim et. al 2008). Combining automatic data analysis and informative visualizations can significantly reduce the gap between the data and the people’s understanding of them.

Many visualizations are able to expose multivariate relationships or temporal information however, few techniques can do both. Visual analytics systems should be able to present relationships between different variables across time to human users. Therefore, it is important for visualization systems to plot more than a single data dimension over time so they can reveal relationships between variables across different time periods (Lee and Shen 2009). The trailplot is a technique that can effectively visualize relationships of two variables over time (Robertson et. al. 2008, Few 2007). Trailplots are essentially scatterplots that visualize the values of an item from different time periods simultaneously in a single plot. The points from the different time periods are connected with a line that represents the temporal order. The peculiar characteristic of this visualization is that time is not represented in any of the two axes, something that is uncommon for temporal visualizations. Time is deemphasized in order to put more weight on the relationship between the two variables, as it is the case with a

(3)

scatterplot. Although it is not the most common temporal visualization, it is receiving increasingly more attention by researchers (Rind et. al 2011, Aigner et. al 2011) and practitioners (Gapminder Foundation 2011, Camtasia Studio 5, Microstrategy software, Tableau Software, Giratikanon and Parlapiano 2013). Along with its effectiveness though, the trailplots have limitations related to screen clutter as it is a space hungry technique that scales poorly for large datasets. However, analytic methods can become the remedy for the clutter problems faced by trailplots and enrich them so they can visualize large datasets gracefully.

There is a number of interaction and distortion techniques that can be utilized by visual analytics systems in order to reduce clutter, some of which also preserve the context of the data (Keim, 2002). These techniques, such as filtering, require the users or the designer of the visualization to explicitly state which data items should be distorted and how. However, users often do not know in advance what information should be visualized and they are unlikely to understand that from a cluttered overview. Furthermore, not all information is interesting even if it is visualized and nor everything that is visualized can be perceived by the viewers. The famous Ted presentation by Hans Rosling1_{would not be so engaging if he was}

not guiding the audience’s attention to the most interesting parts of the plot. Thus, an analytic method that channels the attention of the users to the most interesting data items while at the same time relieves the screen from clutter could significantly enhance the capabilities of the trailplots.

The identification of the most interesting items in a data set is achieved by utilizing techniques from the broader area of anomaly detection. Interestingness in anomaly detection is estimated using both subjective and objective measures. Subjective measures assign a bias to data items similar to the way that a human expert would filter a dataset, taking into account their personal knowledge. This approach incorporates domain knowledge but it may lead to overlooking important data items due to subjective biases. On the other hand, objective measures take a data driven approach that carries no bias and thus no domain knowledge. Instead it treats all data items equally and highlights the most interesting of them according to their statistical properties. However, due the lack of domain knowledge, objective measures can sometimes label an item as interesting although it may not be perceived as interesting by the users. Nonetheless, data driven interestingness detection is preferable for a general purpose system and thus it is the anomaly detection approach we employ.

In this paper we develop Summarizing Trailplots, a system that identifies and visualizes the most interesting items, while it provides a summary view of the less interesting items. This is achieved by an interestingness detection algorithm that we develop and test in this paper. The algorithm is part of a visual analytics system along with the trailplot visualization, a clustering algorithm and user interaction. The system visualizes the relationship of a large number of two dimensional data items for several time periods with reduced clutter, facilitating analytic tasks. Finally, we test the effectiveness of the system and the algorithm through user tests and we discuss their outcome.

The rest of the paper is organized as follows: We first study related work about the trailplot visualization. In the next two chapters we discuss the benefits and weak points of trailplots and we formulate

1_{Hans Rosling presentation}

http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you _ve_ever_seen

our solution. In chapter five we design the interestingness detection and the clustering algorithms. In chapter six we design the Summarizing Trailplots system. In chapter seven we explain the results of the user test and in the final chapter we have the final discussion.

Figure 1. Thesis Overview

2. Related Work

The idea of plotting data from different time periods on a single scatterplot and connecting them with a line to represent time is a fairly old one. It has been first used to describe the relationship between unemployment and inflation in Japan between 1960 and 1991 (Aigner et. al 2011). This method appoints a stronger emphasis on the correlation between the two variables and makes the visualization of time less prominent. H. Rosling used a similar technique along with animation during his famous TED talk where he presented the evolution of economic and societal factors of the world’s countries over time. The software that was utilized, Gapminder Foundation (2011) has been made available along with various datasets so users can use the same visualization for different types of information. Users are able to pause the animation and add trails, lines connecting two different time points, in the visualization to track a data item’s route over time. This enables the users to observe the path followed by the data on the plot making comparison between the paths possible. Similarly, trailplots have been used in medicine to explore multivariate trends for patients (Rind et. al 2011). Visualization platforms, (Camtasia Studio 5, Microstrategy software, Tableau Software, Spotfire software) also integrate trailplots as a functionality, while visualization professionals use them to facilitate raising awareness (Giratikanon and Parlapiano 2013) and performing analysis (Paul Mathewson 2014) in two dimensional datasets over time.

The usefulness of the technique has been supported by Robertson et. al (2008) when they compared trailplots, small multiples and animation in their ability to help users perceive and present trends. They concluded that trailplots perform similarly with small multiples for perceiving trends. The trailplots shine though, in

(4)

assisting users perform more complex tasks, such as perceiving correlations. For complex tasks to be conducted, the users need to understand a large number of individual data points and trailplots accommodate that gracefully. One of the users in their experiment made the following comment for the trailplot visualization: “That’s confusing and blurry.” However, after looking at it for some time concluded: “I can now see some trends that I couldn’t see before.” This shows the potential of trailplots to assist complex analytic tasks but at the same time, it highlights the outstanding problems with them.

Figure 2. The trailplots visualization depicting the relationship between inflation and unemployment over time for the United States. Despite their usefulness, trailplots suffer from screen clutter when they visualize a large number of data items or time points and they are not that intuitive when seen for the first time by users. Trailplots start to face problems with clutter when visualizing 200 items or more (Robertson et. al 2008). Therefore, trailplots usually depict a single data item or a small amount of items which are pre-selected by the user or the designer of the visualization. In this manner only the selected items have a trailplot while the rest of the items are visualized as single points. However, this implies that users already know what they want to examine which is not always the case. Additionally, it is challenging to perform exploratory analysis in this manner. Visualization software packages have introduced different methods for dealing with clutter. Data points of previous time periods can become more transparent (Tableau software) or get aggregated (Microstrategy software) to offload space in the screen and attract less attention. The methods although helpful, still have limited scalability or conceal important contextual information. This makes even moderate sized datasets hard to plot using this technique, especially if the users are not sure which items they wish to see. Furthermore, it is not easy to filter the data through a cluttered display. Therefore, in this thesis we employ analytic methods to automate the clutter reduction process to help the users explore large temporal datasets through trailplots.

3. Trailplots: Benefits, Clutter and Reduction

Techniques

In this chapter we elaborate on the strengths of trailplots and their ability to enable users understand relationships in the data over time by helping them perform comparison tasks. Additionally, we explain the clutter problems that occur with trailplots and how clutter reduction methods can solve them.

3.1 Benefits for Understanding Temporal

Data

In order to understand the effects of time in any situation humans perform mental tasks related to how the values of different variables evolve through time and interact with each other. This involves perceiving and digesting large amounts of information that is frequently too much, especially for someone unfamiliar with a respective domain (Ware 2004). Andrienko & Andrienko (2006) and Aigner et. al (2011) identify several simple to more complex perceptual tasks that are performed by users when analyzing temporal data. In order to do this they use two concepts: references and characteristics. References describe the time period of a data item and characteristics represent the value of a data item. Tasks can be elementary, concerning individual data points and synoptic concerning multiple data points. Synoptic tasks connect sets of values based on their overall behavior and distinguish whether they behave similarly or different. This comparison of primary importance for temporal data since completing synoptic comparison tasks successfully enables users to reach higher level understandings of the data (Aigner et. al 2011). Trailplots excel in providing information about both values and time periods. Since they are essentially scatterplots, they are ideal in providing information about the values of an item over two dimensions (Few, 2009). When it comes to references information, the line connecting the data points can provide sufficient information about the time periods. Combined, these two aspects facilitate the viewers’ understanding of two dimensional time oriented data the most.

Comparison is the most important task for temporal data since comparing different time periods is the cognitive process that enables humans to understand the passing of time. Gleicher et. al (2011) constructed a taxonomy of techniques that assist viewers in making comparisons. Their taxonomy contains three main approaches for assisting comparisons through visualization.

 Juxtaposition which places the compared items next to each other in separate screens and utilizes the viewers’ memory to make the comparisons between them.

 Superposition which involves placing all items in the same screen. It utilizes the perceptual system to make comparison between the objects through proximity.

 Explicit Encodings utilizes computation to compare different items and then visualizes only their different components. Experiments show that superimposing data facilitates comparison tasks the most compared to explicit encoding with juxtaposition not falling far behind. According to Gleicher et al (2011) superposition is the best suited technique for reducing the cognitive load of the viewer by passing it to the perceptual system. It also employs the pattern finding capabilities of perception compared to the computational approach that is used when changes are explicitly highlighted. Similarly, Tominski et. al (2012) found that superposition is more effective for abstract comparison tasks but it is less useful for dense displays. Trailplots is a typical example of superposition visualization since data points from different time periods are visualized in the same plot. This facilitates comparison tasks, but at the same time overloads the screen with too much information.

(5)

3.2 Problems with Screen Clutter

Trailplots are great in facilitating comparisons, however, visual displays are frequently small and thus they face problems with visual clutter when a large number of items is plotted. Clutter prevents the users from performing the visual cognitive tasks that enable them to understand the data, undermining the usefulness of the visualization. Clutter creates more problems for visualizations that use data points and lines to encode the data by making it hard to understand individual items, something that hinders the analysis procedure (Few 2008). Trailplots are a characteristic case of this problem since trailplots include several data points connected with several lines for every data item. Furthermore, the overlap of data points is very common, since there is not just overlap between different data items that are plotted in the same area. It is possible that a single item is retracing its own steps during different periods in time. This creates a view that can be complex, even for a single data item. This is the reason that the use cases for such visualizations usually include only a few data items. Therefore, techniques that reduce clutter are necessary if trailplots can scale for large datasets.

Figure 3. Example of screen clutter in a trailplot, for a dataset of modest size.

3.3 Clutter Reduction Techniques

There is a variety of techniques that reduce the screen clutter and they are separated in three categories: appearance, spatial

distortion techniques and temporal techniques. Appearance

techniques affect the way data items are depicted by changing their size, opacity or by reducing their number through sampling, filtering or clustering. Spatial distortion techniques affect the space around the items or the way they fill this space. This is achieved by plotting points on pixels, adjusting the position of lines/points so they fill the space uniformly or stretch the background behind the data items. These techniques, despite being useful cannot scale to large datasets. On the contrary, appearance techniques such as sampling, filtering and clustering are scalable and enable clear distinction between individual points and lines (Ellis and Dix 2007). This is achieved by showing portions of the data or aggregates. However, this requires the users to provide some type of input in order to activate these techniques. For instance the users have to choose which items they wish to filter by making a selection of what is interesting for them. On the other hand, spatial distortion techniques are not scalable enough as they do not fare well with handling large datasets. What is more, they are not very effective in helping identify individual points or lines in a busy display (Ellis and Dix 2007). Finally, animation is the temporal technique that reduces clutter by separating items from different time periods in separate snapshots (Ellis and Dix 2007). Animations may sound as

the intuitive choice for reducing clutter in trailplots since it uses time to separate different time periods in “time slices”. However, animations are fleeting and when they are observed, this is done in motion, making the perception of changes challenging (Tversky 2002). The viewer sees only one “time slice” and therefore it is necessary to remember and recall from memory in order to perform comparisons that are necessary to understand temporal changes (Gatalsky et. al 2004).They end up providing fairly poor support for analytic tasks and at times they are not accurate enough (Gleicher et. al 2011), while they fail to provide detailed information when the number of time periods or data items increases even slightly. On the contrary, they are great for presentation as they are aesthetically appealing and are able to attract the attention of the users by using motion. Aspects of all three clutter reduction techniques can be utilized in order to improve the visual output of the trailplots.

Trailplots are great in helping users perform mental comparison tasks, through passing the cognitive load to the perceptual system. This comes at the cost of large levels of screen clutter which is not handled effectively by clutter reduction techniques, due to scalability problems. Furthermore, trailplots require more clear screens than most visualizations due to their complexity. Clutter reduction techniques can remedy this problem. Spatial distortion can help reduce the number of the items that are presented on the screen, focusing on the most interesting ones. Appearance techniques and primarily opacity is able to distort the way different time periods are plotted and reveal temporal overlap. Animation is able to channel the attention of the viewers through motion and help the users understand the flow of time. By reducing the effects of the clutter problem, trailplots can be used more effectively for the analysis of large datasets and facilitate the users’ analytics tasks with their superior comparison-supporting capabilities.

4. Solution Approach

Although clutter reduction techniques can improve the effectiveness of the trailplots visualization, they frequently require an action from the users in order to be initiated. For instance users may have to specify which items they wish to filter out. However, the users are not always aware of which items they want to affect with the distortion technique, especially when they explore a dataset and they are not familiar with it. Further, a clutter display can make it hard for users to select individual items even if they know which ones they wish to select. Therefore, the users should be supported by analytic methods that bring an aspect of intelligence in trailplots in order to reduce clutter and help the users utilize them with large datasets.

Users that use visualizations to perform analysis conduct cognitive analytic tasks that help them understand data. Performing several low level tasks leads to higher level understanding of a domain (Amar et. al 2005). High level analytic understandings are more generic in nature such as understanding how inflation affects unemployment in the United States. These understandings require someone to perform several low level tasks that constitute pieces of the broader, high level understanding. For instance, the inflation and unemployment values during different time periods and how these values correlate over time. Additionally, reaching low and high level understandings is related to the discovery of spontaneous insight by the users. Spontaneous insights are out of the box understandings that occur when humans are discovering novel knowledge by establishing new connections between a data source and their current knowledge (Chang et. al 2009). Visualizations that are mapped closely to analytic goals are more effective in helping users reach insights (Ellis and Dix 2007). Thus, our system utilizes

(6)

visualizations and analytic methods in coordination in order to support the cognitive tasks of the users.

Amar et al. (2005) identify ten cognitive tasks that enable users to achieve analytic goals.

• Retrieve Value – identify the value of a data point • Filter – view/remove selected

• Compute Derived Value – derive a value of a data item as a

function of another value

• Find Extremum – identify the highest and lowest values of a

variable

• Sort – create an ascending or descending ordered list of data items • Determine Range – identify the variation between the highest

and lowest values in a dataset

• Characterize Distribution – understand the probability

distribution of the value items in a dataset

• Find Anomalies – identify outlier data items in dataset

• Cluster – create groups of items that are similar and at the same

time different from other groups of items

• Correlate – understand the relationship between the values of two

variables

Figure 4. Low level analytic tasks. From left to right: Correlate, Characterize Distribution, Find Extremum

Figure 5. Low level analytic tasks. From left to right: Filter, Cluster, Find Anomalies

Compute derived value and sort are not relevant for our system since derived values change the data, while sorting would violate the order of time. The rest of the tasks are either passed to the perceptual system through the trailplots, or they are computed automatically using computation. This approach performs selected cognitive tasks automatically for the users in order to reduce the clutter and then uses the perceptual easing of trailplots to help users achieve high level analytic goals. More precisely we use computation to automate the Find Anomalies and Cluster tasks while Retrieve Value, Find Extremum, Correlate, Filter and

Characterize Distribution are facilitated by the visualization. The

next two parts of the paper describe how the visualizations and the automatic analysis are designed.

5. Analytic Methods

There are several analytic methods that could be potentially utilized in support of the visualizations and they can be grouped in two main categories, supervised and unsupervised. Supervised learning techniques are prediction oriented; aiming to predict the value or the class of an item based on data from previous “learned” observations (Witten et. al 2011). Unsupervised learning

techniques are exploratory, having no prior knowledge of the data. They attempt to understand its implicit structure based on its statistical properties. Since our aim is to provide data driven data abstraction that require no prior input, we focus on unsupervised techniques.

There are several unsupervised learning techniques such as clustering, dimensionality reduction (Aigner et. al 2011), segmentation, anomaly/interestingness/novelty/outlier detection, motif discovery and summarization (Esling and Agon 2012). All of them have been utilized in the context of time oriented data producing a variety of analytic outputs. Clustering and

interestingness detection are the two techniques that are the most

relevant to build our system. Clustering is a representation of the cognitive task that is performed by humans while it is also used for clutter reduction in information visualization. Interestingness detection assists in identifying interesting patterns in the data as it is part of the cognitive task of identifying anomalies. Therefore, a combination of clustering and interestingness detection provides the unsupervised learning support we need to help the users understand the time context of a dataset.

5.1 Interestingness Detection and Clustering

Algorithms

Anomaly detection is a group of analytic techniques that identifies anomalous/novel/surprising/interesting patterns in the data (Lin et. al 2005). Interestingness is part of the hierarchy of outlier detection that identifies items that fall in the space between clusters. Anomaly detection identifies outliers placed within clusters whereas novelty detection concerns items that were not part of the information space before. Very extreme outliers are perceived as noise (Masood and Soong 2013). A piece of information is interesting when it challenges the existing models but at the same time it is part of the model. "Interestingness depends on the observer's current knowledge and computational abilities. Things are boring if either too much or too little is known about them, if they appear trivial or random." (Schmidhuber 1997). Interestingness can be estimated based on both objective and subjective measures. Subjective measures of interestingness incorporate prior domain knowledge and use primarily supervised learning approaches. Objective measures techniques are data driven and unsupervised in nature and identify the most interesting patterns in a dataset based on their statistical properties. They produce a ranking of the data items based on their anomaly scores (Chapola et. al 2009). The items that are less frequent are perceived to be the most interesting ones (Masood and Soong 2013). Measuring the right properties of the data is essential for an anomaly detection technique to identify interesting patterns.

Clustering methods group the data items into natural clusters using a similarity/dissimilarity measure. The goal is to minimize the variation within the clusters and maximize the variation between the clusters (Esling and Agon 2012). In other words the clusters should contain similar data items within them but the data items within different clusters should be significantly different. There are several similarity/dissimilarity measures each suitable for different classification needs. Furthermore, there are several clustering algorithmic approaches such as k-means, fuzzy k-means, hierarchical clustering, Self Organizing Maps, each having its own characteristics (Witten et. al 2011). All approaches produce different clustering outcomes and in order to select the most appropriate, one has to consider the problem at hand.

As stated before, the interestingness detection and clustering algorithms are used to reduce the screen clutter and facilitate the

(7)

respective cognitive tasks of the users. The items that are labeled as interesting are visualized independently while the less interesting ones are visualized as cluster of similar items. These two types of visual objects help reduce the screen clutter while at the same time maintain contextual information about the clustered items. As the first step of achieving this we need to identify the features based on which, an item will be characterized as interesting or not. In the next chapters we further elaborate on the features and the design of the algorithms.

5.2 Dimensions of Temporal Change

In the context of time series, anomaly detection refers to finding interesting time points, subsequences, full time series or clusters of time series in the data based on their behavior over time. The passing of time is instinctively understood by observing changes on objects or changes on their relative positions in the spatial dimensions (Pequet 1994). Various types of temporal change exist, however, researchers use the following four fundamental types: • Magnitude of change is the actual amount of change, or the

difference between the values of one data item across two different points in time (Few 2007, Few 2009, Gomez et. al, 2013).

• Rate of change is the relative change, expressed as the percentage of change compared to the preceding time period (Few 2007, Few 2009, Gomez et. al, 2013).

• Direction of change refers to the trend, or the overall tendency of a value to increase, decrease or remain stable across a specified time period (Few 2007, Few 2009, Gomez et. al, 2013).

• Shape of change is the sum of the change in magnitude from consecutive time points across a specified timeline (Few 2007, Few 2009, Lin et. al 2005).

These dimensions form the basis upon which more complex temporal patterns emerge [7]. Magnitude of change refers to absolute changes: the GDP of the United States increased by 140 billion dollars during 2010. Rate of change on the other hand refers to percentage changes in terms of the prior value: the GDP of the

United States increased by 1% during 2010. Both magnitude and rate of change concern the difference between two different time points. Therefore, they can be attributed to differences between consecutive time points, differences between the beginning and the end of a subsequence or whole time series. Direction of change describes whether the GDP increases, decreases or remains stable for a period of time and contains several time points. Similarly, the shape of change represents cyclical movement and whether a variable fluctuates over a period of several time points. The descriptions above describe the dimensions of change of a single variable, for instance GDP.

The dimensions of change are interpreted differently when they are examined on the two dimensional plane and translating them is necessary in order to quantify change in scatterplots. The magnitude change of a single data item is liable to changes in both axes X and Y. Changes on the X axis represent left and right movement while changes on the Y axis represent up and down movement in the trailplot. The same applies to the rate changes as changes in the X variable are relative to the initial X value and changes in the Y variable are relative to the initial Y value. The shape of change is referring to how the values of a single item fluctuate over time. This dimension concerns the whole time series and not individual time points. Furthermore, this dimension distinguishes between items whose their values evolve linearly through time or create a curved path on the plot. Finally, the direction of change is related to the direction that individual items are taking over time, describing the two dimensional trend. The two variables might be trending together or have opposite trends, while also moving towards the sides or corners of the plot. Items that move towards a corner show an increase in positive or negative correlation whereas those that move closer to a side, move towards areas with lower correlation. Therefore, the distance between two points on the trailplot represents the magnitude change between two time periods. The rate of change is represented by the same distance divided by the distance of the initial time period from the starting point of both axes. The shape of change is the total curvature of a data item during all time periods. Finally, the direction of change is interpreted as the largest reduction of the distance to a corner of the plot from one period to the next. This Figure 6. High level representation of the interestingness detection algorithm. a) The data is normalized to a scale 0 to 1 b) The

dimensions of change for each data item are computed. c) The computed measures are turned into interestingness scores. d) The interestingness scores are turned into rankings. e) The individual rankings are merged into a single interestingness ranking.

(8)

interpretation of change permits the measurement of each item’s behavior over time and identify the most interesting items based on these properties.

Each data item is characterized by the four dimensions which represent different types of interest for a user. While magnitude changes are straight forward and reflect how values change between two time periods, they are not the only change of interest. Frequently, changes are measured and interpreted in relation to their initial value. Magnitude changes of similar value may have very different meaning if they have a different starting point. Similarly, individual time point changes can form very different patterns if they are put together. Change can move towards one direction with varying velocity, expressing a linear trend, or it can make cycles revisiting values that were encountered in previous time periods. Therefore, a data item can be perceived as interesting or less interesting after it is examined under different dimensions of change and the interestingness detection algorithm has to take all of them into account.

5.3 Interestingness Detection

Interestingness detection is performed by a five step algorithm that has as input the two dimensional time series and provides interestingness rankings as output. The algorithm first normalizes data to a scale from 0 to 1. Then the measures for each dimension of change are computed. The measures are translated into interestingness scores in order to identify which values are less frequent in the dataset and thus less likely to occur. The interestingness scores are used to create individual rankings for each data item. Finally, these individual rankings are combined into a global ranking of interestingness.

Input:

n number of data items with two variables X & Y and m time

periods.

Algorithm:

1. Normalize the data to bring them to the same scale

2. Compute measures for the shape for each data item

3. Transform the measures into interestingness scores

4. Create item rankings based on the interestingness scores

5. Combine rankings

Output:

A ranking of data items based on their interestingness score.

5.3.1 Data normalization

The first step of the algorithm is to preprocess the data for both variables so that measurements are meaningful and comparable. The algorithm uses a Gaussian filter to smoothen each time series to remove the noise making the data more Gaussian. The smoothened data are then normalized by transforming them to a range between 0 to 1. The normalization of the data is necessary for computing the shape and direction measures and preventing the algorithm from becoming biased by large scale variables. Stable scales also maintain a consistency in the shapes of the data items. For instance, an item with cyclic shape on the two dimensional plane would turn into an ellipse if one of the two dimensions was rescaled to a different range.

The normalized value for a figure d in a dataset D is computed as follows:

𝒅𝒏𝒐𝒓𝒎𝒂𝒍𝒊𝒛𝒆𝒅=

𝒅 − 𝒎𝒊𝒏𝒊𝒎𝒖𝒎 (𝑫) 𝒎𝒂𝒙𝒊𝒎𝒖𝒎 (𝑫) − 𝒎𝒊𝒏𝒊𝒎𝒖𝒎 (𝑫)

5.3.2 Computing measures of change

The normalized data are utilized in order to create measures of change for each data item according to the dimensions of change, introduced before. The measures for each dimension are computed in the two dimensional plane as it was explained above.

5.3.2.1 Magnitude of change

The magnitude of change is estimated as the first order difference from an initial time period to the subsequent. On the two dimensional plane, the first order differences for every time point interval are estimated using the Euclidean Distance between two consecutive points on the plot. For variables x and y in a dataset D, the magnitude of change between periods t and t-1for data item i is computed as: 𝒎𝒂𝒈𝒏𝒊𝒕𝒖𝒅𝒆_𝒊𝒕, 𝒕−𝟏_{= √(𝒙} 𝒊 𝒕_{− 𝒙} 𝒊 𝒕−𝟏₎𝟐_{+ (𝒚} 𝒊 𝒕_{− 𝒚} 𝒊 𝒕−𝟏₎𝟐

5.3.2.2 Rate of change

The rate of change is computed similarly to the magnitude dimension with the addition that the magnitude change is also relative to the position of the initial time period. In order to do this we divide the magnitude change between two periods by the distance of the initial period from the junction point of the two axes, point (0, 0). Again the Euclidean distance is used as the distance measure. 𝒓𝒂𝒕𝒆_𝒊𝒕, 𝒕−𝟏)= √(𝒙_𝒊𝒕_{− 𝒙} 𝒊 𝒕−𝟏₎𝟐_{+ (𝒚} 𝒊 𝒕_{− 𝒚} 𝒊 𝒕−𝟏₎𝟐 √(𝒙_𝒊𝒕−𝟏_{− 𝟎)}𝟐_{+ (𝒚} 𝒊 𝒕−𝟏_{− 𝟎)}𝟐

5.3.2.3 Direction of change

After computing the two dimensions related to the first order differences between time periods we also perform a measurement on the direction that the trails are taking in the plot. In other words we identify which corner of the trailplot each item is moving closer to. To achieve this we compute the distance of an item to the corners of the plot during a time period and afterwards subtract the corresponding distances of the subsequent period. The corner with the minimum value is the one that the data item is travelling towards during this time period, since the distance to it has been reduced the most. The same measure is performed for all time periods. 𝒅𝒊𝒓𝒆𝒄𝒕𝒊𝒐𝒏𝒊𝒕 = minimum ( 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 𝒕𝒐 𝒄𝒐𝒓𝒏𝒆𝒓 (𝟎, 𝟎)𝒊𝒕= √(𝒙𝒊𝒕 − 𝟎)𝟐+ (𝒚𝒊𝒕− 𝟎)𝟐− √(𝒙𝒊𝒕−𝟏 − 𝟎)𝟐+ (𝒚𝒊𝒕−𝟏− 𝟎)𝟐 , 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 𝒕𝒐 𝒄𝒐𝒓𝒏𝒆𝒓 (𝟎, 𝟏)𝒊𝒕= √(𝒙𝒊𝒕 − 𝟎)𝟐+ (𝒚𝒊𝒕− 𝟏)𝟐− √(𝒙𝒊𝒕−𝟏 − 𝟎)𝟐+ (𝒚𝒊𝒕−𝟏− 𝟏)𝟐 , 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 𝒕𝒐 𝒄𝒐𝒓𝒏𝒆𝒓 (𝟏, 𝟎)𝒊𝒕= √(𝒙𝒊𝒕 − 𝟏)𝟐+ (𝒚𝒊𝒕− 𝟎)𝟐− √(𝒙𝒊𝒕−𝟏 − 𝟏)𝟐+ (𝒚𝒊𝒕−𝟏− 𝟎)𝟐 , 𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 𝒕𝒐 𝒄𝒐𝒓𝒏𝒆𝒓 (𝟏, 𝟏)𝒊𝒕= √(𝒙𝒊𝒕− 𝟏)𝟐+ (𝒚𝒊𝒕− 𝟏)𝟐− √(𝒙𝒊𝒕−𝟏 − 𝟏)𝟐+ (𝒚𝒊𝒕−𝟏− 𝟏)𝟐 )

5.3.2.4 Direction of change

The measure for the shape of change of a data item is computed as the aggregate curvature for each time period. The curvature for a time period m for item i is computed with following formula (Worring and Smeaulders 1995):

(9)

𝒄𝒖𝒓𝒗𝒂𝒕𝒖𝒓𝒆_𝒊𝒕₌𝒙𝒊′𝒕∗𝒚′′𝒊 𝒕 − 𝒙′′_𝒊𝒕_∗𝒚′ 𝒊 𝒕 ((𝒙𝒊′𝒕)𝟐+(𝒚𝒊′𝒕)𝟐)𝟑⁄ 𝟐

The derivative values are obtained by using a Gaussian derivative filter with the same parameters to the filter that was used for smoothing the original data. The shape of change of an item is the sum of the absolute curvature values for all time periods.

𝒔𝒉𝒂𝒑𝒆𝒊 = ∑ |𝒄𝒖𝒓𝒗𝒂𝒕𝒖𝒓𝒆𝒊𝒕|

𝑚 𝑖

Once these computations are performed for all the data items in the dataset, we have measures per time period interval for the magnitude and rate dimensions, categorical information for the direction dimension per interval and a single measure per item including all time periods for the shape dimension. These figures are transformed into interestingness scores in order to transform each measure into an instance of the distribution it derives from.

Figure 7. Magnitude of change (left) and Rate of change (right). Magnitude of change is represented as the Euclidean distance between two time periods, while Rate of change is represented as the Magnitude change normalized by the distance of the initial time period from the point (0,0)

Figure 8. Shape of change (left) and Direction of change (right). Shape of change is represented as the aggregate curvature for each time period. Direction of change is represented as the corner of the plot that an item moves towards, from one time period to the next.

5.3.3 Transform the change measures into

interestingness scores

We assume that the most interesting data items are the ones that are less likely to occur in a dataset. For instance, in a dataset with large values, small values occur less frequently and therefore they are perceived as more interesting. On the contrary, in a dataset with small values the larger values are the most interesting. Therefore, the items with the less frequent measures for each dimension of change are the most interesting ones.

The data are grouped together per time period in order to identify the least probable values per time period. Thus, the same item may exhibit rare temporal behavior during some periods and common behavior over other periods. In this manner the items’ interestingness is evaluated taking into account the constraints of their temporal context while values alien to a specific time period

are excluded. The interestingness score for a single item 𝑑𝑖 is

estimated in time period m is estimated as:

𝒅𝒆𝒏𝒔𝒊𝒕𝒚_𝒊𝒕₌ 𝟏

√𝟐 ∗ 𝝅 ∗ 𝝈𝒕

∗ 𝒆−

(𝒅𝒊− 𝝁𝒕)𝟐

𝟐∗𝝈𝒕𝟐

For the shape dimension for which individual time periods are aggregated, the density estimation is conducted for the whole time series. The interestingness scores are used to create interestingness rankings. Since the estimated interestingness scores are used for the purpose of creating rankings, the precision of the estimations is of less importance, since we utilize just the order of these values.

5.3.4 Create interestingness rankings

The data items are sorted in an ascending order, based on their interestingness score, so that the items with the lower scores are on the top of the ranking. Therefore, we create individual rankings of interestingness per time period for magnitude and rate of change dimensions as follows:

𝒓𝒂𝒏𝒌𝒊𝒏𝒈𝒕=𝟎_{, 𝒓𝒂𝒏𝒌𝒊𝒏𝒈}𝒕+𝟏_{, … , 𝒓𝒂𝒏𝒌𝒊𝒏𝒈}𝒎

On the contrary, the shape dimension generates a single ranking for the whole series since the curvature per time period has been aggregated.

The direction of change dimension is not used to create rankings since its four potential outcomes limit its capability to create a useful ranking as many items will have similar values. Therefore, we use direction of change in combination with magnitude and rate of change to create derivative rankings that take into account the direction the trails are taking over time. More precisely, each interestingness score for magnitude, 𝒔(𝒎𝒂𝒈𝒏𝒊𝒕𝒖𝒅𝒆)𝒊𝒕 and rate of

change, 𝒔(𝒓𝒂𝒕𝒆)𝒊𝒕, is multiplied with the corresponding direction of

change interestingness score 𝒔(𝒅𝒊𝒓𝒆𝒄𝒕𝒊𝒐𝒏)𝒊𝒕 to derive the new

scores. The new scores are then used to create rankings of interestingness per time period similar to the original magnitude and rate rankings. The derived rankings aim to capture a different dimension of interestingness that might be missed by the rankings that do not take direction of change into account. For instance, an item that shows a magnitude change that is not so interesting might still be interesting if the direction of this change is uncommon for that specific time period.

Once all the rankings are created we have one single ranking for shape interestingness and four groups of rankings per time period for magnitude, rate, magnitude*direction and rate*direction change. These rankings are merged into a single ranking as the last step of the algorithm.

5.3.5 Combining rankings

Once all the individual rankings are computed they have to be merged into a single ranking that is representative of all four dimensions of change. This process is conducted in a hierarchy of steps. First, the individual rankings per time period are merged to a single ranking for the whole time period for each group. Afterwards, the whole time series rankings are merged again into a single interestingness ranking.

In order to merge the rankings we use the Borda count algorithm. Borda count is a single winner, election method that combines individual rankings into a single ranking that reflects consensus (van Erp et. al 2002). Each individual ranking is weighted as

(10)

follows: The higher ranked item is assigned a weight equal to the total number of items n whereas the second higher ranked receives a weight equal n-1, the third n-2 and so on. The items placed higher in the combined rank are the most interesting ones across the whole time series. At the end of the merging process we have five rankings of the data items: magnitude, rate, shape, magnitude*direction, rate*direction. These five rankings are used to label data items as interesting or less interesting through a threshold. The items that are ranked higher than the threshold, are labeled as interesting. The threshold setting procedure relies heavily on the size of the data, number of the time periods, screen size and therefore it should be adjusted to the specific case.

The Borda count method combines all ranking per time period into a single ranking for magnitude, rate, magnitude*direction and rate*direction measures. Thus, these three dimensions of change are producing four different ranked features. The combined ranking is weighting all time periods equally and the final ranking for the full period is a consensus of the interestingness during each time period. Once these four rankings are created they are ready to be merged into a single ranking along with the shape of change ranking that was already reflecting the full time period per item. Borda count merges the five rankings into a single interestingness ranking creating consensus among the different dimensions of change. We identify three different final merges of the final ranking that we wish to test with the users. The first approach combines all 5 different ranked features in a single ranking involving: magnitude, rate, shape, magnitude*direction and rate*direction. The second approach includes only shape, magnitude*direction and rate*direction, emphasizing more on the direction of change. Finally, the third approach excludes direction of change from the interestingness features by combining only magnitude, rate and shape rankings. The motivation behind these three different merging approaches is the fact that direction of change is not able to produce a ranking on its own, but it is used together with magnitude and rate. Our aim is to test the effect of this special case on the final ranking. Therefore, comparing these three merging approaches can help us discover insights on the effect of the direction of change and the most effective ranking merging.

Figure 9. Rankings of different time periods are merged into a whole time series ranking through Borda count.

5.4 Clustering

Once the data items have been labeled as interesting and less interesting, the clustering takes place. Since the purpose of clustering is to reduce the number of data items, only the less interesting items are part of the clustering procedure.

We use the Self-Organizing Map (SOM) algorithm in order to perform the clustering of the less interesting items. SOM is an unsupervised neural network that is used for data compression, clustering and visualization (Esling and Agon, 2012, Fu 2011). SOMs are great in preserving the topological order (Liao 2005) of the data while clustering which is very relevant for the trailplots since the quantitative information is encoded in the topology of the data. SOMs consist of two layers, one input layer of any dimensionality and a Kohonen layer consisting of a number of artificial neurons that are usually structured in a two dimensional grid. Data items are mapped from the input layer to the Kohonen layer using a similarity measure. The neurons have a vector of weights that is the same length as the input layer and it is initialized randomly (Mangiameli et. al 1997). SOMs utilize competitive learning in order to update the weights of the neurons based on their similarity to the input data. The neurons that are more similar to a data item have their weights adapted in such a manner that they move closer to it. For the clustering process, each neuron represents a distinct cluster. A data item belongs to a cluster of the neurons that has the most similar weight vector to their input vector. For our system we make a few design choices regarding the SOM while some design choices are left to be determined per application case. As the input vector for each data item we use the raw time series of both variables x and y normalized in a range from 0 to 1. The reason for this is that the purpose of the clustering is to maintain the topology of the data as much as possible in order to generate clusters that summarize the data items on the trailplot. As an outcome of this, the input vector of the SOM as well as the weight vector of each neuron have a length equal to the number of time periods multiplied by two. In order to estimate the similarity between the different data items the Euclidean Distance (ED) is employed as the similarity measure. ED is a shape based measure that uses the items’ spatial locations to determine their similarity and therefore it is more appropriate when the purpose of the clustering is visualization [51]. It is the geometric distance between two points and is intuitive, simple to use and computationally cheap (Lin et. al 2012). In terms of performance it can compete well enough or even outperform more complex techniques, especially for larger datasets (Trajcevski et. al 2008). The rest of the parameters of the SOM, learning rate, number of neurons and neighborhood function are not part of this design and are determined based on the specific application.

Once the data items are assigned to clusters we use their classes in order to plot the data items on the screen. The items that belong to a cluster are aggregated in order to be visualized as a single item. The aggregation is performed by averaging the X values of all the items, as well as their Y values and deriving a single two dimensional data point with coordinates for X and Y. The same operation is performed for every time period and has as an outcome the average cluster positions in the trailplot. All clusters are aggregated following the same approach.

5.5 Analytic Output

The final output of the interestingness detection and clustering is to provide the classes of the items that are visualized in the trailplots. The items that are labeled as interesting stay intact. The less interesting items are aggregated into an average item with respect

(11)

to their cluster. The output of the analytics methods becomes the input for generating the summarizing trailplots visualization.

Figure 10. The less interesting items that belong to the same cluster are aggregated into a single data item, highlighted in different color. This data item is generated as the average of the data items that belong to the cluster.

6. System Design

Summarizing Trailplots is a visual analytics system that follows the mantra introduced by Keim et. al (2008):

“Analyse First - Show the Important - Zoom, Filter and Analyse Further -

Details on Demand”

It is comprised of two trailplot visualizations supported by interestingness detection and clustering analytic methods. The main visualization plays the role of the overview, “show the important”, while the second trailplots visualization shows more localized details on demand and it is initiated by user interaction. The analytics methods are used in order to generate the overview and then they are employed again to create new visualizations based on the users’ informational needs.

6.1 Visualization Design

The summarizing trailplots visualization is generated after the interestingness detection algorithm identifies the interesting items and the clustering algorithm provides the generated cluster items. With this input the interesting items and the cluster items are visualized as trailplots.

The summarizing trailplots share most of the characteristics of normal trailplots by superimposing all time periods in a single plot. In order to show direction of time, thus the temporal order of the data points, we use transparency. The oldest data point is visualized as the most transparent one, while the most recent data point is visualized without any transparency. This method helps the users understand the flow of time as the past fades away while the present is more prominent. Additionally, transparency assists the users to understand data items whose values overlap at a time period. In the same manner it enables to observe the path of a single item that is moving cyclically and is retracing its own steps. Further, transparency encodes visually periods with faster and slower changes. Finally, the connecting line between the different time points of the same item, which is not always used in trailplots, is

deemed necessary in order to help maintain the temporal link, especially for time periods with large changes.

The two types of items that are visualized in summarizing trailplots are single items, usually deemed interesting, and clusters of less interesting items. In order to distinguish between the two, we use the size of the circles to encode this information. More precisely, we use the frequency of the data items associated with every visual object to determine their size. Therefore, single items will have a frequency of one while cluster items will have a frequency depended on the size of the cluster, or how many items belong to a cluster.

Figure 11. Summarizing Trailplots: The interesting items are visualized individually, while the less interesting items are visualized as clusters. The size of each cluster is determined by the frequency of the items that are members of the respective cluster. Time points of older time periods are more transparent than more recent ones.

6.2 Interaction Design

Summarizing trailplots is supported by several interaction techniques that allow the users to understand and further analyze the data. These techniques enable the users to select, connect, reconfigure and elaborate (Yi et. al 2007) on items with the support of an additional trailplot and the analytic methods.

As mentioned above the overview visualization is supported by an additional trailplot visualization focusing on visualization more detailed, localized information. The local trailplot visualization is identical to the overview plot, differing only in the items it visualizes which are specified by user demand. More specifically, the local trailplots is not visible and it is initiated when the user interacts with the main plot. Users can highlight items of their interest in order to bring them into focus and track their behavior over time.

The users can reconfigure the visualization by selecting items in the overview plot which are visualized in the local plot. Interesting items are visualized identical to their representation in the overview plot. However, selected clusters are represented by the individual items that represent them. The axes of the plot remains unchanged so that the selected items take up the position they would occupy in the overview plot helping the users understand their topology. This property enables the users to examine items more precisely. Furthermore, it allows them to observe the structure of the clusters giving them a better understanding of the underlying distribution. This increases the confidence of the users in the clustered overview by confirming the similarity of the items within the cluster while spotting and evaluating the effect of intra-cluster outliers. Further elaboration on the data is achieved by selecting items that are analyzed further for their interestingness while the less

(12)

interesting items are clustered once again, repeating the analyze-first step on the local range of the users’ selection. The output of the analysis is visualized in the local plot and can become the new overview if the user wishes so. This enables users to elaborate the analysis within a cluster or a tailored selection of items or simply reduce intra-cluster clutter with the support of the analytic methods.

6.3 Analytic flow

The summarizing trailplots system facilitates the majority of the low level cognitive tasks that need to be performed by users in order to reach higher level understandings. The analytic methods support the identify clusters and detect anomalies tasks. The visualizations help the users understand the correlation between two variables over time, identify extremums and ranges. Furthermore, the characterize distribution task is facilitated by the size of the data points that are plotted. This distinction is encoded in the size of the data points where the smallest points represent individual items while the largest items represent clusters. The larger the size of a cluster, the larger the frequency of items in the cluster. Furthermore, the users can spot intra-cluster outliers visually with the reconfigure technique and the supportive trailplot. The interaction techniques supported by the two trailplots and the analytic methods allow the users to employ these tasks on any subset of the data they want to analyze.

Trailplots are facilitating comparisons by passing them to the perceptual system of the users through superimposing the temporal information about two variables on the same plot. In this manner users do no not need to absorb information for the two variables separately and then perform a mental comparison to complete an analytic task. Tasks that require information to be absorbed and then compared using the brain’s working memory can become challenging for more than a couple of items (Ware, 2004). Further, to perform high level analytic tasks and get an overall understanding of the temporal context, users need to memorize and compare several items (Galatsky et. al 2004). Thus, summarizing trailplots offloads the users’ working memory so they can reach high level understandings and insights.

7. System Evaluation

To evaluate the performance of our system we conducted 5 user experiments that examined the effectiveness of the interestingness detection algorithm and the Summarizing Trailplots system. We conducted semi-structured user experiments on two types of users: users that were expert in working with time series data and users less accustomed to working with them. The interestingness detection algorithm was implemented in Python while the SOM clustering algorithm was implemented in Matlab. The visualizations were implemented using Tableau software. The threshold for the interestingness detection algorithm was set to 10% labeling those items as interesting. The number of clusters was set to 9.

7.1 The Datasets

We used two types of datasets for our experiments, one real world dataset to test the summarizing trailplots system and four artificial ones to test specific parts of the interestingness detection algorithm. The real world dataset contained GDP per capita and life expectancy at birth data for 282 of the world’s countries for a period of 42 years. The datasets were acquired from the Gapminder Foundation and the missing values were interpolated using averages. The artificial datasets were four and each contained 20 data items for two variables over 10 time periods. The first three datasets contained 19 items that were behaving similar over time and one outlying item differing on Magnitude, Direction and Shape of change for each dataset. The fourth artificial dataset contained 19 similar items and five outlying items for different dimensions of change.

7.2 User Experiments

The user experiments were conducted in three different stages and lasted between forty minutes to one hour. During the first stage the users were introduced to the trailplots visualization in order to understand how it works and what type of information is encoded in it. This introduction stage involved only the overview part of the system. Apart from familiarizing the users with the system during this stage the spontaneous impressions of the users were also captured. The second stage evaluated the performance of the Figure 12. The Summarizing Trailplots system and the analytic tasks it facilitates.