ESeeTrack: a visualization prototype for exploration and comparison of sequential fixation patterns

(1)

eSeeTrack - A Visualization Prototype for Exploration and Comparison of Sequential Fixation Patterns

by

Hoi Ying Tsang

B.Eng., McGill University, 2008

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

MASTER OF SCIENCE

in the Department of Computer Science

(2)

Hoi Ying Tsang

B.Eng., McGill University, 2008

Supervisory Committee

Dr. Melanie Tory, Supervisor (Department of Computer Science)

Dr. Micaela Serra, Departmental Member (Department of Computer Science)

(3)

iii

Supervisory Committee

Dr. Melanie Tory, Supervisor (Department of Computer Science)

Dr. Micaela Serra, Departmental Member (Department of Computer Science)

ABSTRACT

eSeeTrack, an eye-tracking visualization, facilitates exploration and comparison of sequential gaze orderings in a static or a dynamic scene. It extends current eye-tracking data visualizations by extracting patterns of sequential gaze orderings, dis-playing these patterns in a way that does not depend on the number of fixations on a scene, and enabling users to compare patterns from two or more sets of eye-gaze data. Extracting such patterns was difficult, if not impossible, with previous visualization techniques. eSeeTrack combines a timeline and a tree-structured visual representation to embody three aspects of eye-tracking data that users are interested in: duration, frequency and orderings of fixations. eSeeTrack allows ordering of fixations to be rapidly queried, explored and compared. Two case studies on surgical simulation and retail store chain to assert the capabilities of eSeeTrack are discussed in this thesis. Furthermore, eSeeTrack provides an effective and efficient mechanism to determine the pattern outliers. This approach can be effective for behavior analysis in a variety of domains that are also described.

(4)

List of Tables

Table 3.1 Attributes Supported by the Current Eye-Tracking Analysis Approaches . . . 9 Table 4.1 File Operations Supported by the Menu and Their Functionality 28 Table 5.1 Fixated Object Categories in the Surgical Simulation Study . 36 Table 5.2 Fixation Count and Percentage of Each Fixation Object in

Sur-gical Simulation . . . 39 Table 5.3 Fixated Object Categories in the Retail Store Study . . . 45 Table 5.4 Fixation Count and Percentage of Each Fixation Category in

Retail Store Chain. The Top 3 Categories in Each Store Are Highlighted. . . 47

(7)

vii

List of Figures

Figure 3.1 Example of development of a transition matrix (on the right) from the areas of interest on an image (on the left). While a matrix column represents the position where a saccade starts, a matrix row signifies the position where a saccade stops. . . . 9 Figure 3.2 Smooth Graphs [3]. ©2009 IEEE. Copy permission given May

6, 2010. . . 11 Figure 3.3 ActiviTree [23]. A pair of numbers associates with each node.

The first number represents an activity number, while the sec-ond number represents the number of occurrences of the activ-ity. ©2009 IEEE. Copy permission given May 6, 2010. . . . 11 Figure 4.1 Overview of eSeeTrack; Panel a (on the top) - timeline

sec-tion highlights all the occurrences of red bands which represent “sales promotion”; Panel b - Detailed timeline section with op-tional thumbnail images of fixated objects displayed; Panel c - Tree visualization shows the observed fixation orderings that end with “sales promotion” for studies in two retail stores. Fix-ations of store 1 are shown in colored labels and those of store 2 are in gray-shadowed labels; Panel d - Control section. . . . 14 Figure 4.2 Detailed view of control section. . . 15 Figure 4.3 A timeline displays fixations of two groups of participants. While

group 1 (on the top) has 16 participants, group 2 (on the bot-tom) has 22 participants. . . 16 Figure 4.4 Time window is used to include the fixations to be analyzed

further. . . 16 Figure 4.5 A tool tip displays detailed information about the 43rdfixation

of group 1’s participant. . . 17 Figure 4.6 A tool tip displays the start and the end time of two videos. . 17

(8)

Figure 4.10 Comparison of non-highlighted and highlighted nodes in the tree visualizations and their effect in the timelines. . . 24 Figure 4.10 Comparison of non-highlighted and highlighted nodes in the

tree visualizations and their effect in the timelines (cont.). . . 25 Figure 4.11 Tree visualization displays the tag sequences beginning with

“poster” followed by “item description”. . . 26 Figure 4.12 File menu items. . . 27 Figure 4.13 Content of a fixation set file . . . 31 Figure 5.1 Timeline section with surgical simulation fixation data; expert

surgeons are in group 1 (on the top) while novice users are in group 2 (on the bottom). Bands in red - lap screen, in green vital screen, and in blue - other. Note that the object that was fixated the most in both groups was “lap screen”. . . 37 Figure 5.2 Tag filter and bar chart with surgical simulation data. In each

category, the top bar shows the fixation counts of experts and the bottom bar represents the ones of novice users. . . 38 Figure 5.3 Tree visualizations of surgical simulation data; the trees are

rooted at “vital screen”. Elements in the fixation patterns of experts are in red and green. The ones of novices are in gray shadow. . . 40 Figure 5.4 Tree visualizations rooted at “other”. Note that experts (shown

in red and blue) have only 1 pattern. Patterns in gray shadow belong to novices. . . 42 Figure 5.5 Tag filter and bar chart showing fixations in two retail stores. 46 Figure 5.6 Time visualizations with gap duration up to 1 second. Colored

labels belong to store A and gray labels belong to store B. Note that “sales condition” does not appear in any of these patterns in either store. . . 48

(9)

ix

Figure 5.7 Tree visualizations with gap duration of up to 3 seconds. Col-ored labels belong to store A and grey labels belong to store B. Note that “sales condition” does not appear in any of these patterns in either store. . . 50 Figure 7.1 The paper prototype of a possible future version of eSeeTrack._À

List of timelines; _{Á Groups of participants; Â Tag filter with} bar chart; _{Ã List of thumbnail scene images; Ä Heatmap and} tree visualization. . . 57 Figure A.1 The first two versions of eSeeTrack user interface mockup; both

consist of a timeline and a collage visualization. A band in the timeline signifies that a fixation occurred. . . 59 Figure A.2 UI mockup version III; _{ÀTimeline visualization with a time}

window (shown in red contour); _{ÁTag list; ÂTag-tree} visual-ization;_{ÃTag order; ÄTag length; ÅGap duration threshold. .} 61 Figure A.3 A tag-tree visualization with labeled tag. . . 62 Figure A.4 UI mockup version IV; _{ÀMenu bar; ÁTimeline visualization;}

ÂTag order; ÃShow tag image checkbox; ÄTag list; ÅTag-tree visualization;ÆOrientation of the tag-tree; ÇTag length; ÈGap duration threshold. . . 63 Figure A.5 eSeeTrack prototype I; it followed the design of mockup UI

version IV. . . 64 Figure A.6 eSeeTrack prototype II. Bands in the timeline were colored.

Tag-tree was modified to a WordTree to representing the fre-quency and the orderings of fixations within the time window. A tag filter was added. . . 65 Figure A.7 Comparison of two highlight effects in timeline; a blue-border

tag block was selected in the detailed timeline (not shown). . . 66 Figure A.8 Tag Filter with a bar chart; “color circle” was the most fixated

and “slider” was the least fixated. . . 67 Figure A.9 eSeeTrack Prototype III. Highlighting effect in the timeline was

(10)

continuous support and patience helped me going through my master study. She was always there to listen and to give advice. I am sincerely grateful to her for her guidance and encouragement when I was lost and lacking confidence.

I would like to also thank Colin Swindells for providing the opportunity to work on the project of Locarna Systems Inc. as my research topic. His help, advice and suggestions improved my research.

Professor Micaela Serra deserves a special thanks as my thesis committee member. I admire her for her time, advice, suggestions and kindness. I will not forget the life lessons that she taught.

I am indebted to my many of my colleagues for creating a joyful environment in which to learn and grow. Especially, their encouragements and suggestions helped me through the difficulties.

I also wish to thank Mr. Gary Zarta, one of my teachers in the Computer Science Department at Dawson College. It was he who opened the world of Computer Science to me. His continuous support and advice led me to persist with my studies.

Furthermore, I would like to express my appreciation to David who continuously corrected my English writing and speaking. I thank him from the bottom of my heart for tolerating my bad temperament, standing beside me and encouraging me constantly during down moments.

My parents deserve special mention for their inseparable and distant support and prayers. My Father in the first place is the person who built my learning character, showing me the importance of intellectual pursuit ever since I was a child. My Mother is the one who sincerely raised me with her caring and gentle love. My Grandma, thanks for her caring and love when I was little. My Brother, thanks for being a supportive and caring sibling.

Finally, I would like to thank everybody who was important to the successful realization of this thesis, as well as express my apology that I could not mention each of you personally one by one.

(11)

xi

DEDICATION

For my parents

-Thank you for your tolerance toward my caprice. I am honored to have you as my parents. Thank you for giving me a chance to prove and improve myself.

(12)

In many domains, researchers use eye-tracking to explore where people look. Exam-ples include cognitive science, psychology, market research, product design, human computer interaction (HCI), and sport training. Typically, their research focuses on understanding a participant’s thinking and behavior. For instance, in the study of scene perception, psychologists examine eye movements and fixation patterns in or-der to unor-derstand the acquisition, processing, and storage of visual information and different contexts [5]. In HCI, gaze data is studied to evaluate the usability of an interface (e.g., are users looking at too many places before finding the necessary func-tion?) [2, 5, 7, 10, 16]. Hence, as a starting point, researchers are often interested in answering the following questions when they study eye movement data:

What does an observer fixate on [9]?

When does an observer gaze at each object, and in what order [9, 10]? How long or how often does the observer fixate on each object [10]?

The answers to these questions can lead to interesting implications in different con-texts. For example, a high number of fixations on an object likely implies viewer’s

(13)

2

interest. Fixating on an object for a long period of time may imply that the viewer is puzzling over the object or studying it. In addition, the order of fixations on dif-ferent objects gives clues about the viewer’s thinking process or strategy to complete a task. Thus, decrypting the ordering of eye movement and the gaze duration imply understanding the viewer’s cognitive process [8, 9, 10].

1.1 Attributes of Interest

Because the eye fixates on objects in different places over time, eye-tracking data varies with both space and time. However, the spatial and temporal attributes do not always need to be shown directly. Depending on the analysis goal, a user may wish to see:

Number/duration of gazes across a spatial scene [10].

Number/duration of gazes on specified regions, objects, or types of objects [10]. Temporal ordering of gazes on specified regions, objects, or types of objects

[4, 7, 10, 22].

In most analyses, users want to compare eye-gaze data between groups of participants or experimental conditions. We focus on enabling such comparisons for sequential orderings of fixations, as compared to simple counts on each object type.

For static scenes (e.g., a commercial poster), fixations can be easily summarized using techniques such as heat maps or by counting fixations on user-designated areas of interest. For dynamic scenes, fixations can be tagged (i.e., labeled with the name of the object or activity); then fixations for each tag can be counted and summarized.

(14)

proach uses a timeline and a tree-structured visualization to provide a summary of serial fixations for exploration and analysis; this is explained fully in chapter 4. It enables users to identify both frequent and infrequent orderings, and to compare or-derings between different data sets (e.g., participants or conditions). While our tool is designed to visualize eye-tracking data, we envision that it could also visualize other kinds of sequential items such as event-based data and state transitions.

1.2 Thesis Organization

The remaining chapters of this thesis are organized as follows:

Chapter 2 reviews the background of eye-tracking.

Chapter 3 discusses previous work on visualizing eye-tracking data and event-based data.

Chapter 4 presents our tool - eSeeTrack, its features, its design, and its implemen-tation.

Chapter 5 demonstrates the capabilities of eSeeTrack via case studies in two differ-ent applications: surgical simulation and retail store chain data.

Chapter 6 includes a discussion on the efficiency and effectiveness of our approach. Chapter 7 ends with the conclusions and generalizations of our work, and gives

(15)

4

Chapter 2 Eye-Tracking Background

2.1 Eye Movements

Eye movements have been studied scientifically for more than a century. In the early years, researchers discovered that the eyes move in several different ways. Two types of movement that have often interested scientists are fixations and saccades. A fixation happens when an observer gazes at a fixed spot in the world, defined as point of regard, for at least 100 to 200 milliseconds [10]. A saccade is the rapid visual transition from one fixation to another. The combination of a series of fixations and saccades is called a scanpath. Due to the high velocities in saccadic movement, which can easily reach 500°per second for a large saccade [8, 16], there is almost no visual input to the brain during a saccade. Hence, our visual experience consists of a set of fixations on different objects.

2.2 Eye-Trackers

Eye-tracking is the process of recording the point of regard using a video-based device called an eye-tracker. The two most common types of eye trackers are head-mounted

(16)

Thus, a participant wearing a head-mounted tracking device is allowed to move freely around the world during the study. The table-mounted tracker is ideal when stimuli can be displayed on a computer screen. In this case, the participant must stay in front of the screen and only make minor head movements.

2.3 Visual Processing

Past research has confirmed that ocular motion links closely to the cognitive goal of the viewer [9, 16]. In addition, the gaze sequence of an individual highly depends on the nature of a task performed [17] and varies according to bottom-up or top-down processing. In the bottom-up approach, gist or layout of the object in a scene is fixated first [10, 18]. In contrast, in top-down processing, eye movements are directed by prior knowledge and expectation of a scene [10, 18]. Furthermore, according to the mind-eye hypothesis, people are usually thinking about what they are looking at [13]. Thus, researchers may infer the thoughts of a participant via his or her fixations.

(17)

6

Chapter 3 Previous Work

3.1 Approaches to Analyzing Eye-Tracking Data

There are two main approaches to viewing eye-tracking results. The first approach is to tag each fixation with the name of the object or area of interest (AOI), and then count the fixations labeled with each tag. The second approach uses visualizations to display the result. The most well known visualizations are heat maps and gaze plots. Others that are less known are bee swarm, cluster, and area of interest. All of these assume a static scene and are composed of the scene plus an overlay showing fixations.

A heat map, also known as a fixation map or a saliency map, is the most popular visualization to present eye-tracking data [13, 26]. It visualizes relative frequencies or durations of fixations via a semi-transparent color map. Alternative versions of a heat map hide uninteresting parts by darkening or blurring them [21, 26]. Heat maps can combine the fixations of multiple participants. Bee swarm is a variation of a heat map that uses dots to indicate the point of regard. The visualization can contain gaze data of multiple participants by using different colored dots. Bee swarm can also be

(18)

can visualize separately the eye movements of multiple participants by assigning a different color to each chain of circles . Variations of a gaze plot either arrange the fixated objects in a more abstract way [6, 15] or include additional information about each fixation [11].

Cluster and AOI visualizations [7] are closely related. The cluster visualization groups gaze spots into clusters and shows each as a colored convex hull. AOI focuses on objects of interest which are often specified by the user. They are either encircled or color-shaded. Both usually show the percentage of fixations on each area of interest.

(19)

8

3.2 Shortcomings of Current Approaches

The visualizations presented in the previous section only satisfy some of the tasks identified in section 1.1. First, the visualizations typically handle only static scenes (e.g., a text, wall shelf display, or commercial poster). Bee swam can integrate with a dynamic scene (i.e., video), but the user would need to view the entire video in order to see all the data. This process becomes very time-consuming for lengthy video. Tagging can also apply to video, but summarizing tag counts gives only lim-ited information. Thus, these visualizations cannot handle dynamic scene content (e.g., interactive systems) nor head-mounted eye-trackers (in which the scene changes because the head moves or the participant moves around).

Secondly, these visual representations are unable to show both fixation frequency and the duration of fixation at the same time. Most importantly, these approaches do not enable viewers to analyze sequential fixation patterns. Those that are able to display the order of fixations do so in only a limited way: bee swarm requires an entire video to be viewed and gaze plot becomes quickly cluttered when there are more than a few fixations. Patterns of fixation orderings are not readily apparent and comparison of such patterns is even more challenging. One technique that can be used to visualize fixation ordering is a transition matrix [7, 14] (see Figure 3.1); however, retrieving a sequence stored in a transition matrix is non-trivial since it requires visiting each element in the matrix at least once. In addition, comparison of patterns cannot be performed directly with a transition matrix, and frequency of patterns is not apparent. eSeeTrack addresses the need to visualize and compare patterns of fixation orderings.

The main attributes necessary to support the desired tasks from section 1.1 are listed as rows in Table 3.1 with a summary of their inclusion in the existing eye-tracking paradigms. Table 3.1 demonstrates that eSeeTrack supports many types of

(20)

Figure 3.1: Example of development of a transition matrix (on the right) from the areas of interest on an image (on the left). While a matrix column represents the position where a saccade starts, a matrix row signifies the position where a saccade stops.

Attribute Supported Heat

map Gaze plot Bee Sw am Cluster and A OI T agging T ransition matrix eSeeT rac k

Static scene Space Time Dynamic scene Space

Time Frequency Duration Ordering Comparison = partial, = complete

(21)

10

analysis that are not supported by other techniques.

3.3 Visualizing Event Sequences

Sequences of fixations are similar to other types of time-ordered events such as state changes or sequential activities. Thus, visualizations for these kinds of data may be useful. Smooth Graphs [3] and ActiviTree [23] visualize sequences of states and events using a node-linked graph and a tree respectively (see Figure 3.2 and 3.3). Each state or event is represented by a node, with transitions between them represented by edges. However, these visualizations are insufficient for several reasons. First, ordering is difficult to discern without directional indicators such as arrow heads, which clutter the image. Second, nodes in a graph typically have several incoming and several out-going edges, making it nearly impossible to determine which outout-going edge followed a given incoming edge without some sort of interaction. This makes it almost impossible to examine sequences of length greater than two, a problem encountered in a previous eye-tracking analysis (in which such graphs were generated manually) [20]. Smooth graphs alleviate this problem somewhat by using smooth curves. However, individual sequences are still difficult to discern because of path overlap. Our approach removes the above problem by replicating nodes in different paths, using a structure similar to a WordTree [25]. Unlike ActiviTree and Smooth Graphs, eSeeTrack also supports comparison of event sequences.

(22)

Figure 3.3: ActiviTree [23]. A pair of numbers associates with each node. The first number represents an activity number, while the second number represents the number of occurrences of the activity. ©2009 IEEE. Copy permission given May 6, 2010.

(23)

12

Chapter 4 eSeeTrack

We designed eSeeTrack to visualize eye-tracking data based on a dynamic scene, to allow users to see all fixation patterns within a time frame and without requiring users to watch the whole video, and to enable users to compare fixation patterns of multiple groups of participants. As our starting point, we assume that users have automatically extracted fixations from the eye-tracking video and tagged those fixations with up to ten keywords of their choice. Tagging is used to identify categories of interest to a user (e.g., the types of objects in the scene). These tasks can be accomplished with commercial eye-tracking software. Since each fixation has an associated tag, in the following sections, a fixation pattern is referred as a tag sequence. A detailed explanation of our design decisions can be found in Appendix A.

4.1 Overview

eSeeTrack consists of four components, shown in Figure 4.1: a timeline and detailed timeline on the top (panels a and b), a tree visualization at the bottom (panel c), and a control section on the right side (panel d). Our tool allows users to create up to six groups of eye fixations. Each group may contain the fixations of more than

(24)

qualitative color map from ColorBrewer [1] and limit the number of tags to ten to ensure that the colors are easily distinguishable [24]. The tag filter, which is located at the top of the control section, labels all the tags and their associated color codes (see Figure 4.1 - panel d or Figure 4.2 for detailed version). It also serves as a filter and a summary of fixation counts. If a specific tag is unselected, the related bands in the timeline will be grayed and the associated blocks and nodes will be excluded in the detailed timeline and tree visualization. The bar chart displays the relative frequency of each tag for each participant group.

(25)

14 Figure 4.1: Overview of eSeeTrack; Panel a (on the top) - timeline section highlights all the occurrences of red bands

which represent “sales promotion”; Panel b - Detailed timeline section with optional thumbnail images of fixated objects displayed; Panel c - Tree visualization shows the observed fixation orderings that end with “sales promotion” for studies in two retail stores. Fixations of store 1 are shown in colored labels and those of store 2 are in gray-shadowed labels; Panel d - Control section.

(26)

(27)

16 Figure 4.3: A timeline displays fixations of two groups of participants. While group 1 (on the top) has 16 participants,

group 2 (on the bottom) has 22 participants.

(a) Create - dragging a box

(b) Move - mouse dragging either the top or the bottom of edge of the time window (shown in yellow edges).

(c) Resize - mouse dragging either the left or the right edge of the time window (shown in yellow edges).

(28)

colored bands of different widths, each representing a fixation. Blank spaces in a timeline denote no fixation. The width of a band denotes the duration of the fixation and the color signifies the associated tag. In Figure 4.3, red bands represent sales promotion. The tag filter in the control section is used as the legend to find out the associated tag of a colored band. Detailed information about a fixation can be accessed via a tool tip, as in Figure 4.5.

Figure 4.5: A tool tip displays detailed information about the 43rd fixation of group 1’s participant.

Figure 4.6: A tool tip displays the start and the end time of two videos.

Each timeline is segmented, with one participant per segment. Segments are the same length because the relative pattern of fixations from the start to the end of an eye-tracking session is more important than variations in the time participants took to complete the task. Tiny triangular marks on the top or bottom indicate the separation between participants within a participant group (see Figure 4.3). When hovering over a triangle as in Figure 4.6, information about the participants and their associated video lengths are displayed.

(29)

18

Some fixation patterns among different participants can already be seen quickly in the timeline by observing the variation in the colors of the bands. In addition, frequently fixated objects are apparent because of the frequency of their color. In Figure 4.3, light green bands, which represent “poster”, are the most frequent in both group 1 and 2. The fixation data can be investigated further by creating a time window as in Figure 4.4 and viewing its contents in the detailed timeline and tree visualization. Only fixations within the time window will be shown within the detailed timeline and counted for the bar charts and tree visualization.

4.3 Detailed Timeline

A detailed timeline contains a list of square blocks labeled and colored by the fixation’s tag as seen in Figure 4.9a. Thin bands may be lost in the timeline but are visible in this view. Detailed information about a fixation can be viewed by pressing the magnifying glass on a selected block as in Figure 4.7. A user can also launch the video to see the exact object that a participant was looking at (see Figure 4.8). Each block can optionally display a thumbnail image of the fixated object as shown in Figure 4.9b.

(30)

(31)

20

(a) Two detailed timelines display two lists of square blocks. Each block represents a fixation; it is labeled with its associated fixated category and is color-contoured.

(b) Thumbnail image of a fixated object is displayed in its associated block in detailed timelines.

(32)

is clicked, triggering the “sales promotion” bands to be highlighted (in panel a). In addition, selecting a tag launches a tree-structured visualization rooted at the selected tag. Thus, this tag is called the root tag.

4.4 Tree Visualization

Although the ordering of fixations can be observed from colors in the timeline, it is difficult to identify ordered patterns from this view since they may be lost in a sea of lines. It is also difficult to identify which orders are the most frequent and infrequent, and to compare the patterns of multiple groups. eSeeTrack overcomes these difficulties by extracting fixation patterns from the data and using a rooted-tree structure to visualize them.

Similar to a WordTree [25], our tree visualization combines the concept of tag clouds and suffix trees. Each node of the tree denotes a tag. It may include more than one label for a given node; the colored label represents the tag of group 1, the label of other groups is gray and is displayed as the shadow. The root of the tree is the tag that was selected in the detailed timeline section. Tree height represents the maximum allowable length of a fixation pattern, which is adjustable in the control section. With the exception of the root, each node is connected to its parent with a curve to create a smooth visual transition. Each path from the root to a leaf node is a unique observed sequence of fixations.

The tree can be either left-rooted or right-rooted in order to display fixation patterns starting from or ending with the root tag, respectively. This allows a user

(33)

22

to answer the questions of “What items does a user fixate on before/after looking at object X?” In Figure 4.1 panel c, a right-rooted tree displays all the fixations sequences that end with the “sales promotion” category.

The size of a label at a node signifies the relative frequency (or count) of sequences containing the category in its position within the sequences. The fixation count of a node is always larger than or equal to the sum of its children. To help users visually locate common tag sequences, especially for a large tree, the tree is ordered and displays the most frequent path on the top.

By highlighting nodes in the tree visualization as in Figure 4.10b, users can re-late the selected fixation pattern to the timelines to verify the moment(s) that the participants fixated on this particular sequence of objects; the associated bars in the timelines will be highlighted by fading the colors of other bars as in Figure 4.10d. In addition, the tool allows users to create a multiple-roots tree to answer questions such as “What items does a user fixate on before/after fixating on X then Y?” To accomplish this, a user clicks on a node of the tree visualization, which will add the node to the root along with any intervening nodes (see Figure 4.11). Similar to tags in the detailed timeline section, information about a particular category node can be accessed via the magnifying glass.

In a WordTree, a sequence is defined as a sentence finishing with an ending punc-tuation mark (i.e., period, question mark, or exclamation mark). A fixation sequence in our tree visualization is built according to two user-specified criteria: 1) the max-imum length of the sequence and 2) the maxmax-imum gap duration between fixations. The system searches through all fixations within the time window to find all sequences that begin (or end) with the selected tag as long as the gap duration between con-secutive fixations is less than the threshold and the sequence length does not exceed the maximum sequence length. The gap duration threshold can be adjusted between

(34)

(35)

24

(a) A tree visualization with no node highlighted.

(b) The same tree visualization with “sales promotion - sales promotion” nodes highlighted.

(36)

25 Figure 4.10: Comparison of non-highlighted and highlighted nodes in the tree visualizations and their effect in the timelines

(37)

26 Figure 4.11: Tree visualization displays the tag sequences beginning with “poster” followed by “item description”.

(38)

Figure 4.12: File menu items.

The file menu is the first component which users interact with, shown in Figure 4.12. It provides functionalities for users to import/export data in/from eSeeTrack. Before analyzing the fixation data, two types of files have to be uploaded to the server which eSeeTrack communicates with. The two types of input files are:

1. an XML file (called fixation set file), which contains a list of fixations made and ordered by participants of the study.

2. Adobe Flash video file (called video file), which includes the scenes seen from the world perspective of a participant during the eye-tracking session plus the points of regard made by the participant.

The fixation set file consists of one or more participant logs. While the fixation file is used to analyze the pattern of fixations, the video files are used to extract thumbnail images of the fixated objects and can also be played back from any fixation point. Once these two types of files are stored on the server, the fixation data can be loaded into eSeeTrack through the menu. Besides the fixation set and video files, eSeeTrack also creates a customized XML file (called state file) to save the current state of findings for later analysis. Users can also download any video files and state files back to their desktop through the menu. Table 4.1 summarizes the associated functionality of the four file operations provided in the file menu.

(39)

28

Operation Functionality

Load Either read fixation data or load the last saved state file into eSeeTrack

Save Write the current state of findings in a file stored in the server Upload Transfer a fixation set file or a video file from user’s local machine

to the server

Download Transfer a video file or a state file from the server to the user’s local machine

Table 4.1: File Operations Supported by the Menu and Their Functionality

4.6 Implementation

eSeeTrack follows the 2-tier architecture model. The model is simple to implement for the prototype version of eSeeTrack. The client side consists of the graphical user interface and the application logic. The server side provides a set of functionalities for clients to access data. In the following sections, the choice of language, the customized classes and the implementations of the critical parts of both client and server sides are described in detail.

4.6.1 Choice of Language

Since the target users of our tool were the clients of an eye-tracking company, the choice of language used to develop our tool had to satisfy the following criteria:

1. our tool should be accessible anytime and anywhere, 2. our tool does not require any installation hassle, 3. our tool has to be interactive.

A type of application which fulfills these requirements is web-based. A web application does not require installation and can be accessed easily via a browser. In addition, most people have access to an Internet connection.

(40)

professional SP3. It is a web application that can be operated in browsers supporting Adobe Flash player 10 or later. It communicates with an Apache HTTP Server which supports PHP to store and read fixation set, video and state files.

4.6.2 Customized Classes

As mentioned in section 4.5, one of the input files to eSeeTrack is a fixation set file. This file contains a list of fixations made by participants. As shown in Figure 4.13, the data in this file is organized in a hierarchical structure. The figure shows that a fixation set file has at least one participant. Each participant has information about his/her associated video file and a list of fixations. Each fixation has a set of attributes. In order to adapt this hierarchical structure, we implemented several classes to store this data from the finest to the coarsest granularities: ETFixation, ETVideo, ETTagList, ETFixationList, ETUser, and ETUserList.

ETFixation class represents a single fixation. It stores all the attributes of a fixation XML tag in a fixation set file , as well as the cropped image of a fixated object obtained from the associated video file.

ETVideo class stores all the attributes of a video XML tag in a video file. ETTagList class extends the ActionScript ArrayCollection class by adding an

attribute called “name”. It is used to group the fixations (i.e. ETFixation objects) which share a common tag and are ordered temporally. The name of the tag is stored in the “name” attribute.

(41)

30

ETFixationList class keeps track of the fixations made by a participant. It consists of three vectors and a dictionary object. The first vector object stores fixations (i.e. ETFixation objects) in temporal order. The second vector object contains ETTagList objects. It allows rapid access to fixations sharing the same tag when building the tree visualization since it orders the fixations by the tag names. The last vector keeps the names of the fixation tags. The dictionary object is used to track the count of each tag.

ETUser class represents a participant. It consists of a ETFixationList object and a ETVideo object. It provides a set of methods to access the fixation data of the participant and the information of the corresponding video.

ETUserList class corresponds to a group of participants. It contains a list of ETUser objects. It provides methods to obtain the tag names and the counts of each tag for the overall group and to access data of a participant.

(42)

31

(43)

32

4.6.3 Client Side

The main GUI components of eSeeTrack are closely related. Therefore, an action performed on a component may trigger another component to perform a task. In this section, the detailed linkage between the main components of eSeeTrack is explained. When fixation data is loaded into eSeeTrack, the system provides a list of par-ticipants to allow a user to form 1 to 6 groups of parpar-ticipants to be analyzed. The number of participants and the order of participants in each group is also controlled by the user. Once the groups of participants are formed, the system loads the fixa-tion data into a ETUserList vector object according to the parameters set previously by the user. Each element in the vector represents a group of participants. Next, the system sends the ETUserList vector to the ETTimelineContainer object. The ETTimelineContainer object is a container to hold the timelines for all the groups of participants. It creates a number of ETTimeline objects (i.e. timeline) according to the number of elements in the ETUserList vector. Then, it passes each vector element to the corresponding ETTimeline object to create a series of colored bands, each is a Sprite object. Besides loading fixation data into the timelines, the system passes the ETUserList vector to the ETTagFilterGrid (i.e. tag filter) object to create a filter and a bar chart for each tag.

When the time window, built with a Sprite object, is created inside the ET-TimelineContainer object, the system first checks the indices of the first and the last fixations within the time window for each timeline and store these indices into two ETEndRange vector objects (one for starting index and other for ending index) . Then, it requests the ETUserList vector object for the fixation count of each tag. Next, the system passes the counts to the ETTagFilterGrid object to update the bar chart. The creation of the time window also triggers the ETTagListContainer objects to populate fixation blocks in detailed timelines.

(44)

the system constructs a tree visualization rooted at the selected tag (i.e. root tag) with the parameters set from the control section. Constructing a tree visualization involves several steps:

1. The system passes an empty tree (i.e. a ETTreeStructure object), the root tag, the control section’s parameters and the starting and ending fixation indices to each element in the ETUserList vector.

2. The system first finds all the fixation sequences within the starting and ending indices which satisfying the two criteria mentioned in section 4.4 by going over the ETFixationList object of each ETUser object in a ETUserList object. 3. Then, the system aggregates the sequences into the tree.

Nodes of the tree are built with ETCompositeNode objects. A ETCompositeNode object holds a vector of ETNode objects. A ETNode object is an extension of a Sprite object. Once the tree is constructed, the system sends the ETTreeStructure object to the ETTreeVisContainer object. The ETTreeVisContainer object recursively goes over each node of tree to set the relative label size of ETNode objects, the x and y position of ETCompositeNode objects and the edge between nodes.

4.6.4 Server Side

The server side was implemented in PHP. It provides 4 functionalities to handle the menu operations:

1. To retrieve all the filenames stored in the folders containing fixation set file, video file and/or state file.

(45)

34

2. To upload a fixation set file or a video file to the server. 3. To save a state file in the server.

4. To load a state file from the server.

When a user selects “load”, “save” or “download” operation from the menu, the client side first sends request to the server to fetch the name of the files stored on the server to display in the GUI for selection.

(46)

Chapter 5 Case Studies

In order to demonstrate the capabilities of eSeeTrack, we worked on two case studies with different data sets. The first data set involves eye-tracking data from a surgical simulation with novice medical users and expert surgeons. The second set displays eye-tracking data of customers in a chain of retail clothing stores. Both data sets were collected for real eye-tracking studies. For the sake of client confidentiality, the second data set was revised to make it anonymous. The data was collected in a real retail chain that does not sell clothing; the category (tag) names were then remapped to equivalent objects in a clothing store.

5.1 Case 1: Surgical Simulation

5.1.1 Background description

Laparoscopic surgery is a minimally invasive procedure in which the surgeon makes only small incisions and uses a camera and light mounted on the end of a tool (the laparoscope) to view the interior surgical site. This view is displayed on a monitor. Due to the instruments’ complicated operation and the high cost of mistakes,

(47)

physi-36

cians require a lot of specialized training. As a result, surgical simulations are used to help novices become adept with the tools. It is known that experts exhibit more systematic eye-gaze patterns than novices during surgical simulation [12, 19].

5.1.2 Data and Hypothesis

Two groups of four users participated in a surgical simulation study. A complete description of the study may be found in [19]. The first group consisted of expert surgeons and surgical residents, and the other group was made up of novice users. Each participant performed two simulations. Fixations were categorized as being on the lap screen, vital screen, or other objects. A description of these categories can be found in Table 5.1.

Tag Description

Lap screen Monitor displaying the laparoscope camera view Vital screen Display of a patient’s vital signs

Other Other object

Table 5.1: Fixated Object Categories in the Surgical Simulation Study

The experimenters for this study had already completed their analysis before we began examining the data using eSeeTrack. Their analysis consisted of statistical computations as well as qualitative viewing of the eye-tracking videos. Our goal was to see whether our visualization tool could identify verifiable findings and patterns more quickly than these time-consuming forms of analysis. We were initially uninformed about the hypotheses and findings of the experimenters, except that we knew to expect different patterns among novices and experts, and thought that novice users might be more likely to focus on objects other than the lap and vital screens (e.g., their hands). We later verified our findings by comparing them to the experimenters’ hypotheses and analysis.

(48)

(49)

38

5.1.3 Analysis

Fixation patterns of length 5 were created for analysis. By observing the timelines and the bar chart in Figures 5.1 and 5.2, and the associated summary statistics (Table 5.2), we can see that more than 90% of fixations are on the lap screen in both groups. The bar chart indicates that novice users were more likely to look at “other” than expert users, but they rarely fixated on the “vital screen” while the expert users focused on it more often. These two observations demonstrate the novices’ lack of experience because they were less concerned about the vital status of the patient and they looked at unimportant things. This finding had been predicted by the experimenters. Thus, eSeeTrack enabled medical staff to quantify and validate their predictions.

Figure 5.2: Tag filter and bar chart with surgical simulation data. In each category, the top bar shows the fixation counts of experts and the bottom bar represents the ones of novice users.

The tree visualization showed that the most common fixation pattern for both expert and novice users was a sequence of 5 “lap screen” fixations among all the pat-terns which began or ended with “lap screen” (not shown). This result was expected since the majority of fixations were on the lap screen and the user must focus on the lap screen to complete the primary surgical tasks.

Some interesting findings emerged in the visualization when analyzing the patterns that begin or end with “vital screen” or “other” (see Figures 5.3 and 5.4). Expert

(50)

Other 3 0.3 21 2.3

Vital screen 92 9.3 4 0.4

Total 991 100.0 890 100.0

Table 5.2: Fixation Count and Percentage of Each Fixation Object in Surgical Sim-ulation

surgeons switched back and forth between the “lap screen” and the “vital screen”, while novices looked at the “vital screen” for a while and then the “lap screen”, or vice versa. This implies that the expert surgeons were more conscious of the vital status of the patient and could gain that knowledge through a quick glance. These patterns can be seen in the tree visualizations shown in Figure 5.3 where there are many more paths for experts (shown in color) than for novices (shown in gray shadow). Only 0.3% of fixations (3 out 991) were “other” in the expert group as compared to 2.3% (21 out of 890) for novices. Although 2.3% does not seem very large, in a surgical situation, this could have a huge impact on the patient. The tree visualizations (Figure 5.4) reveal that once the novice users fixated on “other”, they were more likely to revisit “other” again in subsequent fixations. By contrast, the expert users never looked at “other” more than once in a row. This finding demonstrated that the experts were more task focused with distractions being shorter and less frequent. In addition, it was interesting to note that “vital screen” never immediately preceded or followed “other”.

(51)

40

(a) Tree visualization of surgical simulation data showing the fixation patterns ending with “vital screen”.

Figure 5.3: Tree visualizations of surgical simulation data; the trees are rooted at “vital screen”. Elements in the fixation patterns of experts are in red and green. The ones of novices are in gray shadow.

(52)

(53)

42

(a) Tree visualization showing the fixation pattens ending with “other”.

Figure 5.4: Tree visualizations rooted at “other”. Note that experts (shown in red and blue) have only 1 pattern. Patterns in gray shadow belong to novices.

(54)

(55)

44

Most of our findings were hypothesized by the experimenters and identified in their analysis. However, ordered patterns of fixation tags were very difficult and time consuming for the experimenters to assess since the only way to find them was to sequentially view the videos. Thus, the experimenters could only qualitatively observe that experts looked back and forth between the vital and lap screens more frequently. Specific ordered patterns could not be identified and the relative frequency of patterns could not be quantified. We therefore expect that our visualization could be very helpful for initial exploratory analysis.

5.2 Case 2: Retail Store Chain

5.2.1 Background description

One of the main tasks of marketers is to assess consumer behaviors and to develop appropriate strategies to maximize profit. They use eye-tracking systems to study customer behavior in an attempt to gain a better understanding of buyers’ decision-making processes.

In retail stores, posters are intensively used to advertise new arrivals and/or ar-ticles on sale. Marketing teams may design printed ads that intentionally minimize legally required information or details such as the terms and conditions of a sale. For example, while promotional phrases on a poster are printed in large font to attract a customer’s attention, conditions of the sale are usually printed in tiny characters or using a similar color to the background. In order to verify whether the ads direct attention as expected, marketers may use eye-tracking systems.

(56)

were shopping in the stores. The first group consisted of 16 customers and the second group included 22 customers. Detailed explanation of the fixation categories is found in Table 5.3.

Tag Description

Clothing Clothing displayed on models or on posters

Item description Information about a product such as its style, its make, the country of fabrication, the care labels, the fabric types, etc

Poster A large printer placard that advertises certain articles and/or their sales

Price tag Price of a piece of merchandise shown on an attached label or on a poster

Sales condition Terms and condition clauses for acquiring a sale (on a poster)

Sales promotion Phrases on a poster to advertise a rebate Salesperson Person who sells merchandise in a store

Table 5.3: Fixated Object Categories in the Retail Store Study

The first goal of this case study was to examine whether two stores carrying the same merchandise but having two different display setups led to different customer behaviors. Store managers hoped that customer fixation patterns would be similar in both stores even though the stores’ arrangements were different.

The second goal of the case study was to verify that customers did not notice condition clauses associated with a promotion. This information was intentionally deemphasized so that customers would disregard it. Therefore, we expected to see few patterns containing sales conditions.

(57)

46

5.2.3 Analysis

After analyzing the fixation data with our tool, we concluded that there were only minimal differences in fixations for the two retail stores. This can be observed in the bar chart in Figure 5.5 and descriptive statistics of Table 5.4. As shown in the figure, most categories have a similar number of fixations in both stores, with the exceptions of clothing and sales promotion categories.

Figure 5.5: Tag filter and bar chart showing fixations in two retail stores.

We also constructed tree visualizations rooted at each of the top three categories. Four trees were constructed (not shown) because one of the top three categories was different in each store. The trees showed nearly all possible fixation patterns, suggesting that the store design and layout did not encourage users to gaze along any particular scanpath. In addition, there was no observable difference in the trees for the two different stores. Therefore, different store arrangements did not lead to different fixation patterns, addressing the first objective of the case study.

(58)

Fig-Item description 107 15.9 96 15.2 Poster 221 32.9 189 30.0 Price tag 86 12.8 66 10.5 Sales condition 14 2.1 2 0.3 Sales promotion 85 12.6 39 6.2 Salesperson 118 17.6 86 13.7 Total 672 100.0 630 100.0

Table 5.4: Fixation Count and Percentage of Each Fixation Category in Retail Store Chain. The Top 3 Categories in Each Store Are Highlighted.

ure 5.6. The first tree showed the fixation patterns ending in“sales promotion” and the second showed patterns beginning with “sales promotion”. In both trees, the most frequent pattern was a sequence of five “sales promotion” fixations as shown by the large labels that appear first in the tree. This demonstrates that the sales promotion was effective at maintaining a viewer’s attention for an extended period, as a marketer would hope. We also observed that none of the patterns in either tree contained “sales conditions”. In addition, incrementing the gap duration between fixations from 1 to 3 seconds (see Figure 5.7) did not show any patterns containing sales conditions. Therefore, this confirmed the hypothesis that customers did not notice condition clauses associated with a promotion.

(59)

48

(a) Tree visualization showing the fixation patterns before looking at “sales promotion”.

Figure 5.6: Time visualizations with gap duration up to 1 second. Colored labels belong to store A and gray labels belong to store B. Note that “sales condition” does not appear in any of these patterns in either store.

(60)

(61)

50

(a) Tree visualization showing the fixation patterns before looking at “sales promotion”.

Figure 5.7: Tree visualizations with gap duration of up to 3 seconds. Colored labels belong to store A and grey labels belong to store B. Note that “sales condition” does not appear in any of these patterns in either store.

(62)

(63)

52

Chapter 6 Discussion

By combining a timeline and a WordTree-like visualization, our approach enables users to quickly explore short sequential patterns of eye-gaze fixations, and compare those patterns across multiple groups. The two case studies were conducted with real users in real environments to empirically test our visualization techniques. These results demonstrate that our approach can help users to quickly identify interesting and verifiable patterns. Frequent and infrequent tag sequences can be easily spotted in the tree visualization. The relative frequency of each fixated category can be easily obtained via the bar chart or the size of node labels. By adjusting the time window, users can quickly observe how these frequencies change across different times within a session.

During the case studies, we discovered some shortcomings in our tool, particularly as the number of participants or fixations increases. First, the detailed view can only show the ordering of fixations, not the durations or exact time. In addition, accessing the detailed information about a fixation via the magnifying glass can be cumbersome. These issues might be resolved by replacing the detailed timeline with a fisheye view or a zooming feature in the timeline and by accessing more information

(64)

by rearranging the tree layout so that non-leaf nodes are always visible or by using a zooming interface. Thirdly, it would be useful to align fixation patterns from different participants according to events or phases of their task (e.g., browsing vs. making a purchase in the retail scenario, practice vs. real trials in a usability experiment), and to be able to select these time windows across all participants. Last, the number of possible fixation categories is limited with our approach because the number of colors that can be easily distinguished is small. In practice, we have found that ten categories is sufficient for most analyses we have done so far; however, analyses requiring substantially more categories may require a new approach.

Development of eSeeTrack was motivated by our research group’s previous chal-lenges in examining sequences of eye-gaze fixations [20]. This type of analysis was difficult, time consuming, and very limited in scope with previous methods. Thus, by exposing time-ordered sequences of fixations, eSeeTrack enables a new type of analysis of eye-tracking data.

(65)

54

Chapter 7 Conclusion

eSeeTrack, a interactive prototype visualization, was designed to facilitate exploration and comparison of sequential ordering of fixations in a static or a dynamic scene, which was difficult to perform with previous eye-tracking approaches. Three interest-ing aspects of eye-trackinterest-ing data: duration, frequency and orderinterest-ings of fixations, are integrated into eSeeTrack via the combination of a timeline and a tree-structured visu-alization. Through the size of node labels in the tree visualization, frequent patterns and outliers are easily determined. The capabilities of eSeeTrack were demonstrated via two case studies on surgical simulation and retail store chain data; eSeeTrack is an efficient and effective mechanism to rapidly query, explore and compare the orderings of fixations.

7.1 Generalizations

Our eye tracking example demonstrates that with some small modifications, WordTrees can visualize more than just text data. When combined with a timeline view, this approach can effectively identify common (and uncommon) sequences of events and relate those to exact time points when they occurred. Furthermore, shadowed

(66)

of behavior analysis. Examples from previous work in visualization include observed states of animal behavior [3] and sequences of human activities extracted from diary studies [23].

7.2 Future Work

In future work, we plan to continue refining eSeeTrack, adding new features and solving the issues stated in chapter 6. We would also like to extend eSeeTrack by developing a heat map-like display for dynamic scenes.

Figure 7.1 shows a possible future version of eSeeTrack. In this version, the detailed timelines are removed. Timelines are aligned vertically as shown in _{À, each} shows the fixation data of a participant. Sliders in a timeline serve as the separators of activities done or locations where the participant was in during the eye-tracking session. A user can select any participants to any group to analyzed the fixation data further by dragging the timeline of a participant to the lists in _{Á. Each group} is assigned to a color. The tag filter in Â has the same functionalities as the current version of eSeeTrack. In Ã, a list of panoramic scene thumbnail images is displayed. When an image is selected, it triggers a heatmap over that scene to be displayed in_Ä under the “Frequency/Duration” tab. The saturation of the color assigned to a group represents the intensity of fixation frequency or duration in the color map. Similar to the current eSeeTrack, the “Ordering” tab displays the tree visualization and provides controls to adjust the tree. The tree visualization shows fixation orderings over the panoramic scene.

(67)

56

Finally, we would like to deploy our tool for use in additional case studies to further evaluate its effectiveness, and consider its extension to other types of behavior analysis beyond eye tracking.

(68)

(69)

58

Appendix A

Design Decisions

Since the current design of eSeeTrack prototype went through several iterations for refinement, the appendix reviews the design choices made from the initial version until the current version of eSeeTrack. We begin with explaining the paper mockups that we did, then describe our implementation choices, and end with a description of the software prototypes.

A.1 Paper Mockups

The mockups of the user interface of eSeeTrack were designed before the actual imple-mentation took place. At this stage, we only considered visualizing eye-tracking data of one participant. Thus, we were focusing on displaying three aspects of the data: duration, frequency and orderings of fixations. Since the raw form of eye-fixation data was stored as a video data, a timeline represented the length of a video containing the fixation data, similar to the timeline in a video player application. In addition, a timeline could display the temporal ordering of fixations in a video by placing bands in it. A band represents a fixation. The position of a band signified the moment the corresponding fixation occurred in a video and the width of the band represented

(70)

(a) UI mockup version I ; the size of the thumbnail image represents the rel-ative frequency of fixation; the bigger image size means the higher relrel-ative frequency of fixations.

(b) UI mockup version II; the contour color of the thumbnail image rep-resents the relative frequency of fixation: red contour - high frequency of fixation, yellow - moderate frequency of fixation, and green - low frequency of fixation.

Figure A.1: The first two versions of eSeeTrack user interface mockup; both consist of a timeline and a collage visualization. A band in the timeline signifies that a fixation occurred.

(71)

60

the duration of the fixation relative to the video length. In order to allow the user to be aware of fixated objects, we thought of displaying the associated thumbnail images. Each thumbnail image was extracted from a part of the video frame where the fixation occurred.

A.1.1 Version I and II

At the earliest stage of design, two different versions of the user interface were de-signed, shown in Figure A.1. They included a timeline and a collage visualization with different layouts of thumbnail images. When a user selected a particular region in the timeline (i.e. a time window, shown in red contour in the timeline), the collage visualization displayed the thumbnail image of all fixated objects within the time range. In the mockup version I (see Figure A.1a), the relative frequency of fixation on an object was expressed via the size of the associated thumbnail image; the images were scattered all over the collage visualization. In the mockup version II (see Fig-ure A.1b), the contour color of a thumbnail image symbolized the relative frequency of fixation on the corresponding object; the collage visualization was organized as a matrix. In both mockup versions, when hovering over a thumbnail image, the detailed information about the associated fixation was shown in a tool tip.

Some issues were identified in these two initial designs. The ordering of fixations was not included in both versions because there was no way to identify which object was first fixated. Although ordering of fixations were somehow revealed in version II because of the placement of images in rows, this might lead to double interpretation: the ordering could be read in row or column order. Moreover, laying out the thumbnail images as shown in mockup version I might be problematic since it might require a lot of computation effort.

(72)

Figure A.2: UI mockup version III; _{ÀTimeline visualization with a time window} (shown in red contour);_{ÁTag list; ÂTag-tree visualization; ÃTag order; ÄTag length;} ÅGap duration threshold.

Next, we came up with another mockup user interface design, shown in Figure A.2. A timeline was kept, but a tag list and a tag visualization were added in this version. This version resembled more the current version of eSeeTrack. A tag list was similar to a detailed timeline and a tag visualization was similar to a tree visualization. Once a user selected a desired region (i.e. a time window) in a timeline, a tag list was populated sequentially by the fixations within a time window. Each fixation was represented by a rectangular block, called a tag, which was labeled with a fixated object. The tags in a tag list could be ordered by the temporal ordering of fixations or the frequency of fixation with _{Ã in Figure A.2. When a tag was selected in the} tag list, the tag-tree visualization displayed a tree structure rooted at the selected tag (shown as the green border rectangle in Figure A.2). This tag was called root tag. All nodes of the tree were of equal size. Each node was represented by a tag. The tree showed the observed orderings of fixations of both before and after the root tag.

(73)

62

The maximum allowable length of ordering before or after fixating on the root tag could be specified in _{Ä in Figure A.2. The maximum gap duration between fixations} could be adjusted using the slider shown _{Å in Figure A.2 to include more tags in} the tag-tree visualization. All the tags in the tag list and the tag-tree visualization displayed the thumbnail images of their associated fixated object.

Problems existed with this mockup version. First, the different alignments in the ordering of fixations in the timeline and the tag list might burden users since it re-quired users to mentally translate the ordering from horizontal (in the timeline) to vertical (in the tag list). Secondly, the layout of the tag-tree visualization possibly misled users in understanding the orderings of fixation because it combined the order-ings of fixations ending with the root tag and the ones beginning with the root tag. Figure A.3 demonstrated the same tree as in the tag-tee visualization of Figure A.2, except with labeled tag nodes. The tree showed 2 orderings ending with the root tag and 3 orderings starting with the root tag. Since tags were connected, a user might interpret 6 different orderings whose second tag was the root tag: 1) A-root-C, 2) A-root-D-F, 3) A-root-E, 4) B-root-C, 5) B-root-D-F, and 6) B-root-E while only 4 ordering existed: 1) A-root-C, 2) A-root-D-F, 3) B-root-D-F and 4) B-root-E.

Figure A.3: A tag-tree visualization with labeled tag.

A.1.3 Version IV

The version IV mockup solved the two problems identified with version III. As shown in Figure A.4, tags in the tag list were placed horizontally, the tag-tree visualization

(74)

menu bar (_{À in Figure A.4) provided support to import and export eye-tracking data} to and from the program.

Figure A.4: UI mockup version IV;ÀMenu bar; ÁTimeline visualization; ÂTag order; ÃShow tag image checkbox; ÄTag list; ÅTag-tree visualization; ÆOrientation of the tag-tree; _{ÇTag length; ÈGap duration threshold.}

A.2 Software Prototypes

As shown in Figure A.5, the preliminary prototype was based on the mockup UI version IV; some issues could be identified immediately. The timeline showed a series of narrow bands, there was no way to find out what were the fixated objects unless

(75)

64

Figure A.5: eSeeTrack prototype I; it followed the design of mockup UI version IV.

the tag list displayed the associated tags within a time window which was shown as a red border rectangle in the timeline (here the labels were not inserted yet.) The frequency of fixations on an object was completely lost in this version. Also, the widgets adjusting the visual outputs were not located in the same panel. For this reason, as shown in Figure A.6 we added color to bands in the timeline and colored border to the tags in tag list, each color corresponded to a tag category. We transformed a tag-tree visualization to a WordTree to show the frequency; each node was a colored label of a category and the size of the label showed the relative frequency. All widgets for controlling outputs were placed on a panel located on the right. A filter was added, which served both as a legend to allow users identify the association between colors and tag categories and as a filter to exclude undesired categories from the tag list and tag-tree visualization. We referred a tag list as a detailed timeline and a tag-tree visualization as a tree visualization.

(76)

Figure A.6: eSeeTrack prototype II. Bands in the timeline were colored. Tag-tree was modified to a WordTree to representing the frequency and the orderings of fixations within the time window. A tag filter was added.

(77)

66

(a) Dot indicators below the timeline show the occurrences of a fixated object.

(b) Occurrences of a fixated object are highlighted by fading the colors of other bands.

Figure A.7: Comparison of two highlight effects in timeline; a blue-border tag block was selected in the detailed timeline (not shown).

ESeeTrack: a visualization prototype for exploration and comparison of sequential fixation patterns

Table of Contents

List of Tables

List of Figures

1.1

Attributes of Interest

1.2

Thesis Organization

Chapter 2

Eye-Tracking Background

2.1

Eye Movements

2.2

Eye-Trackers

2.3

Visual Processing

Chapter 3

Previous Work

3.1

Approaches to Analyzing Eye-Tracking Data

3.2

Shortcomings of Current Approaches

3.3

Visualizing Event Sequences

Chapter 4

eSeeTrack

4.1

Overview

4.3

Detailed Timeline

4.4

Tree Visualization

4.6

Implementation

4.6.1

Choice of Language

4.6.2

Customized Classes

4.6.3

Client Side

4.6.4

Server Side

Chapter 5

Case Studies

5.1

Case 1: Surgical Simulation

5.1.1

Background description

5.1.2

Data and Hypothesis

5.1.3

Analysis

5.2

Case 2: Retail Store Chain

5.2.1

Background description

5.2.3

Analysis

Chapter 6

Discussion

Chapter 7

Conclusion

7.1

Generalizations

7.2

Future Work

Appendix A

Design Decisions

A.1

Paper Mockups

A.1.1

Version I and II

A.1.3

Version IV

A.2

Software Prototypes