Does it work?

(1)

Does it work?

University of Groningen, Communication & Information Sciences

An evaluative study of usability evaluation methods

University of Groningen, Faculty of Arts

Communication & Information Sciences

Oude Kijk in 't Jatstraat 26

9712 EK Groningen

The Netherlands

Supervisor: dr. I.F. van der Sluis

Second assessor: M. Nissim, PhD

Manon van 't Hul

S2819562

June 30th, 2017

Master thesis MSc Computer Communication

(2)

Does it work?

Picture on cover page:

Drawing: H.C.M. van ‘t Hul (2017)

Quote: Pratchett, Terry (1996). Hogfather. UK: Victor Gollancz

(3)

Preface

Finally, it is finished! My master piece.. my magnum opus.. my chef d’oevre.. my tour de force.. A trilling ride, which started in September 2014, when I thought, well why not start with a Master’s Degree in Computer Communication? Easy to combine with family live and an almost full-time job, so it will take me no more than two years..

Well, so I thought.. But although the past three years were action packed, I have learned a lot, I have made new friends and I got to know the whole easy dinner segment at the supermarket up close. Only positive experiences!

When approaching the moment to start with the Master’s thesis I was fully emerged in being a student again, with deadlines that always seemed to be planned too early and way too much stuff to do. Thinking of a subject for my thesis was difficult. I knew I wanted to do something with usability, of user experience, but to my surprise that was not specific enough to start a thesis.

After dawdling around for a bit, concentrating on my courses first, I spotted a notification on Nestor. An internship opportunity at the Faculty of Arts, which could also be a subject for a Master’s thesis. Long story short, I got an internship and a Master’s thesis in one, lucky me. This is the end result, and I am proud of myself for achieving what I set out to do three years ago.

The last three years would not have been possible if not for the everlasting patience and support of my partner, Eelco, and my children, Aislynn, Dristan and Quinn. They had to cook and clean for

themselves, the horror…

Also, without the flexibility of my employer, this whole exercise would not have been possible. I was lucky to be able to schedule my own working hours so I could ride my bike in between classes at the University of Groningen and my job at the Hanze University of Applied Sciences.

I would also like to thank Malvina Nissim, for always giving solid advice on usability theory and how to incorporate this in my thesis.

And last, lots of thanks to Ielka van der Sluis for supervising this project, her support and feedback and for answering all my questions while never letting me feel I asked a stupid one.

I hope you enjoy your reading (in general really, but specifically in this case; my thesis).

Manon van ‘t Hul

(4)

Summary

In Computer Interaction usability is a well know, tried and tested subject for almost 40 years. There are various methods available for conducting an usability evaluation of a system. Methods such as heuristic inspection, cognitive walkthrough and the think aloud protocol have become standards in the field of digital design. These methods are often combined and used at different moments in the stages of an iterative design process.

In this study, a selection of standard usability evaluation methods was used to assess the usability of a system, named the PAT Workbench, that was developed at the department of Communication and Information Science of the University of Groningen.

The goal of this study was to evaluate a specific combination of standard usability evaluation methods, within the iterative design process of design science research. The PAT Workbench was used as a case study, to test the added value of various usability evaluation methods. The study resulted in an overall evaluation method that consists of multiple usability evaluation methods and guidelines for evaluation of comparable systems.

The research question of this study is; “How do different usability evaluation methods, focussed on experts and users, contribute to the evaluation of a system during an iterative design process?” To answer the research question, two types of usability evaluations were conducted; an expert review and a longitudinal user study.

The expert review was conducted as part of the design cycle, were a system is developed, evaluated and improved in-house, before field testing with users. Part of this expert review was an heuristic inspection and a cognitive walkthrough. The heuristic inspection explored the interface of the PAT Workbench holistically while the cognitive walkthrough concentrated on three predefined top tasks of PAT Workbench. After the expert review improvement on the usability of the PAT Workbench were suggested.

(5)

The user tests were conducted twice. The first test was conducted after the beta release of the PAT Workbench, at the beginning of the course ‘Multimodale instructieve teksten in de

gezondheidscommunicatie’. Results of this test were used to make additional iterations. After the students had worked with the improved system for two months the user study was repeated.

From this study it can be concluded that an expert review as part of the design cycle of design science research is a valuable usability evaluation method and has the following advantages;

- it is cheap,

- it offers a detailed system description,

- it helps to overcome obvious issues in a more expensive user evaluation.

The heuristic inspection yielded the most suggested improvements, while the cognitive walkthrough confirmed the findings from the heuristic inspection and did not pose new issues.

During this study it became evident that testing with users displays multiple aspects of the system, which an expert may overlook and can provide useful insights in time based efficiency, error count and task completion. Testing with users is an effective and valuable method within an iterative design process.

However, some points of concern emerged:

- The test results were influenced by the iterative process (e.g. with bugs, downtime due to updates) and resulted in a decrease in user satisfaction during the second test, where an improvement was expected.

- Implementation of TAP was difficult because, participants continuously need reminders to verbalise their thoughts. .

- The UMUX questionnaire returned identical results as the SUS questionnaire and did not add new insights.

In the end, the evaluation of the selected usability evaluation methods yielded a number of suggestions to take into consideration when choosing usability evaluation methods.

A heuristic inspection could be a sufficient expert method when evaluating a system on a smaller scale, as was the case for the PAT Workbench. Heuristic inspection is an extensive method that uncovers many usability problems and a cognitive walkthrough as an addition to a heuristic inspection delivered no new insights and had no added value during this study.

In longitudinal studies, added value in iterative tests may be gained from fresh participants in addition to the original ones.

In between testing, it is advised to not let participants use a beta version of the product for their own work. Test results concerning satisfaction were influenced during this study because participants were also actual users of the system, while the system was not fully functional.

(6)

For further evaluative research on comparable systems a combination of the methods heuristic inspection, usability metrics, post-task questionnaires and a simple post-test questionnaire are suggested. In addition, an interview can be conducted to get more in depth information from the participants. When testing a simple system on a small scale, those methods are sufficient to gather usability data.

(7)

List of tables and figures

Figure 1: Design science research cycles ... 3

Table 1: Usability principles ... 6

Table 2: Four questions during a cognitive walkthrough, with original examples ... 7

Figure 2: The System Usability Scale) ... 11

Figure 3: The Usability Metric for User Experience ... 11

Figure 4:: The After Scenario Questionnaire ... 12

Figure 5: Menu structure of the PAT Workbench at the left side of a page ... 14

Table 3: Severity scores ... 15

Table 4: Actions for ‘add MI’ ... 16

Table 5: Actions for ‘Search MI’ ... 16

Table 6: Actions for ‘annotate MI’ ... 17

Table 7: Severity scores of problems in PAT Workbench ... 18

Table 8: Issues in the PAT Workbench with a severity score of 4 ... 19

Table 9: Issues in the PAT Workbench with a severity score of 3 ... 20

Table 10: Task 1: add MI ... 22

Table 11: Task 2: Search MI ... 23

Table 12: Task 3: Annotate MI ... 24

Table 13: Number of issues with a severity score of 3 or 4 ... 25

Table 14: Suggested improvements for issues with a severity score of 3 and 4 ... 27

Table 15: Operationalization of the construct usability in dimensions and indicators ... 30

Table 16: Materials ... 31

Table 17: Definitions of concepts within indicators ... 32

Table 18: which data is used to measure which indicator? ... 33

Table 19: Task completion Test 1 ... 35

Table 20: Task completion Test 2 ... 36

Table 21: Amount of errors Test 1 ... 37

Table 22: Amount of errors Test 2 ... 37

Table 23: Task times Test 1 ... 38

Table 24: Task times Test 2 ... 39

Table 25: SUS & UMUX scores for Test 1 & 2 ... 40

Table 26: ASQ scores Test 1 & Test 2 ... 41

Table 27: Positive and negative remarks during interviews after Test 1 ... 42

Table 28: Positive and negative remarks during interviews after Test 2 ... 43

(9)

1

1. Introduction

Usability is a subject that is known in the field of Computer Interaction since the 80’s (Sauro, 2013). According to Wikipedia; ‘Usability is the ease of use and learnability of a human-made object such as a tool or device’ (Usability, n.d.). As such, usability is first seen in the fields of psychology and ergonomics in the early 20th century. In 1943 Alphonse Chapanis introduced shape coding in the aircraft cockpit, where similar shaped controls caused runway crashes (Pew, 2010) and in 1954 Paul Fitts publishes a model that predicts human movement, which is known as Fitts’ law and still used in human computer interaction today (Goktürk, n.d.). During the 80’s and 90’s, usability in system design becomes more prominent. The first CHI conference was held in 1983 and in 1987 Ben Shneidermann publishes the, now standard, work ‘designing the user interface’. User questionnaires to assess the usability as the SUS (Brooke, 1996) and the QUIS (Chin et al., 1988) are developed. In 1993 Nielsen publishes his book ‘usability engineering; a practical guide to usability testing’, which quickly becomes a standard in the field of usability testing.

To evaluate the usability of a system, various methods such as heuristic inspection, cognitive walkthrough and think aloud protocol were developed since the early years of usability research. These methods are tried and tested and are used as standards in the industry of digital design. Within an iterative design process of a system, usability evaluation methods are used in different stages of the process, sometimes as a stand-alone method, sometimes combined with other methods.

In this study, a selection of standard usability evaluation methods is used to assess the usability of a system that was developed at the department of Communication and Information Science of the University of Groningen. The goal of this system, named the PAT Workbench, is to create a corpus of annotated multimodal instructions (MIs) for further research.

The PAT Workbench is a system that is still under development. According to Rubin & Chisnell (2008) there are various techniques or methods to evaluate usability, used at different phases while developing a system.

The goal of this study is to evaluate a specific combination of usability evaluation methods, within an iterative design process. In this study the PAT Workbench will be used as a case study, to test the added value of various usability evaluation methods. The study will result in an overall evaluation method that consists of multiple usability evaluation methods and guidelines for evaluation of comparable systems.

(10)

2

1.1 Reading guide

In Chapter 2 relevant theories are discussed, such as design science research, usability evaluation, expert review and testing with users.

(11)

3

2. Theory

In this chapter the theoretical framework of this study into interface evaluation methods is described. This framework builds on design science research as part of the iterative design process. In paragraph 2.1 design science research is explained. There are various methods to evaluate a system during the iterative design process. In this study into usability evaluation methods the objective is to assess a selection of these evaluation methods on what these methods yield to the evaluation of a system. Paragraph 2.2 focusses on theory about this selection of methods of usability evaluation while paragraph 2.3 describes the PAT workbench, which was used as a case study.

2.1 Design science research

Researching a system that is in development, and analysing the use of such a system during this development, is called design science research (Kuechler & Vaishnavi, 2008). Design science research is an iterative design process, in which improvements are made to a system under development. Hevner (2007) stated that there are three design science research cycles within design science research; the design cycle, the relevance cycle and the rigor cycle (Fig.1). These cycles will be discussed further in this paragraph.

(12)

4 Design cycle

The design cycle is the heart of the design science research process, in which the system is developed, from the first planning stages until the launch of the product. Within this design cycle, a system should be inspected before proceeding to field test. Iterations, based on this primary inspection, should be executed before the system is released into the relevance cycle and the rigor cycle (Heyner, 2007). An example of a primary inspection could be an evaluation by experts.

Relevance cycle

The relevance cycle is rooted in the environment in which the developed system will operate.

Requirements are formulated, based on the application domain (f.e. what do users need, what does the client want, what are the technical specifications) and the system is field tested within this same environment.

According to Hevner (2007): “The results of the field testing will determine whether additional iterations are needed in this design science research project.” A user evaluation with actual users is an example of field testing as part of the relevance circle within design science research.

Rigor cycle

The rigor cycle is the scientific foundation of the development of a system with design science research. There are two aspects to the rigor cycle. First, scientific theories and methods are used in developing and evaluating a system. Second, experiences gained from the development and evaluating process contribute to new insights and are additions to the knowledge base.

2.2 Usability evaluation

When evaluating a system during the design science research process, the primary goal is to determine whether a system is usable by the intended user population to carry out the tasks for which it was designed. In other words; to test the usability. The ISO 9241-11 standard defines usability as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” (ISO, 1998). According to ISO/IEC 9126-4 the usability metrics consists of effectiveness, efficiency and satisfaction (ISO, 2016). The definitions ISO/IEC 9126-4 uses are:

- Effectiveness: The accuracy and completeness with which users achieve specified goals.

- Efficiency: The resources expended in relation to the accuracy and completeness with which users achieve goals.

(13)

5 Usability evaluation of a system in development is an important part of the iterative design science research process. Evaluation is part of the design cycle, as well as the relevance cycle. According to Nielsen (1994) there are four ways of evaluating a systems interface. Nielsen (1994) indicated the following four methods;

-“automatically (running an interface specification through a program); - empirically (usability assessed by testing the interface with real users); - formally (using exact models and formulas to calculate usability measures);

- informally (based on rules of thumb and the general skill and experience of the evaluators).” Because of time-, cost-, and technical constraints, the empirical and informal methods are in general the most used (Preece, Rogers & Sharp, 2015), and will be discussed further in this chapter. An often used empirical method is testing with users, while conducting an expert review is the most commonly used method for an informal evaluation. Several studies were conducted to see which method yielded the best results. Desurvire. Lawrence. &: Atwood (1991) studied expert reviews and concluded that these methods where the most reliable. Karat, Campell & Fiegel (1992) compared empirical usability testing with the expert review method ‘cognitive walkthrough’ and discovered that empirical testing identified the largest number of problems. In their study into usability testing

methods, Desurvire, Kondziela & Atwood (1992) researched the influence of the expertise of the evaluator, using the expert review methods ‘heuristic inspection’ and ‘cognitive walkthrough’, on predicting results of testing with users. They concluded that a heuristic inspection yielded better results when used by experts, compared to a cognitive walkthrough, due to the analysis of more dimensions of the interface. Jeffries. Miller. Whanon. and Uyeda (1991) also compared different evaluation methods and discovered that the expert review method ‘heuristic inspection’ found more severe problems compared to testing with users or the expert review method ‘cognitive walkthrough’. In this case study about the PAT Workbench, both methods ‘expert review’ and ‘testing with users’ are used. In the next paragraphs ‘expert review’ (2.2.1) and ‘testing with users’ (2.2.2) are discussed further.

2.2.1 Expert review

To identify potential usability problems without users, standard methods for an expert review are heuristic inspection and cognitive walkthrough, especially in the design and prototyping phase (Barnum, 2011). This phase of designing and prototyping corresponds with the design cycle of design science research, where a system will be developed, evaluated and improved, before testing in the field.

2.2.1.1 Heuristic inspection

(14)

6 principles (see Table 1). This set of principles was based on an analysis of 249 usability problems that Nielsen and his colleagues collected. According to renowned usability expert Jakob Nielsen, a

“heuristic evaluation involves having a small set of evaluators examine the interface and judge its compliance with recognized usability principles (the ‘heuristics’)” (Nielsen, 1995b). When conducting a heuristic inspection, evaluators compare usability principles with a system interface while attempting to accomplish actual system tasks. This type of evaluation is particular suitable when a system cannot be field tested, and is often used early in the design phase, to make a first assessment of the product, and improve it accordingly. As the PAT Workbench is a system in development, especially at the start of this case study, a heuristic inspection will be conducted to assess usability problems and suggest improvements, as part of the iterative design process.

Table 1: Usability principles (Nielsen & Mohlich, 1990)

Heuristic principle Description

Visibility of system status The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.

Match between system and the real world

The system should speak the users' language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order. User control and freedom

Users often choose system functions by mistake and will need a clearly marked "emergency exit" to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.

Consistency and standards Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions.

Error prevention

Even better than good error messages is a careful design which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action.

Recognition rather than recall

Minimize the user's memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.

Flexibility and efficiency of use

Accelerators -- unseen by the novice user -- may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions.

Aesthetic and minimalist design

Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility.

Help users recognize, diagnose, and recover from errors

Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.

Help and documentation

(15)

7

2.2.1.2 Cognitive Walkthrough

Besides heuristic inspection, a cognitive walkthrough is an expert review method which is also often used in usability evaluation (Preece, Rogers & Sharp, 2015). Just as a heuristic inspection, a cognitive walkthrough is well suited for evaluating a system during the design cycle, before field testing in the relevance cycle.

According to Nielsen & Mack (1994); “Cognitive walkthroughs involve simulating the user’s problem-solving process at each step in the human-computer dialog, checking to see if the user’s goals and memory for actions can be assumed to lead to the next correct action”. A cognitive walkthrough is more focussed on specific tasks and explores user problems in more detail than a heuristic inspection (Preece, Rogers & Sharp, 2015). This method was designed by Wharton et al. (1994) to evaluate the learnability of a system interface, and consists of four established questions that are asked for each action the user has to perform to complete a task. In Table 2 these four questions are described and explained.

Table 2: Four questions during a cognitive walkthrough, with original examples (Wharton et al., 1994)

Questions Explanation

Will the user try to achieve the right effect? For example, maybe their task is to print a document, but the first thing they have to do is select a printer. Will they know that they should be trying to get a printer selected?

Will the user notice that the correct action is available?

If the action is to select from a visible menu, no problem. But if it's to triple-click the printer icon, they may never think of it.

Will the user associate the correct action with the effect they are trying to achieve?

If there's a menu item that says, "select printer," things will go smoothly. Not so if the menu says "SysP."

If the correct action is performed, will the user see that progress is being made toward solution of their task?

If after selecting the printer a dialog box states that the "Printer is Laser in Room 105," great. Worst case is no feedback.

Although this method is widely used in usability evaluation, it is a method which is time consuming. Also, the first three questions are not always distinguishable for the users of this method, they can sometimes be experienced as ambiguous. To address this, a more streamlined method was developed by Spencer (2000), where the first three questions have been collapsed into one question;

"Can you tell a credible story that the user will know what to do?"

Question four was also reformulated; "If the user does the step correctly, and <describe system response>, is there a credible story to explain that they knew they did the right thing?”

(16)

8 four question model by Wharton et al. (1994) costs more time, using the shorter version with two questions is preferable.

To conduct a cognitive walkthrough, the specific tasks that are explored have to be defined first. A way to evaluate the quality of a system is using top tasks (McGovern (2010). Top tasks are the tasks that are paramount for users while operating the system. Top tasks are best identified by engaging customers or stakeholders (McGovern, 2015). When users or stakeholders are not available, another way to define the top tasks is to gather customer tasks from the organization. These top tasks can also be used when testing with users.

For this case study regarding the PAT Workbench, a cognitive walkthrough is also conducted, in order to assess the system on a user level. To do this, top tasks of the PAT Workbench will be defined and, while performing the tasks, for each action the researcher will answer two questions according to the method by Spencer (2000).

2.2.2 Testing with users

Rubin & Chisnell (2008) stated that; “user testing refers to a process that employs people as testing participants who are representative of the target audience to evaluate the degree to which a product meets specific usability criteria”. Testing with users in the field is a crucial part of the relevance cycle of design science research. The methods used with user testing are various and are often combined within a test. Nielsen lists the following available methods for user testing:

 Performance measures  Think Aloud Protocol  Observation

 Questionnaires  Interviews  Focus groups  Logging actual use  User feedback

Focus groups, logging actual use (of a fully functional installed system) and user feedback are methods that are difficult to organize and have technical and time constraints due to which these methods are not feasible within the timeframe of the case study regarding the PAT Workbench. The other methods (performance measures, Think Aloud Protocol, observation, questionnaires and interviews) will be used in the PAT Workbench case study. Performance measures, Think Aloud Protocol and (usability) questionnaires are specific usability methods, where observation and

(17)

9

2.2.2.1 Performance measures

When conducting a usability test with users, it is common to record objective performance data like completion rate, time to complete and errors. With these data, usability metrics like effectiveness and efficiency, can be calculated.

Measuring effectiveness

Effectiveness is the accuracy and completeness with which users achieve specified goals. Task completion data can be used to quantify this metric. On his blog, usability researcher Jeff Sauro calls task completion rate the fundamental usability metric (Sauro, 2011a). He also indicates that, based on his own analysis of 1189 tasks from 115 usability tests with a total of 3472 users, the average task completion rate is 78% (Sauro, 2011a). For a successful system, one should aim for a task completion rate that is 78% or higher. The completion rate indicates the ISO/IEC 9126-4 metric effectiveness, as shown in the equation below (Mifsud, 2015). If many participants complete a task successfully, a system is effective.

Another measure of effectiveness is the number of errors. Errors correlate strongly with other metrics as task time, task completion and satisfaction and provides the reason behind these metrics (Sauro & Lewis, 2009). According to Rubin & Chisnell (2008); “an error is defined as any divergence by a user from an expected behaviour”. Errors can be categorized, based on data from the user test, i.e. from the think aloud data and observations. Based on extensive research of 719 tasks in software, Sauro discovered that the average number of errors per task is 0.7 (Sauro, 2012).

Measuring efficiency

Efficiency could be calculated by subtracting the start time from the end time, but since efficiency according to ISO/IEC 9126-4 is defined as "resources expended in relation to the accuracy and completeness with which users achieve goals", time based efficiency should be calculated as user effectiveness divided by the time spent by user (Midsud, 2015).

N = The total number of tasks (goals) R = The number of users

nij = The result of task i by user j;

if the user successfully completes the task, then Nij = 1, if not, then Nij = 0

(18)

10

2.2.2.2 Think aloud protocol (TAP)

Think Aloud (sometimes referred to as Thinking Aloud) is an evaluation protocol designed to study cognitive processes from participants. This can give valuable insights into preconceptions and

expectations of the participants about the system. The TAP can also help detect the causes of errors the participants make. The method involves participants verbalizing every thought throughout the

evaluation process. Nielsen (1993) stated; “Thinking aloud maybe the single most valuable usability engineering method”. Different approaches of the TAP, have been proposed, varying from no involvement of the researcher (Ericsson & Simon, 1993) to helpful guidance when a participant has some trouble (Boren & Ramey, 2000). In a study into these different approaches Krahmer & Ummelen (2004) concluded that participants were able to complete more tasks with the proposed protocol by Boren & Ramey, where the researcher provides guidance. However, it seemed that participants were influenced by the help from the researcher and this influence could pose problems for the validity. Krahmer & Ummelen (2004( found no differences in the final evaluation of the quality of the system by the participants or the number of detected navigational problems between the two approaches to the TAP. Therefore, in this case study of the PAT Workbench, the model of Ericsson & Simon (1993) with no involvement of the researcher will be used.

2.2.2.3 Post-test questionnaires

A questionnaire is a frequently used tool for usability evaluation to measure satisfaction. There are various standardized questionnaires available, i.e. QUIS (Chin et al, 1988), CSUQ (Lewis, 1995), SUS (Brooke, 1996) and USE (Lund,2001). Tullis & Stetson (2004) compared five questionnaires; SUS, QUIS, CSUQ, Words and a questionnaire they developed themselves, to see what the effectiveness was of the questionnaires with different sample sizes. They conducted an online study with 123 participants who performed two tasks on two assigned websites. After finishing the tasks participants randomly received one of the five questionnaires. Tullis & Stetson concluded that, bases on the frequency distributions from these five questionnaires, the system usability scale (SUS) was the most reliable questionnaire across sample sizes. This questionnaire will be used as a method to measure satisfaction in the case study of the PAT Workbench.

Data collected with SUS are ratings or rankings of a system, especially in regards to the ease of use of the application. According to www.usability.gov: “SUS yields a single number representing a

composite measure of the overall usability of the system being studied. When a SUS is used,

(19)

11

Figure 2: The System Usability Scale (Brooke, 1996)

In the case study about the PAT Workbench, usability is assessed according the ISO definition. Finstad (2010) proposed a questionnaire, the usability metric for user experience (UMIX), to assess the perceived usability of a system based on the ISO 9242-11 definition of usability. When UMUX is used, participants are asked to score the following four items with one of seven responses that range from Strongly Agree to Strongly disagree. Finstad tested the UMUX against SUS and concluded that both scales correlated and that UMUX was reliable in measuring the underlying construct. Because the UMUX questionnaire relates to the ISO definition of usability used in this study, this questionnaire will be applied as well.

Figure 3: The Usability Metric for User Experience (Finstad, 2010)

2.2.2.4 Post-task questionnaire

Besides post-test questionnaires like the SUS and UMUX, post-task questionnaires are also used when testing with users. This type of questionnaire provides diagnostic information specifically about the task the participant executed using the system. Sauro and Dumas (2009) compared three post-task questionnaires; the Usability Magnitude Estimation (UME, the Subjective Mental Effort Question (SMEQ) and the After Scenario Questionnaire (ASQ. T)hey concluded that the after scenario

questionnaire (ASQ) had high correlation with the other measured data, like task completion and task time. It was also the most easy to use for the participants and easy to set up, compared with the other two questionnaires UNE and SMEQ. The ASQ will be used in the PAT Workbench evaluation to measure task performance satisfaction.

1 I think that I would like to use this system frequently. 2 I found the system unnecessarily complex.

3 I thought the system was easy to use.

4 I think that I would need the support of a technical person to be able to use this system. 5 I found the various functions in this system were well integrated.

6 I thought there was too much inconsistency in this system.

7 I would imagine that most people would learn to use this system very quickly. 8 I found the system very cumbersome to use.

9 I felt very confident using the system.

10 I needed to learn a lot of things before I could get going with this system

1 This system’s capabilities meet my requirements 2 Using this system is a frustrating experience 3 This system is easy to use

(20)

12 The ASQ consists of three questions with a seven point Likert scale. When an ASQ is used,

participants are asked, after each task, to score the following three items with one of seven responses that range from Strongly Agree to Strongly disagree (Fig.4).

Figure 4:: The After Scenario Questionnaire (Lewis, 1995)

2.3 The PAT Workbench

To evaluate the different methods there are for evaluating the usability of a system, a case study was conducted of the PAT Workbench. In this paragraph the PAT workbench will be described.

2.3.1 Description of PAT Workbench

According to Van der Sluis, Kloppenburg & Redeker (2016); “the PAT Workbench is an online tool that was built to facilitate the annotation, storage and retrieval of multimodal instructions (MIs) collected by master students in Communication and Information Science at the University of Groningen.” Multimodal instructions are instructions that consists of text as well as pictures. Each year approximately 190 annotated instructions are added to the corpus in the PAT Workbench. The system is used in a course about MIs, and as a tool to facilitate empirical studies on the effectiveness of text-picture combinations. With the PAT Workbench users can annotate, store and retrieve MIs, based on a coding scheme that was developed for this purpose. It is possible for annotators to collaborate within the PAT Workbench and inter-annotator agreement is supported.

The system is designed using Bootstrap and a MySQL database is used to facilitate data creation and manipulation (Van der Sluis, Kloppenburg & Redeker, 2016). The annotated documents are stored in a separate directory, where every document is linked by an identification code to the MySQL database. Metadata is also stored with each document entry in the database.

2.3.2 Goal of PAT Workbench

In the PAT Workbench, users can perform a small set of actions; find MIs, upload MIs, annotate MIs and collaborate with other users in doing this. In this case the goal of the PAT Workbench is: Users of the PAT Workbench know how to use the application efficiently.

2.3.3 System specifications

Conceptually, the PAT workbench is employed to facilitate the process of annotating, storing and retrieving MIs. This process can be viewed as a pipeline in which different kinds of users can use the workbench for different purposes.

1 Overall, I am satisfied with the ease of completing the tasks in this scenario

2 Overall, I am satisfied with the amount of time it took to complete the tasks in this scenario 3 Overall, I am satisfied with the support information (online-line help, messages, documentation)

(21)

13 At the start of this pipeline, MIs are added to the PAT workbench. Users search the internet or other media for MIs. These MIs are then uploaded to the workbench and enriched with functional metadata (such as the source, title, description, audience, organisation). This enables future users to re-use MIs that were added by others, effectively establishing a central ‘hub’ for MIs.

Users collaborate in the PAT workbench to annotate MIs. The idea is that per MI ownership (meaning; every time the user adds an MI to his collection, the users manages the MI), the MI is annotated by its manager and a collaboration partner. This results in two annotations. The manager can then specify the final annotation by comparing both annotations and making informed decisions, which is referred to as the gold standard.

Annotation categories are used to make the MI corpus searchable. Consider that one MI has various attributes (i.e. from the previously mentioned metadata, as well as numerous values to describe the text, pictures and text-picture relations of the multimodal instruction itself) and that every MI within the corpus of annotated MIs is annotated with the same set of categories. The PAT workbench facilitates searching through the corpus of annotated MIs by filtering these categories, allowing users to build detailed queries. MIs are added to the searchable corpus) when a gold standard has been established for them. An administrator (system role) can unlock all of the MIs, making them available for other annotators again.

2.3.4 Interface structure of the system

(22)

14

Figure 5: Menu structure of the PAT Workbench at the left side of a page

2.3.5 Development of the PAT Workbench and design science research

The PAT Workbench was developed, using an iterative process of; develop – test – improve – test – etc. The different stages of evaluating and improving can be placed in the cycles of design science research.

Design cycle: part of the iterative design cycle of the PAT Workbench was a primary evaluation in the form of an expert review. Based on the expert review, alterations were made before conducting a user evaluation.

Relevance cycle: in this study two separate user tests will be conducted. The results from the conducted tests with actual users are used to improve the usability of PAT Workbench, both during and after the study, as a part of the relevance cycle within design science research.

(23)

15

3. Case study: expert review

To evaluate the interface of PAT Workbench, two expert reviews were conducted: a heuristic inspection and a cognitive walkthrough. In this chapter the methodology of these evaluation methods, the results of these evaluations are described and a hypothesis for the user evaluation is formulated.

3.1 Methods expert review

This review was conducted as part of the design cycle, were the system was developed and evaluated, and subsequently improvements were made. The research question for the expert review was “In how far does the PAT workbench suit its purpose and in what way can the design be improved?’ To answer this question an heuristic inspection and a cognitive walkthrough were conducted. The heuristic inspection explored the interface of the PAT Workbench holistically while the cognitive walkthrough concentrated on three predefined top tasks of PAT Workbench.

3.1.1 Heuristic inspection

Because the PAT Workbench is a system in development, all of the ten stated usability principles (see Table 1) were used in the heuristic inspection, to get an overall view of the usability of the system. The inspection was done by one evaluator. The evaluator went through the whole system, trying out functionalities and system tasks, while filling in a heuristic inspection checklist (Appendix A). This checklists consisted of various checkpoints and problems, categorized by the ten usability principles. To further asses the encountered problems, severity scores (see Table 3) were added, and an

assessment of the quality of the checkpoint (‘good’ or ‘improve’) was given.

Table 3: Severity scores (Nielsen, 1995c)

Score Description

0 I don't agree that this is a usability problem at all

1 Cosmetic problem only: need not be fixed unless extra time is available on project 2 Minor usability problem: fixing this should be given low priority

3 Major usability problem: important to fix, so should be given high priority 4 Usability catastrophe: imperative to fix this before product can be released

3.1.2 Top tasks

(24)

16

 Add MI

 Search MI  Annotate MI

These tasks were then broken down into actions and used in a cognitive walkthrough.

3.1.3 Cognitive walkthrough

The cognitive walkthrough in this evaluation was conducted by one person who completed specific tasks, based on predefined top tasks for the application. For each top task, the actions the user had to perform to successfully accomplish the task were identified. The top tasks and sequence of actions the user has to perform to accomplish the top tasks are described in Table 4, Table 5 and Table 6.

Table 4: Actions for ‘add MI’

step Action

1 Go to homepage of PAT Workbench 2 Click on ‘Add’ in menu on the left 3 Users clicks on tab ‘upload’ 4 Click on button ‘choose’ 5 Select file from drive 6 Click on button ‘upload’ 7 Fill out metadata on form 8 Click on button ‘save’

Table 5: Actions for ‘Search MI’

step Action

1 Go to homepage of PAT Workbench 2 Click on ‘Search’ in menu on the left 3 Click on tab ‘expansive search’

4 Enter keyword(s) in search box on page 5 Select filters (optional)

(25)

17

Table 6: Actions for ‘annotate MI’

step Action

1 Go to homepage of PAT Workbench 2 Click on ‘Collection’ in menu on the left 3 Choose MI from overview

4 Click on icon ‘pencil’

5 Annotate MI with four tabs; function, text, picture and text-picture relation 6 Click on icon ‘save’

During the walkthrough the evaluator verbalized her thoughts while performing the top tasks and recorded this on paper. This account of what happened was then used to ask two questions (Spencer, 2000) about every step in the task;

 Can you tell a credible story that the user will know what to do?’

 ‘If the user does the step correctly, and <describe system response>, is there a credible story to explain that he knows he did the right thing?’

3.2 Results expert review

3.2.2 Heuristic inspection

For each problem encountered during the heuristic inspection, a severity score was assigned, according to the opinion of the evaluator (see Appendix A). In Table 7 the number of issues per heuristic

principle is shown, including the number of issues for each severity score. The following severity scores were assigned:

0 = I don't agree that this is a usability problem at all

1 = Cosmetic problem only: need not be fixed unless extra time is available on project 2 = Minor usability problem: fixing this should be given low priority

(26)

18

Table 7: Severity scores of problems in PAT Workbench

A severity score of 3 or more is an indication of a problem that should be fixed immediately. Lower severity scores have less priority of don’t have to be fixed because they don’t pose an immediate problem for the user. In Table 8 and Table 9, the issues in the PAT Workbench with a severity score of 3 and 4 are described. For more information about the issues with a lower score, see Appendix A. Heuristics

(based on Nielsen & Mohlich (1990)) Number of issues Number of issues with severity score 0 Number of issues with severity score 1 Number of issues with severity score 2 Number of issues with severity score 3 Number of issues with severity score 4 Total severity score

Visibility of system status 16 0 2 7 6 1 38

Match between system and the real world

1 0 0 1 0 0 2

User control and freedom 3 0 0 1 1 1 9

Consistency and standards 7 0 1 1 1 4 22

Error prevention 3 0 1 0 0 2 9

Recognition rather than recall 3 0 0 2 1 0 7

Flexibility and efficiency of use 1 1 0 0 0 0 0

Aesthetic and minimalist design 10 1 4 4 1 0 15

Help users recognize, diagnose, and recover from errors

4 0 0 3 1 0 9

(27)

19

Table 8: Issues in the PAT Workbench with a severity score of 4

Usability heuristics Checkpoints & description of problems with a severity score of 4

= Usability catastrophe: imperative to fix this before product can be released

Visibility of system status

It is always clear what is happening from each action you perform.

Page search: Does not seem to work, there are no results. There is little feedback for the user that a search action was successful. When the user types a keyword into the search box, the outer lines of the box change from blue to black. This is hardly noticeable, the user can think he did not execute the action and try again. There is no feedback that the search action delivered results.

User control and freedom

It is easy to access all major portions of the site from the Home Page.

When on the homepage, it is not clear how to go to the annotating page. This is not an item in the menu while it is a top task.

Consistency and standards

Overall, the site behave like one expects a web site to behave.

At the bottom of each page there is a icon of a cogwheel. Users can change their settings. Clicking on the wheel results in a PHP error.

In the same place, at the bottom, a logout icon is seen. Hovering over the icon the following url pops up: https://mi-werkbank.webhosting.rug.nl/home/logout. Clicking on the icon results in logging out, without warning. User gets a redirect to a page with a warning: connection is not private; https://mindwise-groningen.nl/

A search action cannot be started by giving an enter. User has to click on button ‘zoeken’. The button: ‘add a missing PDF’ is not working

Error prevention Error-prone conditions are eliminated or users are presented with confirmation options before committing

When logging out via the icon at the bottom left of the page there is no message to warn user or confirm action

(28)

20

Table 9: Issues in the PAT Workbench with a severity score of 3

Usability heuristics Checkpoints & description of problems with a severity score of 3 = Major usability problem: important to fix, so should be given high priority

Visibility of system status

It is clear what information is available at the current location.

Search and browse seems to be done in different collections. Users can search in annotated MIs and browse through the whole database.

The current information matches what you expect to find.

Homepage: At the bottom of the overview of ‘my Mis’ there is an indicator there is more; ‘meer…(14)’. However, all 14 MIs in the user’s collection are shown on the home page.

Page ‘collection’: At the top there is a box that states; ‘1/13’, ‘previous’, ‘last’ etc. Next and last are more visible by using white letters instead of grey ones. It implicates that there are 13 MIs in this collection. However, there are only 5, and when the user clicks on ‘next’ the message: ‘there are no MIs that meet this search’.

It is clear where you can go from the current location.

This is not always clear, mainly because a lot of actions are hidden behind icons that are not visible or where it is not obvious to the user what an icon does.

It is always clear what is happening from each action you perform.

Page add: After adding a document and clicking on save there is no clearly visible feedback. When on a smaller screen (laptop) the user has to scroll to the top to see the text: ’loading…one moment’.

Page annotate: After saving the user can easily miss the feedback shown in the box next to the save icon. This box is also used, by the user, to describe the changes that are made and is not directly associated with feedback about the success of saving. Also, the text is difficult to see, being light grey on a dark grey background. User control and

freedom

It is easy to access all major portions of the site from the Home Page.

To go to specific MI in the collection of the user, from the homepage, the user can only get there by clicking on the icon for annotating (pencil). User will then be directed to the annotation page. It is not possible to just see an MI or to select an MI by clicking on an MI panel or title.

Consistency and standards

Overall, the site behave like one expects a web site to behave.

(29)

21

Recognition rather than recall

Available actions are always clearly presented.

Annotation page: When the user is finished with annotating it is not directly clear if he can navigate away or has to actively save the changes. It has to be saved but the option to do this is not clearly visible. The user can save the changes by clicking on a small diskette on the other side of the page (at the top). This is not consistent with rules of proximity in web design.

Aesthetic and minimalist design

The site is aesthetically pleasing.

On various occasions text does not flows or wrap properly. It spills out over the container and over other text or graphic elements like panels, buttons or icons. Also word-break does not function properly. This is seen on the page with limit options in search: stappencorres-pondentie, hoofdresultate-n, waarschuwing-en.

Help users recognize, diagnose and recover from errors

If necessary, error messages are clear and in plain language.

Page Add, uploading own file: When the user clicks on the upload button without selecting a file first a new page opens with the message; ‘There were rejected files (1): You did not select a file to upload’. There is also still code on this page.

Help and documentation If needed, a FAQ is available.

(30)

22

3.2.3 Cognitive walkthrough

In this paragraph an account is given of the cognitive walkthrough for the three described top tasks. A summary of the findings of the cognitive walkthrough is presented in Tables 10, 11 and 12. Appendix B gives a complete overview of the cognitive walkthrough and the findings.

Table 10: Task 1: add MI

Step Action Will the user know what to do? Does the user know he did the right thing?

1 Go to homepage of PAT Workbench

Yes, page opens automatically Yes, the title of the page is ‘PAT Workbench / start’.

2 Click on ‘Add’ in menu on the left

Yes, the item ‘Add’ is visible in the menu.

Yes, the title of the page is ‘PAT Workbench/ Add’ and there is an explanatory text on the page.

3 Users clicks on tab ‘upload’ Yes, page opens with tab ‘upload’ active.

Yes, tab is highlighted so the user knows it is the active tab.

4 Click on button ‘choose’ Yes, how to upload is explained in the text on the site.

Yes, a new window opens with the file manager application of the computer.

5 Select file from drive Not relevant, takes place in the setting of the user’s own computer.

Yes, system goes automatically back to the ‘add’ screen in the PAT Workbench and the chosen file is visible.

6 Click on button ‘upload’ Yes, the button ‘upload’ is visible for the user

It is also explained in the text on the page.

Yes, the user is directed to a new page. When no file is selected the user will get an error message.

7 Fill out metadata on form No, it is not always clear which fields are compulsory and what should be filled in.

No, there is no feedback for the user when filling in the fields.

8 Click on button ‘save’ Yes, the button is clearly visible beneath the form.

(31)

23

Table 11: Task 2: Search MI

Step Action Will the user know what to do? Does the user know he did the right thing?

2 Click on ‘Search’ in menu on the left

Yes, the item ‘Search’ is visible in the menu.

Yes, the title of the page is ‘PAT Workbench / Search’ and there is an explanatory text on the page.

3 Click on tab ‘expansive search’

Yes, page opens with tab ‘upload’ active.

Yes, tab is highlighted so the user knows it is the active tab.

4 Enter keyword(s) in search box on page

Yes, the search box is clearly visible.

Yes, typing is directly visible.

5 Select filters (optional) Yes, filters can be easily selected by clicking on the selection controls in the form of radio buttons.

Yes, if a filter is selected, the radio button is changed and has a black centre.

6 Click on button ‘Search’ Yes, the button ‘search’ is clearly visible next to the search box.

No, it is not immediately evident for the user that the search action was successful. The tab with results does not stand out enough to catch the eye of the user.

7 Click on tab ‘result’ No, the tab with results is not visible enough for the user.

(32)

24

Table 12: Task 3: Annotate MI

Step Action Will the user know what to do? Does the user know he did the right thing?

2 Click on ‘Collection’ in menu on the left

Yes, the item ‘Collection’ is visible in the menu, but it is not immediately clear you have to go to ‘collection’ to annotate an MI.

Yes, the title of the page is ‘PAT Workbench / Collection’.

3 Choose MI from overview At the moment of this cognitive walkthrough, this function of the PAT Workbench was not working properly. It was not possible to do a complete cognitive walkthrough.

4 Click on icon ‘pencil’

5 Annotate MI with four tabs; function, text, picture and text-picture relation

6 Click on icon ‘save’

3.3 Conclusion and suggested improvements expert review

The research question for the expert review was: ‘‘In how far does the PAT workbench suit its purpose and in what way can the design be improved?” To answer this question, through heuristic inspection and a cognitive walkthrough, some problems occurring in the PAT Workbench are identified and improvements are suggested.

3.3.1 Heuristic inspection

(33)

25

Table 13: Number of issues with a severity score of 3 or 4

Regarding the main problems for visibility of system status: - Icons were not visible enough and could be missed by users.

- Search results opened in a tab on the same page as were the search was conducted, not on a new page. This is not consistent with web design standards and users can get confused, not seeing the results immediately.

- There was not sufficient feedback, or it was not at the place where the user would expect it, f.e. when saving an annotation there was no system notification that the save was successful.

Issues with consistency and standards were mainly system errors or other elements that were not functional in the PAT Workbench.

A big issue with user control and freedom was the absence of the top task ‘Annotate MI’ in the menu of the PAT Workbench. Users have to click on a small icon in the MI tile to access the annotating page.

For error prevention, there were two fundamental problems;

- When clicking on the log out button, a user immediately exits the system. There is no notification or option to cancel the action.

- When adding an MI, it is not clear if the MI is saved, due to the message ‘loading…one moment’ that stays on screen. Users can get confused, will try again or exit the action, thinking it is not working, while this is not true

3.3.2 Cognitive Walkthrough

The cognitive walkthrough did not pose as many problems as the heuristic inspection. The top tasks are easy to accomplish by the user. The main problem for the user is, according to the walkthrough, a lack of feedback or visibility of feedback and guidance for the user.

Heuristics Number of issues with

severity score 3

Number of issues with severity score 4

Visibility of system status 6 1

User control and freedom 1 1

Consistency and standards 1 4

Error prevention 0 2

Recognition rather than recall 1 0

Aesthetic and minimalist design 1 0

Help users recognize, diagnose, and recover from errors 1 0

(34)

26

3.3.2.1 Feedback

After filling out metadata and saving it when adding an MI, there is no feedback. There is no new screen or message to indicate that saving was successful. It is advisable to give the user visible feedback about the success of saving the metadata.

When executing a search it is not clear for the user his action was successful, results open in a new tab that is not clearly visible. An improvement would be to conform to standard web conventions and open the results in a new page.

3.3.2.2 Guidance

When performing the top task ‘add MI’, the user could get into trouble when filling out the metadata. It is not clear which fields are compulsory, there is only an asterisk after these fields.

Also, it is not clear what is meant by ‘background information’ or what an OCR text means and if he should do anything with it. A solution would be to give more information to the user, e.g. to explicitly explain what they should do.

When searching for MIs the user needs to have knowledge about the annotating scheme and about MIs in general. To help the user with successfully performing this action, more guidance is advisable.

3.3.2.3 does the PAT workbench suit its purpose?

In conclusion, the PAT Workbench suits its purpose. It is possible to execute all the actions it was built for (annotation, storage and retrieval of MIs) and top tasks should be relatively easy to perform by the intended users.

However, there are some problems which could influence the performance of the user. Mostly, these problems concern the heuristics ‘visibility of system status’ and ‘consistency and standards’, the most issues with a high severity score were found within these usability heuristics. There were also

problems found based during the cognitive walkthrough, mainly in regard to feedback and guidance. On these aspects the design of PAT Workbench could be improved.

3.3.3 Suggested improvements

An important part of the design cycle is, besides conducting a formative evaluation of the system, to implement improvements based on the evaluation. From the expert review several issues were found

that are imperative for the usability of the PAT Workbench and should be fixed immediately. In Table 14 some improvements are suggested for these issues. A more extensive report and suggestions for improvements on all the issues can be found in Appendix C. This report was also shared with the developer of the PAT Workbench, immediately after conducting the expert review. Based on suggestion in this report, some improvements were applied to the PAT Workbench, in particular; - Annotation manual was added.

(35)

27 - Visual division of elements on pages by adding coloured panels, also for better readability / clarity. - Adding the functionality of hoovering over icons on MI tiles to access more information about MI. Unfortunately, it was not clear if there were other changes made to the PAT Workbench, based on the suggested improvements. This was not communicated by the developer.

Table 14: Suggested improvements for issues with a severity score of 3 and 4 Usability heuristics Suggested improvements

Visibility of system status Search: change the searchable collection to one (preferable), or make it clear to the user which collections are accessible via search or browse.

Indicators of the available information like ‘meer…(14)’, ‘1/13’, ‘previous’, ‘last’ etc. should be consistent with the information that is actually available.

Make icons more visible, i.e. by using colours that have more contrast with the background.

Give feedback that is consistent with web design standards: let results open in a new page after a search action is performed.

It is better to have the feedback (’loading…one moment’) after adding a document and clicking save in the vicinity of the save button, this is where the user will be looking during this action.

It is advisable to give the user more visible feedback after saving an annotation, so he knows the action of saving was successful.

User control and freedom It is recommended to add the top task ‘annotate’ to the menu on the left of the page. It is better to allow the user to view the MI completely before annotating, so he can make an informed decision if he wants to annotate this MI. Make it possible to just see an MI or to select an MI by clicking on an MI panel or title.

Consistency and standards System errors or other elements that are not working in the PAT Workbench should be fixed.

It is better to add the possibility to give an enter command to start a search. Error prevention When logging out it is better to give the user an option to change his mind or let

him actively confirm that this is really want he wants to do and not be immediately logged out.

Page add: The message on screen ‘loading.. one moment’ after saving a document keeps being visible on screen. To the user it seems that it is not working. After checking, MIs were saved, regardless of the message. The user will try again or exit the action, thinking it is not working, while this is not true. The user needs

appropriate feedback, this should be fixed immediately.

Recognition rather than recall Make it clear to for the user that he has to save after annotating an MI, by moving the save icon to the spot where the user is looking when he is done annotating. This is more visible.

Aesthetic and minimalist design Fix wrapping of text and word-break and fix pages were code is still seen. Help users recognize, diagnose

and recover from errors

Make error messages clear, explain to users what they did wrong and how they can fix it.

(36)

28

3.4 Expectations for testing with users

The results of testing with users are expected to give insight in the usability experienced by actual users, i.e. how well tasks are executed while working with the application, of the PAT Workbench. Based on conclusions of the heuristic inspection (HI) and the cognitive walkthrough (CW) there are some expectations for the evaluative testing with users.

Visibility of system status (HI):

Participants will have trouble with finding the page in the PAT Workbench where they can annotate an MI.

User control and freedom (HI) / visibility of feedback (CW):

Participants will have trouble with the placements and look-and-feel of icons, tabs, buttons and other elements. These will probably be not visible enough for them and prove to be difficult in locating functions.

Error prevention (HI) / visibility of feedback (CW):

The unclear or absence of notifications causes users to make errors

Guidance (CW):

Users will have trouble filling in information, e.g. when adding or searching an MI, due to the lack of guidance from the system

(37)

29

4. Case study: testing with users

In this chapter the methods which were used during the testing with users phase are described. Two user tests were conducted, based on the usability dimensions efficiency, effectivity and satisfaction. The results of the two user tests are also described in this chapter.

4.1 Methods testing with users

To evaluate the usability of a system on effectiveness, efficiency and satisfaction, testing with users is the principal method to discover user problems (Nielsen, 1993). Testing with users is also part of design science research and takes place in the relevance cycle, where a system is evaluated within the environment of the application domain.

4.1.1 Operationalisation & design

The design of the test with users of the PAT Workbench is a within-subjects design. All participants were asked to perform the same three top-tasks in a realistic task order to see if they encounter problems when executing these tasks. The user tests were conducted twice. The first test was

conducted after the beta release of the PAT Workbench, at the beginning of the course ‘Multimodale instructieve teksten in de gezondheidscommunicatie’. Results of this test were used to make additional iterations, and following this, the PAT Workbench was tested a second time with the same sample group, when the students had worked with the system for about two months.

(38)

30

Table 15: Operationalization of the construct usability in dimensions and indicators

Construct Definition

Usability The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use (ISO, 1998)

Dimension Definition Indicator

Effectiveness The accuracy and completeness with which users achieve specified goals

Number of errors the user makes when performing a task

Number of tasks completed Efficiency The resources expended when users

try to perform a task.

Time it takes a user to complete (part of) a certain task successfully.

Satisfaction How much users like using the system What do users think of the PAT Workbench

4.1.2 Participants

For this user study a purposive sample was used; nine students who were attending the course ‘Multimodale instructieve teksten in de gezondheidscommunicatie‘ were asked to participate in the user tests. This year the PAT workbench was used in this course for the first time. The students became familiar with MIs during the course. A application form was available to sign-up for the test, which took place in a lab-room at the Harmonie building of the University of Groningen.

All participating students were female. In the first test, the participants were four bachelor students and five master students. In the second Test 1 bachelor student and three master students participated.

Does it work?

Does it work?

University of Groningen, Communication & Information Sciences

An evaluative study of usability evaluation methods

University of Groningen, Faculty of Arts

Communication & Information Sciences

Oude Kijk in 't Jatstraat 26

9712 EK Groningen

The Netherlands

Supervisor: dr. I.F. van der Sluis

Second assessor: M. Nissim, PhD

Manon van 't Hul

S2819562

June 30th, 2017

Master thesis MSc Computer Communication

Does it work?

Quote: Pratchett, Terry (1996). Hogfather. UK: Victor Gollancz

Preface

Summary

Contents

List of tables and figures

1. Introduction

1.1 Reading guide

2. Theory

2.1 Design science research

2.2 Usability evaluation

2.2.1 Expert review

2.2.1.1 Heuristic inspection

2.2.1.2 Cognitive Walkthrough

2.2.2 Testing with users

2.2.2.1 Performance measures

2.2.2.2 Think aloud protocol (TAP)

2.2.2.3 Post-test questionnaires

2.2.2.4 Post-task questionnaire

2.3 The PAT Workbench

2.3.1 Description of PAT Workbench

2.3.2 Goal of PAT Workbench

2.3.3 System specifications

2.3.4 Interface structure of the system

2.3.5 Development of the PAT Workbench and design science research

3. Case study: expert review

3.1 Methods expert review

3.1.1 Heuristic inspection

3.1.2 Top tasks

3.1.3 Cognitive walkthrough

3.2 Results expert review

3.2.2 Heuristic inspection

3.2.3 Cognitive walkthrough

3.3 Conclusion and suggested improvements expert review

3.3.1 Heuristic inspection

3.3.2 Cognitive Walkthrough

3.3.2.1 Feedback

3.3.2.2 Guidance

3.3.2.3 does the PAT workbench suit its purpose?

3.3.3 Suggested improvements

3.4 Expectations for testing with users

4. Case study: testing with users

4.1 Methods testing with users

4.1.1 Operationalisation & design

4.1.2 Participants

4.1.3 Materials