The influences of design paradigms and principles on user acceptance of interactive system for adapting web applications – Case study

(1)

The influences of design paradigms and principles on user acceptance of

interactive system for adapting web applications – Case study

SUBMITTED IN PARTIAL FULLFILLMENT FOR THE DEGREE OF MASTER OF SCIENCE

Nathalie Jansen

10890378

M

ASTER

I

NFORMATION

S

TUDIES

H

UMAN-

C

ENTERED

M

ULTIMEDIA

F

ACULTY OF

S

CIENCE

U

NIVERSITY OF

A

MSTERDAM

August 23, 2017

1st_Supervisor ₂nd_Supervisor

Dr. Radboud Winkels Dr. Frank Nack

(2)

The influences of design paradigms and principles on

user acceptance of interactive system for adapting web

applications – Case study

Nathalie Jansen

University of Amsterdam Student 10890378 nathaliejansen01@gmail.com

ABSTRACT

This research focuses on influence of design principles on the technical acceptance of an interactive system for adapting web applications. The aim of the usability test of three different interaction models is to provide a possible interaction framework, which can be used for this kind of system.

Categories and Subject Descriptors

Human-centered computing

General Terms

Performance, Design, Human Factors

Keywords

User acceptance, design principles, graphical user interface, usability.

1. INTRODUCTION

According to CBS (Centraal Bureau voor Statistiek), 92% of the Dutch households have an internet connection in 2016. Of those people, 74% have shopped online in 2016, where 27% have bought an insurance or financial product (for instance a mortgage). Because more people buy online, more companies have started offering their product online. Ordering a product online for the user is usually not complicated. They select the product they want, fill in their shipping and billing information and proceed to the checkout. To offer a service online, like an insurance or mortgage, it is necessary to ask more personal questions and calculate a premium based on the input of the user. When the premium is calculated the user can then decide if he/she wants to close the insurance or not. This is already a more complicated process for the user because they are concerned about their privacy and more likely to stop with the order process before payment. (TNO, 2015) There is more pressure for financial companies to optimize their online forms for a better conversion.

In these so called ‘application street’ more is happening than the users perceives. For instance the surface and type of house are retrieved via an API after filling in the zip code. Also the premium calculation and processing the application to the back-office of the company has different services. An application street begins at the moment a customer starts the application to the (automatic) processing of the application in the back-office of the company. This is all developed by, either the technicians within a company or by a third party. Bikkelhart is a company that makes these application streets available to companies. They see that marketers are always busy optimizing the application street for higher conversion. Marketers might not want to wait until the developers have time to develop the changes and wait until the next release. For this problem Bikkelhart wants to create an interactive system based on a graphical user interface called V-Next editor. V-Next has the main function of adapting web forms. This system makes it possible for marketers to adapt the already existing application streets to optimize and test it without the need to make changes to existing code. A small research was conducted among the potential users. The survey was about the user requirements and goals that potential users have with this editor. Based on those requirements an interaction design is made based on pure human adaption. The design is not based on underlying and proved design principles or theories but on the experience of the user experience designer of the company.

To get an interaction framework that has a high positive usability and user acceptance a usability test is conducted between the three designs based on certain principles that theoretically can be best used for a graphical user interface. According to Brooke (2013), the specific context of a system and their usability makes it difficult to compare usability across different systems.

This paper is a case study to provide a possible interaction framework, which can be used for an interactive system for adapting web applications. The method and research is explained and at last an answer is given to the research question and hypothesis.

(3)

2. RELATED WORK

2.1 Design principles and paradigm’s

The focus of computer-centered design has shifted to the human-centered design. A system needs to adapt to the user. (Mandel, 1997) Also the interface of the system is getting more graphical. A user that is positive about an interactive system is more likely to use it and share their experience with others. Since the start of the focus on human-centered design, research has started on possible reusable design principles from other research fields and creating new principles for specific types of interfaces and systems.

Hansen (1971) defined one of the first lists of principles for interface design and other researches have filled the list of principles. For example, Galitz (2007), Mandel (1997), Shneiderman & Plaisant (2003) and Nielsen & Molich (1990). Next to the principles defined by the different researchers, technological companies such as Apple and Microsoft have their own guidelines and principle to take into account when using their system.

All these principles are not always in line with each other. This makes it more difficult for interaction designer to pick the best principles for their own system. To make sure that there are some international standards for the design process, ISO (International Organization for Standardization) metrics are introduced. For interaction design the ISO/IEC ISO 9241-11 Metrics can be used which states: “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use”. This is translated to:

• Effectiveness: If users can complete predefined tasks using the application, and the quality of the output of those tasks.

• Efficiency: The level of resources the users needs when performing the tasks.

• Satisfaction: The subjective opinion of the user in using the application.

(ISO/IEC ISO 9241-11).

Note that using the design principles does not ensure the usability of the system. Because the principles are more general they need to be interpreted for use in the designed system. All these principles are defined but the question is, which principles can best be used for an interactive system with the main goal of adapting web forms. Loranger (2002) states that even though software changes fast, the user behaviour does not. This makes older tested guidelines still effective.

Alan Dix et all (2009) stated that there are fifteen paradigm’s for interaction. For this research the paradigm’s

Window systems and the WIMP interface and Direct manipulation are of interest. WIMP (windows, icons,

menus, pointer) is still the most used paradigm for a user interaction. (Hinckley, K. 2017) The Direct manipulation is more a ‘what you see is what you get’ interaction. The

feedback on the action of the user is immediate. A disadvantage is that there is not a clear distinction between input and output.

2.2 Usability and context

When looking at the usability of an application the context when using the application also needs to be taken into account. The context is, the background and experience of the user, the environment of where the application is used and what kind of task needs to be achieved. (Brooke, 2013) Loranger, Schade and Nielsen (2002) also mentioned that with a higher variety of different applications people spent less time with each application. The application must be understandable within moments; otherwise the application would not be used effectively.

Therefore the application needs to be simple in use. The features must be presented in such a way that users instinctively knows what to do with it. Users do not want to spend a lot of time learning how to use an application.

2.3 User requirements

Before further development on V-Next a small survey was conducted under a handful potential users. The survey was about the user requirements that the potential users have with this editor. The potential users were marketers in the financial branch and the e-commerce marketers of Bikkelhart. The following requirements came out that survey:

• A/B testing

• Alter the texts of the question • Alter the information textbox • The style of the form

• Alter the error/validation message • Alter the order of questions • Add or remove answers option • Set a default answer

• Switch the type of control per question

3. RESEARCH QUESTION

The problem of not having a researched interface for an adapting web form application combined with the literature study has led to the following main research question: “What user interaction framework can best be used for an interface, based on adapting web forms to gain user acceptance?“

3.1 Sub questions

These sub questions need to be answered to answer the main question:

1. What design paradigm and principles can be best used?

2. What are the pitfalls of the designs?

3. What is the technical acceptance for each used paradigm and principles?

(4)

3.1.1 Definition technical acceptance Davis (1998) first defined the terms for a valid

measurement scale to predict the user acceptance of new technology. Two main factors influence their decision to use a new information system application of other technology:

• Perceived ease of use. The perceived ease of use is defined as “the degree to which a person believes that using a particular system would be free from effort”.

• Perceived usefulness. The perceived usefulness is defined as “the degree to which a person believes that using a particular system would enhance his or her job performance”.

3.2 Hypothesis

The Application Design Showcase held by Nielsen Norman Group in 2012, points out, that especially in an e-commerce environment people need to see the difference between items. The main target group of the V-Next application are online marketers. The hypothesis is that the direct manipulation paradigm is the best paradigm to be used.

4. PROTOTYPE DESIGN

The main goal of the research is to find an interaction framework for graphical user interfaces specially for adapting web forms. Based on the user requirements three pure human adaption interaction designs are made. The designs are hi-fi prototypes with the basic interaction functions to test the technology acceptance.

4.1 Technical Requirements

V-Next editor is a graphical user interface (GUI), which makes it possible to adapt an existing design.

Figure 1: flow of the tool V-next

As figure 1 shows the editor is only one small part of the application. It is the only part visible for the potential users. The developer still needs to make the first ‘application street’, where the different servers will be connected, the

CSS will be styled and the connection to the production website will be made.

Then the user can edit the ‘application street’, as figure 2 shows.

Figure 2: flow for the user of the V-Next editor.

4.2 Design choices

All the prototypes are based on the same basic principles. The first two prototypes were built with two different paradigms, WIMP and Direct manipulation, to test which of the two can be best used. The third prototype is based on the paradigm in prototype one or two, depending on which one has a better usability according to this usability test. The difference between the paradigms is explained in paragraph 4.2.3 and 4.2.4. Also the feedback of all the participants is taking into account with the design of the third. The third prototype is more focussed on the principles than the paradigm.

4.2.1 Basic principles

The Application Design Showcase listed design mistakes for forms made in cases. They mention that the people are accustomed to left alignment labels and have more difficulties with right-aligned or not-aligned labels. Therefor all the labels in the prototype will be left aligned. The research of Loranger, et all (2002) is specifically done for applications using Flash. It also states that many of the guidelines they researched relate to the core essence web based functionality. Therefore those guidelines can also be used for this application. One of the guidelines is: Highlight selected areas. This guideline is used especially when a question is clicked to edit. See appendix IV point 2 how this looks like in the prototypes. In all the prototypes the question that is editable is the colour grey. The inactive question/buttons will stay white.

Another guideline of the Application Design Showcase is: Provide scrollbars that have an indicator showing the relative position. This is used in all the prototypes to indicate the total length of the form. It is only shown when there is a need for a scroll bar. The guideline of ‘Use

(5)

meaningful icons provided by Loranger is also used to give the participants a faster understanding of the function of a button

Galitz (2007) principle of designing menus states that vertical menus are easier for the user to scan. However vertical menus were not suitable for the top menu bar of the editor. Therefor the menu is made horizontally but (as Galitz states) made more ‘button-like‘ to give a better visual of the possible choses. See Appendix IV point 5.

4.2.2 Prototype one

The first prototype is made by the interaction designer of Bikkelhart and is based on the paradigm: WIMP (Windows, icons, menus and pointers), see Appendix IV point 1. As Dix (2009) states: a human can do multiple things at the same time, therefore a system also needs to be able to manage multiple changes at the same time. In the prototype a question has multiple elements that can be changed within one session. The prototype is more based on menu selection.

According to Loranger, et all (2002) sound and animation in elements can have a positive effect on the usability. Therefore in the first prototype when pressing the button save, the text saved (the time of the saving) will be bold for a couple of seconds. This guideline is described in their paper as: Don’t show gratuitous motion.

Galitz (2007) states that vertical menus are easier to scan to get the right option. Therefore this principle is used in prototype one when activating a question. See appendix IV point 2. The left part of the screen is used to show the questions in the web form. The grey coloured one is active for editing. The right part shows the editing options for the question. The editing options are vertically for better scanning.

4.2.3 Prototype two

The second prototype is based on the paradigm: Direct manipulation. See for example appendix IV point 1. When making changes in the questions as a user there still are menus, text boxes and the users still needs to click with the mouse. The main addition is that the users directly see what the changes indicated and what the result is. Another addition is that all the actions that a user can do are legal and do not give system errors. (Dix, A, 2009)

As mentioned before in an e-commerce environment people need to see the difference between items. Therefore the direct manipulation paradigm should be the best paradigm to use in this editor.

One of the guideline stated by Shneiderman & Plaisant (2003) is that minimal memory load on the user, supports the underlying direct manipulation paradigm. Within prototype one the user needs to memorize the changes and look back at them on another screen. This is not necessary with prototype two where the changes are immediately shown.

4.2.4 Prototype three

The third prototype is primarily based on the most usefull of the first two prototypes: prototype two. This prototype therefore has the paradigm direct manipulation.

Besides the paradigm the third prototype also has changes made based on the feedback of the first two prototypes. The focus is more considerable on the way the principles are applied. See appendix IV point 2,3 and 4 for the changes. The guideline; don’t show gratuitous motion, that is used in Prototype one is also used in prototype three. Not in text but in a button to show that the changes are saved. See appendix IV point 4. There is also more space when a question is active.

To give the participants more options to adjust a question, a setting menu in an active editing state of a question is added, based on the Galitz (2007) principle of designing menus. This menu shows the extra options, to make a question dependable and make an A/B test, in a vertical menu. See appendix IV point 2.

The principle offer informative feedback Shneiderman, B. & Plaisant, C. (2003) is especially used in the element tooltip. This element points out the adjustments in the different versions when making an A/B test. See appendix V point 3.

Rosenholtz, et all (2005) provides the guideline void cluttered displays. This guideline can be used in the active state of editing a question. Give more space and focus to the editing part than to the rest of the editor.

Nielsen & Tahir (2002). Researched the guideline: Minimize the number of clicks or Pages. This is interpret with the position of the activity log, that is changed in prototype three.

5. METHOD

5.1 The usability test setup

The usability test exists of three components, the usability test with predefined tasks, the questionnaire, and at last a short interview of the general experience and the positive and negative feedback on the editor.

The participants start directly with the predefined tasks. It is a Performance Measurement Evaluation observation to evaluate which design is better on the acceptance. The participants get nine concrete user tasks to accomplish, see appendix I and II. If the user needs to accomplish more than ten tasks the observation would take too long and the user will get distracted. (Benyon, 2014) The time of completing the tasks and determining if the task is completed is measured by recording the usability test. The observation will take place in the office of Bikkelhart; where it is possible to use a usability-testing tool, Lookback1_{, which records the screen and the actions done} by the user. Vocal- and facial expression is also recorded.

(6)

During the observation the ‘Thinking aloud protocol’ will be used to get the reasoning behind their actions. The participants will also read the task aloud so the duration of the tasks can be measured more precisely. The recordings make the data processing easier and it can be analysed on a later moment.

The second part of the usability test is a questionnaire. The first part of the questionnaire is the questions for the Brooks System usability scale (2013) to calculate the User Acceptance.

The second part is more general for example, age, gender, education level and skill in technology use. The skill in technology use can be an interesting influence on the acceptance of the editor. The survey is a structured online questionnaire, which makes it possible to compare gathered data.

The last part is a short interview to get their general experience of working with the editor and the positive and negative feedback.

5.2 The observation

5.2.1 The tasks

There are nine different tasks as shown in appendix I and II. These tasks are selected because it takes the participant through the entire editor to give them the full experience. The difficulty of the tasks is build up to let the participants get to know the editor and gradually build up the learning curve.

Prototype one and two have the tasks list in appendix I. Prototype three has the task list in appendix III. Because of the feedback of prototype one and two the place of the activity log is changed. Therefore a sub-task is created to let the participants find the activity log and see if this is a better place.

If the participants were not able to perform a task they got a hint. The hint was not given after a particular time that had past but when the participant said repeatedly that they really did not know where to look.

5.2.2 The data analysis

The observation is recorded in three ways: the voice recording, the screen with the application and the mouse movements together with the participants’ facial expressions. The time taken to complete the task is measured by recording the time they started the tasks until they are certain they finished it. If, for example, they are not certain if the tasks is done and still looking for prove it is done that time is measured too. The participant is asked to read the task aloud to pinpoint the start of executing the task.

The time a participant needs for the usability test give insight in the learning curve and it can be used to see if there is a correlation between a fast task success and the Perceived ease of use. The amount of tasks that succeeded can also have a correlation with the perceived ease of use.

The recorded time also measures the effectiveness and efficiency mentioned in the ISO metrics.

Giving a hint is done to get their opinion of the application and not only to measure the task success. These opinions are used to improve the third prototype and see what the reason is behind the System Usability Scale (SUS) scoring of a participant. The opinions gives us also an insight in the third ISO metric; Satisfaction. The SUS score is explained in paragraph 5.3.

5.3 The survey

5.3.1 SUS question

The SUS score can be calculated by the first ten questions of the survey. See appendix III for the questions. These questions are asked in a 5-point likert scale. The uneven numbered questions are positively worded and the even numbered questions are negatively worded.

5.3.2 General questions

The last five questions are more general question to see if there is another factor that can have influence on the usability and the SUS score. The last question measures the software skills of the participant, this question is copied of a four yearly research by the TNO on the ICT in households (TNO, 2015).

5.3.3 The data analysis

Bangor et all (2008) identified six major usages for the SUS score. Three of those points are according to them:

1. “Providing a point estimate measure of usability and customer satisfaction

2. Comparing different tasks within the same interface 3. Comparing iterative versions of the same system“ These points (and the point that the SUS score) can be used with a smaller target group and still be reliable according to Sauro (2011) are the reasons the usability test is analysed with the SUS score.

The SUS score is calculated by a formula where a score between 1 and 100 can be given. This is not in percentages. In the SUS: Retrospective (2013), Brooke concluded the following meaning (figure 3) behind the SUS score. This is based on different studies about the SUS score.

Figure 3: The meaning of sus score (Brooke, 2013)

This means that when a prototype scored above the 70 the prototype is acceptable to be built.

(7)

5.4 Participants

To test the three different interaction design three groups will be gathered to test each separate design. Each group contains six participants. This number of participants is based on Nielson’s study (2012), were five participants will find most findings, having a larger group of participants will not mean more findings, as figure 4 shows.

Figure 4: how many users for testing usability? Nielson (2012)

To make sure that there were at least five participants, seven people per prototype were asked. For every group six people agreed to take part of the test.

Sauro (2010), states that with five participants you find 85% of the finding that 31% to 100% of the user will encounter. This means that six participants is enough to have a first insight if the interaction framework works. According to Nielson iterative usability testing is better. After the first test the occurring problems can be dealt with. Prototype three can be alert. The second test can validate if the fixes of the first problem are overcome. Also in the second test the users are more likely to discover the remaining problems of the original design in this case prototype two. The downside is that there is no guarantee that the new design does not give new problems to overcome.

The participants are ‘real users‘ of the editor when it is finished. The participants are e-commerce managers of the company Bikkelhart and e-commerce managers, online marketers and product owners of the different customers of Bikkelhart. They do not have pre knowledge of what it is supposed to do. They will be used as participants for the research because they are one of the main users for when the editor is finished. They are not randomly picked because not every customer of Bikkelhart is going to use the new editor. Also there are work political issues that not every potential user is asked. The participants are evenly spread by function, age and gender over the three prototypes.

Because a small group of customers and a particular set of employees of Bikkelhart will use the application the population group is only thirty people. The target group of this research consist of eighteen people, which means that

more than half tested the application and that gives a strong validity.

6. RESULTS

6.1 SUS score

The SUS- scores of the prototypes are as followed:

P1 P2 P3

Average 72,083 83,333 76,250

Table 1: average SUS score of the prototypes

Table 1 shows that with the first round of testing prototype one against two, prototype two scored the best. As Brooke (2013) mentions in his retrospective a score of at least 70 is acceptable. Both prototypes are acceptable, but because the opinion of the participants and the SUS score is better of prototype two, this prototype is used as basis for prototype three.

After the adjustments were made prototype three was tested and the average SUS score is lower than that of prototype two.

Figure 5 shows per prototype the answers of the participants on question one in the SUS score: I think that I would like to use this system frequently, were 1 is strongly disagree and 5 is strongly agree. This shows that the perceived usefulness is higher for prototype two than for prototype three.

Figure 5: want to use system more frequently

The technical acceptance consists of two parts the Perceived ease of use and the perceived ease of usefulness. A lower SUS score indicates a lower technical acceptance. The technical acceptance is in prototype two higher compared to prototype three.

6.1.1 General data

Even though it is a small population the correlation, calculated by pearson correlation coefficient and compared to other larger studies. There is a light negative correlation, but not significant for the influence of age on the SUS score. Older participants tend to give lower Scores. For gender and software skills no correlation is found. Bangor (2008) also found a light negative correlation in age and SUS score and no correlation with gender.

0 1 2 3 4 5 1 2 3 4 5 p ar ti ci p an ts Sus sore

want to use system more frequently

p 1 p 2 p 3

(8)

6.2 Learnability

With the SUS score not only the usability is measured, it also measures the learnability of a system. (Lewis and Sauro, 2009) The learnability score is measured by two of the ten questions, 4 and 10. For the prototypes the learnability is as followed:

Prototype Average

1 66,667

2 85,417

3 83,333

Table 2: average learnability of the prototypes

The learnability for prototype two and three is excellent according to table 2. Prototype two has also here a higher score than prototype three. Even though there were more tasks not successfully done, See appendix V for task success and the spend time on every task. If the task is not successfully done and a hint was needed the time is marked in red. Even though there are less tasks successfully done the perceived ease of use and usefulness is higher in prototype two.

In prototype two were not only more hints needed also the average time of competing all the tasks is higher than that of prototype three. 13.54 minutes of prototype two to 9.53 minutes of prototype three. The general feedback of the second prototype was that they all needed to get familiar with the editor.

The general feedback with the third prototype was that they could find everything quick and only had more trouble with task 7 and 8. Those were the most difficult tasks in the test. With a closer look to the spend time (appendix V) the effectiveness and the efficiency discussed in the ISO/IEC ISO 9241-11 metric in prototype three is higher. There is less time needed for completing a task and there are less non-successful tasks. Also fewer hints were needed so the efficiency en efficiency is higher in prototype three. The perception of the user is not in line with the task success.

6.3 The editor

6.3.1 Direct manipulation

As mentioned earlier the paradigm Direct Manipulation is chosen as the paradigm in prototype three. In appendix IV point 1 the difference between the paradigm WIMP and Direct manipulation is shown.

The main negative feedback in prototype one was that they want to preview the changes that they make before saving the changes.

One other point mentioned by more than half of the participants is that too much space used for all the options that can be altered in a question. The third frequently mentioned feedback is that they do not want to click the save button with every alteration. This makes editing slower.

In prototype two the changes are immediately shown and the editor is presented as is would be on a website. That makes the paradigm direct manipulation better suited for prototype three.

The more technical skilled participants stated that they had a lot of options to alter; the less technical participants stated that they would not use half of it.

Not all the feedback was negative on the WIMP paradigm. The animation for saving alterations is positive.

Therefore prototype three is chosen to use the direct paradigm with a menu for the settings that are used less and an animation for the alterations.

6.3.2 Active state of a question

There is too much space used in prototype one when a question is active for editing, according to the feedback. The feedback for prototype two was the opposite. The space that is used was to narrow, see appendix IV point 2. In prototype two not al the options that can be altered are shown, only the commonly used and even two of those were hidden behind a button which would open the options. Therefore for prototype three more space is used for the active state of a question and the two options, altering the question type and the answer option, were not behind a button. The feedback on this was that even though the options are open the participant would even like more space when altering the question.

In all the prototypes the way the question highlighted when active was good.

6.3.3 Activity log

When testing prototype two, four of the six participants pointed out that even though they see the alterations, they want to know if it is saved or not. Within Prototype two it is not necessary to specifically save changes, every adjustment is automatically saved. There is an activity log to show the changes (appendix IV point 4) but it is not shown within one mouse click. In prototype one and two the activity log has the same position. The participants in prototype one and two found the position of the activity log strange. It is shown as a button in the website and not as a button in de editor. Especially the participants of prototype two expect that, because the changes are automatically saved the activity log is within one mouse click reachable for there own assurance.

Therefore in prototype three an activity log button is added (appendix IV point 4). This button has an hourglass as a pictogram and lights up when an adjustment is made. The participants did not see the animation until they were at task 3.b and needed to look for it. After they knew where to look they noticed the animation with the other tasks. The feedback of the participants was that the button is oddly placed. It is within the form frame and they expected it outside in the top menu. They also expected that the button would be bigger and/or that the animation would be more obvious.

(9)

6.3.4 A/B testing

Two of the tasks were to make an A/B test and to set the winner as the new basis form. These tasks were the most difficult and placed at the end of the usability test. The overall feedback in all the prototypes was that the interaction was too complex. Especially setting the winner of the A/B test as the basis form. Seven participants did not succeeded in this task without a hint or two.

It is not clear to all users where the A / B test can be set. In appendix IV point 2 the place of setting the A/B button is shown (‘test aanmaken’) In prototype one, the button was placed at the end of the list of editable changes. Especially the marketers of Bikkelhart were negative about the place of the button. Performing A / B testing is the most important option for the marketers and needs to be more prominent on screen.

In prototype two the button was placed as an icon that is shown immediately when activating the question for editing. The participant in prototype two were quite quick in finding the button (see appendix IV for the icon and the place) The feedback here was that when there are more options possible and shown in the same way, for example making questions dependable of each other, this would be too chaotic.

In prototype three a menu behind a setting icon shows the option for making an A/B test and making a question dependable (see appendix IV). The A/B test has the same icon as prototype two because of the positive feedback. The participants here knew what the setting icon meant and found it a logical place for adding functionality to the question.

Their biggest negative feedback was that the difference between the Test A question and the test B question was not clear. See appendix IV point 3 to see how the difference is shown. They found that the icon difference was not strong enough between version A and version B but they did like the tooltip that showed the difference between the questions without opening the question itself.

The vocabulary of the button (‘implement test’) that actually set the winning test as basis was too confusing. The participants found it almost deceptive. They thought that the button would let the test run even though it was in the description of the tasks that this only delivers the two forms with URL that is needed.

7. DISCUSSION

When usability testing with five users only 85% of the problems that affect 31%-100% of the users will be detected for this population and set of tasks. (Sauro, 2010) This means that there are other problems that can influence the usability of the application. The third prototype is based on the second prototype and can be seen as an iterative design so there are twelve participants that tested the interaction framework, which gives according to Nielsen (2012) almost 100% of the usability problems that can be found. It is expected that prototype three has a better SUS

score after the iteration; in this case it didn’t happen. This might be because of the problems that the participants have found are not only the superficial problems but the more in depth functional problem that could have a higher negative effect with the participants. The different problem in the A/B testing in prototype three than in prototype one and two is an example of a more in depth functional problem. When testing with prototypes there are always limitations. Another point is that the prototypes don’t have a “fancy” design; the used colours are white, black and grey. Some of the participants verbally acknowledge that in using the editor this distracted them, which might have a negative influence on the overall opinion of the participants. Even when the SUS score is reliable and valid with a smaller sample size one critical participant can have greater influence on the meaning of the SUS score than with a larger sample size. In combination with the limitations of testing with a prototype this can give a lower mean in SUS score. Tullis and Stetson (2004) state that the best outcome with the SUS score is when the sample group is 12. This is probably what happened in the third test group, more participants told that the lack of “fancy“ design distracted them.

8. CONCLUSION

The participants want to see the changes they make immediately. They gave it in their feedback and it is shown in the SUS score of prototype two and three. This means that hypothesis is proven and the interaction framework of the application needs to be developed with the direct manipulation paradigm in mind.

The changes that are made for prototype three did not have the expected outcome. It was expected that prototype three would have a better interaction and therefore a higher SUS-score and more positive feedback. This means that the changes made to the third prototype are not the fixes in the right way for the users.

As mentioned in the discussion the target group for the third prototype revealed the limitations of testing with a prototype more. This does not mean that the third prototype cannot be used as a basis for the next iteration. The

effectiveness en efficiency is higher in prototype three than in prototype two and after the iteration on prototype two more in depth functional problems can come to light. There were principles with positive feedback in prototype three; only the manner they were implemented in the interaction was not always the best for higher user acceptance. One of the elements that were positively changed is the position of the activity log. Thus the principle of: Minimize the number of clicks or Pages is in prototype three better.

The button and its animation is (according to the feedback) ‘okay’, when you know the corresponding meaning but it is not clear at first glance. The guideline: Don’t show gratuitous motion, is needed in the interaction framework.

(10)

The guideline of Highlight selected areas is used in the right way to indicate the active state of editing a question. The tasks 7 and 8 are the hardest task to do in the editor. The feedback was that the vocabulary that is used in the editor needed to be simpler and that the difference between version A and version B need to be clearer. The vocabulary adjustment falls under the guideline of Loranger (2002) Speak the user’s language.

The tooltip that provides information about what is different between the A and B version is a good addition. Also the interpretation of the menu design where to find the A/B test button in prototype three got the most positive feedback.

The space used for editing can be less cluttered and therefore the guideline avoid cluttered displays can be better used for the active state of a editing a question. This means that the paradigm of the interaction framework needs to be: Direct manipulation. The principles that are most important to the usability are the principle and guidelines of:

• Minimize the number of clicks or Pages • Do not show gratuitous motion

• Highlight selected areas • Speak the user’s language • Menu design

• Offer informative feedback • Avoid cluttered displays

But not all the principles are used in the right way. Even though all the prototypes were acceptable enough to be built in reality. None of them are the one that will be further developed.

9. FUTURE WORK

The next step is to test a fourth graphical prototype. The changes made in the fourth prototype need to lead to a higher user acceptance and at least the same effectiveness and efficiency than prototype three.

In the participants feedback of prototype three came to light that the animation for the savings was not in a prominent place for the user. This is one example that can be improved within the editor. Another improvement is the manner of showing the different elements in the form when an A/B test is activated.

The usability test needs to have the same method as this research. All the participants of the first three prototypes can test the fourth prototype. With a larger sample size the SUS score will be more reliable and the problems that have a lesser percentage to occur will also be encountered. (Sauro, 2010) When the test results are at least 68 and rather 83 (the results of prototype 2) can the editor be built. The editor can be constantly improved from that point on. Using the approach of prototyping before building (Dix, et all, 2005), iterative design and test strategy. The different

principles can be tested further to get the optimal mix for the target group.

In addition, there are new features for the editor that can be added, for example a second target group. Not only marketers can use the editor also information analyst and programmers can use the editor. This target group is more technically enhanced and therefor the prototype needs to be able to accomplice more technical tasks. For instance to change the business rules of the form, add questions and answer options, without breaking the main code of the web application.

10. ACKNOWLEDGMENTS

I would like to express my gratitude to my thesis mentor Radboud Winkel of the Universiteit van Amsterdam for his guidance and patience during the research and the writing of the paper.

Also I would like to show my gratitude to the second reader Frank Nack who also guided me in the beginning of the thesis design.

11. REFERENCES

Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An empirical evaluation of the system usability scale. Intl.

Journal of Human–Computer Interaction, 24(6), 574-594. Benyon, D. (2014). Designing Interactive Systems: A comprehensive guide to HCI, UX and interaction design, 3/E.

Brooke, J. (2013). SUS: a retrospective. Journal of

Usability Studies, 8(2), 29-40.

CBS, (2015) ICT kennis en economie 2017. Retrieved

July

20,

2017,

from:

https://www.cbs.nl/-/media/_pdf/2017/26/ike2017-web.pdf

Davis, F. D. (1989), "Perceived usefulness, perceived ease of use, and user acceptance of information technology", MIS Quarterly 13 (3): 319 - 340, doi: 10.2307/249008 Dix, A. (2009). Human-computer interaction (pp. 1327-1331). Springer US.

Galitz, W. O. (2007). The essential guide to user interface design: an introduction to GUI design principles and techniques. John Wiley & Sons.

Hansen, W. J. (1971), User engineering principles for interactive systems. In Proceedings of the November 16-18, 1971, fall joint computer conference (pp. 523-532). ACM. Hinckley, K. (2017, April). A background perspective on touch as a multimodal (and multisensor) construct. In The Handbook of Multimodal-Multisensor Interfaces (pp. 143-199). Association for Computing Machinery and Morgan & Claypool.

ISO/IEC ISO 9241-11. Retrieved January 30, 2016, from:

http://usabilitygeek.com/usability-metrics-a-guide-to-quantify-system-usability/

(11)

Lewis, J., & Sauro, J. (2009). The factor structure of the system usability scale. Human centered design, 94-103. Loranger, H, Schade, A and Nielsen, J. (2002) Website Tools and Applications with Flash. Design guidelines based on user testing of 46 flash tools.

https://media.nngroup.com/media/reports/free/Website_To ols_and_Applications_with_Flash.pdf

Mandel, T. (1997). The elements of user interface design (Vol. 20). New York: Wiley.

Nielsen, J. (2012). How Many Test Users in a Usability Study? Retrieved February 1, 2016, from:

https://www.nngroup.com/articles/how-many-test-users/ Nielsen, J., & Molich, R. (1990, March). Heuristic evaluation of user interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 249-256). ACM.

Nielsen, J. & Tahir, M. (2002). Homepage Usability: 50 Sites Deconstructed. Indianapolis, IN: New Riders Publishing.

Rosenholtz, R., Li, Y., Mansfield, J. & Jin, Z. (2005). Feature congestion: a measure of display. CHI 2005 Proceedings.

Sauro, J. (2010). Why you only need to test with five users (explained). Measuring Usability.

Sauro, J. (2011). Measuring usability with the system usability scale (SUS).

Shneiderman, B. & Plaisant, C. (2003). Designing the user interface. Pearson Education India.

TNO, (2015). Privacy beleving op het internet in Nederland. Retrieved 2 February, 2016 from: https://www.tno.nl/nl/zoeken/?q=privacy%20beleving%20 op%20het%20intern

Tullis, T. S., & Stetson, J. N. (2004). A comparison of questionnaires for assessing website usability. In Usability professional association conference (pp. 1-12).

U.S. Department of Health & Human Services. HHS Web Standards and Usability Guidelines. Retrieved from:

https://webstandards.hhs.gov/guidelines/

Whitenton, K, et all (2012). Application Design Showcase: 2012. Retrieved from:

(12)

APENDIX I: Tasks prototype 1 & 2

taak 1: inloggen

log-in met: test@testverzekering.nl.

(Wat zou je doen als je het wachtwoord niet weet?) Taak 2: Helptekst (tooltip)

a. Zet de helptekst bij de ‘geboortedatum partner’ uit.

b. Zet daarna de Helptekst aan bij het invoerveld ‘E-mail’ met de omschrijving: ‘Wij sturen alle informatie per e-mail’. Taak 3: Vraag Aanpassen 1

De vraag ‘Wat is uw geslacht’ moet worden veranderd in ‘Geslacht'. taak 4: Algemene instellingen

Verander in de Algemene instellingen van de aanvraagstraat de variatie naam van ‘Origineel’ naar ‘Versie-A’. taak 5: antwoord waarde eruit halen

Haal antwoord waarde ‘Mijn kind’ weg uit vraag ‘Wie rijdt het meest in de auto?’ Taak 6: Volgorde vragen

Wissel de vragen ‘Aantal schadevrije jaren’ en ‘Aantal kilometers per jaar om’. Taak 7: Maak een A/B test

Voor vraag ‘Hoe wilt u betalen?’ moet er een A/B test komen met de naam: Versie-B. In die versie moet de antwoordoptie veranderd worden naar Radiobuttons horizontaal.

Taak 8: Implementeren

Implementeer de A/B test variatie: Versie-A. Taak 9: Uitloggen

Log uit

(13)

APENDIX II: Tasks prototype 3

Taak 1: inloggen

log-in met: test@testverzekering.nl.

(Wat zou je doen als je het wachtwoord niet weet?) Taak 2: Helptekst (tooltip)

a. Zet de helptekst bij de ‘geboortedatum partner’ uit.

b. Zet daarna de Helptekst aan bij het invoerveld ‘E-mail’ met de omschrijving: ‘Wij sturen alle informatie per e-mail’. Taak 3: Vraag Aanpassen 1

a. De vraag ‘Wat is uw geslacht’ moet worden veranderd in ‘Geslacht'.

b. Controleer of alle veranderingen tot nu toe zijn doorgevoerd in de activiteiten log. taak 4: instellingen

Verander in de funnel instellingen van de funnel de variatie naam van ‘Origineel’ naar ‘Versie-A’. taak 5: antwoordoptie eruit halen

Haal antwoordoptie ‘Mijn kind’ weg uit vraag ‘Wie rijdt het meest in de auto?’ Taak 6: Volgorde vragen

Wissel de vragen ‘Aantal schadevrije jaren’ en ‘Aantal kilometers per jaar om’. Taak 7: Maak een A/B test

Voor vraag ‘Hoe wilt u betalen?’ moet er een A/B test komen met de naam: Versie-B. In die versie moet de antwoord categorie veranderd worden naar Radiobuttons horizontaal.

De A/B test is via VWO onder de doelgroep getest en nu moet er weer één versie komen van de funnel. Taak 8: Implementeren

Implementeer de A/B test variatie: Versie-A Taak 9: Uitloggen

Log uit

(14)

APENDIX III Questionnaire

Usability test editor Flexfunnel

U heeft net de usability test gedaan en daar heb ik nog een aantal vragen over.

Deel 2 Applicatie gebruik

Dit zijn een tiental vragen om duidelijk te krijgen hoe het gebruik van de applicatie beviel? 1. Ik denk dat ik dit systeem graag regelmatig wil gebruiken

1 2 3 4 5

Sterk mee oneens 0 0 0 0 0 Sterk meen eens 2. Ik vond het systeem onnodig complex

1 2 3 4 5

Sterk mee oneens 0 0 0 0 0 Sterk meen eens 3. Ik vond het systeem makkelijk te gebruiken

1 2 3 4 5

Sterk mee oneens 0 0 0 0 0 Sterk meen eens

4. Ik denk dat ik ondersteuning nodig heb van een technisch persoon om dit systeem te kunnen gebruiken

1 2 3 4 5

Sterk mee oneens 0 0 0 0 0 Sterk meen eens 5. Ik vond dat de verschillende functies in dit systeem erg goed geïntegreerd zijn

1 2 3 4 5

Sterk mee oneens 0 0 0 0 0 Sterk meen eens 6. Ik vond dat er teveel tegenstrijdigheden in het systeem zaten

1 2 3 4 5

7. Ik kan me voorstellen dat de meeste mensen zeer snel leren om dit systeem te gebruiken

1 2 3 4 5

Sterk mee oneens 0 0 0 0 0 Sterk meen eens 8. Ik vond het systeem erg omslachtig in gebruik

1 2 3 4 5

Sterk mee oneens 0 0 0 0 0 Sterk meen eens 9. Ik voelde me erg vertrouwd met het systeem

1 2 3 4 5

(15)

10. Ik moest erg veel leren voordat ik aan de gang kon gaan met dit system

1 2 3 4 5

Deel 3 Algemene vragen

De laatste vijf vragen zijn algemene vragen. Deze vragen worden gebruikt om een goed beeld te krijgen van de doelgroep. Wat is je geslacht?

0 Man 0 Vrouw

Wat is je leeftijd? …

Wat is je hoogst genoten opleiding?

0 Lager onderwijs (basisschool, vmbo/mavo, onderbouw havo/vwo) 0 Middelbaar onderwijs (bovenbouw havo/vwo, mbo 2/3/4)

0 Hoger onderwijs (hbo, wo, promotietrajecten) Wat voor functie op het werk heb je?

…

Geef aan wat je op software gebied kan:

0 een programma voor tekstverwerking gebruiken, zoals Word. 0 een spreadsheetprogramma gebruiken, zoals Excel.

0 een programma gebruiken om foto’s, video’s of geluidsopnamen te bewerken.

0 presentaties maken met software zoals Powerpoint of Prezi, waarin bijvoorbeeld tekst, afbeeldingen, tabellen of grafieken zijn opgenomen.

0 geavanceerde Excelfuncties voor data-analyse gebruiken, zoals gegevens sorteren of filteren, formules gebruiken of grafieken maken.

0 computerprogramma’s schrijven in een programmeertaal.

Einde

(16)

Appendix IV Examples of the prototype

1. WIMP versus Direct manipulation Prototype 1: WIMP

(17)

2. Active state of a question and making an A/B test

In prototype 2, the button to make the A/B test is

(18)

3. Difference in A/B test

(19)

4. Activity log Prototype 1 & 2

The activity log is on the right side of the screen. The left side is how to set a version as basis form.

Prototype 3

(20)

(21)