Color in Usability of Mobile Applications

(1)

Abstract

This study looks at the role of color in the usability of mobile applications. Although the importance of usability of mobile applications increases and color has been proven to an influencer of behavior and emotion and therefore suitable for usability, little research has

been done into this specific field. A

literature review describes the various

components and color theories in which

color can play a role in usability.

Hypotheses regarding consistency,

feedback, representation, and visibility have been formed upon the findings. These hypotheses have been tested through A/B testing using usability metrics. The results show that color can affect the time it takes for a user to finish a task, whether a user is able to fulfill that task and the user’s feeling of satisfaction about the design. When designing an application, attention must be paid to the harmony, wavelength, associations, luminance ratio, amount

and distribution of colors. Since

previous research is lacking and the utility of color in the usability of mobile applications has been proven, this study

forms the beginning of theory formation

of this subject and a basis for further

research.

Color in

Usability of

Mobile

Applications

A.L. TOL

10720367

Bachelor Thesis

Information Science

Supervisor

DR. D. HEINHUIS

Second Examiner

DRS. A.W. ABCOUWER

University of Amsterdam

Faculty of Science July, 2018

(2)

List of Figures

Figure 1, Primary Colors of the Visible Light 8

Figure 2, Overview of Harmonious Color Schemes 9

List of Tables

Table 1, Comparison of Design Principles into Six Categories 5

Table 2a, Group Statistics of Task Time and Satisfaction for Hypothesis 1 20

Table 2b, Independent Samples Test of Task Time and Satisfaction for Hypothesis 1 20

Table 3, Chi-Square Test of Success Rate for Hypothesis 1 20

Table 5, Fisher’s Exact Test of Success Rate for Hypothesis 2 21

Table 15a, Chi-Square Test of Success Rate for Hypothesis 7 26

Table 15b, Crosstabulation of Success Rate for Hypothesis 7 27

(4)

1. Introduction

The most recent estimate of smartphone users in the world comes from a media measurement company called Zenith. Their research, which included 65% of the world population, concluded that 63% of adults are in possession of a smartphone (Zenith, 2017). According to this research, the Netherlands has the highest percentage of adult smartphone users in the world. CBS (2018) puts this number at 89% in 2017, including children in the age category from 12 to 18. Since smartphones are more portable than laptops, desktops and even tablets, internet browsing has shifted from desktop to smartphone, with the smartphone exceeding the desktop in October 2016 (StatCounter, 2016). The same applies to mobile commerce, which according to Chen & Chen (2012), is rapidly increasing. This research confirms the work of Luke Wroblewski (2011), who already claimed that the world was becoming ‘mobile first’ in 2011.

Consequently, as internet usage is shifting from desktop to smartphone, the usability of mobile information systems becomes more important. Nielsen (2003) names usability on the web “a necessary condition for survival”. Not only does usability benefit users of the system, but operators of the system, often businesses, as well. Optimizing usability has proven to increase sales, customer loyalty, and satisfaction, improve products and services, and reduce variable costs (Speicher, 2017). Nielsen’s explanation for this phenomenon is rather simple: “if users cannot find the product, they cannot buy it either” (Nielsen, 2003).

The majority of usability research used today dates from before the first mobile phone with color-display came on the market (McCarty, 2001). Researchers who have taken color into account ever since, have mainly focussed on the issues of color-blindness and color-deficiency (Tognazzini, 2003; Chaparro & Chaparro, 2017), and color-harmony and aesthetics (Brady & Phillips, 2003; Tokumaru et al., 2002). Color can affect emotion (Gong et al., 2017), perception (Spence et al., 2010) and behavior (Nitse et al., 2004). Poor color choices can therefore not only displease the user but also evoke unwanted behavior or emotions. This can cause usability to decrease. Despite the importance, little research has focussed on the color aspect of usability in mobile applications. Therefore, the goal of this study is to bridge that gap.

The value of this study is twofold. On the one hand, it has academic value since it offers a deepening of existing usability research by focussing on one specific aspect; color. On the other, it has practical value since the results of this study can contribute to the improvement of the usability of mobile applications to benefit both the user and the operator.

The research question that will be central to this thesis has been formulated as follows: _{to what}

extent can the usability of mobile applications be improved by the use of color? To answer this research question, a number of sub-questions must first be answered:

1. Which factors influence usability of user interfaces?

2. What specifications of mobile devices should be taken into account for? 3. What color theories can be applied to the usability factors?

4. Can the advantage of color in usability be empirically proven?

In the theoretical framework, the first three sub-questions will be answered, and hypotheses will be formed upon these findings. Subsequently, these hypotheses will be confirmed or rejected through an empirical experiment. A conclusion will be drawn upon the results.

(5)

2. Theoretical Framework

2.1 Usability

Nielsen & Molic (1998) define usability as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use”. The International Standards Organization that creates standards for products, services, and processes to make sure they fit their purpose, agrees on this definition by naming effectiveness, efficiency, and satisfaction as factors for usability in their ISO standard 9241-11 (ISO, 1998).

Effectiveness, efficiency, and satisfaction can also be seen as goals. Many researchers have drawn up principles which should be taken into account while designing an interface to meet these goals. Consequently, multiple studies have made a comparison between these principles. Studies by Campos et al. (2015), Sivaji et al. (2011) and Koshiyama et al. (2015) have in common that they compare heuristics by Donald Norman, Ben Shneiderman, and Jakob Nielsen. In his book User centered system design: New perspectives on human-computer interaction, that came out in 1986, Norman and his co-author Draper describe six design principles to make interfaces easier to use and more intuitive. In that same year, Shneiderman published his eight golden rules for interface design, of which a sixth edition is currently available (Shneiderman et al., 2016). Years later Nielsen released his 10 usability heuristics for user interface design (1995), which has been referenced the most over the following decades. Table 1 shows a comparison of these three sets of design principles into six categories.

Table 1, Comparison of Design Principles into Six Categories

Nielsen Norman Shneiderman Consistency Consistency and standards Consistency Strive for consistency

Error Error prevention Help users recognize, diagnose, and recover from errors Offer simple error handling Feedback Visibility of system status Help users recognize, diagnose, and recover from errors Feedback Design dialogs to yield closure Offer informative feedback Representation Match between system and real world Mapping Affordances User control Flexibility and efficiency of use User control and freedom Constraints Enable frequent users to use shortcuts Permit easy reversal of actions support internal locus of control Visibility Recognition rather than recall Aesthetic and minimalist design Visibility Reduce short-term memory load However, table 1 does not include Nielsen’s usability heuristic ‘help and documentation’ since this is an optional solution in case of malfunction of the usability design. It includes ‘Help users recognize, diagnose, and recover from errors’ twice because this heuristic is twofold. On the one

(6)

hand, it implies feedback by helping the user to recognize and diagnose an error, and on the other, it belongs to the error category because of the recovery element. The principles will be further explained through the six categories.

Consistency. All visible and invisible structures of an application need to be consistent. An example of the visible structures is the elements on the screen, such as headers, links,

typography, and corporate identity. Tognazzini (2003) emphasizes that the invisible structures, such as user flow, are as important. Not only does an application need to be consistent

throughout its domain, but it also needs to be consistent with the user’s expectations (Blair-Early & Zender, 2008).

Error. Errors should be prevented at all times. However, this might not always be possible. In case of an error, the user should be able to recognize, diagnose and recover the error (Nielsen, 1995). Norman (2013) divides errors into slips and mistakes, with slips being the user’s fault, mostly caused by inattention, and mistakes being the systems fault. Errors can be prevented by setting constraints, offering suggestions, choosing the right defaults and using forgiving

formatting (Nielsen, 2015).

Feedback. According to Hogue (2013), feedback should give information about the user’s location, current state, previous state(s), outcome, and, future state(s). Also, feedback should provide information about errors. Feedback should be user-oriented and therefore be written in the human language (Tognazzini, 2003). Feedback can be visible, verbal, tactile or a

combination of these (Norman, 1986).

Representation. Representation refers to the relationship between the virtual and the physical world. Controls should match elements from the physical world (Norman, 1986), the

appearance of an element should imply its function (Norman, 1999) and “words, phrases and concepts should be familiar to the user” (Nielsen, 1995).

User control. The user should be able to freely and efficiently use the application without entering a prohibited state. Both, experienced and inexperienced users, should be able to be in control of the system (Nielsen, 1995). An option to return to a previous state should always be available, and users should be able to tailor frequent actions, like through shortcuts, to make the application more efficient to them (Nielsen, 1995; Shneiderman, 2016).

Visibility. Needed information, options, actions and objects should always be visible to the user. Likewise, the number of elements presented to the user should be minimized to reduce the user’s memory load. Shneiderman (2016) states that one “is only capable of maintaining around five items in short-term memory at one time”, therefore the items appearing on the screen should be reduced to a minimum. Irrelevant items diminish the visibility of relevant ones (Nielsen, 1995) and if items are not visible, users will not be able to use them (Norman, 1986). It is essential to appoint that usability is cultural dependent; no one generic solution fits all systems and all users universally (Culley & Madhavan, 2013). This means that these principles form a tool rather than a solution. The tool can be used to create and evaluate a design with the user being involved in the process. A more extensive test panel is necessary to develop a successful system (Faulkner, 2003).

In short, six categories of factors influence the usability of user interfaces. The six classes are consistency, error, feedback, representation, user control, and visibility. These categories can be used as a tool to design effective, efficient and satisfactory interfaces.

(7)

2.2 Usability of Mobile Applications

For decades the internet was only accessible by a desktop computer (Leiner et al., 2009). Consequently, websites were built to fit the proportions of a desktop screen. When the

smartphone was introduced, users were forced to use sites that were not appropriately suited for their device. Punchoojit & Hongwarittorrn (2017) express their concern about this

phenomenon and state that “the limited screen size has caused a design challenge to

information display patterns and effectiveness of applying desktop designs to mobile platform unsettled”. In his book ‘mobile first’, Luke Wroblewski (2011) explores the consequences of a mobile-first world. He claims that ‘mobile forces you to focus’, meaning the smaller screen size of a smartphone forces the developer to restrict the elements on the screen. A smaller screen size also means a higher interaction cost, meaning the amount of interaction needed for the user to meet their goal, and relying on the short-term memory of the user; not all information can be present without overloading the page (Budiu, 2015). As a result, setting priorities is one of the essential steps when designing a mobile application.

Moreover, setting priorities and minimizing content is also necessary for a touchscreen. The touchscreen is an example of best mapping; the controls are directly attached to the system that users are controlling (Norman, 2013). This makes touchscreen intuitive to use. However, the confined space for controls can make the controls complicated. With the iPhone, up to nine finger movements can be distinguished (Hutsko & Boyd, 2013). Touchscreens also complicate designing. The usage of a keyboard and a mouse can be precisely tracked, while touchscreens have hard to distinguish finger controls and can be less easy to use because of the crowdedness of buttons. Proper affordances and haptic feedback are therefore crucial (Budiu, 2015).

As discussed, the mobile-first world is a direct consequence of portability. People are online every moment of the day, and everywhere they go. Since most mobile applications rely on the internet and internet is now practically available everywhere, apps can now be used at any given time. The downside to portability from a developer standpoint is the interruptions it causes. Since mobile devices are used in a variety of contexts, users are often interrupted by their environment. The average duration of a session on a desktop lasts twice as long as one on a smartphone. As a result, an application’s design needs to take into account for interruptions (Budiu, 2015).

In conclusion, a few specifications of mobile devices should be taken into account for when designing a mobile application. First, the smaller screen size causes a restriction of elements and therefore a higher interaction cost. In other words, since fewer elements can present at the same time, user’s need to interact with the system to acquire not visible elements. Second, touchscreens can both improve and deteriorate the usability of mobile applications. Last, mobile applications are used everywhere and every time with interruptions as a side effect.

2.3 Color in Usability of Mobile Applications

In this section, the previously discussed factors of usability will be linked to color, taking the benefits and limitations of mobile devices into consideration. This study will include

consistency, feedback, representation, and visibility, leaving error and user control out. Error and user control are part of the applications background processes and have no visible

expression. Since color is a visual aspect, color can play no role in error and user control. To gain a better understanding, a few color principles will first be explained.

(8)

Color, or _colour, can be defined as “the property possessed by an object of producing different sensations on the eye as a result of the way it reflects or emits light” (Colour, n.d.). In other words, when light hits a particular object, this object will absorb all colors except for the color that humans perceive it as. Colors are an interpretation of electromagnetic radiation by the brain, within the visible spectrum which lies between 390 and 700 nanometer wavelength (Starr & Starr, 2006). Each color has its wavelength, see figure 1. Light enters the eye through the cornea and is received by the two receptors, rods, and cones, of the retina. Rods are used for the reception of light, whereas the cones are used for the reception of the colors red, green and blue (Hecht, 1937). These three colors form the primary colors. From these three colors, all colors can be mixed (Hayter, 1830). Combining two colors from the primary colors will result in secondary colors. Similarly, mixing two secondary colors produces tertiary colors.

Wavelength in Nanometers Figure 1, Primary Colors of the Visible Light

Color can be divided into additive (pigments) and subtractive (light) coloring. Since the focus of this study is on mobile applications, only subtractive coloring will be discussed. A property of subtractive coloring is that by mixing all colors, white light is created. This principle forms the base RGB color model, which is used for the pixels in a smartphone. RGB stands for Red Green Blue, and to mix other colors, each of the primary colors can be put on a value from 0 to 255. A flaw of this model is that colors darker than 0 and brighter than 255 cannot be represented. Besides RGB, a few other concepts need to be explained to make an in-depth statement on color in the usability of mobile applications; luminance, hue, shades, and tints. First, luminance can be described as the “effectiveness of the various stimulus wavelengths in evoking the perception of brightness” (Kim, 2010). Luminance and brightness are strongly correlated, but not the same (Stone, 2016). Luminance can be measured in L*, which ranges from 0 to 100. An L* of 100 in the additive color model RGB, is white whereas 0 is black. Colors like yellow (98), cyan (91), green (88), magenta (60), red (54) and blue (30) range in between (Kim, 2010). Second,

Fairchild (2013) describes hue as "the degree to which a stimulus can be described as similar to or different from stimuli that are described as red, green, blue, and yellow". Last, shades are colors mixed with a set quantity of black, while tints are mixed with white (Mollica, 2013). 2.3.1 Consistency

Chimera & Shneiderman (1993) have found that color consistency leads to fast performance and high satisfaction. To create color consistency in a design, a color scheme needs to be established and should be used consistently throughout the design. Two design principles should be

followed. First, all elements of the same kind should be consistent with one another. User interface elements of a mobile application consist of among others headers, texts, titles, links, buttons and input fields. For example, research among elderly showed that buttons were better and faster understood when the button was monochrome compared with multicolored buttons (Sha et al., 2017).

The second principle states that color combinations used in applications should have a

harmonious feeling. Much research has been done into color combinations, and over the years several theories about color schemes have been formed. First, the analogous color scheme

(9)

consists of hues that are adjacent to each other in the color wheel. Second, the complementary color scheme is based on colors which are on opposite sides. The triadic color scheme forms a triangle on the color wheel, while the tetradic forms a rectangle, and the squared scheme forms a square. Last, the split-complementary is a combination of the complementary and the triadic scheme; it takes one color from one side of the color wheel and two on the opposite (Rhyne, 2016). For an overview of the color schemes, see figure 2. Not deviating from the chosen color palette, leads to consistency.

Analogous Color Scheme Complementary

Color Scheme

Triadic Color Scheme

Tetradic Color Scheme Squared Color scheme Split-Complementary

Color Scheme Figure 2, Overview of Harmonious Color Schemes

2.3.2 Feedback

Feedback about the user’s location, system states and errors should always be available to the user. This section will discuss these three aspects, starting with location. Since mobile users experience frequent interruptions, the current location should be visible immediately by returning to the application. Highlighting the user’s location through color, allows the user to process the information faster than text would do. A frequently used method is to highlight the active page in the navigation. This method can only be used on a global menu, containing no more than six items (Budiu, 2017). An expert in the field of user experience recommends one base color for all navigational items and the accent color, as discussed in section 2.3.1, as a highlight (Babich, 2016a).

Second, the system status can be visualized in a variety of ways. Regardless of the form, the visualization must meet two conditions; movement and determination (Babich, 2016b). In other

(10)

words, the visualization should have a begin state and an end state, and the user must see the transitioning of states. Color hue can be used to represent a quantity (Norman, 1991) and can, therefore, be used to visualize the system status. Shneiderman (2016) suggests using a change of color to indicate a change of status. In addition, Gorn et al. (2004) found that by showing users a relaxing color, such as blue, the time perception of the user changes. Relaxing colors make the user think that the website is downloading faster. Consequently, they like the website better.

Last, color can be used in feedback in case of an error. An effective method, as used in research of Novak et al. (2013), is the traffic light concept. In this case, green was associated with

positivity; yellow showed caution and red implied fatal flaws. In interaction design, this concept can be found in Google’s mail sign-up interface (Google, n.d.). Password strength ranges from weak, to fair, to strong and is visually presented to the user using the colors from a traffic-light; green, yellow and red. Also, invalid input results in a red border around the input area,

accompanied by textual feedback in the same color. Color associations can explain the

effectiveness of this concept. Due to a longer wavelength, red causes feelings of arousal and alert the user, while green, with a shorter wavelength, cause people to relax (Walters, 1982).

2.3.3 Representation

Representation is about linking the real world to elements and structures in a digital world. Colors from nature can be used to create a harmonious color palette. This technique can be added to the previously discussed techniques in section 2.3.1. Colors in nature appear naturally, and they are associated in the human mind with natural and harmonious (Rhyne, 2016).

Likewise, every individual color in a scheme does have its associations. Red, for example, can be linked to joy (Soldat et al., 1997), anger (Fetterman et al., 2011) and sexual emotions (Elliot et al., 2012). Blue, on the other hand, can be linked to trust (Lee et al., 2010) and relaxation (Gorn et al., 2004). In contrast, green is often associated with peacefulness (Clarke et al., 2008), safety (Kliger et al., 2012) and health (De Vries et al., 2012). Accordingly, a shorter wavelength is associated with relaxation while a longer wavelength with arousal (Walters, 1982).

Furthermore, representation can be applied through affordances. Sung-Euk (2012) has created the concept of color affordances, using the color associations in combination with ideas of Gibson (1979) and Norman (1986) about affordances. Affordances are the features of the appearance of an object that communicate what is supposed to be done with that particular object. For example, in nature neon-colored animals and plants are often poisonous so they should not be eaten. As discussed a smartphone can be controlled by two primary sources of input; tap and swipe. It needs to be clear to the user which movement is expected (Norman, 2013). Since hoovering is not possible on a mobile device, the user must learn by trial and error. Immediate feedback, for example, when a user places a finger on a button but has not removed it yet, can be seen as an affordance. It invites the user to click. This affordance can be realized by changing the touched element to a shade or tint of its original color (Norman, 2013). Sung-Euk’s research (2012) concludes that an element’s purpose needs to be aligned with an associated color, to create the right affordance.

2.3.4 Visibility

Visibility focuses on making the essential elements visible to the user to minimize the user’s memory load. Key aspects are readability, minimalism and emphasis. Readability depends on the contrast between the between the background and the text. Contrast can be expressed in relative luminance. Some studies have put the text legibility at a 4.5:1 ratio (Webster, 2014), while others recommend a 5:1 (ISO, 2010). As discussed, the relatively small screen size of a

(11)

smartphone should be taken into account. Kim (2010) states that a 1:10 ratio is necessary to ensure readability for small text sizes or screen sizes. On the contrary, a lower contrast can be useful. Contrast can be used to indicate activeness of a function. An expert in user experience suggests using a lower contrast to show a button’s inactivity (Babich, 2016c).

Second, a research about color use on websites found that users liked interfaces with 2-3 base colors, excluding black and white, the best (O'Donovan, 2011). Shneiderman (2016) puts this number at 4, including black and white, with the possibility of 3 other accents colors. Both of these studies were based on desktop views of a website. A specialist in user experience, states that the golden rule, the 6:3:1 ratio, also applies to color (Zieliński, 2017). 60% of the screen should be covered by the primary color, 30% by the second color and 10% by an accent color. This theory is in line with the interface design of the most prominent social media applications. In addition, a previously mentioned research showed that monochromatic icons are more easily recognizable than multi-colored ones (Sha et al., 2017). This suggests a lower variety of colors contributes to more clear visibility.

Last, color can separate elements and draw attention. Related items can be grouped by

corresponding colors, such as a different background color that distinguishes the items from the rest (Shneiderman, 2016). Since the screen of a mobile device can only show a portion of the full page, various background colors, provided they do not exceed the color limits, do not have to be bothersome. Also, by color coding, users can locate corresponding information faster (Ozcelik et al., 2009). Colors with longer wavelengths, which increase arousal, can be used to attract the user's attention to the objects, actions and options needed for them.

2.4 Hypotheses

The previous subsections have been concerned with the subquestion ‘what color theories can be applied to the usability factors?’. In this section, this question will be answered, and hypotheses will be drawn upon these results divided into the four categories.

From the first category, consistency, can be concluded that all UI elements of the same kind need to be consistent in color with one another. A color scheme needs to be established and not be deviated from to create consistency. A harmonious color scheme can be made through six techniques. Based upon these results, the following hypotheses were formed:

- H1: Consistent color use in UI elements increases usability. - H2: The use of a harmonious color scheme increases usability.

As discussed in section 2.1, feedback about current location, status and errors should always be available to the user. Color can be used to highlight the current location in the global navigation. Color wavelength can be used to arouse or relax the user. This is useful in error feedback. System status can be accurately displayed to the user through a color changing loading bar. Blue would be the best to use in this situation, due to its relaxing properties. Since a loading bar requires motion and clicks cannot be measured, a hypothesis based on these findings cannot be tested. Therefore this study will not measure the effect of a loading bar. From the residual results, the following hypotheses were formed:

- H3: Highlighting the current location increases usability.

(12)

The representation category concludes that color associations should be taken into account when designing an interface. The purpose of an element should match the association of its color. Also, color affordances should be used to evoke the right behavior from the user. Color affordances are used for UI elements to inform the user on how the element should be used. Both of these findings will be left for future research. The reason for leaving color affordances out, is that color affordances for mobile applications require interaction. Unlike desktops where one can hover over an item, mobile devices can only show users their function by clicking on them. Since users will be clicking on static designs, interaction is not possible. Color associations are left out because they are case specific. This makes this subject not generalizable for usability. Color associations with arousing and relaxing colors, however, can be measured. For this subject the following hypothesis was formulated:

- H6: Using arousing and relaxing colors in the right situation, increases usability.

From the last category, visibility, can be concluded that readability, minimalism, and emphasis on essential elements, lead to usability. Readability requires high contrast between content and background. Due to the smaller screen size, mobile devices require contrast of 1:10 luminance ratio. Minimalism is needed to reduce the user’s memory load. To achieve minimalism, only three colors on a 6:3:1 ratio should be used. Another method to minimize memory load is through emphasis. By using a highlight color, to emphasize important elements, actions, and options, the user’s memory load can be minimized. Since emphasis is case specific, like color associations, it is not generalizable for usability and therefore falls outside the scope of this study. For the other two subjects, the hypotheses are formed as follows:

- H8: A 1:10 luminance ratio between the content and the background increases usability. - H9: Using only three colors on a 6:3:1 ratio increases usability.

3. Methodology

This study was concerned with the research question ‘to what extent can the usability of mobile applications be improved by the use of color?’. To answer this question, four sub-questions had to be answered first. The answers to the first three sub-questions have led to the formation of seven hypotheses. The answer to the fourth sub-question, ‘can the advantage of color in usability be empirically proven?’, was found through an experiment. The experiment included seven interface examples, two for each hypothesis. Version A was based on previous research as can be found in section 2, and version B was either a neutral or contradicting design. Section 3.1 will explain the demographics of the participants. Section 3.2 will describe the tool that was used to execute the experiment, the setup of the survey and how the internal validity was guaranteed. How the designs have come about will be explained in section 3.3. Furthermore, section 3.4 describes the usability metrics and 3.5 the procedure.

3.1 Participants

A total of 170 people have participated in the experiment. People were able to join by clicking on the link to the test, which was available through the author’s Facebook, LinkedIn, and Instagram. The participants came from 18 different nationalities, with Dutch (72.9%) and American (11.2%) being the most common ones. Each age category ranging from <18 to 54-64 contained at least one participant. The age group with the highest frequency was 18-24 (74.7%). 51.2% of the participants identified as female, while 44.7% as male. The remaining ones prefer not to tell their gender, identify as another gender or their data was missing. A small majority of the participants (52.1%) is currently enrolled in or graduated from a university bachelor. This

(13)

percentage is 18.2% for university master and 11.2% for university of applied science. Other participants study or have studied at an institution for practical education, high school or another unspecified institution. The average participant rated their smartphone skills with an 8.2. 55.9% of the participants said they spend 90 minutes or more on their smartphone per day. To prevent threats to internal validity due to systematic differences in participants

characteristics (Scholten, n.d.), each participant was randomly assigned version A or version B for every hypothesis.

3.2 Materials

The tool used to execute the experiment was a survey application developed by ZURB inc. named Helio. Helio allows researchers to ask participants questions in the form of multiple choice, 5-point Likert scale, zero to ten scale and open answer. Participants can also be asked which design they prefer and where they would click in a static interface (Helio, 2018). Because of this last feature, Helio was considered suitable for this experiment.

The survey consisted of two parts: the usability test of the designed interfaces and a set of questions about the participant. The first part of the survey, the usability test, consisted of three screens per hypothesis. On the first screen, the participants were shown a task in the form of a question. An example task, which can also be found in Appendix A, was “Where would you click to solve the error?”. By clicking on continue, the second screen, with the interface on which the task could be fulfilled, was revealed. This research used static interfaces, in other words, images of the interfaces, as testing material. The given task could be performed by placing a click on the interface. After placing the click, and continuing, the participant was redirected to the third page where he or she was asked to rate their satisfaction about the previously shown interface. Since participants could not interact with the interfaces or perform a task based on their own needs, satisfaction was interpreted as the extent to which the participants found the interface clear. During the second part, participants were asked to rate their smartphone skills and how much time they actively spend on their smartphones. The first question was measured on a zero to ten scale and the second question was multiple choice question. The answer options of this multiple choice question were based on research by Falaki et al. (2010), who found that 20% of their participants spent 30 minutes or less on their smartphones, 20% spent 90 minutes or more, and the remaining ones were equally distributed within these boundaries. Also, the participants were asked about personal information regarding their age, gender, nationality and highest education. All of these were multiple choice questions and the answer options, as well as the entire setup of the test, can be found in Appendix A.

To guarantee the internal validity, a prototype was created and tested before the start of the actual survey. This prototype was examined by a group of potential participants, who were asked to explain the difference between version A and version B. When their findings were in line with the intention of the designs, the designed interfaces were accepted as testing material for the survey. When their findings differed, the process of designing and testing was repeated. The corresponding tasks have been tested to avoid ambiguity. While testing the designs and tasks, a few possible threats to internal validity occurred.

First, time length appeared to be an essential issue. According to Blumberg et al. (2011), surveys face the same problems as mobile applications; they are often interrupted. People get bored, tired or hungry during the process. As discussed in section 2.2, people tend to multitask while using their smartphones. This was confirmed during the test phase. Because multitasking is natural behavior, instead of forbidding people to multitask, an additional question was added.

(14)

By adding the question “Did you multitask during this test?”, it was possible to investigate whether this influenced the results afterward. Also, the survey was designed to take no longer than 5 minutes to finish.

Second, mobile applications have different appearances on a different browser, software and screen sizes. To make sure every participant rated the same content, images of interfaces were used as testing material. This meant that there was no interaction possible. However, previous studies have shown that static interfaces can be used as reliable material to analyze that interface (Gorn et al., 2004; Lee et al., 2010). Since the focus of this study was on mobile

applications, the size of the interfaces was chosen accordingly. Taking in consideration that the lifespan of a phone is estimated at around 2.5 years (Kanter, 2018), people were, during the time of the experiment, most likely to be in possession of an iPhone 7 or a Samsung Galaxy S7, since these were the most sold phones in 2016 (Lovejoy, 2017; Silver, 2017). To make sure that the interfaces fit on the majority of the smartphones of the participants, the 16:9 ratio, which both of these phones have, was chosen for the designs. Participants took the survey from their smartphone to prevent the use of an unfamiliar phone affects the results. Finally, the survey was not accessible through laptops and tablets, since the screen sizes of these devices deviate from the purpose of this study.

Last, during the test phase, there were issues with two browsers. Safari and Android Internet did not show, or had a delay with, continue button. This negatively influenced the results during the test phase. For this reason, the test was distributed mainly through social media to avoid these browsers. Also, a question about which browser the participant used was added to be able to determine whether this problem caused significant differences.

3.3 Designs

The performed experiment followed the principles of the A/B testing method. Jiang et al. (2016) describe the goal of A/B testing as “to estimate or test the average treatment effect (ATE), which is defined as the difference between the treatment effects of the old and new variations”. In this case, the ATE is the difference in the degree of usability between version A and version B. Since A/B testing in usability research is always performed on existing designs, the interfaces made for this study were designed to be familiar to the participants. To do so, current features from different operating systems and applications were used and combined.

According to multiple sources (Laurinavicius, 2016; Phuong, 2013), Source Sans Pro is one of the best typefaces for readability in mobile applications. Therefore, this font was used to

prevent the typeface from interfering with the results. For this same reason, the default text size of the operating system iOS was used (Apple, 2018). According to Harley (2014), not ambiguous icons are rare, and therefore icons need a label. This has been applied to all interfaces to prevent confusion. All navigation icons are from flaticon (2018). The interfaces were inspired by

frequently used applications like shopping, food delivery, social media, hotel booking or phone app. From these apps, interfaces such as settings, search and sign up were used as a starting point. In the following paragraphs, hypothesis-specific design choices will be explained.

(15)

Version A Version B

Hypothesis 1: Consistent color use in UI elements increases usability.

Task: “Where would you click to order chicken nuggets?” Buttons were chosen to represent UI elements. Section 2.3.4 discussed the use of color in buttons. Research among elderly has shown that monochrome buttons are understood easier and faster than colorful ones. This theory was tested on to a broader population. Both versions had the same icons and wording to make sure these factors do not influence the results. The icons are from Smashicons (2016).

Hypothesis 2: The use of a harmonious color scheme increases usability.

Task: “Click to finish off with a hashtag”

The analogous color scheme from figure 2 was used to create version A, while a random color combination, that does not match one of the harmonious color schemes, was used to create version B. Elements of the same kind are the same color in both designs to avoid confusion. The designs include features from both Apple and Android.

(16)

Hypothesis 3: Highlighting the current location increases usability.

Task: “Click on the current location in the navigation bar” All frequently used shopping applications have a navigation bar. For this reason, a specific screen from such an app, the search screen, was used. The navigation bar consists or icons that are frequently used in shopping apps. In version A the current location is

highlighted in a blue color.

Hypothesis 4: Using red for negative and green for positive feedback, increases usability.

Task: “Where would you click to solve the error?”

Screens which require for data entry, like sign up forms, should give users feedback about their entered data. A good amount existing applications make use of colored feedback for these forms. For each design the error is specified in words. Only version A has negative feedback circled in red and positive feedback in green.

(17)

Hypothesis 5: Using arousing and relaxing colors in the right situation, increases usability.

Task: “Click on the hotel with the lowest amount of rooms available”

Applications that make use of arousing and relaxing colors are often retail apps. A widely known example is the

booking application for a.o. hotels. Version A and B had the same content. In version B all the text was black, while version A had blue (relaxing) for hotels with sufficient rooms left and red (arousing) for hotels that were likely to be sold out quickly.

Hypothesis 6: A 1:10

luminance ratio between the content and the background increases usability.

Task: “Where would you click to change your password?” Since luminance ratio is important in readability, a reading task was chosen. To ensure that differences in reading speed did not influence the results, words were chosen instead of sentences. The minimal luminance ratio for desktop (1:4.5) was compared to the suggested luminance ratio for mobile devices (1:10).

Version A made use of the color RGB(24, 67, 91) which has a 1 : 10.009 luminance ratio with the background, while version B made use of the color RGB(51, 123, 153) which has a 1 : 4.99 ratio.

(18)

Hypothesis 7: Using only three colors on a 6:3:1 ratio

increases usability.

Task: “Where would you click to review the second order?” A design with a lot of UI elements was created to make a broad distribution of colors possible. For both designs, UI elements with the same function had the same color. Version A had three colors on a 6:3:1 ratio, while version B had six colors on a more distributed color ratio. Version B had an harmonious (squared) color scheme to let the color combination not influence the results.

3.4 Usability Metrics

The seven hypotheses from the four categories, were tested through Nielsen’s usability metrics (2001). Nielsen states that usability of a new interface can be tested by comparing it to old one. Users should be asked to complete a task while the following metrics are being collected:

1. Success rate: whether the user is able to complete the task.

2. Task time: the number of seconds it takes the user to complete a task.

3. Error rate: the number of errors the user faces before finishing the task.

4. Satisfaction: the user’s overall subjective satisfaction rate.

This method assumes that there is already an existing interface. Since this was not the case for this study, a design based on previous research was compared to a neutral or contradicting one. As explained in section 3.3, the participants were randomly assigned one design or the other and asked to perform the same task. First, the success rate was measured by locating the click and determining whether the position of that click fell within the predetermined area. In case of a click on the right spot, participants were assigned a 1. In case of a wrongly placed click,

participants were assigned a 0. Second, the task time was measured by timing how long the participants spent on the screen with the designed interface. To make sure the reading speed of the participant did not influence the task time, the task was shown before the screen with the interface. The timer was stopped when the participants clicked continue and were redirected to the next screen. Third, as mentioned, the experiment made use of static interfaces which

participants could not interact with. Only one click, the last one, could be measured. As a consequence, it was not possible to measure the error rate. Last, after each task was fulfilled, participants were asked to rate their feeling of satisfaction, as interpreted as clarity, about the interface. Satisfaction was measured on a zero to ten scale.

(19)

3.5 Procedure

As discussed, an online usability test was conducted to answer the fourth sub-question. People who randomly saw the link to the test on Instagram, Facebook or LinkedIn were able to

participate. The participants took the test from their smartphones. Computers and tablets were excluded. When opening the test, the participants were given instructions and asked to follow their intuition and be honest. The survey consisted of two parts: the usability test and questions about the participant. During the usability test, the participants were shown a task, an interface on which the task could be performed, and they were asked to rate the interface on clarity. Questions about the participant included personal questions, questions about their smartphone use, whether they multitasked and which browser they used. Afterward, participants were thanked for their participation.

4. Results

The data from the usability test was statistically analyzed, and the results will be presented in this section. As explained in section 3, each participant was randomly shown version A or B for each of the seven interfaces. The participants were asked to perform a task using these

interfaces. A task could be fulfilled by clicking on a particular spot in the interface. The time it took the participant to fulfill the task, as well the location of the click were measured. After each interface, participants were asked to express their satisfaction. These three measurements cause every hypothesis to be divided into three null hypotheses:

- H0 Task Time: there is no difference in time between version A and version B.

- H0 Success Rate: there is no difference in success rate between version A and version B. - H0 Satisfaction: there is no difference in satisfaction between version A and version B.

The null hypotheses for task time and satisfaction were tested by an independent sample T-test. Since the independent sample T-test “compares the means between two unrelated groups on the same continuous, dependent variable” (Leard, 2018), this test was suitable for the data. All assumptions of the test were met; both time and satisfaction were measured on a continuous scale, there were two independent groups (version A and version B), and there was

independence of observations. One outlier from task time for hypothesis 5 was removed since it deviated 12.96 times the standard deviation from the mean. Both task time and satisfaction were approximately normally distributed. Lumley et al. (2002) state that “there are a variety of situations in which we can assume normality regardless of the shape of our sample data”. This phenomenon is called the central limit theorem and according to Field (2013) “up to 100 or even 160 might be necessary”. Since this study contained 170 participants, normality was assumed. Finally, homogeneity of variance will be discussed per hypothesis.

Since the success rate was categorical data, either the Chi-Square test or the Fisher's exact test was used depending on the case. Overall the Chi-Square was used, but in case of an expected frequency of less than five, the Fisher's exact test was used since this tests suits that situation better. According to Field (2013), the likelihood ratio is preferred when a data sample is small. Since this study contained 170 participants, this test was not suitable. Both of the assumptions of the Chi-Square test were met; the two variables both consisted of categorical data, and each variable consisted of two independent groups. These same assumptions apply to the Fisher’s exact test. The descriptive statistics of the success rate, as well as those of task time and satisfaction, can be found in appendix B.

(20)

20 As discussed in section 3.2, there were two factors that could possibly have influenced the results; multitasking and browser. Three variables were calculated; average task time, average success rate and average satisfaction. To test whether multitasking or browser influenced the results, independent t-tests were performed with the two factors as independent variables and the three newly calculated variables as dependent variables. These tests can be found in appendix C. The tests showed no significant difference in multitasking or browser, and therefore it was assumed that these factors did not influence the results.

4.1 Consistent Color Use in UI Elements

For hypothesis 1, “consistent color use in UI elements increases usability”, the three null hypotheses were tested through statistical tests. As seen in table 2b, the Levene's test shows that homogeneity of variance could be assumed for both task time and satisfaction. Looking at equal variances assumed, both variables have a higher significance value than 0.05. Neither of the null hypotheses could be rejected, meaning there was no significant difference between version A and version B regarding task time or satisfaction. In table 3, the Chi-Square statistic for success rate was 0.111 with a significance value greater than 0.05, indicating no significant difference was found.

Table 2a, Group Statistics of Task Time and Satisfaction for Hypothesis 1

H1 Variation N Mean _DeviationStd. Std. Error _Mean

H1 Task Time A 86 15.0773 10.00386 1.07874 B 84 15.0069 7.27474 .79374 H1 Satisfaction A 86 7.03 2.379 .256 B 84 7.05 2.206 .241 Table 2b, Independent Samples Test of Task Time and Satisfaction for Hypothesis 1

Levene's Test for Equality of Variances t-test for Equality of Means

F Sig. t df _(2-tailed)Sig. _DifferenceMean _DifferenceStd. Error

95% Confidence Interval of the Difference Lower Upper H1 Task Time Equal variances assumed 1.114 .293 .052 168 .958 .07042 1.34419 -2.58326 2.72410 Equal variances not assumed .053 155.327 .958 .07042 1.33930 -2.57516 2.71600 H1 Satisfaction Equal variances assumed .775 .380 -.036 168 .971 -.013 .352 -.708 .682 Equal variances not assumed -.036 167.552 .971 -.013 .352 -.707 .682

Table 3, Chi-Square Test of Success Rate for Hypothesis 1

Value df Significance Asymptotic

(2-sided)

Exact Sig.

(2-sided) Exact Sig. (1-sided)

Pearson Chi-Square .111a 1 .739 .780 .482 Continuity Correctionb .002 1 .965 .482 Likelihood Ratio .111 1 .739 .780 Fisher's Exact Test .780 .482 N of Valid Cases 170 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 6.42. b. Computed only for a 2x2 table

(21)

4.2 Harmonious Color Scheme

Hypothesis 2, “the use of a harmonious color scheme increases usability”, was also tested using the three null hypotheses. In table 4b, one can see that the Levene's test was significant for both variables. This meant equal variances could not be assumed. Therefore, task time had a significance value of 0.012. Since this value was lower than 0.05, the null hypothesis for task time was rejected, meaning that there was a significant difference in task time between version A and version B. As seen in table 4a, the average task time of participants with version B was higher than the one with version A. However, no significant differences regarding satisfaction or success rate were found. Since success rate had an expected frequency of less than 5, the Fisher’s exact test was used instead of the Chi-Square test. Shown in table 4b and 5, the significance value for both satisfaction (0.236) and success rate (0.947) were higher than 0.05, and therefore the null hypotheses could not be rejected.

Table 4a, Group Statistics of Task Time and Satisfaction for Hypothesis 2

H2 Task Time A 81 7.5536 4.29861 .47762 B 89 9.5352 5.82475 .61742 H2 Satisfaction A 81 8.44 1.673 .186 B 89 8.06 2.529 .268 Table 4b, Independent Samples Test of Task Time and Satisfaction for Hypothesis 2 Levene's Test for Equality of Variances t-test for Equality of Means

95% Confidence Interval of the Difference Lower Upper H2 Task Time Equal variances assumed 4.682 .032 -2.503 168 .013 -1.98159 .79157 -3.54429 -.41889 Equal variances not assumed -2.539 161.298 .012 -1.98159 .78060 -3.52310 -.44008 H2 Satisfaction Equal variances assumed 5.988 .015 1.168 168 .244 .388 .332 -.268 1.044 Equal variances not assumed 1.190 153.868 .236 .388 .326 -.256 1.033 Table 5, Fisher’s Exact Test of Success Rate for Hypothesis 2

(2-sided)

Exact Sig.

Pearson Chi-Square .004a ₁ _.947 _1.000 _.727 Continuity Correctionb _.000 ₁ _1.000 Likelihood Ratio .004 1 .947 1.000 .727 Fisher's Exact Test 1.000 .727 N of Valid Cases 170 a. 2 cells (50.0%) have expected count less than 5. The minimum expected count is .95. b. Computed only for a 2x2 table

(22)

22

4.3 Highlight of the Current Location

For hypothesis 3, “highlighting the current location increases usability”, three null hypotheses regarding task time, success rate, and satisfaction were statistically tested. Table 6b shows that the Levene’s Test for Equality of Variances was not significant, meaning equal variances could be assumed. Since the significance values were higher than 0.05 for both variables, the null hypotheses for task time and satisfaction could not be rejected. In other words, no significant differences were found. Table 7 displays the same result for success rate. Since the significance level (0.063) is higher than 0.05, the null hypothesis could also not be rejected.

Table 6a, Group Statistics of Task Time and Satisfaction for Hypothesis 3 _H3 Variation _N _Mean Std.

Deviation Std. Error Mean

H3 Task Time A 84 20.2094 16.56947 1.80788 B 86 22.0083 13.55112 1.46125 H3 Satisfaction A 84 6.55 2.792 .305 B 86 5.94 2.896 .312 Table 6b, Independent Samples Test of Task Time and Satisfaction for Hypothesis 3 Levene's Test for Equality of Variances t-test for Equality of Means

95% Confidence Interval of the Difference Lower Upper H3 Task Time Equal variances assumed .234 .629 -.776 168 .439 -1.79885 2.31913 -6.37724 2.77954 Equal variances not assumed -.774 160.135 .440 -1.79885 2.32458 -6.38964 2.79194 H3 Satisfaction Equal variances assumed .156 .693 1.388 168 .167 .606 .436 -.256 1.467 Equal variances not assumed 1.389 167.972 .167 .606 .436 -.255 1.467 Table 7, Chi-Square Test of Success Rate for Hypothesis 3

(2-sided)

Exact Sig.

Pearson Chi-Square 3.448a ₁ _.063 _.082 _.045 Continuity Correctionb _2.887 ₁ _0.89 Likelihood Ratio 3.461 1 .063 .082 .045 Fisher's Exact Test .082 .045 N of Valid Cases 170 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 32.12. b. Computed only for a 2x2 table

(23)

4.4 Red for Negative and Green for Positive Feedback

Hypothesis 4, “using red for negative and green for positive feedback, increases usability”, was tested in the same way as the previous hypotheses. Looking at table 8b, one can see that the Levene's test was not significant for both variables. Therefore, equal variances could be assumed for task time and satisfaction. The null hypotheses for these two variables could be rejected because both of their significance values (0.006 for task time and 0.038 for satisfaction) were less than or equal to 0.05. This meant there was a significant difference between version A and version B regarding task time and satisfaction. However, using the Fisher’s exact test because of an expected frequency lower than 5, no significant difference were found for success rate (see table 9). Table 8a, Group Statistics of Task Time and Satisfaction for Hypothesis 4

95% Confidence Interval of the Difference Lower Upper H4 Task Time Equal variances assumed .420 .518 -2.773 168 .006 -2.23870 .80736 -3.83258 -.64482 Equal variances not assumed -2.782 167.942 .006 -2.23870 .80480 -3.82753 -.64987 H4 Satisfaction Equal variances assumed 2.016 .157 2.086 168 .038 .522 .250 .028 1.016 Equal variances not assumed 2.091 167.998 .038 .522 .250 .029 1.015 Table 9, Fisher’s Exact Test of Success Rate for Hypothesis 4

(2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Pearson Chi-Square .008a ₁ _.930 _1.000 _.625 Continuity Correctionb _.000 ₁ _1.000 Likelihood Ratio .008 1 .930 1.000 .625 Fisher's Exact Test 1.000 .625 N of Valid Cases 170 a. 2 cells (50.0%) have expected count less than 5. The minimum expected count is 2.89. b. Computed only for a 2x2 table

(24)

24

4.5 Arousing and Relaxing Colors

For the testing of hypothesis 5, “using arousing and relaxing colors in the right situation, increases usability”, the three null hypotheses were tested. Table 10b shows that the Levene’s test for task time was significant while the test was not significant for satisfaction. In other words, equal variances could not be assumed for task time, but they could for satisfaction. Both the significance values for task time (0.106) and satisfaction (0.644) were above 0.05, so neither of the null hypotheses could be rejected. The Fisher’s exact test was used for success rate because there was an expected frequency of lower than 5. This test showed a significance value of 0.282, indicating that there was no significant difference between version A and B regarding success rate (see table 11). Table 10a, Group Statistics of Task Time and Satisfaction for Hypothesis 5

95% Confidence Interval of the Difference Lower Upper H5 Task Time Equal variances assumed 4.499 .035 1.664 167 .098 1.04390 .62753 -.19501 2.28280 Equal variances not assumed 1.629 134.981 .106 1.04390 .64072 -.22324 2.31104 H5 Satisfaction Equal variances assumed .750 .388 .463 168 .644 .115 .248 -.375 .604 Equal variances not assumed .463 165.967 .644 .115 .248 -.375 .605 Table 11, Fisher’s Exact Test of Success Rate for Hypothesis 5

(2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Pearson Chi-Square 1.726a ₁ _.189 _.282 _.172 Continuity Correctionb _.905 ₁ _.341 Likelihood Ratio 1.815 1 .178 .282 .172 Fisher's Exact Test .282 .172 N of Valid Cases 170 a. 2 cells (50.0%) have expected count less than 5. The minimum expected count is 3.81. b. Computed only for a 2x2 table

(25)

4.6 A 1:10 Luminance Ratio

For hypothesis 6, “a 1:10 luminance ratio between the content and the background increases usability”, the three null hypotheses were tested through statistical tests. As seen in table 12b, the Levene’s test was significant meaning that homogeneity of variance could not be assumed. The significance value of task time (0.128) was not equal to or lower than 0.05, and therefore the null hypothesis could not be rejected. The degree of satisfaction, however, had a significance value of 0.005, meaning that there was a significant difference in satisfaction between version A and version B. Looking at table 12a, one can see version A had a higher value than version B. Finally, as seen in table 13, the null hypothesis for success rate could not be rejected because the significance value was greater than 0.05. Table 12a, Group Statistics of Task Time and Satisfaction for Hypothesis 6

95% Confidence Interval of the Difference Lower Upper H6 Task Time Equal variances assumed 7.029 .009 -1.530 168 .128 -1.29368 .84571 -2.96327 .37592 Equal variances not assumed -1.541 125.233 .126 -1.29368 .83969 -2.95550 .36815 H6 Satisfaction Equal variances assumed 4.519 .035 2.817 168 .005 .734 .260 .219 1.248 Equal variances not assumed 2.821 166.109 .005 .734 .260 .220 1.247 Table 13, Chi-Square Test of Success Rate for Hypothesis 6

(2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Pearson Chi-Square .801a ₁ _.371 _.535 _.281 Continuity Correctionb _.340 ₁ _.560 Likelihood Ratio .811 1 .368 .535 .281 Fisher's Exact Test .535 .281 N of Valid Cases 170 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 5.44. b. Computed only for a 2x2 table

(26)

26

4.7 Three Colors on a 6:3:1 Ratio

For hypothesis 7, “using only three colors on a 6:3:1 ratio increases usability”, three null hypotheses regarding task time, success rate and satisfaction were statistically tested. Table 14b shows that the Levene’s test is not significant, meaning that equal variances can be assumed. Both task time and satisfaction had a significance value higher than 0.05, and therefore the null hypotheses could not be rejected. As seen in table 15a, the Chi-Square statistic was 4.428 with a significance value of 0.035, indicating that there was a significant difference in success rate between version A and B. According to table 15b, participants with version A had a significantly higher success rate than participants with version B. Table 14a, Group Statistics of Task Time and Satisfaction for Hypothesis 7

H7 Task Time A 83 10.2651 10.55822 1.15892 B 87 11.3303 13.16948 1.41192 H7 Satisfaction A 83 8.57 1.407 .154 B 87 8.10 1.766 .189 Table 14b, Independent Samples Test of Task Time and Satisfaction for Hypothesis 7 Levene's Test for Equality of Variances t-test for Equality of Means

95% Confidence Interval of the Difference Lower Upper H7 Task Time Equal variances assumed .052 .820 -.580 168 .563 -1.06528 1.83606 -4.69001 2.55944 Equal variances not assumed -.583 163.218 .561 -1.06528 1.82663 -4.67217 2.54160 H7 Satisfaction Equal variances assumed 1.173 .280 1.884 168 .061 .463 .246 -.022 .948 Equal variances not assumed 1.894 162.909 .060 .463 .244 -.020 .945 Table 15a, Chi-Square Test of Success Rate for Hypothesis 7 Value df Asymptotic Significance (2-sided) Exact Sig.

Pearson Chi-Square 4.428a ₁ _.035 _.042 _.028 Continuity Correctionb _3.611 ₁ _.057 Likelihood Ratio 4.532 1 .033 .042 .028 Fisher's Exact Test .042 .028 N of Valid Cases 170 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 14.16. b. Computed only for a 2x2 table

(27)

Table 15b, Crosstabulation of Success Rate for Hypothesis 7 H7 Variation Total A B H7 Success Rate 0 Count 9 20 29 Expected Count 14.2 14.8 29.0 % of Total 5.3% 11.8% 17.1% Standardized Residual -1.4 1.3 1 Count 74 67 141 Expected Count 68.8 72.2 141.0 % of Total 43.5% 39.4% 82.9% Standardized Residual .6 -.6 Total Count 83 87 170 Expected Count 83.0 87.0 170.0 % of Total 48.8% 51.2% 100.0%

4.8 Summary of the Results

The results of the null hypotheses of the seven main hypotheses have been presented in the previous sections. This section will give a summary of these results and a possible explanation of the outcome. An overview of the results of the usability test can be found in table 16. Each cross (x) in this table represents a significant difference between version A (enhanced design) and version B (neutral or contradicting design), with version A exceeding over version B. From this table can be concluded that the answer to the fourth research question, “can the advantage of color in usability be empirically proven?”, is yes. Table 16, Overview of Analysis Results

Category Hypothesis _TimeTask Success _Rate _factionSatis-

Consistency H1 Consistent color use in UI elements increases usability H2 The use of a harmonious color scheme increases usability x Feedback H3 Highlighting the current location increases usability H4 Using red for negative and green for positive feedback, increases _usability x x Representation H5 Using arousing and relaxing colors in the right situation, increases _usability

Visibility H6 A 1:10 luminance ratio between content and background increases _usability x H7 Using only three colors on a 6:3:1 ratio increases usability x In consistency, no results were found for the first hypothesis. This is remarkable because it means that the theory of Sha et al. (2017), who conducted research among elderly, cannot be applied to a target group of a different age range. The second hypothesis found that a task can be fulfilled significantly faster when a harmonious color scheme is used. This suggests such a color scheme does not only give users a feeling of peace and harmony as found in previous studies, but it also affects responsiveness. This phenomenon can be explained by the research of Einakian and Newman (2010). In their research, they found that the use of a non-harmonic color combination leads to lower visibility. When visibility is low, the memory load of the user is

Color in Usability of Mobile Applications

Color in

Usability of

Mobile

Applications

A.L. TOL

10720367

Bachelor Thesis

Information Science

Supervisor

DR. D. HEINHUIS

Second Examiner

DRS. A.W. ABCOUWER

University of Amsterdam

Table of Contents

List of Figures

List of Tables

1.

Introduction

2.

Theoretical Framework

2.1 Usability

2.2

Usability of Mobile Applications

2.3