Conveying capabilities in a chatbot with limited intelligence

(1)

Conveying capabilities in a chatbot with limited intelligence

SUBMITTED IN PARTIAL FULLFILLMENT FOR THE DEGREE OF MASTER

OF SCIENCE

Emiel de Graaf

10645373

M

ASTER

I

NFORMATION

S

TUDIES

H

UMAN-

C

ENTERED

M

ULTIMEDIA

F

ACULTY OF

S

CIENCE

U

NIVERSITY OF

A

MSTERDAM

24-07-2018

1

st

_Supervisor

₂

nd

_Supervisor

dr. F.M. (Frank) Nack

dr. S. (Susanne) van Mulken

(2)

Conveying capabilities in a chatbot with limited intelligence

The design of an e-commerce chatbot with an user-centered approach

Emiel de Graaf

University of Amsterdam Lead author emieldegraaf@hotmail.nl

ABSTRACT

Today, business-to-consumer companies are implementing chatbots in large numbers for many different applications. However, research shows that there is a gap between user expectations and the prac-tical use of chatbots. First of all, users think that the intelligence of chatbots is much higher than it actually is. Secondly, chatbots fail in conveying what their capabilities are. In collaboration with INFORMAAT, we designed a chatbot with limited intelligence for the e-commerce domain to explore ways of conveying capabili-ties that improve the user experience of a chatbot. For the design of the chatbot, we used different user-centered methods. At first, we conducted expert interviews to explore potential solutions to convey capabilities. Subsequently, we implemented these solutions in our design, and tested the user experience of the chatbot. We used two iterations to evaluate and improve the user experience of the first iteration. The user experience tests indicate that for an e-commerce chatbot with limited intelligence: 1) one should only ask for information when the user is able to see its relevance, 2) one should apply graphics since in particular situations they can better convey information than text alone, 3) one should apply buttons to make navigation within the chatbot clear and efficient, 4) one should apply error management to give feedback on the state of the chatbot and to provide a way out, 5) one should give users the possibility to specify demands directly via textual input, 6) one should show examples of questions that indicate that the user is free to ask other relevant questions as well, 7) one should initially consider if a chatbot is the right solution for the use case and 8) one should initially make the scope of the chatbot as small as possible to be able to process most user demands.

KEYWORDS

Conversational UI, Chatbots, User-centered design, User experience, User expectations, Human-computer interaction

1 INTRODUCTION

Today, many companies are investing in the development of chat-bots. They can be compared with the rise of mobile applications in 2008Ðchatbots are a hype. While technology providers are still investing in underlying technologies such as natural language pro-cessing (NLP) and other artificial intelligence (AI) technologies, business-to-consumer (B2C) companies are already implementing chatbots in large numbers for many different applications. However, research shows that there is a gap between user expectations and the practical use of chatbots [19]. First of all, users have high expec-tations of the intelligence of chatbots, while in contrast NLP and other AI technologies are still underdeveloped [5]. This prevents users from having advanced conversations; chatbots are not yet

able to interpret complex questions and understand the dynamic context of messages. Secondly, chatbots fail in conveying what their capabilities are. Due to insufficient feedback of the capabil-ities of the chatbot, users often do not know in which situations what actions are possible [3]. According to Luger and Sellen, some thought should be given to how to convey system limitations and capabilities until chatbots are fully able to have human-like conver-sations [19]. For this reason, the aim of this research is to explore ways of conveying capabilities that improve the user experience of a chatbot.

This paper describes the design of a chatbot with a user-centered approach. The chatbot is designed in collaboration with INFOR-MAAT, a digital design firm from the Netherlands1. In this research, two iterations of designing, developing and UX testing are con-ducted. The research has an inductive nature, since the aim is to develop theory that enables practitioners and researcher in the field of human-computer interaction (HCI) to design chatbots that are more user-friendly. For this research, we choose to implement the chatbot for the e-commerce domain, since the domain is task-oriented and chatbots offer a mobile-friendly solution [11] in con-trast to current e-commerce sites [1]. Because of the fact that the average intelligence of chatbots is low, we choose to implement a chatbot with basic NLP. Furthermore, a low intelligence increases the importance of conveying capabilities. In summary, the research question will be: How to convey capabilities of an e-commerce

chat-bot with limited intelligence for a better user experience? At first,

Section 2 discusses relevant studies concerning conversational user interfaces (CUI) and the management of user expectations. Sec-ondly, Section 3 describes the design process of the chatbot and the different methods that are applied for the testing and evaluation of the UX of the chatbot. In the last sections, the implications of the findings are discussed and several recommendations are made to improve the UX of chatbots.

2 THEORETICAL BACKGROUND

2.1 CUI

Until now, practitioners in the field of HCI have mainly been fo-cusing on the design of graphical user interfaces (GUIs). However, there is a trend noticeable that moves into a complete different direction, namely towards CUIs [11]. Instead of interacting through a visual interface, the user converses through natural language in a CUI. According to Dale, there are two types of CUIs: voice assistants and text-based chatbots [8]. The most known voice assistants are

(3)

Apple’s Siri2, Amazon’s Alexa3, Microsoft’s Cortana4and the Google Assistant5. As the name suggests, voice assistant rely on voice as main form of interaction. In contrast, the main form of interaction with chatbots happens through a visual chat interface. According to Eeuwen, a definition for a chatbot is a "service, pow-ered by rules and sometimes artificial intelligence, that you interact with via a chat interface" [10, p. 4]. Currently, there are thousands of chatbots that are developed by B2C companies such as data and service providers [11]. Technology providers are investing a lot of resources to develop frameworks, platforms and technologies such as NLP and other AI technologies that enable B2C compa-nies to build chatbots. Examples of frameworks are Microsoft’s Bot Framework6and Google’s Dialogflow7.

Although some see CUIs as a potential revolution in the field of HCI, chatbots are not something new. One of the earliest chatbots is the ELIZA project of Joseph Weizenbaum in 1966, which is a NLP computer program that uses simple pattern matching [29]. Since then, NLP has evolved from simple pattern matching to pars-ing messages and recognizpars-ing user intents to a higher degree [3]. Besides the improvement of NLP and other AI technologies, one of the reasons that chatbots become so popular is the growth of mobile users and popular social platforms, in which chatbots can easily be integrated [8]. Berglund states that integrating chatbots with popular social platforms such as Facebook Messenger8gives users a greater confidence level since they are familiar with the environment [3]. Besides, Berglund mentions that most platforms offer accessibility tools that can help users for example with visual impairments.

2.2 Chatbot limitations

Although many companies see chatbots as ’the next big thing’, re-search shows that there are several problems that affect the UX of chatbots [11]. As described in Section 2.1, many chatbots use NLP and other AI technologies to enable more advanced conversations. However, research shows that these technologies have not reached their full potential yet. According to Cambria and White, the per-formance of even the most efficient word-based algorithms is poor when these algorithms are applied in a domain for which they are not trained [5]. Although NLP research is currently shifting towards a more semantical interpretation, the research of Cam-bria and White discuss that before chatbots are fully able to have human-like conversations, chatbots should contain physical and social knowledge and learn how to handle this type of knowledge first. Dialogflow is a good example to illustrate what the current state of underlying technologies of chatbots is in practice, since it is only able to take the context of messages into account to a certain extent. Contexts inside Dialogflow are variables that can be activated by a user intent, and allows developers to specify re-lated actions that are only accessible when the variable is activated.

2_{https://www.apple.com/ios/siri/} 3_{https://developer.amazon.com/alexa} 4_{https://www.microsoft.com/en-us/cortana} 5_{https://assistant.google.com/} 6_{https://dev.botframework.com/} 7_{https://dialogflow.com/} 8_{https://www.messenger.com/}

These ’conversational flows’ have to be explicitly defined and are not based on any underlying semantics.

While chatbots are still underdeveloped, the expectations of users are much higher. Research shows that users have the expecta-tions that a conversational agent 1) has good technical performance, 2) is smart, 3) seamless and 4) personal [30]. According to Luger and Sellen, this results in a gap between user expectations and the practical use of chatbots [19]. This is explained by Luger and Sellen according to Norman’s gulf of execution and evaluation [22]. First of all, there is a gulf since users have higher expectations of the intelligence of conversational agents. Luger and Sellen show that at all times, users ascribe human characteristics such as humor and contextual understanding to conversational agents. For example, when a conversational agent fails in performing a task, users as-sume that the conversational agent is unable to learn. However, more technical users are less susceptible to this [19]. Furthermore, the findings of Luger and Sellen correspond with the results of Weizenbaum, which was later described the ELIZA effect: users tend to treat programs that respond to them appropriately with more intelligence than they actually have [29].

Secondly, there is a gulf between the expected capabilities and the actual capabilities of chatbots. According to Berglund, users ask information that the chatbot does not possess and do not un-derstand what the possibilities are due to lack of feedback on the state of the chatbot [3]. This can be illustrated by a user who tries to perform an action in a conversational flow that he has already left. According to Luger and Sellen, users compensate this kind of behavior by employing an "economy of interactions, such as avoiding complex tasks, limiting the types of language used, and gradual abandonment of the CA for activities other than those they ’trusted’ it to perform" [19, p. 5294]. This corresponds with research from Hill et al., who state that users send shorter messages and use more limited vocabulary when they are talking to a chatbot than when they are talking to a human [15]. This illustrates that there is an adaptive overhead when users are talking to a chatbot. However, as Hill et al. discuss, this does not exceed the novelty of the interaction as users are still willing to send a large amount of messages to chatbots.

A solution to reduce the gap between user expectations and the practical use of chatbots is adopting a hybrid approach, for example the integration of traditional search with chatbots [27]. This can lower user expectations since users are able to fall back on traditional search, which implies that the chatbot is rather limited.

2.3 Design principles

2.3.1 Anthropomorphism.The ELIZA effect, which is dis-cussed in Section 2.2, is a powerful tool when creating virtual char-acters, since it can increase the credibility of virtual characters [23]. The positive effect of anthropomorphism is also researched in relation to the application of CUIs. For example, the ability to apply small talk and humor can engage users and make the experience more pleasant [19]. Here, it is important for the conversational agent not to repeat its own answers because it can make them become unnatural [17]. Another way to make a chatbot more en-gaging is through embodiment. A research of Hegel et al. shows 2

(4)

that users attribute more anthropomorphic behaviors when the ap-pearance of computers become more human-like [14]. Furthermore, users experience more fun while playing against a human-like com-puter. This means that the appearance of chatbots might have a positive influence on the UX of chatbots. Finally, integrating per-sonal information in the dialog can make a chatbot more engaging as it can open for a more personalized experience [3].

However, the higher the naturalness of the agent, the higher the expectations of users regarding its intelligence and capabilities. According to Cassell, we use a variety of cues to communicate intelligence, such as providing feedback, interrupting, error han-dling, turn taking etc. [6] However, according to Luger and Sellen, users describe the interaction with conversational agents instead as clearly bounded [19]. Perhaps a more mechanical solution would be better since it might lower the user expectations. For example, Luger and Sellen discuss that one should be careful with small talk and humor, as it can also act as affordances. According to an in-formal research of INFORMAAT, another solution to lower user expectations is to apply a consciously incompetent tone in a chatbot [12]. However, in task-oriented situations, the research suggest that users prefer a more serious tone.

2.3.2 Transparency.To reduce the gap between the user ex-pectations of the capabilities of a chatbot and the actual capabil-ities of a chatbot, Luger and Sellen suggest that it is important that the chatbot is as transparent as possible [19]. According to Sörensen, one way to reduce the gap is through an explanation of what conversational patterns the bot can manage at the beginning of a conversation [26]. This onboarding process can help users to understand what the capabilities of a chatbot are. For example, without an onboarding process, users might not be aware or be confident enough to perform certain tasks. According to Berglund, another solution to convey capabilities is error management [3]. Error management enables a chatbot to indicate whether it receives a message that it can not handle. Finally, Luger and Sellen mention that when a system is struggling, one might rely on the GUI to convey capabilities.

2.3.3 User input.According to Zamora, there are two dif-ferent input modalities for chatbots: voice and textual input [30]. However, earlier in this research we made the distinction between voice assistants and chatbots. Therefore, this research focuses on textual input only. For this research, a more interesting distinction is the difference between predefined answers and textual input, which is addressed by Berglund. According to him, predefined an-swers have the advantages that it 1) simplifies user interaction, 2) provides a more fluid control of the conversational flow, 3) is easier to implement than NLP and 4) gives feedback about the state of the conversational flow [3]. However, Berglund advices to combine predefined answers with NLP to allow users to express themselves more freely. Finally, Berglund recommends to use other GUI elements such as information card templates, since they are appreciated by users and make the chatbot more versatile in com-parison to using plain text only.

Figure 1: Design methods

3 INTERACTION DESIGN

This section explains the different user-centered design methods that were applied for the development of our chatbot. An overview of these methods are shown in Figure 1. At first, we conducted expert interviews to map different aspects that concern the gap between user expectations and the practical use of chatbots, and to explore potential solutions to convey capabilities. This resulted in a set of non-functional requirements for our chatbot. Secondly, we defined the use case of the chatbot with the help of a user scenario, which resulted in a set of functional requirements. Thirdly, the tar-get audience was defined, and two personas were applied to give an example of a potential user and to make the personality of the chat-bot explicit. Fourthly, a platform was selected that offered enough functionality to meet the non-functional requirements. Based on both sets of requirements, we implemented several solutions in our design and tested the UX of the chatbot and the solutions. We used two iterations to evaluate and improve the user experience of the first iteration.

3.1 Expert interviews & requirements

Before the start of this research, we conducted expert interviews to map different aspects that concern the gap between user expec-tations and the practical use of chatbots, and to explore potential solutions to convey capabilities. This section explains the methods that were used to conduct the interviews and analyze the data, and discusses the results that led to a set of non-functional requirements for our chatbot.

3.1.1 Interview guide.The interviews are in-depth interviews, which are semi-structured interviews that allow the interviewer to dig deeper into relevant and unexplored phenomena [18]. There-fore, in-depth interviews are well-suited for exploratory research. The interview guide is shown in Appendix A. The first part of the interview consists of several general questions such as ’Do you recognize that there is a gap between user expectations and the practical use of chatbots?’, ’What do you think that the causes are?’ and ’Do you know any solutions to reduce this gap?’. In the sec-ond part of the interview, more specific questions about how to 3

(5)

Table 1: Experts

No. Experience Domain

E01 Information architect Insurance company E02 Project manager Insurance company E03 UX Designer Digital agency E04 CUI Designer Banking company

reduce the gap between user expectations and the practical use of chatbots are asked to provide the expert a direction for their answers. Two examples of such questions are ’How do you think that user expectations can be managed in the beginning of the conversation?’ and ’How do you think that GUI elements can help with the management of user expectations?’. Furthermore, these questions include different topics that are meant to support probing. In the end, summary questions are asked, such as ’What advice would you give to a company that wants to implement a chatbot to manage the user expectations?’.

3.1.2 Experts.The interviews were conducted with 4 Dutch experts in the area of UX design and chatbot design. The experts were selected using snowball sampling based on the criteria that they had at least one year work experience with the actual design and/or design strategy of a chatbot, which we considered sufficient since chatbots are rather new in the field of HCI. For this research, the names of the experts have been left out. Table 1 provides an overview of the experts.

3.1.3 Data collection.During the UX tests, the voices of the experts were recorded. After the interviews, the audio was coded directly into English. For the coding, thematic analysis was applied, which is a flexible method that can be used to identify patterns of themes in the interview data [20]. Instead of counting the occur-rences of words, we tried to explore a broad range of themes that are relevant for managing user expectations of a chatbot, with the ultimate goal to find general solutions for conveying capabilities. In the last interview, theoretical saturation was achieved.

3.1.4 Findings & requirements.The coding of the interviews is shown in Appendix B. As result, we were able to derive a set of non-functional requirements, which was used for the design of the first version of the chatbot (see Table 2). This section will refer to the non-functional requirements (1-17n) to explain how they were derived from the interviews and to the experts (E01-04) to indicate what the contribution of each expert is to the results.

According to the interviews, chatbots have the potential to an-swer questions directly and help users further through a natural and personal conversation. They are available at popular and fa-miliar platforms and have the potential to reduce costs through automation. Additionally, chatbots are 24-hours per day available in contrast to humans. All experts stated that NLP and other AI technologies are still underdeveloped and unaccessible to B2C com-panies, which results in that the intelligence of the average chatbots is rather low (02n). However, experts confirmed that user expecta-tions are dramatically higher; users expect that chatbots are just as intelligent as humans (E01,02,04), and think that chatbots have knowledge about every domain (E01,03,04). All experts mentioned

that in the worst case, users expect that they are talking to a human, which eventually leads to frustration.

For these reasons, the experts stated that it is important to com-municate the intelligence of a chatbot. Although applying a person-ality in a chatbot can increase trust through a personal experience (E04), it is important to remember that anthropomorphism can also increase the expectations of users (E01,04). For this reason, two experts suggested to use a name (03n; E04) and an avatar (04n; E05) that conveys that the chatbot is not human. Three experts suggested to use a formal tone in task-oriented situations (05n; E01,03,04), and one expert suggested to use a consciously incompetent tone otherwise (06n; E04).

During the UX test, several ways of conveying capabilities were explored. One of these ways is applying an introduction. Experts suggested that an introduction can communicate the incompe-tence of the chatbot (E03,04; 07n), convey the possible actions (08n; E01,03,04) and show the added value of the chatbot (09n; E04). However, three experts indicated that the messages that the chatbot sends must be short, as users will immediately skip long descriptions (11n; E01,02,04). Furthermore, the messages must use terms consistently, to make the chatbot easier to understand (12n; E01). Another promising solution that was mentioned is error man-agement. Errors are initially hidden and thus do not require any space. When the user gets stuck, an error can show the possibilities and provide a way out of the current conversational flow (17n).

Another way to convey capabilities is through the use of GUI elements. Three experts argued that in some cases, GUI elements such as images can convey more information than text alone (13n; E02,04,04). Predefined answers and other buttons can guide the user quickly and easily (14n). Besides, predefined answers can force users to read the information provided before, as they need to make a choice based on that information (15n; E04). However, buttons have the disadvantage that they can limit the user. For this reason, one of the experts suggested to use a combination of buttons and textual input (16n; E01).

Finally, all experts indicated that it is important to start with a scope for a chatbot that is as small as possible, to make sure that the chatbot is able to process the most common user intents for that particular domain, and delivers the right information (01n).

3.2 Use case & requirements

To give our design a direction, we had to define the domain and the scope of our chatbot. We chose to implement a chatbot for Bol.com, a Dutch web shop, for several reasons. First of all, Bol.com has a professional relationship with INFORMAAT. We believed that this gave us better access to information and that the chatbot might eventually be of use for Bol.com. Secondly, the domain is task-oriented, which means that the chatbot does not have to respond with one answer, but have actual conversational flows. However, we wanted to make the scope of the chatbot not too large. This is one of the non-functional requirements of our chatbots, since the expert interviews showed that the chatbot would otherwise be too complex (see Table 2). Thirdly, the e-commerce domain is an interesting case, as research shows that the UX of mobile e-commerce sites is bad [1] and chatbots appear to be a mobile-friendly solution [11]. 4

(6)

Table 2: Non-functional requirements

No. Requirements (The chatbot...)

01n ...performs tasks in a domain that is not too large 02n ...has limited intelligence

03n ...uses a name to indicate that the chatbot is inhuman 04n ...uses an avatar to indicate that the chatbot is inhuman 05n ...applies a more formal tone in task-oriented situations 06n ...applies a consciously incompetent tone otherwise 07n ...applies an intro to introduce its personality 08n ...applies an intro to convey possible actions 09n ...applies an intro to show its added value 10n ...applies an intro to configure the chatbot 11n ...sends short messages to convey capabilities 12n ...uses consistent wording over elegant wording 13n ...uses visual elements to convey specific information 14n ...uses buttons to convey capabilities and fast forward 15n ...uses predefined answers to focus on a message 16n ...uses textual input to provide freedom to the user 17n ...uses error management for feedback and escapes

To decide which tasks the user could perform, we created a user scenario that maps the different tasks that users perform in the customer journey of Bol.com, and revealed several points where a chatbot can add value to the customer journey (see Appendix C). The customer journey was informally confirmed by two users of Bol.com. The customer journey shows that we assume that the chatbot can add value through: 1) assisting the search and payment of a product, 2) sending information through messages, for example about the status of an order, 3) viewing orders directly without log-ging in and 4) delivering a personal experience to make situations more pleasant, for example when the user has to return a product and needs information.

To keep the scope small, we had to refine the scope of the chatbot. We decided to make the chatbot offer only one product. We chose to make the chatbot assist users with the finding of a television, which is a product with many properties such as screen size, frame rate, color etc. Furthermore, we wanted to narrow the scope of the tasks as well. We wanted to choose tasks which are closely related and chronologically ordered in the customer journey, to support engagement in the UX tests and make the user flow as smooth as possible. We designed five main tasks: 1) searching for televisions, 2) saving televisions to view later, 4) ordering a product and 5) viewing product information. Each of these tasks contains several subtasks. These task form the functional-requirements for our chatbot, which are shown in Table 3.

3.3 Personas

Personas are a design technique that is used in the field of human-computer interaction. They have the potential to make assumptions explicit, which supports making design decisions [24]. For this re-search, two personas were designed: one persona to give an example of a potential user and one persona to define the personality of the chatbot. Before the design of the user persona, the target audience was defined according to demographics of large e-commerce sites.

Table 3: Functional requirements

No. Requirements (The user...) Task

01f ...is guided with finding a TV Main 02f ...is able to specify requirements for a TV Sub 03f ...is able to choose from multiple TVs Sub 04f ...is able to save interesting TVs Sub 05f ...is able to view product information Main 06f ...is able to view product specifications Sub 07f ...is able to view product reviews Sub 08f ...is able to view delivery information Sub 09f ...is able to view stored TVs Main 10f ...is able to manage the stored TVs Sub

11f ...is able to order a TV Main

12f ...is able to choose a payment method Sub 13f ...is able to get an overview of his/her orders Main 14f ...is able to view when his/her order arrives Sub 15f ...is able to view where his/her order arrives Sub 16f ...is able to view where his/her order currently is Sub 17f ...is able to specify notification preferences Sub

3.3.1 User group.According to a report of Verto Analytics [16], the main consumer group that shops online has the age be-tween 18 and 75 (95 percent). There is not much difference bebe-tween Millenials (ages 18-34), GenX (ages 35-54), and Boomers (ages 55-74) regarding e-commerce population. GenX is the largest group, as it accounts for 34 percent of the total online shopping population. Furthermore, there are several factors that influence online shop-ping behavior. First of all, research shows that there is a gender inequality; women tends to be more sensitive to emotion, trust and convenience than men [25], while for men ease of use is more important [25]. Finally, Hashim et al. discuss that more technical users prefer online shopping, which are mostly from the younger generation [13].

Although there are several factors that influence online shopping behavior, we decided that we wanted to design our chatbot for both men and women, independently of their technical skills. Therefore the target audience was defined as people who shop online between

18 and 75. Although the target audience is broad, we wanted to

give a single example of a potential user that might want to use our chatbot, to make the goals and needs of such a potential user explicit. This persona is shown in Appendix D.1. We imagined that the typical user of our platform is someone who needs guidance with finding a television, because he/she does not know what prop-erties are important. Besides, as described in the expert interviews (Section 3.1), chatbots can offer a personal experience, for which this persona is sensitive.

3.3.2 Chatbot.Another persona was designed to make the appearance and the tone of the chatbot explicit. The persona of the chatbot is shown in Appendix D.2. As shown in the non-functional requirements (see Table 2), we wanted to apply a more formal tone in task-oriented situations (06n). In other situations, we wanted to apply a consciously incompetent tone to make the dialog more natural, without creating too high expectations. Additionally, we wanted to give the chatbot a name (04n) and avatar (05n) that 5

(7)

indicates that the chatbot is not human. For this reason we used the name ’Bol.com TV-bot’, which also reveals what the domain is. For the avatar, we used the mascot of Bol.com, to express the brand and lower the user expectations.

3.4 Platform selection

Before the design of the chatbot, we had to choose a framework to build the chatbot and a platform to deploy the chatbot to. These had to be carefully selected, since each framework allows integrations with different platforms, and each platform offers different function-alities. First of all, based on the non-functional requirements (see Table 2), we wanted to select an accessible framework that supports basic NLP, since the average intelligence of chatbots is rather low (02n). In general there are two types of frameworks available: 1) NLP platforms such as Microsoft’s Bot Framework, IBM Watson9 and Dialogflow, and 2) more limited drag and drop platforms such as ChatterOn10_{and Chatfuel}11_{. We chose to develop the chatbot} in Dialogflow, which appeared to be less complex than the other NLP platforms, but offered more flexibility than the drag and drop frameworks. With Dialogflow, developers can define user intents and follow-up messages easily. After defining example questions that correspond with the intents, Dialogflow uses NLP and machine learning to automatically learn when to trigger what intent. As discussed in Section 2.2, Dialogflow is only able to take the context of messages into account to a certain extent, since it uses fixed con-versational flows as an alternative to the understanding of context. Therefore, we can assume that the intelligence of Dialogflow is rather limited. Furthermore, Dialogflow offers an integration for most platforms such as Facebook Messenger, Slack12, Twitter13, Skype14, Telegram15and Kik16. Of these platforms, Facebook Messenger was chosen to deploy the chatbot to, since it offered the most functionalities, such as predefined answers, buttons, media templates, carousels and more. Besides, Facebook Messenger is also the most popular platform for chatbots today [4] .

3.5 First iteration

3.5.1 Design.The design of the chatbot is based on the func-tional requirements (See Table 3) and the non-funcfunc-tional require-ments (See Table 2). The functional requirerequire-ments define the bound-aries of the chatbot (domain, tasks etc.). The non-functional re-quirements define principles that are used for the actual design of the chatbot (appearance, tone, solutions to convey capabilities etc.). This section explains the different design decisions for the chatbot of the first iteration, for which it will refer to the non-functional requirements (1-17n) to explain which design decisions are based on what non-functional requirements. In the text below, the design decisions are categorized as ’solutions’. For example, one of the solutions is ’error management’.

9_{https://www.ibm.com/watson/} 10_{https://www.chatteron.io} 11_{https://chatfuel.com/} 12_{https://slack.com/} 13_{https://www.twitter.com/} 14_{https://www.skype.com/} 15_{https://telegram.org/} 16_{https://www.kik.com/}

Figure 2: Iteration 1: Introduction

As discussed in Section 3.4, the chatbot has limited intelligence since it uses basic NLP to recognize user intents and uses conversa-tional flows as an alternative for taking context into account. We designed our chatbot in a way that it uses a single conversational flow for each main task (see Table 3). Most of these tasks are di-rectly accessible through textual input. For example, users are able to navigate from viewing their saved televisions to searching for another television. However, in several conversational flows the subtasks (and even two other main tasks) depend on the context of that conversational flow; users can only view product informa-tion (main task) if they select a television, and users can only ask questions about the television (sub task) when they are viewing product information.

Introduction. The first iteration starts with an introduction, in

which the chatbot explains that it is still unintelligent, to lower user expectations (07n). Furthermore, the introduction indicates that the chatbot can help the user with finding a television (09n), which is shown in Figure 2. Finally, the chatbot asks in the introduction if it should send messages to inform users when their order arrives (10n).

Animation. In the introduction, an animation explains the main

tasks (see Table 3) that a user can perform (08n). This animation is shown in Figure 2. When a user starts performing a main task, a more detailed animation is shown to explain the subtasks that a user can perform within the conversational flow of that main task. For example, one of the main tasks is finding a television through several questions, and a subtask is saving a television to view later (see Figure 3).

Naming the possibilities. The expert interviews show that it is

important to be transparent about the possible actions that users can perform and how to perform them. Because of the different possible actions of each conversational flow, it might be even more important to clearly indicate in which situation what actions are possible. For this reason, the chatbot explicitly states possible actions on different 6

(8)

Figure 3: Iteration 1: Searching a television

locationsÐin the animations (as described above), but also in the text (11n). Examples of situations where the chatbot explicitly states possible actions through text are shown in Figure 4 (left) and in Figure 5.

Hierarchy of information. For the main task ’viewing product

information’, a hierarchy of information had been designed. A tele-vision has several features, such as the different specifications, re-views, delivery information, etc. However, the dialog would become unclear if all the possible actions would be listed in one message. For this reason, the chatbot initially shows only the first layer of possible actions that the user can perform (see Figure 4, left). When the user performs one of these actions, the chatbot suggests more relevant actions (see Figure 4, right).

Figure 4: Iteration 1: Hierarchy

GUI elements. Besides the animations, GUI elements such as

carousels, images (13n) and predefined answers and other buttons (14n) are applied to convey information and make the user perform actions efficiently. For example, predefined answers are applied in the introduction to guide the user quickly to the search of a televi-sion and to put focus on the instructions (15n). Furthermore, they make the selection procedure of a television technically feasible; the chatbot would not be able to process every user demand via textual input and thus provides a limited amount of options instead (see Figure 3). In the end, a visual shows the current location and the time and place of arrival of an order.

Error management. The chatbot sends an error message if the

user asks a question that the chatbot does not understand. The error does several things: 1) it indicates the boundaries of the con-versational flow in which the user is on that moment, 2) it states the possible actions that the user is able to perform in the conversa-tional flow and 3) it provides a way out of the conversaconversa-tional flow (17n). Although users can simply leave the conversational flow by navigating to another main task via textual input, it might be the case that users do not know that they can leave the conversational flow, do not know how to access another main task or do not know that a specific main task exists in the first place. In these cases, error management might be a good solution (see Figure 5).

3.5.2 User experience test.To measure the UX of the chatbot in the first iteration, a UX test is designed that is shown in Appendix E. Since the UX tests are conducted in Dutch, this section provides examples of the tasks and questions that participants receive.

The UX test consists of two main parts. The first part contains sev-eral tasks that the user needs to perform, both the main tasks and un-derlying subtasks, which are defined in the functional-requirement (see Table 3). This Section will refer to the functional requirements (01f-17f) to indicate which task corresponds with what require-ment. For example, one of the main tasks is that the participant must search for a television (01f) with the scenario that he/she has a low budget and that he/she wants a small television. An underlying subtask is that the participant must save two interesting televi-sion (04f), which can only be performed inside the conversational flow of that main task. Another main task is that the participant must view their orders (13f) and a corresponding subtask is that the participant must view the status of their last order (14,15,16f). During these tasks, questions are asked what the user thinks that the capabilities of the chatbot are. Although the first part mainly addresses usability, the UX test also takes the emotions of users into account, which is an important aspect of the UX [28].

After the completion of the tasks, several additional questions are asked, which addresses the total UX of the chatbot. These ques-tions are based on the model of Nielsen Norman Group Conference in Amsterdam in 2008 [2], which subdivides UX into 1) utility, 2) usability, 3) desirability and 4) brand experience. Examples of addi-tional questions are ’To what extent do you think that the chatbot is easy to use?’ (usability) and ’Is there anything that you did not like about the chatbot?’ (desirability) Both the tasks and the additional questions are semi-structured, which is well-suited for exploratory research [18]. This ensures that the tasks do not have to be per-formed in a specific order and thus the user flow is disrupted as little as possible.

(9)

Table 4: Participants first iteration

No. Gender Age Occupation

P01 Male 20 Student Product Design P02 Female 50 Management supporter P03 Female 24 UX designer

P04 Male 29 Interaction designer P05 Female 45 Senior UX-recruiter

The chatbot is tested in the Facebook Messenger application17 on an Apple iPhone SE18. During each UX test, the participant sits individually besides an observer. The observer uses probing techniques to get in-depth information about the UX of the different implemented solutions (see Section 3.5.1). Futhermore, the observer uses prompting techniques to lead users towards the right location after they successfully perform a tasks or after they get stuck. Here it is important not to use terms that indicate what the solution is, and to apply both the techniques consistently across the different UX tests.

3.5.3 Participants.For the UX test of the first iteration, 5 par-ticipants were selected with non-probability sampling; parpar-ticipants were selected based on their availability, gender, age and profes-sion, since we wanted to represent the broad target audience of Bol.com (see Section 3.3.1). According to Nielsen and Landauer [21], 5 participants are enough to discover 85% of the usability problems. The more iterations, the more problems can be detected and the better the design will be. Although we do not test for usability only, it is plausible that this principle does not apply for other aspects of UX. This limitation is discussed in Section 4.1. However, since this research only seeks to explore different solutions to convey capabilities, we make the assumption that 5 participants are enough. For this research, the names of the participants have been left out. An overview of the participants is shown in Table 4.

3.5.4 Data collection.During the UX tests, the voices of the participants and the screen of the mobile device were recorded. Participants were asked to think aloud; explain what they were thinking and feeling and why this is the case. Charters mentions that there are two types of think-aloud methods, the retrospective method (thinking aloud after the test) and the concurrent method (thinking aloud during the test) [7]. According to Charters, each of these methods has its own disadvantages; the retrospective method depends on participants’ incomplete memory, while the concurrent method might interrupt the user flow. We chose to apply the con-current method because we wanted to be independent from the memory of participants and because we do not make use of quanti-tative measures such as completion time. After each UX test, the audio was transcribed and coded in combination with the videos. For the coding, thematic analysis was applied to identify patterns of themes [20]. We consider this as a more flexible method to general-ize the expressions of participants, which are ambiguous according to Charters. For the same reason, Charters states that it is important that own inferences must be made by the observer.

17_{https://itunes.apple.com/us/app/messenger/id454638411} 18_{https://www.apple.com/nl/iphone-se/}

3.5.5 Findings.In this section, the results of the first UX tests are discussed. The coding of the transcripts of the first UX tests is shown in Appendix G.1. Since the UX tests were conducted in Dutch, the quotes in this section are translated from Dutch to English. This section will refer to the participants (P01-05) to indicate what the contribution of each participant is to the results. During the UX tests, four participants did not know how to perform at least one of the main tasks (P02,03,04,05). Below, the different solutions and the corresponding observations are described.

Introduction. Two participants made statements about the

indica-tion that the chatbot is unintelligent. While one of the participants liked the indication because "it lowers the expectations" (P03), the other participant reacted irritably that she wondered what the util-ity of the chatbot is since it is unintelligent (P05). Contradictory, the latter participant started typing a question that was too complex for the chatbot to understand, without following the introduction. Lastly, two participants did not like the question if the chatbot should send messages to inform them when their order arrives, since they wanted to start searching for a television, and did not want to answer questions (P04,05).

Animation. Several problems were observed that were related

to the animations. Only three participants read at least one of the animations (P01,02,03) while four participants did not read at least one of the animations (P01,03,04,05). Two participants stated that they would preferably act directly without reading text and animations at all. In some cases, participants read the animations, but forgot the instructions later in the process (P03,04). Perhaps one of the reasons why the participants did not read the animations is that the animations were perceived as too small, too quick, and uncontrollable. Three of the participants even thought that the animations were spam (P03,04,05).

Naming the possibilities. Just like the animations, the text was

skipped or forgotten by participants (P02). Furthermore, there were problems observed that were due to the way of how possible actions were stated. This was clearly visible in the hierarchy of informa-tion, in which actions were described with highly specific examples. Firstly, this gave participants the idea that terms should be described specifically. For example one participant wondered "what the spe-cific code" was to perform an action (P01). Secondly, the spespe-cific examples gave participants the idea that the possibilities were the only possibilities (P01,02,03,05). Instead, other information could be requested as well. This ensured that participants did not know how to navigate to another main task since they were not aware that this was also a possibility. For this reason, two participants indicated that it should be more clear what the participant can type (01,04,05). However, available space is a problem since four partici-pants described in different situations that the received messages cost too much space and made the chatbot unclear (P01,03,04,05).

Hierarchy of information. Another problem that occurred in the

hierarchy of information is that information is partially hidden. It was observed that all participants had no idea what information they could ask besides the information that they were requested to find in the UX test. One participant formulated this as follows: "Sometimes I would not even know what I want to know, and in 8

(10)

this case a list that tells me what specifications are, would be a better solution" (P04).

GUI elements. For each participant, buttons offered a clear and

efficient solution to perform subtasks. Participants quickly knew what the possible actions were and what actions led to the comple-tion of the given assignment. In each UX test, situacomple-tions occurred in which participants chose to scroll above and use buttons from earlier in the conversation, instead of that they performed a subtask directly via textual input. This is in line with the finding that par-ticipants would rather click than type (P04,05). For example, one of these participants had the idea that in the hierarchy of information he "would be more efficient with click-actions than with answering questions" (P04). Fourthly, all the participants of the first UX tests indicated that they liked the visual in the end that shows delivery information. Fifthly, participants indicated that they would rather compare televisions in a graphical overview, instead of requesting information separately (P02,03,04). Lastly, four participants tried to find a solution in Facebook Messenger’s GUI menu when they got stuck (P02,03,04,05).

Error management. Another solution that appears to be effective

is error management. In all five UX tests, errors led to the successful completion of at least one task. In two of these UX tests, participants had an incorrect idea about what action could be performed in a specific conversational flow; they thought for example that they could only perform a tasks when they were outside the conversa-tional flow (P02,04). In these cases, error management provided a way out, even though they could have performed the task directly via textual input.

Overall user experience. In the additional questions, two

partici-pants explained that the usability of the chatbot was significantly lower compared to the usability of the current Bol.com website (P01,04). None of the participants gave an indication that usage was efficient. Two participants even mentioned that they would have quit the chatbot in a ’real’ setting (P01,04). According to a partici-pant, one of the reasons is that "the switch to a chatbot is a learning process that you do not want", and said that for him "it would be faster to apply filters and sort products" (P01). Furthermore, several problems regarding the utility and desirability of the chatbot were described. First of all, one participant felt like being forced into a direction (P04). Secondly, the issue of trust emerged. Participants questioned the independence of chatbots, and two participants did not trust the chatbot for large expenses. One participant said the following: "Well, would I trust that, especially, for such a big expense? ... I would not do that quickly." (P05) Furthermore, one participant would have preferred if the chatbot was more natu-ral (P01). However, there were sevenatu-ral positive aspects mentioned concerning the utility and desirability as well. Three of the five par-ticipants acknowledged that the chatbot guided them in a personal way through the process of finding a television (P01,03,05). Three participants saw receiving messages as an advantage (P01,02,03). One participant mentioned the advantage that tasks are directly ac-cessible (P04) and another participant mentioned that the chatbot is more mobile-friendly in comparison to a website (P03). Finally, one participant described that "it would be useful if the chatbot could

Figure 5: Iteration 1: Error management

really answer questions like has the television the same features as the LG?" (P04).

3.5.6 Evaluation and improvements.The results of the UX tests of the first iteration show that the usability of the chatbot is low; in several situations participants did not know how to navigate to a main task and got stuck. Furthermore, they described the chat-bot as inefficient compared to the website of Bol.com. After the UX tests, several usability problems were found. First of all, participants tend to skip content, such as text and animations quickly, since they would rather attempt to perform tasks directly than spend time on reading and interpreting content. However, even if users view the instructions in the text and/or in the animations, it is likely that users forget these specific instructions. Secondly, the results indicate that these specific instructions give users the feeling that 1) the possibilities are the only possibilities for that specific situation and 2) the question should be formulated just as precise as in the examples. This indicates that explicitly naming the possibilities gives users the feeling of limitation. Thirdly, although users are able to explore the hierarchy of information, the possibilities are initially hidden for the user. This causes a gulf of evaluation; users are unable to interpret the possibilities which might lead to frustra-tion. Finally, a small problem was found in the introduction, since users did not like the question to receive order information, since they wanted to start searching for a television instead of answering questions.

According to the results of the first UX tests, two solutions seem to be highly efficient: GUI elements and error management. In sev-eral situations, graphics are a desired method to convey information. Furthermore, buttons can make navigation clear and efficient. When the chatbot does not understand the input of the user, error man-agement might be a good solution. Errors can give users feedback about the current conversational flow and the possible actions, and provide a way out of the conversational flow. Furthermore, the solution initially does not require any space, which appears to be 9

(11)

an important aspect in the design of a chatbot; users want as much information as possible without receiving too many messages.

According to the outcomes of the first iteration, a major change has been suggested. Instead of relying on textual input, we suggest to rely on GUI elements to navigate. Buttons can provide a clear and efficient way to navigate from one task to another. Besides, users do not have to remember the different tasks and how to perform these tasks. This might prevent that users get stuck because they do not know how to navigate from one task to another. Buttons can also make navigation in the hierarchy of information more efficient. However, this does not mean that one should not have the possibility to type; it can still give users the option to make demands more specifically and directly. Although it is still not possible to convey all the possible tasks in the hierarchy, buttons might make navigation more efficient and stimulate exploration. Lastly, it is suggested to move the question to receive order information to a more relevant location, for example after the user orders a new television.

3.6 Final iteration

3.6.1 Design.Based on the evaluation of the chatbot of the first iteration (see Section 3.5.6), three main changes were made in the final iteration. These changes are described below. An overview of the final design of the chatbot is shown in Appendix F.

Introduction. Only few changes were applied in the introduction.

One of these changes is that the question the question to receive order information is now asked after the users orders a television, instead of in the introduction.

GUI elements. More predefined answers and other buttons were

applied than in the first iteration. In contrast to the chatbot of the first iteration, the chatbot in the final iteration relies on GUI elements as the main form of navigation.

Hierarchy of information. In the hierarchy of information, users

now have the possibility to navigate through the use of predefined answers (see Figure 6, left). These predefined answers might make navigation more efficient and stimulate exploration. Besides, users are also able to leave the conversational flow by clicking on a predefined answer with the label ’help’. This activates the help function.

Help function. Since we wanted to give users the option to

navi-gate through the use of buttons, we had to find a solution to make these options always available. For this reason, we designed a help function (see Figure 6, right), which can simply be activated by typ-ing the command ’help’. This way, the help function initially does not require any space and users only have to remember one simple command. The help function is explained in the introduction.

Error management. The implementation of the help function

also affected the error management. Instead of providing different possible actions that the user can perform inside a conversational flow, we chose to let the errors simply refer to the help function. An example of an error is: ’Sorry, I do not understand what you mean! If you need help, you can type "help" at any moment. How much are you willing to spend?’

Figure 6: Iteration 2: Main changes

Naming the possibilities. In addition to offering buttons, we still

wanted to provide users the option to specify demands more specif-ically and directly. For example, if the user wants to start searching again, the user can simply type something like "Can you find me another television?". Instead of specifically mentioning the possibil-ities, the final version of the chatbot provides examples of questions. This might give users the feeling that they can also ask other related questions that are not mentioned (see Figure 6, right).

3.6.2 User experience test.In the final iteration, the exact same UX test was used as in the first iteration (see Section 3.5.2).

3.6.3 Participants.The same sampling method was used in the final iteration as in the first iteration (see Section 3.5.3). Fur-thermore, we intended to use the same distribution and variety of participants. The participant of the final iteration are shown in Table 5.

3.6.4 Data collection.The exact same data collection meth-ods were used in the final iteration as in the first iteration (see Section 3.5.4).

3.6.5 Findings.In this section, the results of the final UX tests are discussed. The coding of the transcripts of the UX tests is shown in Appendix G.2. Since the UX tests were conducted in Dutch, the quotes in this section are translated from Dutch to English. This section will refer to the participants (P06-10) to indicate what the

Table 5: Participants final iteration

No. Gender Age Occupation

P06 Male 21 Student Mec. Engineering P07 Female 22 Student Pedagogy P08 Male 40 Design Consultant P09 Male 59 Principal consultant P10 Female 35 Senior UX designer 10

(12)

contribution of each participant is to the results. During the UX test, two participants did not know how to perform a main task, which is twice as less as in the first iteration (P07,09). Below, the different solutions and the corresponding observations are described.

Introduction. In agreement with the first UX tests, the opinions

were divided about the indication that the chatbot is unintelligent. Two participants liked the indication (P07,09), while one of these participants also mentioned that "if you would normally receive something like this you would think ’oh okay, what is the point of this?’" (P07). Since the question if the chatbot should send messages to inform participants when their order arrives, was now asked after the participants orders a television, none of the participants were annoyed anymore. Three of the five participants chose to receive the messages (P06,07,08). None of the participants started typing without following the introduction and none of the partici-pants asked a question which was too complex for the chatbot to understand.

GUI elements. In agreement with the first UX test, buttons offered

a clear and efficient solution to perform subtasks. All participants knew quickly how to perform tasks through the use of buttons, and preferred to scroll above to activate buttons from earlier in the conversation. Three participants mentioned that they liked the visual in the end that shows delivery information (P06,08,10).

Hierarchy of information. In comparison to the first iteration

predefined answers were also applied in the hierarchy of informa-tion, with as result that all participants found the right information and did not mention the problem anymore that information was hidden. Despite the more efficient navigation, two participants still suggested to apply a graphical overview to compare televisions (P08,10).

Naming the possibilities. As discussed in the final iteration,

par-ticipants still have the possibility to typeÐto specify demands more specifically and directlyÐfor which example questions are given. During the UX tests, participants asked several questions. Most of these questions were asked in the style of the examples (P06,07,08,10). Only one participant asked the exact question that was suggested (P10) and three participants used commands to nav-igate to a main task (P06,07,10). However, only two of the five participants tried to ask additional information in the hierarchy of information that was not accessible via buttons (P06,07). This can be explained by the fact that participants would rather click than type; even after the implementation of buttons as the main form of navigation, two participants indicated that they would prefer if the example questions were clickable (P06,09). One user mentioned that the terms that she used when typing was influenced by terms that were used on the buttons (P10). Lastly, only two participants asked a question that the chatbot did not understand. Although the combination between textual input and buttons seem efficient, two participants did not know how to perform a main task (P07,09). Namely, these two participants did not know that they could type to navigate. This means that the participants did not read the in-structions or forgot the inin-structions, which is in agreement with the outcomes of the first UX tests.

Help function. Another main problem that was observed was

related to the help function, since two of the participants initially did not know what the help function was for (P06,09). Furthermore, two participants thought that ’help’ is a not a good name to refer to a menu (P08,09). The same participants suggested to make the help function more accessible. However, when participants discovered the help function, either through an error (P07,09,10) or through clicking on a predefined answer with the label ’help’ (P06), partici-pants used the help function efficiently; three participartici-pants typed the help function to navigate quickly between main tasks (P06,08,09).

Error management. Only one participant found the right

infor-mation after an error (P09). However, this was mainly because participants did not get any errors during the UX tests. In the end of the UX test, the observer gave four of the five participants the order to perform an impossible tasks, after which an error was shown. All these participants activated the help function via the error to find a solution (P06,07,09,10).

Overall user experience. In contrast to the first iteration, four

of the five participants indicated in the additional questions that they thought that the use of the chatbot was simple and efficient (P06,07,09,10). Furthermore, participants did not mention the prob-lem that learning costs too much time anymore. However, in agree-ment with the first iteration several problems regarding the utility and desirability of the chatbot were described. Just like in the first iteration, one participant felt like being forced into a direction (P06). One participants questioned the independence of chatbots and wanted to know what happened with his personal information (P09). Furthermore, two participants stated that they preferred if the chatbot was more natural (P07,10). In line with this observation, two participants wanted that the chatbot expresses the Bol.com brand better (P08,09). Finally, one participant questioned the use case of the chatbot since "if you would transform a complete web-site into a chatbot, it would perhaps exceed the limits of necessity" (P10). She suggested to focus instead on providing the information that websites do not give, such as the differences between televi-sions. However, several positive aspects were mentioned as well. The most mentioned advantage (four of the five participants) was that the chatbot guided participants in a personal way through the process of finding a television (P06,07,08,09). Additionally, two participants stated that tasks were directly accessible through the chatbot (P06,10), and one participant mentioned that the chatbot is more mobile-friendly in comparison to a website (P09).

3.6.6 Evaluation.The results of the UX tests of the last iter-ation show that the usability of the chatbot is much higher than in the first iteration; in only two situations participants did not know how to navigate to a main task and got stuck. Furthermore, participants described the chatbot as simple and efficient. In the last version of the chatbot, three main changes were made (see Section 3.6.1), that have a positive effect on the usability of the chatbot. First of all, the question if the chatbot should send messages to inform users when their order arrives, is now asked after the users orders a television, which appears to be a more relevant location. This implies that it is important to ask information only when users see the relevance, since users want to fulfill their demands as quick and 11

(13)

with as little effort as possible. Secondly, in agreement with the re-sults of the first iteration, GUI elements and error management tend to be good solutions on which users can rely. In several situations, graphics are a desired method to convey information. Furthermore, buttons can make navigation clear and efficient. Although in the last iteration the need of error management reduced, the results of the final UX tests show that the errors are a good method to lead users to the help function, which can provide a way out to users. Furthermore, the help function is a good solution to give users the option to navigate to a main tasks at any moment; when the users discover the help function, they use it multiple times to navigate quickly between tasks.

Although the usability of the chatbot is perceived as good, there are still minor usability problems that can explain why two users did not know how to perform a main task. First of all, these users did not know that they could type to navigate. This means that the users do not read instructions or forget instructions, which is in agreement with the outcomes of the first UX tests. Another explanation might be that GUI elements tend to give users the expectation that everything must be solved with GUI elements. Secondly, several users discovered the help function late in the conversation. Although this can also be explained by the fact that users do not read instructions or forget instructions, perhaps a better explanation is that the name ’help’ is unclear for many users. This can be easily improved.

Besides the minor usability problems, the utility and desirability of the chatbot are still mediocre. Although the results show that a chatbot can offer a personal and mobile-friendly experience, the general approach of the chatbot should be perhaps reconsidered. As suggested, a potential direction for chatbots is towards a more fun, natural and personal conversation, with more distinctive informa-tion in comparison to the Bol.com websiteÐthe kind of informainforma-tion that a shop assistant provides.

4 DISCUSSION

In this research, we explored several solutions to convey capabili-ties inside an e-commerce chatbot for a better UX. We developed a chatbot with different user-centered methods. At first, we con-ducted expert interviews to explore potential solutions to convey capabilities. Subsequently, we implemented these solutions in our design, and tested the user experience of the chatbot. We used two iterations to evaluate and improve the user experience of the first iteration. An overview of the final design of the chatbot is shown in Appendix F. In this Section, we will discuss the findings of the UX tests.

First of all, the findings do not show what the individual influence of the introduction is on the expectations of the user. However, in the final version of the chatbot, users did not ask any questions that were far too complicated for the chatbot to understand. Since three users noticed that the chatbot indicated that it is unintelligent, it is plausible that the introduction helped in lowering the user expectations. A more important finding is that one should only ask information of the user when the user is able to see its relevance. In the first iteration, the question to receive order information was asked in the introduction. Because this was perceived as annoying, the question was in the second iteration only asked after the user

ordered a television. This resulted in the fact that users were not annoyed anymore.

The UX tests also show that when a chatbot has limited intelli-gence, GUI elements can improve the user experience of chatbots. In several situations, graphics are able to better convey information than text alone. Additionally, we suggest that one should rely on buttons as the main form of navigation instead of textual input. But-tons such as predefined answers are self-explanatory and perceived as more efficient than navigating with textual input. In contrast to textual input, buttons do not require additional instructions 1) what a user can type, 2) when a user can type this and 3) if it is the only thing that the user can type. This prevents that users feel over-whelmed by the amount of information and skip the instructions. By using buttons, the chatbot is less dependent on the memory of users and perceived as less obscure.

Furthermore, we suggest to make use of a menu to give users the opportunity to navigate to other tasks at any time. This menu can be called with a simple command, which should be logical and easy to remember.

The results also show that error management is an effective solu-tion to give feedback on the conversasolu-tional flow and its possibilities, and to provide a way out to the user. For example, a good method is to let errors refer to the menu to prevent that users get stuck. Error management initially does not cost any space, which is an important aspect when designing a chatbot for a mobile phone.

The fact that some users still prefer to type during the use of the chatbot indicates that the chatbot should still give users the option to make demands more specifically and directly. Besides, we suggest that one should give examples of questions instead of explicitly mentioning the possibilities, to show what kind of questions the chatbot is able to understand and to indicate that users can try other related questions as well. In case that users do not read these questions, this is not a problem because users can still perform their tasks through the use of buttons.

Although in the last iteration the usability of the chatbot was much higher than in the first iteration, opinions about the utility and desirability of our chatbot were divided. One of the problems was that participants questioned the use case. However, expert interviews and the UX tests show several potential directions for the chatbot to head in, for example keeping contact through personal messages, answering questions that are not being answered on a website etc. Another problems was that the chatbot was perceived as rather mechanical than natural, which is the result of a formal tone and the use of GUI elements. For this reason, participants of the UX tests suggested to make the chatbot more natural. However, one should be careful with anthropomorphism, since it can increase user expectations of both the intelligence and the capabilities of the chatbot [19]. For a chatbot with low intelligence, we suggest to apply the more mechanical approach.

Another way to reduce the gap between user expectations and the practical use of chatbots, is to increase the intelligence of chat-bots in a way that chatchat-bots are able to meet the high expectations of users. After all, by improving the underlying technology, we will be able to exploit the full potential of chatbots. Only recently, Google revealed Google Duplex19, a technology that according to Google

19_{https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html}