• No results found

The influence of the Nutri-Score in combination with interactive decision aids on the healthiness of food choices - A quantitative case study

N/A
N/A
Protected

Academic year: 2021

Share "The influence of the Nutri-Score in combination with interactive decision aids on the healthiness of food choices - A quantitative case study"

Copied!
57
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The influence of the Nutri-Score in combination with interactive

decision aids on the healthiness of food choices

- A quantitative case study

by Whitney Raskeyn 11-01-2021

University of Groningen Faculty of Economics and Business

Nettelbosje 2 9747 AE Groningen

Master thesis

MSc Marketing Intelligence & Management

First Supervisor: prof. dr. ir. K. van Ittersum Second Supervisor: prof. dr. T.H.A. Bijmolt

(2)

1

ABSTRACT

In a (western) world, where overweight and obesity are issues that are the order of the day, it is necessary to find solutions that can contribute to reducing the high overweight and obesity rates. Overweight and obesity are inextricably linked to an unhealthy diet. (Online) grocery stores play a key role when it comes to the (unhealthy) nutrition of consumers, because 65-75% of all food purchases in the Netherlands are made in supermarkets. Information overload and choice overload are two important problems that hinder consumers in making healthy choices in supermarkets. Online grocery stores in particular are able to mitigate the negative consequences of these problems, by facilitating consumers with Nutri-Score interactive decision aids (IDA’s). Therefore, online grocery stores could have an important role in improving the healthiness of consumers’ purchases.

In our study we found that the Nutri-Score IDA’s increase the healthiness of consumers’ purchases. Although the effect is small, we recommend the Ministry of Health, Welfare and Sports in collaboration with (online) grocery stores to invest in raising awareness for the Nutri-Score and the Nutri-Nutri-Score IDA’s among consumers. Nevertheless, more research is needed to investigate the generalizability of our results. Furthermore, research is needed regarding the visibility of the Nutri-Score IDA’s and their information button in order to reach as many consumers as possible and to be able to improve consumers’ understanding of the Nutri-Score. However, grocery stores might have to show some reluctance until further research has been done regarding side effects on the Nutri-Score IDA’s, since we found out that using these IDA’s has a negative effect on the number of items that consumers purchase.

(3)

2

PREFACE

In 2015, I started my academic journey at the University of Groningen, as a bachelor student in Business Administration. After 3.5 years, I finished my bachelor and a half year later I started as a master Marketing Intelligence & Management student. I have been interested in marketing since the first course I followed in marketing during my bachelor, but my interest took off when I came into contact with the marketing study association (MARUG) from the university. It was during one of the MARUG events that I found out about the role of big data and the Marketing Intelligence track the university offered. Because I was very curious about this branch of marketing, and I already was interested in the management track, I decided to follow both tracks. Almost 1.5 years later, I am very pleased with both tracks as I followed the courses with interest and pleasure. Further, I’m thankful for all the knowledge I have gained during my master and for all the people I have met during my study period in Groningen.

I want to thank my first supervisor prof. dr. ir. Koert van Ittersum, for the pleasant cooperation. Despite the fact that we had some setbacks regarding receiving the data, I had the feeling that Koert was working on it and that it was a priority to him, which reassured me. Next, I would like to thank David Olk, for his helpful and constructive feedback regarding the data part of my thesis. Moreover, I would like to thank my family, my mum, dad and sister, for always supporting me and believing in me. Further, I would like to thank my boyfriend Kevin, for all pleasant thesis conversations, the useful feedback I received and lastly for his unconditional support.

Writing this thesis, I learned a lot about the influence marketing can have on consumers’ health. This is a subject that really got my interest and that I really enjoyed working on. Therefore, I hope that you will enjoy reading my thesis.

Whitney Raskeyn,

(4)

3

Table of contents

1. Introduction ... 5

2. Theoretical background... 8

2.1 The importance of a healthy diet ... 8

2.1.1 The social impact of overweight and obesity ... 8

2.1.2 The role of supermarkets on the road to healthier nutrition ... 9

2.2 Theoretical framework ... 9

2.2.1 Nutri-Score ... 9

2.2.2 Interactive decision aids ... 10

2.2.3 Healthiness of food choices ... 13

2.2.4 Visibility of the Nutri-Score IDA’s ... 13

2.3 The hypotheses ... 14

2.3.1 Conceptual model ... 14

3. Data set description and preparation ... 16

3.1 Data collection ... 16

3.1.1 Description of the A/B test ... 16

3.2 Data cleaning and preparation ... 17

3.2.1 Merging data sets ... 17

3.2.2 Removing variables and sessions ... 17

3.2.3 Renaming variables ... 18

3.2.4 Setting boundaries ... 18

3.2.5 Creating new variables... 19

3.2.6 Final data set ... 20

3.3 Descriptive statistics... 20

3.3.1 Distribution of sessions... 20

3.3.2 Average Nutri-Score ... 20

3.3.3 Usage of IDA’s for the test groups ... 21

4. Research design ... 22

4.1 Plan of analysis... 22

4.1.1 Multiple linear regression model ... 22

4.1.2 Testing the assumptions of OLS ... 23

4.1.3 Final model choice MLR model ... 26

4.1.4 Count model ... 27

4.1.5 Challenges count model with Poisson regression ... 29

4.1.6 Final model choice count model ... 30

(5)

4

5.1 Results multiple linear regression model ... 31

5.1.1 Coefficients MLR model ... 31

5.1.2 Interpretation coefficients MLR model ... 32

5.1.3 Model performance MLR model ... 34

5.2 Results count model ... 35

5.2.1 Coefficients count model ... 35

5.2.2 Interpretation coefficients count model ... 36

5.2.3 Model performance count model ... 39

6. Discussion ... 40

6.1 Results explanation ... 40

6.1.1 Filtering and sorting on the Nutri-Score ... 40

6.1.2 Interaction effects visibility ... 41

6.1.3 Consumers’ actions and functions in the online grocery store environment ... 42

7. Conclusion & Managerial implications ... 44

7.1 Conclusion ... 44

7.2 Managerial implications ... 44

7.3 Limitations and future research ... 46

Appendix 1 ... 48

Appendix 2 ... 49

Appendix 3 ... 50

(6)

5

1. Introduction

Overweight and obesity are a globally increasing problem since 1980 (Lehnert, Sonntag, Konnopka, Rieder-Heller & König, 2013). This is a consequence of a diet with relatively high levels of trans fatty acids, saturated fatty acids, salt and sugar that people consume on a daily basis (WHO, 2020). Currently this leads to major public health challenges in the Western World (Malik, Willett & Hu, 2013), including in the Netherlands. In 2019, the ‘Centraal Bureau voor de Statistiek’ (CBS) and the ‘Rijksinstituut voor Volksgezondheid en Milieu’ (RIVM) published a report about increasing overweight rates among Dutch people (Volksgezondheid en Zorg, 2020). The report states that 50.1% of the adults are dealing with overweight. In total, 14.7% of the Dutch adults have obesity. In comparison with 30 years ago, the percentage of people with obesity have doubled in the Netherlands.

The increasing overweight and obesity rates lead to several diseases and serious health issues, like type 2 diabetes mellitus and heart and vascular diseases among other things (Martin-Rodriguez, Guillen-Grima, Martí & Brugos-Larumbe, 2015). As a result, these health issues lead to increased healthcare utilization costs and several other societal costs. These societal costs are mainly captured in costs resulting from a lack of productivity of people with overweight (Lehnert, et al., 2013). According to different European studies, healthcare costs in Europe related to overweight and obesity were a decade ago already between 1.9% and 4.7% of the total annual healthcare expenses (Müller-Riemenschneider, Reinhold, Berghofer & Willich, 2008; von Lengerke & Krauth, 2011). Since overweight and obesity rates have increased in the past decade, it is assumed that the percentage of total annual healthcare expenses have only increased over the past years (Lehnert et al., 2013).

The grocery industry plays a key role when it comes to nutrition, since 65-75% of all food purchases in the Netherlands are made in supermarkets (Rabobank, 2019). Consumers have to deal with both information overload (Boer, et al., 2017) and choice overload (Schwartz, 2004) in order to detect healthy food options in supermarkets. Information overload concerns the overwhelming amount of nutritional information from different parties that consumers have to deal with on a daily basis (Boer et al., 2017). For example, nutritional information from food blogs, diet gurus and social media, among others (Evers & Carol, 2007). Choice overload concerns a situation where consumers face too many choice options in a certain environment (Schwartz, 2004).

(7)

6

consumers and eventually to indifference when it comes to picking healthy nutrition (Boer et al., 2017). Supermarkets can respond to information overload by providing clear nutritional insights at a glance, for example with the help of the Nutri-Score. The Nutri-Score is a food choice label (see Figure 1) at the front of a package, that shows the healthiness of a product (Julia & Hercberg, 2017). The logo resembles a traffic light scale, where the color dark green or score A considers a product to be healthy, while the red color or score E represents the least healthy option.

Consumers have to deal with choice overload in supermarkets, because they have limited knowledge or resources, like time (Hinson, Jameson & Whitney, 2003). Due to e-commerce, the range of choice options has only increased over the past years (Breugelmans, Köhler, Dellaert & de Ruyter, 2012). The problem with too many choice options is that it leads to the so-called ‘Paradox of choice’ (Schwartz, 2004). This refers to a situation where having more choice options is no longer satisfying and where more options lead to several negative consequences for consumers’ psychological state and emotional well-being. Online supermarkets in particular, are able to mitigate the negative consequences of choice overload, with the help of interactive decision aids (IDA’s), like filter and sort options (Dawes & Nenycz-Thiel, 2014). IDA’s can guide consumers actively in their decision-making process in the online environment (Aksoy, Bloom, Lurie & Cooil, 2006), which simplifies the choice process.

In this research we will focus on analyzing the healthiness of consumers’ purchases in an online grocery store environment by researching the influence of IDA’s that are related to the Score. Despite the fact that research has already been done on the effects of the Nutri-Score on consumers’ purchases in physical stores (Julia & Hercberg, 2017; Dréano-Trécant et al., 2020), little is known about the effects of the Nutri-Score in an online shopping environment. Since Nielsen (2017) predicts that 70% of all consumers will do their grocery shopping online in 2024, it is necessary to expand research regarding the effectiveness of the Nutri-Score to an online setting. Therefore, we will research if the Nutri-Score, combined with IDA’s, can contribute to an improvement of the healthiness of consumers’ purchases in online grocery stores. This leads to the following research question:

RQ 1: What is the effect of the Nutri-Score filter and sort IDA’s on consumers’ food choices?

(8)

7

options have a higher chance of being chosen by consumers. Therefore, we will research the effect of making the Nutri-Score IDA’s visually more present. Not alone allow the Nutri-Score IDA’s consumers to find (healthy) products more easily, they also provide information about the interpretation of the Nutri-Score. Consumers have the possibility to click on the information buttons from the IDA’s, which navigate them to a special created webpage with information about the Nutri-Score. Since the Nutri-Score is a relatively new concept in the Netherlands, it is possible that some consumers do not understand how they should interpret the score. Although research in France pointed out that in total almost 90 percent of consumers understand how they should interpret the score, this still means that more than 10% of consumers have difficulties with the interpreting the Nutri-Score (Egnell et al., 2018). Navigating to this information webpage is essential for this group to be able to understand how the scores should be interpreted. In order to make sure that consumers pay attention to these IDA’s and their info buttons, it is necessary to change consumers' regular shopping environment to obtain a change in consumers’ shopping behavior (Combris, Bazoche, GiraudHeraud & Issanchou, 2009). This leads to the following research question:

RQ 2: What is the effect of increased visibility of the Nutri-Score filter and sort IDA’s on the relationship between using the Nutri-Score IDA’s and consumers’ food choices?

The aim of this study is to explore Nutri-Score IDA’s, to find a possible solution that can contribute to improving the healthiness of consumers’ purchases and reducing the high overweight and obesity rates in the Netherlands. Therefore, this study focuses on investigating consumers’ behavior regarding these IDA’s. Besides, the effect of visibility of these IDA’s and their information button will be researched, to obtain more knowledge regarding the possibilities that online grocery stores have to influence the healthiness of consumers’ purchases. The scientific relevance of this study is to contribute to the current literature by exploring online options that are able to guide consumers into a healthier lifestyle. For this study, data from a pilot study of a Dutch grocery retailer in collaboration with the Dutch Ministry of Health, Welfare and Sports is used.

(9)

8

2. Theoretical background

This section starts with an explanation of the importance of a healthy diet (paragraph 2.1). Thereafter, the theoretical framework is discussed (paragraph 2.2). Furthermore, the conceptual model is presented, followed by the hypotheses that arise from the conceptual model (paragraph 2.3).

2.1 The importance of a healthy diet

2.1.1 The social impact of overweight and obesity

In the Western world, the Body Mass Index (BMI) is one of the most commonly used measurements to determine the healthiness of people’s weight (Lehnert et al., 2013; Jebb, Johnstone, Warren, Goldberg & Bluck, 2009). It is possible to calculate the BMI score with the help of the following formula: BMI = kg/m2, where kg is a persons’ weight expressed in kilograms and m2 is that person’s height in meters squared (WHO, 2020). The World Health Organization (WHO) emphasized that ‘healthy’ people should have a BMI score between 19 kg/m2 < 25 kg/m2. The WHO defined overweight as a BMI score between 25 kg/m2 < 30 kg/m2 and obesity is defined as a BMI > 30 kg/m2 (WHO, 2020). When too many people in the Dutch population have a BMI score that can be classified as overweight or especially obesity, this entails many societal problems and costs.

(10)

9

2.1.2 The role of supermarkets on the road to healthier nutrition

Since the vast majority of food is purchased in supermarkets in the Netherlands (Rabobank, 2019), it is safe to say that supermarkets play an important role when it comes to the daily nutrition of the Dutch population. Due to nutritional information overload (Boer et al., 2017) and choice overload in supermarkets (Schwartz, 2004), it can be very challenging for consumers to make (healthy) choices in (online) grocery store environments. Therefore, food choices are often made in a passive state, based on simple cues, like taste or convenience for example (Blaylock, Smallwood, Kassel, Variyam and Aldrich, 1999).

In order to tackle the high overweight and obesity rates, it is necessary that consumers are enabled in their decision-making process to overcome the problems of both information overload and choice overload. (Online) grocery stores can overcome the information overload problem with offering products with a Nutri-Score on it, because this label provides at a glance which food items are healthy (Julia & Hercberg, 2017). Moreover, online grocery stores have the possibility to help consumers overcome the problem of choice overload by providing interactive decision aids, like filter and sort decision aids. In this research the Nutri-Score will be combined with filter and sort IDA’s in order to assess the effectiveness of these nutritional aids.

2.2 Theoretical framework

2.2.1 Nutri-Score

When it comes to nutrition labels, front-of-pack and back-of-pack labels can be distinguished. Back-of-pack nutrition labeling is obligatory in most countries, while front-of-pack labeling is optional (Grunert, Fernandez-Celemin, Wills, Storksdieck Genannt Bonsmann & Nureeva, 2010). The problem with back-of-pack nutrition labeling is that research suggests that only a small group of consumers use these labels for the selection of their food items, because these labels are quite overwhelming and extensive to read (Grunert, Wills & Fernández-Celemín, 2010). Furthermore, consumers have to deal with an overwhelming amount of nutritional information from different channels, like food blogs, diet guru’s and social media among others. Each of these channels have their own interpretation and point of view regarding healthy nutrition (Evers & Carol, 2007). This leads to a lot of confusion for consumers and eventually even to indifference when it comes to picking (healthy) nutrition (Boer et al., 2017).

(11)

10

of packages of food items (Julia & Hercberg, 2017). Front-of-pack nutrition labels, like the Nutri-Score have received a lot of attention in the past decade from authorities and learned societies, in order to find a helpful solution for the overweight and obesity problem (OECD Publishing, 2008).

The Nutri-Score is a nutrition label founded and developed in France (Julia & Hercberg, 2017) (see Figure 1). The logo resembles a traffic light, where the scale starts on the left side with the healthiest option (dark green color and score A) and continues to the right side, with the least healthy option (red color and score E). The main idea is that each product in a supermarket has their own label which indicates the nutritional quality of a product, or in other words, the healthiness of a product. The score associated with the product will be circled and magnified on the traffic light label, as stated in Figure 1 (Julia & Hercberg, 2017). Research pointed out that compared to other front-of-package nutrition labels, the Nutri-Score is the easiest score for consumers to correctly assess the healthiness of a product (Egnell et al., 2018).

Nowadays, different (European) countries have already implemented the Nutri-Score labels in their supermarkets: Belgium, Spain, Germany, Switzerland and Luxembourg for example. Currently, the Nutri-Score label can be found already on some items in Dutch supermarkets. However, the scores are not yet in line with the official dietary guidelines of the Netherlands Nutrition Centre (Netherlands Nutrition Centre, n.d.). The Dutch government aims to comply the Nutri-Scores of products with these guidelines in 2021.

Figure 1: Nutri-Score logo (Distrifood, 2019)

2.2.2 Interactive decision aids

(12)

11

solution for this choice problem. Such a system allows consumers to deal with high complexity environments. Therefore, online retailers introduced online decision aids: these were designed to assist consumers in their online search process (BNET-Editorial, 2007; eMarketer, 2009). There can be made a distinction between two different types of online decision aids: passive decision aids and interactive decision aids. Passive decision aids do not require any form of active participation of consumers, while interactive decision aids are more focused on actively helping consumers to indicate their preferences or needs in order to obtain personal suggestions (Häubl & Trifts, 2000; Murray & Häubl, 2008).

There are several studies that have proven that interactive decision aids can influence the consumer and their decision-making process (Bechwati & Xia, 2003; Häubl & Trifts, 2000; Lee & Geistfeld, 1998). Haübl and Trifts (2000) suggest that interactive decision aids are designed for assisting consumers in the initial screening phase of alternatives and at the same time facilitate comparisons among the alternatives that consumers are considering to select. This means that IDA’s have an effect on purchase decisions, the process will be more efficient and the purchase decision will be of a higher quality (Aksoy et al., 2006). Shi and Zhang (2014) classify four types of different interactive decision aids that are plausible to find in online grocery stores nowadays: (1) decision aids for nutritional needs, (2) decision aids for brand preferences, (3) decision aids for economic needs and (4) personalized shopping lists. In this research, there will be focused on IDA’s for nutritional needs, or in particular, Nutri-Score interactive decision aids.

(13)

12

Currently, the Dutch grocery retailer that is studied, already implemented different interactive decision aids on their website, like filter and sort IDA’s. These specific type of IDA’s will be discussed in more detail below.

Filter aid. An important interactive decision aid that can be used to influence the

consumer decision-making process is the filter aid. This aid allows consumers to filter on certain criteria (Dawes & Nenycz-Thiel, 2014). According to Groissberger and Riedl (2017, p.134), filtering enables consumers to set “thresholds for attribute values to reduce the number of products in the consideration set”. This means that the overwhelming amount of product choices in online grocery stores can be reduced based on the consumers’ preferences with the help of filter IDA’s. In the pilot study of the online grocery retailer, they added Nutri-Score A, B, C, D or E as filter options for some product categories. In literature, not much is written about the filter IDA. This has to do with the fact that in a lot of online stores, filtering is a similar function as sorting. For example, in some stores sorting on price automatically involves filtering on price. Therefore, researchers refer mostly to sorting without making a clear distinction between filter and sort aids in their papers. For example, Cai & Xu (2008) consider filtering as a similar function as sorting in their paper.

Sort aid. Sorting can be defined as an aid that allows consumers to view or to sort on

different alternatives in an e-commerce environment (Dawes & Nenycz-Thiel, 2014). Sorting based on product attributes is also referred to as quality sorting (Cai & Xu, 2008). There are two important mechanisms that should be considered regarding quality sorting: the principle of concreteness (Slovic, 1972) and the principle of loss aversion (Cai & Xu, 2008).

The principle of concreteness refers to the processability of the provided information. Consumers only use information that is explicitly stated to them in a stimulus environment, meaning that information is processed the way it is presented to them (Bettman, Payne & Staelin, 1986). In other words, quality attributes that provide information that is easily to process, are most likely to receive the most attention (Creyer & Ross, 1997).

(14)

13

are more likely to be seen as references because these products appear earlier in the list (Houston & Sherman, 1995). Thus, viewing a descending product list where consumers sort on Nutri-Score A-E, consumers are most likely to see a product with a Nutri-Score of A as a reference, since these products appear earlier in the list.

2.2.3 Healthiness of food choices

The concept ‘healthiness of food choices’ has many different definitions, all tailored to a certain context. In order to determine how healthy a food choice is, the Nutri-Score serves as a measurement for the healthiness of food choices in this study. We investigate the healthiness of consumers’ food choices by calculating the average Nutri-Score of the items they purchased in a shopping session. It is important to keep in mind that products with a Nutri-Score of A, corresponding with the color dark green, can be seen as the healthiest product options. Products with a Nutri-Score of E, corresponding with the color red, are considered to be the least healthy options.

2.2.4 Visibility of the Nutri-Score IDA’s

Consumers’ shopping habits tend to be hard to change, since these are influenced by cues in their shopping environment which are often processed without consumers being consciously aware of these cues (Wilson, Buckley, Buckley & Bogomolova, 2016). To steer consumers’ attention to the Nutri-Score IDA’s and to get them to click on the information button, instead of relying on heuristics while shopping, it is necessary to modify their regular shopping environment to change their behavior (Combris et al., 2009).

(15)

14

we translate this information to the Nutri-Score IDA’s in the studied online grocery store, increasing the visibility of these IDA’s with their information button can lead to a gain in navigating to the Nutri-Score information webpage. This can lead to more consumers understanding the concept of the Nutri-Score. Eventually, this can lead to improvements regarding the healthiness of consumers’ purchases when consumers use the Nutri-Score IDA’s and understand how the Nutri-Scores should be interpreted.

2.3 The hypotheses

2.3.1 Conceptual model

Figure 2 shows the conceptual model of this study. The model shows the different concepts and relationships that will be explored in this research.

Figure 2: Conceptual model

(16)

15

wherein consumers use Nutri-Score IDA’s contain healthier food choices, because these IDA’s enable them to seek for healthy food choices. This leads to the following hypotheses:

H1a: The usage of the Nutri-Score filter IDA has a positive effect on the healthiness of food choices.

H1b: The usage of the Nutri-Score sort IDA has a positive effect on the healthiness of food choices.

Furthermore, the online grocery store increased the visibility of the Nutri-Score IDA’s in the second part of the pilot study. Research of Milosavjlevic et al. (2012) suggests that by making an option visually more present, consumers are more likely to choose this option. In our case, increasing the visibility of the Nutri-Score would mean that consumers’ attention will be steered to the Nutri-Score IDA’s and their information button. If consumers attention is steered to these IDA’ and their information button, this could potentially lead to more consumers who understand how the Nutri-Score should be interpreted, if they navigate to the information page. Therefore, the expectations rise, that increasing the visibility of the Nutri-Score IDA’s will have a positive effect on the relationship between the usage of these IDA’s and the healthiness of consumers’ purchases. Thus, the following hypotheses arise:

H2a: Increased visibility of the Nutri-Score filter IDA has a positive moderating effect on the relationship between the usage of the Nutri-Score filter IDA and the healthiness of food choices.

(17)

16

3. Data set description and preparation

This section starts with a description of how the online grocery store did collect the data that will be used for this study (paragraph 3.1). Subsequently, an explanation of how the data is cleaned and prepared will follow (paragraph 3.2), and lastly some descriptive statistics and model free insights will be presented (paragraph 3.3)

3.1 Data collection

3.1.1 Description of the A/B test

The Dutch Ministry of Health, Welfare and Sports started a project with an online grocery retailer in the Netherlands to obtain insights regarding the effectiveness of the Nutri-Score in online grocery stores. Therefore, the studied online grocery store has conducted a pilot study to research what the effect is of adding the Nutri-Score filter and sort IDA’s to the consumers’ shopping environment. The pilot study is set up as an A/B test with a between-subjects design. In a between-subjects design, each individual or subject is exposed to a single treatment (Keren, 1993). The study consisted of two parts, in part 1 the Nutri-Score IDA’s are introduced and in part 2 these IDA’s have been made more conspicuous. For this study, a large set of clickstream data is used. Bucklin & Sismeiro (2008, p.36) define clickstream data as “the electronic record of a user’s activity on the Internet” or in other words “the data trace the path a visitor takes while navigating the Web”. Since this path reflects consumers’ shopping behavior and consumers’ choices in the online shopping environment, analyzing clickstream data is relevant for this type of research. The data resulting from this A/B test will be used to answer both research questions and their corresponding hypotheses.

Description of part 1. The first part of the study took place between June 9, 2020 and

(18)

17

Description of part 2. The second part started immediately after the first part, and took

place between July 10, 2020 and July 30, 2020, covering a time period of 3 weeks. The consumers in part 2 were also randomly divided in two groups, similar as in part 1. However, two differences regarding the visibility of the Nutri-Score IDA’s have been made for the test group in the second part of the pilot study, compared to the test group from part 1. The Nutri-Score filter IDA was moved up on the left side of the screen and users received a pop-up message on the right side of the screen, stating that there was a new function that enables them to sort products based on their Nutri-Score. In short, the visibility of the Nutri-Score IDA’s was increased for this test group. The Nutri-Score IDA’s were only possible to observe if consumers from the test group visited one of the product categories where the grocery store released the IDA’s for.

3.2 Data cleaning and preparation

3.2.1 Merging data sets

Before analyzing the data set, it is necessary to combine all different data files from the online grocery store. The online grocery store delivered multiple datafiles that had to be aggregated and merged by a data scientist from the university. This allows us to start with only two different data sets, one from the first part of the pilot study and one from the second part. We merge both data sets into one large data set in RStudio, and we will continue to use RStudio for further analyses. This data set contains 2.638.620 sessions and 23 different variables. Because we first need to make some important adjustments to this data set and its variables, we will provide an overview of the variables and their meaning later on in Table 15 in Appendix 2.

3.2.2 Removing variables and sessions

(19)

18

in NSavg, because these sessions cannot be used for further analyses. Further, for our research we are only interested in sessions, where consumer actually purchased products. Therefore, we remove sessions where My_List <1, because every consumer needs to visit this page at least one time before they are able to make a purchase. This page contains an ‘order’ button where consumers should press on before they can place their order and pay for their groceries.

3.2.3 Renaming variables

Subsequently, we change the names of some of the variables. Some variable names are translated from Dutch to English and other variables are renamed due to privacy concerns of the online grocery store. Because of these privacy issues, we will not elaborate on which specific variable names we changed. Furthermore, we make the assumption that the original variable aantal_toegevoegd is a proxy for items purchased, given that items purchased is not available in this database. Therefore, we change the name of this variable into

Items_Purchased. In this scenario, zero or even negative items purchased is not possible, since

that would assume that consumers did not buy any products at all. Because we are only interested in shopping sessions where consumers did purchase products, it makes sense to remove values ≤ 0.

3.2.4 Setting boundaries

The summary of the data set shows that some variables have really high maximum numbers:

Unique_Pageviews, Items_Purchased and Times_Viewed. It is important to delete outliers

before we proceed with analyzing the data, because outliers can have a negative influence on the generalizability of the results. Therefore, we start by creating boxplots for the three discussed variables. Boxplots allow us to calculate which data points can be qualified as outliers by creating a range. A common rule that is used to qualify outliers is to check if the data points are more than 1.5 x inter quartile range (IQR) above the third quartile (Q3) or below the first quartile (Q1) (Laurent, 2013).

(20)

19

on average between 1 and 3 euros, it is not surprising if consumers would buy more than 22 products in one shopping session. Besides, it is well-known that the online grocery store that we take into account for this research, offers free delivery if consumers place an online order above 50 euros. Therefore, we set our own boundaries based on common knowledge. Because we are mostly interested in the shopping behavior of regular consumers, we set the following boundaries: unique pageviews (1-300), items purchased (1-100) and times viewed (1-750).

3.2.5 Creating new variables

After cleaning the data set, the data needs to be prepared for further analyses. The next step involves creating a new dummy variable, which we refer to as presencens_dum, based on the already existing dummy Group. Presencens_dum indicates if the Nutri-Score IDA’s were present (1= present) or absent (0= absent). At the same time this dummy indicates if consumers’ shopping sessions took place in the test group (1) or the control group (0).

Moreover, we create a new dummy variable for consumers who filtered and sorted on the Nutri-Score in the same session, which is called filtersortns_dum. This dummy indicates if the Nutri-Score filter and sort IDA’s are both used during a shopping session (1) or if these IDA’s are not both used during a shopping session (0).

Thereafter, we have to create a new dummy variable that states in which visibility condition, regarding the Nutri-Score IDA’s, consumers’ shopping sessions took place. It is possible to distinguish which sessions from the test groups belong to which visibility condition, because we know on which dates which condition took place. We start by creating a new variable/column, called diffdays, which calculates a numeric value for each date. The regular visibility condition started on the 9th of June, so this date can be seen as the starting point. Sessions from the regular test condition have numeric values of 1 to 31, while sessions in the increased visibility condition have numeric values from 32 to 52. This new column allows us to create the dummy variable visibility_dum, which indicates in which visibility condition which session took place. Sessions in the regular visibility condition are coded as a value of 0, while sessions that took place in the increased visibility condition are coded as a value of 1.

(21)

20

3.2.6 Final data set

Taking into account all the above, our final data set contains 22 variables and 508,503 shopping sessions in total. All used variables and their meaning are further elaborated on in Table 15 in Appendix 2.

3.3 Descriptive statistics

3.3.1 Distribution of sessions

After adjusting our data set, we calculate some general descriptive statistics. In Table 1 the distribution of the sessions for each visibility condition and each group are displayed. Visibility condition ‘R’ refers to the regular visibility condition, and visibility condition ‘I’ refers to the increased visibility condition. As mentioned before, only both test groups were able to see the Nutri-Score IDA’s. Moreover, visibility condition ‘R’ can also be seen as the first part of the pilot study whereas visibility condition ‘I’ can be seen as the second part of the pilot study.

Control group Test group Total sessions Visibility condition (R) 134,506 136,260 270,766 Visibility condition (I) 118,970 118,767 237,737

Total 253,476 255,027 508,503

Table 1: Distribution of the sessions

3.3.2 Average Nutri-Score

Next, we calculate the average Nutri-Score. The average Nutri-Score has a range from 1-5, where the numeric value of 1 corresponds to a Nutri-Score of A and the numeric value 5 to a Nutri-Score of E. For all sessions, the Nutri-Score is 2.333 on average, which corresponds to a Nutri-Score between B and C. For the distribution of the average Nutri-Scores per test condition and per group, see Table 2.

Control group Test group Visibility condition (R) 2.346 (B-C) 2.346 (B-C) Visibility condition (I) 2.319 (B-C) 2.316 (B-C)

Table 2: Average Nutri-Score

(22)

21

a Levene’s test for equal variances, followed by an independent samples T-test. The Levene’s test (p-value= 0.011) is significant, meaning that we can reject H0 and infer that the variances are unequal between the groups. The T-test shows that there is a difference in means between the test group in the regular visibility condition versus the test group in the increased visibility condition (p-value <0.001).

3.3.3 Usage of IDA’s for the test groups

From the 255,027 sessions that represent both test groups, the Nutri-Score filter and sort IDA’s are not used frequently compared to the other two IDA’s in the online grocery store, namely the ‘sort on price’ and ‘sort on purchase frequency’ IDA’s. More details regarding the usage of the IDA’s can be found in Table 3. It is noteworthy to mention that only in 113 sessions, consumers used both the Nutri-Score filter and sort IDA’s.

Filter NS Sort NS Sort price Sort purfreq Total Visibility condition (R) 1,841 1,299 33,736 3,480 40,356 Visibility condition (I) 2,317 2,838 31,726 3,675 40,556

Total 4,158 4,137 65,462 7,155 80,912

Table 3: Usage of IDA’s for the test groups

The usage statistics of the Nutri-Score IDA’s (Filter NS, Sort NS) in Table 3 give reason to believe that the means of the test groups in the different visibility conditions are significantly different from each other. We perform two independent samples T-tests with unequal variances, which indeed show that the means of both the Score filter IDA and the Nutri-Score sort IDA are significantly different between the test groups in the different visibility conditions. More details can be found in Table 4.

(23)

22

4. Research design

The type of analysis that will be used to obtain answers on both research questions and their corresponding hypothesis will be discussed. Subsequently an extra research question will be introduced to obtain new insights and its corresponding analysis will be discussed (paragraph 4.1).

4.1 Plan of analysis

4.1.1 Multiple linear regression model

To obtain an answer to research question 1 and 2, we will focus on estimating the effect of the usage of the Nutri-Score IDA’s on the average healthiness of consumers’ food choices (measured as the average Nutri-Score). The dependent variable (NS_avg) is continuous, therefore linear regression fits this type of data. We are mainly interested in the effect of the Nutri-Score IDA’s on the average healthiness of consumers’ purchases, but it is unrealistic to think that only the Nutri-Score IDA’s have an effect on the healthiness of consumers’ purchases. Therefore, we include variables from the data set in our model, that we suspect to have a possible effect on the average healthiness, to minimize the omitted variable bias.

Since we want to include multiple control variables, we decide to estimate a multiple linear regression (MLR) model. OLS regression is the most commonly used method for fitting linear statistical models with a continuous dependent variable (Hayes and Cai, 2007), thus we will estimate this model with the use of OLS. To measure the moderating effect of the visibility condition, a two-way interaction effect is included in the estimation.

The following linear regression equation will be estimated:

𝑁𝑆_𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑡= 𝛼 + 𝛽1𝐹𝑁𝑆𝑖+ 𝛽2𝑆𝑁𝑆𝑖+ 𝛽3𝑉𝑖_𝑑𝑢𝑚𝑖+ 𝛽4𝑃𝑁𝑆_𝑑𝑢𝑚𝑖+ 𝛽5𝐷𝑖𝑖 + 𝛽6𝑆𝑃𝑖+ 𝛽7𝑆𝑃𝐹𝑖

+ 𝛽8𝑆𝑒𝑖+ 𝛽9𝑀𝑎𝑖+ 𝛽10𝑀𝐿𝑖+ 𝛽11𝑈𝑃𝑖+ 𝛽12𝐹𝑁𝑆𝑖∗ 𝑉𝑖_𝑑𝑢𝑚𝑖+ 𝛽13𝑆𝑁𝑆𝑖∗ 𝑉𝑖_𝑑𝑢𝑚𝑖+ 𝜀𝑖

𝑖 = 1…528,466

Where, at session 𝑖:

𝑁𝑆_𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑖 = average Nutri-Score of purchased products in session i

α = intercept

𝐹𝑁𝑆𝑖 = number of times filtered on Nutri-Score 𝑆𝑁𝑆𝑖 = number of times sorted on Nutri-Score

(24)

23

𝑃𝑁𝑆_𝑑𝑢𝑚𝑖 = dummy variable for presence of the Nutri-Score (1= present (test group), 0= absent (control group))

𝐷𝑖𝑖 = number of times visited a page containing discounted products

𝑆𝑃𝑖 = number of times sorted on price

𝑆𝑃𝐹𝑖 = number of times sorted on purchase frequency

𝑆𝑒𝑖 = number of times visited a page containing search results 𝑀𝑎𝑖 = number of times visited a page containing a magazine box

or recipe found based on the box or recipe

𝑀𝐿𝑖 = number of times visited a page containing an overview of a personalized shopping list

𝑈𝑃𝑖 = variable that indicates how many different product pages are visited

𝜀𝑖 = disturbance term

𝛼, 𝛽1… 𝛽13 = model parameters

It is important to keep in mind that OLS regression makes several assumptions that we have to take into account before our model and the results can be interpreted. These assumptions will be discussed in the next paragraph.

4.1.2 Testing the assumptions of OLS

Functional form. Before interpreting the results of the MLR model, we need to test the

basic assumptions of OLS. The first assumption that we need to test for, concerns the functional form of our model. According to Leeflang, Wieringa, Bijmolt and Pauwels (2015), 𝐸(𝜀𝑡) = 0 is one of the most important assumptions for a model that is estimated with OLS. If this assumption is not met, it means that the parameter estimates of the model are biased. The functional form can be tested with a RESET test. The RESET test (RESET= 0.711) of our model is not significant (p-value= 0.399), meaning that we cannot reject the null-hypothesis which states that 𝐸(𝜀) = 0. Thus, we can assume that the current functional form of our model is appropriate and that there is a linear relationship between the explanatory variables and the dependent variable.

Autocorrelation. The second assumption involves autocorrelation. This assumption

states that Cov(εt, εs) = 0 for t ≠ t′, meaning that if residuals follow a pattern over time, this

(25)

24

Homoscedasticity. The next assumption involves homoscedasticity. This assumptions

states that (Var(εt) = σ2), meaning that the error term of the model should have the same

variance at all times (Leeflang et al., 2015). To detect if the error term has indeed the same variance, we can visually detect this by looking at a scale location plot. The scale location plot (see Figure 3) shows a higher variance for the higher predicted values. The slope of the line moves upwards, indicating that the assumption of homoscedasticity for the error term is violated. If the homoscedasticity assumption is violated, it means that there is heteroscedasticity. To formally detect if we have heteroscedasticity, we provide a Goldfeld-Quandt test (GQ= 1.018, p-value <0.001) and a Breusch Pagan test (BP= 14,888, p-value <0.001). Both tests are highly significant, meaning that there is no constant variance over time. In other words, there is heteroscedasticity.

This issue can be improved by a transformation of the model. By a log-transformation of the model, the precision of estimates is improved and at the same time the influence of outliers is reduced (Başer, 2007). We are aware of the fact that there are a lot of zeros in the data set for most of the variables, which can be problematic for log-transforming the model, since it is not possible to take the logarithm of 0. Therefore, we make sure that only values >0 are log-transformed. After changing our model in a log-transformed version of the model, we have to conclude that there still is heteroscedasticity. The Goldfeld-Quandt test (GQ= 1.026, p-value <0.001) and the Breusch Pagan test (BP= 27,842, p-value <0.001) show no signs of improvement, meaning that the log-transformed version of the model did not solve the heteroscedasticity issue. However, we will still continue with the log-transformed version of the model and test in the next paragraph if other assumptions or the model performance improved compared to the original multiple linear regression model.

Figure 3: Scale-Location plot MLR model

(26)

25

are not normally distributed. The histogram clearly does not follow the bell-shaped form that we usually see when the residuals are normally distributed. Furthermore, the residuals in the Q-Q plot are not normally distributed, many points in this plot are not on the straight line.

Figure 4: Histogram of the residuals Figure 5: Q-Q plot of the residuals

To formally check if the normality assumption is violated, we conduct a Lilliefors (Kolmogorov-Smirnov) normality test (D= 0.102, p-value <0.001) and an Adjusted Jarque-Bera test (AJB=19,813, p-value <0.001). Both tests are highly significant, therefore we need to reject H0, which states that the residuals of the error term are normally distributed. This means that the normality assumption is violated. If the normality assumption is violated, we are not able to interpret the confidence intervals of our model, meaning that we are unable to distinguish which of the estimates are truly significant. A remedy for this issue concerns bootstrapping (Leeflang et al., 2015). By applying bootstrapping to our model, we are able to generate p-values that are based on the true distribution of our model. This means that the estimates of our estimated model will not change by applying bootstrapping, only the p-values will change to their ‘true’ values. After bootstrapping our model (see Table 9, p. 31), we can conclude that all the variables that were significant before bootstrapping, are still significant after bootstrapping. Further, the variable visibility_dum and both interaction effects are still insignificant. Meaning that applying bootstrapping does not lead to drastic changes.

Multicollinearity. The last assumption concerns multicollinearity. According to

(27)

26

our model only appear to have VIF scores <5 (see Table 5), meaning that there is no multicollinearity.

Variable VIF score

FilterNS_log 1.007 SortNS_log 1.007 presencens_dum 1.003 visibility_dum 1.003 Discount_log 1.176 SortPrice_log 1.038 SortPurchaseFreq_log 1.003 Search_log 1.787 Magazine_log 1.035 My_List_log 1.007 Unique_Pageviews 2.021

Table 5: VIF scores variables MLR model

4.1.3 Final model choice MLR model

(28)

27

MLRmodel1 MLRmodel1_logtransformed

AIC 1,491,124 669,287

BIC 1,491,291 669,454

Table 6: Information criteria MLR models

4.1.4 Count model

It is interesting to expand our research regarding the influence of the Nutri-Score IDA’s to a broader field, namely to consumers’ behavior in the online grocery store environment regarding their purchases. This allows us to better understand which consumer actions, but also which functions that the grocery store offers in the online shopping environment, influence consumers’ purchases. We are able to trace consumers’ behavior, by extracting information from their pathway. This leads to the following new research question:

RQ3: Which consumer actions and functions in the online grocery store environment have an effect on the number of items consumer purchase during a shopping session?

As mentioned before, we use the old variable aantal_toegevoegd, which is now called

Items_Purchased, as a proxy for items purchased, because our data set lacks a variable which

clarifies how many products were purchased during shopping sessions. Therefore, a model will be provided, where Items_Purchased is the dependent variable. We are mostly interested in the effect of usage of the Nutri-Score filter and sort IDA’s on Items_Purchased, but we also include multiple other variables from the data set to estimate if these variables have an effect on

Items_Purchased.

(29)

28

Figure 6: Histogram Items_Purchased

The following count model equation with Poisson regression will be estimated:

𝐼𝑡𝑒𝑚𝑠_𝑃𝑖= 𝛼 + 𝛽1𝑃𝑁𝑆_𝑑𝑢𝑚𝑖+ 𝛽2𝑉𝑖_𝑑𝑢𝑚𝑖+ 𝛽3𝑈𝑃𝑖+ 𝛽4𝐷𝑖𝑖+ 𝛽5𝐹𝑁𝑆𝑖+ 𝛽6𝑆𝑁𝑆𝑖+ 𝛽7𝐹𝑆𝑁𝑆_𝑑𝑢𝑚𝑖

+ 𝛽8𝑆𝑃𝑖+ 𝛽9𝑆𝑃𝐹𝑖+ 𝛽10𝑆𝑒𝑖+ 𝛽11𝑀𝑎𝑖+ 𝛽12𝑀𝐿𝑖+ 𝛽13𝑃𝐸𝑖+ 𝛽14𝑇𝑉𝑖+ 𝜀𝑖

𝑖= 1…528,466

Where:

𝐼𝑡𝑒𝑚𝑠_𝑃𝑖 = the number of items purchased in session i

𝛼 = intercept

𝑃𝑁𝑆_𝑑𝑢𝑚𝑖 = dummy variable for presence of the Nutri-Score (1= present (test group), 0= absent (control group) 𝑉𝑖_𝑑𝑢𝑚𝑖 = dummy variable indicating the visibility condition

(1= increased visibility, 0= regular)

𝑈𝑃𝑖 = variable that indicates how many different product pages are visited

𝐷𝑖𝑖 = number of times visited a page containing discounted products 𝐹𝑁𝑆𝑖 = number of times filtered on Nutri-Score

𝑆𝑁𝑆𝑖 = number of times sorted on Nutri-Score

𝐹𝑆𝑁𝑆_𝑑𝑢𝑚𝑖 = dummy variable for filtered and sorted on Nutri-Score in same session (1= yes, 0= no)

𝑆𝑃𝑖 = number of times sorted on price

𝑆𝑃𝐹𝑖 = number of times sorted on purchase frequency

𝑆𝑒𝑖 = number of times visited a page containing search results 𝑀𝑎𝑖 = number of times visited a page containing a magazine box

or recipe found based on the box or recipe

𝑀𝐿𝑖 = number of times visited a page containing an overview of of a personalized shopping list

𝑃𝐸𝑖 = number of times visited a page containing an overview of products that are earlier purchased

𝑇𝑉𝑖 = number of times products viewed

𝜀𝑖 = disturbance term

(30)

29

4.1.5 Challenges count model with Poisson regression

A count model with a Poisson distribution knows three challenges that cannot be violated. The first challenge concerns a situation where the mean and the variance of the dependent variable are unequal (Long & Freese, 2006). Checking our mean (7.442) and variance (73.324), we are able to conclude that the variance is greater than the mean, which indicates overdispersion. To formally detect if we have overdispersion, an overdispersion test is conducted (𝑎= 0.818, p-value <0.001) (Long & Freese, 2006). The test confirms that there is overdispersion, meaning that the heterogeneity across cases is larger than we expected. This means that we need to extend our model to a negative binomial regression model, which is an extended version of a Poisson model.

The second challenge concerns a situation where zero events cannot be observed for the dependent variable (Long & Freese, 2006). In our case, zero events cannot be observed for the dependent variable Items_Purchased, because zeros would indicate that there was no purchase at all. Items_Purchased only contains values >0. Therefore, a special version of a negative binomial regression model will be estimated, which is called a truncated negative binomial model. This model takes into account that zeros for Items_Purchased are not possible. The third challenge concerns a situation where there are more zeros in the dependent variable than expected (Long & Freese). Because our dependent variable contains no zeros at all, we do not have to deal with this challenge, meaning that the final version of our count model will be a truncated negative binomial model.

(31)

30

Variable VIF score

visibility_dum 1.001 presencens_dum 1.003 Unique_Pageviews 3.334 Discount 1.193 FilterNS 1.067 SortNS 1.075 filtersortns_dum 1.128 SortPrice 1.064 SortPurchaseFreq 1.021 Search 2.087 Magazine 1.058 My_List 1.015 Purchased_Earlier 1.063 Times_Viewed 1.769

Table 7: VIF scores variables count model

4.1.6 Final model choice count model

To validate if a truncated negative binomial model is the best fit for our data, a linear regression model with OLS, a Poisson model and a negative binomial model will also be estimated. Thereafter, we are able to calculate which of the models has the best fit for our type of data. The AIC and BIC information criteria are used, to determine the fit of the four models (see Table 8). According to both information criteria, the truncated negative binomial regression model is the best performing model, meaning that we chose the right model for our data. We will interpret the coefficients of this model in section 5. To assess the model performance of our final count model, we will compare the AIC and BIC of the truncated negative binomial model with that of a null model in section 5.2.3.

Table 8: Information criteria count models

(32)

31

5. Results

This section starts with the results of the MLR model (paragraph 5.1), followed by the results of the count model (paragraph 5.2).

5.1 Results multiple linear regression model

5.1.1 Coefficients MLR model

The coefficients of the bootstrapped log-transformed MLR model are shown in Table 9. We will interpret the coefficients of the significant variables in paragraph 5.1.2. The explanatory variables FilterNS_log and SortNS_log appear to be significant. Furthermore, the moderator

visibility_dum, appears to have a significant direct relationship with the dependent variable NS_avg_log. Furthermore, the control variables Discount_log, FilterNS_log, SortNS_log, SortPrice_log, SortPurchaseFreq_log, Search_log, Magazine_log, and My_list_log are

significant.

Variable Estimate Std. Error Bias p_bootstrap

(Intercept) 0.727 0.004 -0.000 0.000*** FilterNS_log -0.079 0.015 0.000 0.000*** SortNS_log -0.084 0.022 -0.000 0.000*** visibility_dum -0.015 0.001 0.000 0.000*** Control: presencens_dum 0.001 0.001 -0.000 0.327 Control: Discount_log 0.037 0.001 -0.000 0.000*** Control: SortPrice_log 0.011 0.002 0.000 0.000*** Control: SortPurchaseFreq_log -0.029 0.005 -0.000 0.000*** Control: Search_log 0.005 0.001 -0.000 .0.000*** Control: Magazine_log -0.010 0.001 0.000 0.000*** Control: My_list_log -0.023 0.002 -0.000 0.000*** Control: Unique_Pageviews -0.000 0.001 0.000 0.399 FilterNS_log:visibility_dum -0.035 0.021 0.000 0.097 SortNS_log:visibility_dum -0.007 0.025 0.001 0.387

(33)

32

5.1.2 Interpretation coefficients MLR model

Because we log-transformed the dependent variable as well as the continuous explanatory variables in the MLR model, we can interpret the coefficients as changes in percentages. Meaning that for one percent change in the independent variable, the dependent variable, will increase/decrease by about (coefficient) percent, or more precisely by 100 ∗ (1.01𝛽̂𝑖− 1), where 𝐵̂𝑖 = the coefficient of variable i (Yang, 2012). It is important to be aware that the dummy variables in the model are not log-transformed, so we need to interpret these differently (Yang, 2012). In this paragraph, we will accept/reject the hypotheses and interpret the coefficients of the significant variables. Thereafter, we will provide an overview of the hypotheses and their findings in Table 10.

Filter on Nutri-Score. The explanatory variable FilterNS_log, which means that there

is filtered on the Nutri-Score, is significant (p <0.001). Consistent with our expectation, the effect of filtering on the Nutri-Score is negative (-0.079), which seems logical since the healthiest Nutri-Score A is coded as the numeric value of 1 in our data set, while the unhealthiest Nutri-Score of E is coded as a value of 5. A decrease in the average Nutri-Score would mean, that filtering on the Nutri-Score leads to healthier food choices. For a more meaningful interpretation, we will calculate what the effect is for the average Nutri-Score, when the usage of the Nutri-Score filter IDA increases with 10%, therefore we will use the following formula:

100 ∗ (1.10−0,079 − 1) = −0.750%.

For a 10% increase in the usage of the Nutri-Score filter IDA, the average Nutri-Score decreases by 0.750%. Since it is hard to interpret the effect of this decrease in percentages, we will explain it with an example. The average Nutri-Score of the studied shopping sessions is 2.337. A decrease of 0.750% would mean that the average Nutri-Score would decrease with 0.018 to 2.319. In other words, a 10% increase in the usage of the Nutri-Score filter IDA leads to a decrease of 0.018 in the average Nutri-Score of consumers’ food choices. Even though this effect is very small, we are able to say that H1a holds: The usage of the Nutri-Score filter IDA has a positive effect on the healthiness of food choices.

Sort on Nutri-Score. The explanatory variable SortNS_log, which means that there is

(34)

33

average Nutri-Score would mean, that sorting on the Nutri-Score leads to healthier food choices. For a more meaningful interpretation, we will calculate what the effect is for the average Nutri-Score, when the usage of the Nutri-Score filter IDA increases with 10%, therefore we will use the following formula:

100 ∗ (1.10−0,084− 1) = −0.797%

This means that for a 10% increase in the usage of the Nutri-Score filter IDA, the average Nutri-Score decreases by 0.797%. The effect of using the Nutri-Score sort IDA (0.797%) is quite similar to the effect of using the Nutri-Score filter IDA (0.712%). This means that the effect on the average Nutri-Score is also very small for the usage of the Nutri-Score sort IDA. Even though the effect is very small, we are able to say that H1b holds: The usage of the Nutri-Score sort IDA has a positive effect on the healthiness of food choices.

Visibility condition (direct effect). Besides the effect of the visibility condition as a

moderator, we also estimated the direct effect of the visibility condition on the healthiness of consumers’ food choices. The direct effect appears to be significant (p-value <0.001). The effect of increased visibility compared to regular visibility (-0.015) is negative, which seems logical. A decrease in the average Nutri-Score would mean, that consumers in the condition where the visibility of the Nutri-Score IDA’s was increased, choose for healthier food choices. For a more meaningful interpretation, we will calculate what the effect is of the increased visibility condition versus the regular visibility condition on the average Nutri-Score. Therefore, we will use the following calculation: 1 − (𝐸𝑥𝑝(−0,015)) = 0.015. This means that the increased visibility condition leads to a decrease of 0.015% in the average Nutri-Score, compared to the regular visibility condition. In other words, increasing the visibility of the Nutri-Score IDA’s, leads to a decrease in the average Nutri-Score of consumers’ food choices.

Visibility condition (moderator). The interaction effect between filtering on the

(35)

34

The interaction effect between sorting on the Nutri-Score and the visibility condition is also not significant (p-value= 0.387). Therefore, we cannot say that the increased visibility condition, compared to the regular visibility condition, has a significant different effect on the relationship between the usage of the Nutri-Score sort IDA and the average Nutri-Score of consumers’ food choices. Meaning that we have to reject H2b: increased visibility of the Nutri-Score sort IDA has a positive moderating effect on the relationship between the usage of the Nutri-Score sort IDA and the healthiness of food choices.

Hypothesis Findings

H1a: The usage of the Nutri-Score filter IDA has a positive effect

on the healthiness of food choices.

Accepted

H1b: The usage of the Nutri-Score sort IDA has a positive effect

on the healthiness of food choices.

Accepted

H2a: Increased visibility of the Nutri-Score filter IDA has a

positive moderating effect on the relationship between the usage of the Nutri-Score filter IDA and the healthiness of food choices.

Rejected

H2b: Increased visibility of the Nutri-Score sort IDA has a

positive moderating effect on the relationship between the usage of the Nutri-Score sort IDA and the healthiness of food choices.

Rejected

Table 10: Overview hypothesis findings

5.1.3 Model performance MLR model

An important measure to assess the goodness of fit for a model is by evaluating the R-squared (Leeflang et al., 2015). According to Leeflang et al. (2015) the R-squared“measures the proportion of total variance in the criterion variable “explained” by the model” (p. 102). The R-squaredof our model is 0.006269, which means that 0.006269% of the variance is explained by the model. This seems low, but Leeflang et al. (2015) emphasize that for some types of problems and data all models have low R-squaredvalues, due to the type of data. Besides, this model tries to explain consumers’ behavior, so it might be that other factors, which are not captured in our data set, also influence the average Nutri-Score of consumers food choices. This means that the low R-squared for our model does not necessarily mean that our model does not provide valuable information. Especially not since a lot of coefficients in the model are significant.

(36)

35

states the model lacks explanatory power (Leeflang et al., 2015). This means that our model explains more than would be explained due to chance.

Subsequently, we can compare the AIC and BIC of our final model (log-transformed MLR model) with the AIC and BIC of a log-transformed null model and with a log-transformed model without the control variables to see if our estimated MLR model (MLRmodel1_log) outperforms these models (see Table 11). The model without control variables is referred to as the Simplemodel_log. According to both information criteria we are able to conclude that our estimated MLR model has the lowest AIC and BIC, meaning that this model is the best performing model. Besides, the adjusted R-squaredfor the log-transformed MLR model (0.006244) is higher than the adjusted R-squaredof the simple model (0.0001765), meaning that the estimated MLR model has the best explanatory power.

Nullmodel_log Simplemodel_log MLRmodel1_log AIC 672,459 672,374 669,287

BIC 672,482 672,441 669,454 Table 11: Comparing information criteria of regression models

5.2 Results count model

In this paragraph, we will interpret the coefficients of the significant variables of the count model. Thereafter, we will provide an overview of the findings in Table 12.

5.2.1 Coefficients count model

The coefficients of our final truncated negative binomial count model are shown in Table 11. The explanatory variables visibility_dum, Unique_Pageviews, Discount, FilterNS, SortNS,

SortPrice, SortPurchaseFreq, Search, Magazine, My_List, Purchased_Earlier and Times_Viewed are significant. We will interpret the coefficients of the significant variables in

section 5.2.2.

(37)

36

Table 12: Results of the count model signif. codes: <0.001‘***’, 0.01‘**’, 0.05‘*’

5.2.2 Interpretation coefficients count model

In this section we will interpret the coefficients of the explanatory variables from the count model, followed by a table that provides an overview of the effects of the explanatory variables on Items_Purchased (see Table 13). In section 6.3.1 we will further elaborate on variables that have a relatively large impact on the number of items purchased and we will try to find explanations for these findings. Since we already calculated the exponent of each of the estimates, we can calculate easily what the effect of a specific variable is on the number of items purchased. For example, an exponentiated estimate of 0.9 means a 10% decrease in the number of items purchased (0.9 * Items_Purchased).

Presence of the Nutri-Score IDA’s. The effect of the presence of the Nutri-Score is not

significant (p-value= 0.320). This means that we are able to conclude that the presence of the Nutri-Score IDA’s do not have a significant effect on the number of items that consumers purchase during a shopping session.

Visibility condition. The effect of the visibility condition is significant (p-value

<0.001). We can interpret the coefficient of this dummy variable as follows: if the

visibility_dum equals 1 rather than 0 (keeping all other variables in the model constant), the

increased visibility condition (1) decreases Items_Purchased with 10.8%.

Unique pageviews. The effect of the number of different product pages visited is

significant (p-value <0.001). This means that if Unique_Pageviews changes by one unit (keeping all other variables in the model constant), Unique_Pageviews decreases

Items_Purchased with 0.3%.

Discount. The effect of the number of pages visited containing discounted products is

significant (p-value <0.001). This means that if Discount changes by one unit (keeping all other variables in the model constant), Discount increases Items_Purchased with 1.6%.

Referenties

GERELATEERDE DOCUMENTEN

De meeste effectgerichte maatregelen, zoals een verlaging van de grondwaterstand of een verhoging van de pH in de bodem, verminderen de huidige uitspoeling, maar houden de

Lasse Lindekilde, Stefan Malthaner, and Francis O’Connor, “Embedded and Peripheral: Rela- tional Patterns of Lone Actor Radicalization” (Forthcoming); Stefan Malthaner et al.,

The results show that the cultural variables, power distance, assertiveness, in-group collectivism and uncertainty avoidance do not have a significant effect on the richness of the

Therefore, future research is recommended to investigate combination interventions (e.g. Nutri-Score label and price decrease intervention) simultaneously to steer consumers

Research on forum shopping in the Netherlands could provide more insight in the phenomenon: the frequency of forum shopping, the strategic considerations of litigants (and

The main research question of this paper “Do firms that announce an open market repurchase program signal undervaluation?” is researched by measuring the effects of the

The present text seems strongly to indicate the territorial restoration of the nation (cf. It will be greatly enlarged and permanently settled. However, we must

Muslims are less frequent users of contraception and the report reiterates what researchers and activists have known for a long time: there exists a longstanding suspicion of