• No results found

A discrete choice analysis of food store choices among U.S. households

N/A
N/A
Protected

Academic year: 2021

Share "A discrete choice analysis of food store choices among U.S. households"

Copied!
27
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

A Discrete Choice Analysis of Food Store Choices among

U.S. Households

Remco van Bruggen 10650628 June 26, 2018

University of Amsterdam

BSc in Econometrics

(2)

Statement of Originality

This document is written by Student Remco van Bruggen who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Abstract

Because of increasing health problems in the United States, the factors influencing what people eat are of central concern for health researchers and policymakers. Although a wide body of re-search examined different factors influencing healthy food choices, not a lot of rere-search has been done on the factors determining food store choice. In this paper, data from the nationally repre-sentative National Household Food Acquisition and Purchase Survey (FoodAPS) is analyzed using a random coefficient logit model to determine how individual and household characteristics of food shoppers interact with both the type of food store they choose and the distance from home to each store. Overall, households prefer food stores closer to home and superstores are preferred over su-permarkets, grocery stores and combination grocery stores. Preferences for type of food store and willingness to travel are found not to vary much across income groups and participation in the Sup-plemental Nutrition Assistance Program (SNAP). Additionally, owning a car, living in a rural area and households including members suffering from obesity, are more willing to travel further for food stores. This research helps in understanding how people make food purchasing decisions, which is important for health researchers and policymakers. The next step is to explore how the preferences can be leveraged to promote healthier eating.

(4)

Contents

1 Introduction 2

2 Discrete-choice model theory 3

2.1 Models . . . 5

2.1.1 Conditional Logit . . . 6

2.1.2 Random Coefficients Logit . . . 8

2.2 Model Estimation . . . 9

3 Research method 11 4 Results and analysis 13 4.1 Heterogeneity by SNAP participation and income group . . . 14

4.2 Household characteristics influencing distance travelled . . . 15

4.3 Heterogeneity by store choice rationale and other household characteristics . . . 18

4.4 Limitations . . . 21

(5)

1

Introduction

According to the leading national public health institute of the United States, the CDC, more than one-third of U.S. adults suffer from obesity. Educating people about the health risks of a poor diet does not seem to be working, as the percentage of obese adults is significantly higher than a decade ago. According to a study done by Finkelstein, Trogdon, Cohen and Dietz (2009) about the medical costs of obesity, the estimated costs of obesity amount to $147 billion U.S. dollars in 2008.

Because of this increasing problem, the factors influencing what people eat are of central concern for health researchers and policymakers. There are different policies and programs that focus on the retail food environment of individuals and households, considering for example the location of stores and the type of food they stock. The Healthy Food Financing Initiative (HFFI), for instance, aims to equip and develop food stores with healthful foods. In 2014 an amount of $125 million was allocated to this program with the goal of eliminating low-income areas with limited access to nutritious food, so called food deserts, by incentivizing retailers to do business in these areas. Financing for programs like this, was granted after multiple studies found that food store choice and limited access to healthy foods affected dietary intake. Krukowski and McSweeney (2012) state that nutritious food availability is associated with dietary intake. Furthermore, a significant determinant in the availability of nutritious food is the accessibility of supermarkets (Block & Kouba, 2006). Because supermarkets are less frequently located in rural areas and areas with a high proportion of racial minorities, they state that these populations might find the lack of availability of nutritious foods a big obstacle to having a healthy diet. Besides accessibility to nutritious foods, numerous studies find that other important variables affecting the consumption of nutritious foods are the pricing of healthful foods, quality of foods and store type (Krukowski & McSweeney, 2012). Liese et al. (2007), for instance, find that the availability of healthful foods is substantially higher at supermarkets and grocery stores compared to convenience stores.

Although a wide body of research examined different factors influencing healthy food choices, not a lot of research has been done on the factors determining food store choice. Hillier et al. (2015) used a conditional logit model to analyze what factors influence the store choice of households. The results of their study indicated the importance of distance from home to food stores, the influence of race and sex of food shoppers and food prices. However, this study has several limitations. Firstly, Hillier et al. (2015) used a relatively small dataset. Secondly, their sample only included a single city. This limits the generalization of the results.

(6)

Taylor and Villas-Boas (2016) used a larger dataset, namely the FoodAPS dataset, to estimate a mixed logit model and analyze how consumers, shopping for both food consumed at home and food consumed away from home, choose among a variety of food outlet alternatives. Their research focuses on the effect of distance to each store and on how preferences of food stores vary among SNAP-participating1and non-participating households. This analysis aggregated stores into relevant

outlet types and used expenditure shares as an analogue for shopping choices. Therefore the analysis was reduced to a linear regression of the difference in log shares on individual and outlet attributes. The limitation of using expenditure shares is that free food events, were not taken into account in this study.

This research builds on Hillier et al. (2015) by using the large nationally representative FoodAPS dataset to make results more generalizable. Also, instead of using the conditional logit model, a random coefficient logit model2 is used as the former is often said to produce unrealistic substitution patterns because of the Independence of Irrelevant Alternatives (IIA) property. This research builds on Taylor and Villas-Boas (2016) by considering a broader amount of variables that could influence store choice. Besides that, instead of using aggregated data and expenditure shares as an empirical analogue of choice probabilities, this analysis used individual data and the relatively new asmixlogit command in Stata to estimate the random coefficient logit model. The goal of this research is to analyze how individual and household characteristics of food shoppers interact with both the type of food store they choose and the distance from home to each store.

Before the analysis is done, the theoretical framework of discrete-choice models is outlined in the next section. Section 3 then describes the design and content of the empirical research. The results of the empirical research are analyzed and discussed in section 4, after which the final section of this article provides a conclusion and recommendations for further research.

2

Discrete-choice model theory

In this section, the theoretical framework behind discrete-choice models is outlined. First, the history and general idea of these models are explained. Then, the building blocks of these types of models are described, after which some popular discrete-choice models are presented and discussed. Their corresponding assumptions, advantages and drawbacks are also outlined. The so called mean utility method to estimate the parameters, first introduced by Berry (1994), is then explained.

1The SNAP-program provides purchasing assistance for low-income households in the United States 2The random coefficient logit model is also sometimes called the mixed logit model

(7)

The purpose of discrete choice models is to explain and predict the choices consumers make when choosing between two or more discrete alternatives. The use of these models has been very popular over the past few decades (Berry & Haile, 2016). One way to specify the demand for differentiated products would be to define a system of demand equations, where the demand of each product depends on its own price, the price of other products, and other variables influencing demand (Nevo, 2000). However, this results in a lot of parameters to be estimated. As Nevo (2000) stresses, this does also not account for the heterogeneity in consumer preferences, while this heterogeneity is implied by the fact that products are differentiated. Discrete choice models deal with the problems stated above (McFadden, 1974).

Before the actual models are described, the foundation of discrete-choice models is outlined below. This foundation is necessary to understand the models and where they come from. The foundation consists of two parts. In the first part, the theory of choice is described and in the second part the random utility theory is explained.

For discrete-choice models some common assumptions exist. According to Ben-Akiva and Lerman (1985), a choice can be defined as the outcome of some consecutive steps. Namely, the definition of the choice problem, the generation of alternatives, the evaluation of attributes of the alternatives, the choice, and finally the implementation. Therefore, a choice can be seen as a process. The choice in this process is dependent on the characteristics of all the alternatives and the choice is eventually made by defining some decision rule. To define this process four elements are now introduced following Ben-Akiva and Bierlaire (1999). The first element is the maker. The decision-maker in discrete-choice models is often defined as an individual, but it can also be defined as a group of persons. A household, family or an organization are examples of such a group. In that case, only the decisions of the group as a whole are considered. The second element is the alternative set and is about the choices available to the decision-maker. When modelling discrete choices, the set of all possible alternatives has to be defined. The set where all alternatives in a certain context is defined, is called the universal choice set. Because it is possible that not all alternatives are available for one individual, the choice set is defined as a subset of the universal choice set available to a certain individual. The third element is about the attributes of each alternative. Ben-Akiva and Lerman (1985) state that the attractiveness of an alternative is evaluated in terms of a vector of attribute values. These attributes do not necessarily have to be directly measurable, but it could be any function of the available data. In the case of discrete-choice models the attributes could be both observed and unobserved by the researcher. In the next section this is discussed further. The

(8)

last element of the process is the decision rule, where the decision-maker has to make a choice out of all the alternatives based on their attributes. This is often modelled based on utility theory. In micro-economic theory it is often assumed that a consumer has a utility function that allows him to rank all the alternatives in a consistent way and chooses the option that maximizes the utility (McFadden, 1974). But as Dagsvik (1995) states, this approach has been criticized a lot. It is argued that when people face a choice among alternatives, they tend to be uncertain about which alternative to choose. This suggests that the decision rule should include a probabilistic dimension. Discrete-choice models are based on the so called random utility theory. This means the models are based on deterministic decision rules, where the utilities are described by random variables (Dagsvik, 1995).

In the random utility theory, it is still assumed that the decision-maker maximizes his or her utility. However, it is also assumed that the researcher has incomplete information on the utility of the consumer (Ben-Akiva & Bierlaire, 1999). Manski (1977) mentions four different possibilities for this incomplete information. It could be due to unobserved product characteristics, unobserved taste variations that are unknown to the researcher and measurement errors. Utility can therefore be modeled as a random variable to account for this. The utility of an individual i that chooses alternative j in the choice set Ci is defined as

Uij = Vij + εij (1)

following the notation of Ben-Akiva and Bierlaire (1999). Vij is the part observed by the researcher, while εij is the part unobserved by the researcher. This utility based on the random utility theory is

at the basis of discrete-choice models. Because it is assumed that individuals choose the alternative that maximizes their utility, the probability of an individual i choosing alternative j is defined as

P (j|Ci) = P [Uij ≥ Uik ∀ k ∈ Ci]. (2)

How this translates into the actual discrete-choice models is described in the next section. Different discrete-choice models are mentioned and explained next. Assumptions, drawbacks and advantages of the models are also discussed.

2.1 Models

In this section some different discrete-choice models are mentioned. Important assumptions and their implications are discussed as well.

(9)

Following Nevo (2000), the utility of consumer i buying product j can be defined as

uij = αi(yi− pj) + xjβi+ ξj+ ij, (3)

i = 1, ..., I j = 1, ..., J

where yi is defined as the income of consumer i, pj is the price of product j, xj is a K-dimensional vector of observed product characteristics, ξj is a scalar for the unobserved product characteristics,

and ij is the disturbance term with mean zero. Lastly, αi and βiare the consumers’ marginal utility from income and a vector of taste-coefficients respectively. The consumer also has the option not to buy any goods at all. This has to be accounted for, because otherwise an equal increase in price of all products would not change the quantities purchased. According to Nevo (2000), the standard practice is to normalize the utility of the outside good to zero by setting the parameters equal to zero. The utility can then be defined as

ui0= αiyi+ i0. (4)

In the logit models explained below, ij is assumed to follow the Type I extreme-value distribution.

This distribution is used in logit models as it leads to a convenient formula for the probability that a consumer makes a certain choice (Rasmussen, 2007).

Besides the different possible specifications for the distribution of ij it is also possible to put

restrictions on the parameters to be estimated in the utility function specified above. Using the utility function and the distribution of ij it is possible to define the market share of product j,

which is explained in the models described in the following paragraphs.

2.1.1 Conditional Logit

One possible restriction is to assume that all consumers have identical taste parameters. This means that αi = α and βi = β for all consumers and that the disturbance term ij is uncorrelated across i ’s.

These restrictions lead to the popular standard conditional logit model and result in the following utility function.

uij = α(yi− pj) + xjβ + ξj+ ij, (5)

(10)

Because the coefficients are now the same for all consumers, it is possible to get an aggregate utility function by adding up all the incomes (Rasmussen, 2007):

uj = α(y − pj) + xjβ + ξj+ j, (6)

i = 1, ..., I j = 1, ..., J

Now the utility function does not depend on the consumer i anymore. As mentioned before, discrete choice models are based on the utility theory, where it is assumed that individuals choose the alternative that maximizes their utility. Choice probabilities, or market shares, are defined as in equation (2). Now the choice probability of product 1, in a market with j alternatives, is the probability that product 1 has a higher utility than all the other j alternatives. Formally, this results into equation (7) below.

s1 = P [u1 > u2] ∗ P [u1 > u3] ∗ ... ∗ P [u1 > uj] ∗ P [u1> u0]. (7)

After substituting equation (6) into equation (7) and with the assumption of ij having a Type I

extreme value distribution, the market share of each product can now be derived to the following (Nevo, 2000): sj = exp(xjβ − αpj+ ξj) 1 +PJ k=1exp(xjβ − αpj+ ξj) (8)

Because income is on both sides of each inequality in equation (7), it drops out and is not included in equation (8) above. Therefore, income does not play a role in the choice of consumers.

This conditional logit model is appealing and is used a lot in practice (Nevo, 2000). However, due to the strong restrictions the substitution patterns only depend on the market shares. When individuals are forced to switch to a different alternative, the model only takes the market shares of the other alternatives into account, and not how similar the characteristics of other alternatives are compared to the initially chosen alternative. The following price elasticities can be derived as is done by Rasmussen (2007). ηjk =    −αpj(1 − sj) if j = k, αpksk otherwise. (9)

(11)

There are two reasons why these elasticities are unrealistic and result in restricted substitution patterns (Nevo, 2000). First of all, when the market share is small, the own-price elasticities are close to −αpj, which means that demand is less responsive to price for lower prices. This implies that

the seller of a product charges a higher markup on products with low marginal costs, but there is no good reason to assume this. Second, this model implies the Independence of Irrelevant Alternatives property (IIA). This means that the model assumes that preferences of consumers are uncorrelated across products. A famous example is mentioned by McFadden (1974). Imagine people have the choice of travelling either by car or by bus and two-thirds choose to travel by car. Now imagine a second brand of bus travel is introduced, which is essentially the same as the first. Intuitively, it makes sense that two-thirds of people still choose to travel by car and the remaining people will split between the two bus options. However, under the IIA property, only half of the people choose the car alternative when the new bus is introduced. This IIA property could therefore result in unrealistic substitution patterns when some alternatives are close substitutes to each other. In the next section, the random coefficients logit model is discussed. This model does not imply the IIA property.

2.1.2 Random Coefficients Logit

Another discrete-choice model is the random coefficients logit model, sometimes called the mixed logit model. This model has been used and discussed a lot after Berry, Levinsohn and Pakes (1995) described methods for analyzing supply and demand based on this model. In this model it is assumed that the parameters are determined by consumer characteristics and are therefore different across consumers. This can be modelled in the following way, again using the notation in Nevo (2000):

  αi βi  =   α β  + ΠDi+ Σvi. (10)

The average values of αi and βi are defined as α and β respectively. Di is a vector of the observed

characteristics of consumer i and vi is a vector of the unobserved characteristics. Π is a matrix of coefficients determining how the parameters depend on the observed characteristics and vi is a

matrix of how the parameters depend on the unobserved characteristics. The utility in this model, already defined in equation (3), can be rewritten after the substitution of equation (10) into equation (3). This leads to the following specification.

(12)

uij = αi(yi− pj) + xjβi+ ξj+ ij, = αiyi+ δj + µij+ ij (11) i = 1, ..., I j = 1, ..., J where δj = −αpj+ xjβ + ξj (12) µij = (−pj, xj)(ΠDi+ Σvi). (13)

The utility is now split in four parts. The first part, αiyi is the utility from income. However, as is

explained in section 2.1.1, this does not play a role in the choice consumers make. The second term, δj, is called the mean utility by Berry (1994) and is the same for all consumers. The last two terms,

µij + ij, represent a disturbance term that captures the effects of the random coefficients. If it is

assumed again that ij has a Type I extreme value distribution, the market share is given by

sij =

exp(δj + µij)

1 +PJ

k=1exp(δk+ µik)

. (14)

The total aggregate market share is now defined as the integral of all the consumer-level choice probabilities defined in equation (14) (Nevo, 2000). Because of the integrals, the market shares are now harder to estimate compared to the conditional logit model (Rasmussen, 2007).

The advantage of the random coefficient logit model is that it allows for flexible substitution patterns (Nevo, 2000). The IIA assumption is accounted for. Therefore, if the price of a particular product increases, consumers are more likely to switch to another product with similar product characteristics instead of the brand with a big market share.

2.2 Model Estimation

In this section, the estimation of the parameters in the discrete-choice models above are discussed. Berry (1994) presents a method to transform the market share equation, so that parameters in the model can be estimated. The idea of the method used by Berry (1994) is to find the values of α and β that match the observed shares to the predicted shares by the model. The following example of

(13)

estimating the coefficients in the conditional logit model makes this more clear. The market shares of this model are defined as

sj =

exp(xjβ − αpj+ ξj)

1 +PJ

k=1exp(xjβ − αpj+ ξj)

. (15)

As mentioned before, the part in the exponent is called the mean utility by Berry (1994). These predicted market shares have to be matched with the observed market shares to get a system of linear equations. When the mean utility of the outside good is normalized to zero, this system of linear equations can be solved for the mean utility to get the following linear equation.

ln( ˆsj) − ln( ˆs0) = xjβ − αpj + ξj (16)

In this specification, ˆsj is defined as the observed market share of product j and ˆs0 is the market

share of the outside good. The log difference in these market shares can now simply be regressed on the mean utility to estimate the model parameters. A possible problem that should be mentioned here is the problem of endogenous prices. Both observed and unobserved product characteristics determine the prices. The unobserved product characteristics are captured by the disturbance term. Because these unobserved characteristics could influence the price of the products, the prices and the disturbance term can be correlated, which results in the endogeneity problem (Nevo, 2000). In the regression of log differences in market shares on the mean utility levels, instrumental variables can be used to solve for this problem. For the more complicated discrete-choice models the system of equations is harder to solve. For the random coefficient logit model, this could be done numerically (Berry, 1994). The reason is the difficulty of calculating the integral in the predicted market share equation. As Berry (1994) suggests, a tradeoff has to be made between the strong assumptions of the simpler models and the more complex calculations in the models where those assumptions are relaxed.

In this section, the framework of discrete-choice models necessary for answering the research question has been discussed. Two popular discrete-choice models have been presented. The more simple conditional logit model is easier to estimate, but it has some limitations. The limitations are relaxed in the random coefficient logit model. However, this comes at the cost of a more difficult estimation procedure. In this empirical analysis on food store choices, the random coefficient logit model is applied on a big dataset from a survey in the United States. This is discussed in detail in

(14)

the next section, where the method of the empirical research and the data used in this research are outlined.

3

Research method

The method of the empirical research is outlined in this section. Both the data used and the method of the research are discussed in detail.

For this research, data obtained by the USDA’s National Household Food Acquisition and Pur-chase Survey (FoodAPS) is used. It was designed to support research about American households’ food acquisitions, factors influencing food demand and how access to different types of food stores is related to food choices, food security, health and obesity. In total, 4826 American households completed the survey between April 2012 and January 2013. The primary respondent of each house-hold provided information about the househouse-holds and all the individuals in the househouse-holds. Different household demographics and information about the households’ food purchases, intake and health were collected over the course of seven days. One of the unique features of the FoodAPS dataset is that it contains a geographical component. It contains distance measures for the food outlets visited by the households, as well as distance measures for food outlets that could have been visited during the week. In particular, variables for the closest distance from households to each food outlet are an important part of this research.

For the analysis of the factors that influence store choice, there are four types of food stores taken into consideration. Although the dataset includes 11 types of food stores in total, the four types analyzed in this research are chosen to be the primary food store by the vast majority of the households. The four stores considered are superstores (SS), supermarkets (SM), medium to large grocery stores (MLG) and combination grocery stores (CO). Superstores are defined as the very large ‘big box’ stores, engaged in the retail sale of a wide variety of grocery and other store merchandise. Supermarkets are defined as establishments engaged in the retail of an extensive variety of grocery and other store merchandise, typically with ten or more checkout lanes. Medium to large grocery stores carry a moderate to wide selection of all four staple food categories. Lastly, combination grocery stores primarily sell general merchandise, but also a variety of food products.

To analyze the factors influencing store choice, the random-coefficient logit model explained in section 2.1.2 is used in this research. As discussed in the Theoretical framework above, this model accounts for the IIA property by allowing random coefficients on variables that vary across both cases

(15)

and alternatives. In the case of this empirical study, the only variable that varies across both cases and alternatives is the distance. In particular, the distance is defined as the distance to the closest food store of each type. The only difference of using the random-coefficient logit model instead of the conditional logit model in this application is that the coefficients of the distance variable are allowed to vary across individuals, which makes the model a bit more flexible. Besides this distance measure, a lot of household characteristics are included of which the family size, whether or not the household is located in a rural area and car ownership are some examples. Data about income and the participation in the supplemental nutrition assistance program (SNAP) is also included. The SNAP program provides purchasing assistance for low-income households in the United States. The data is divided into SNAP participating households and nonparticipating households in three income groups. Namely, incomes below 100% of the Federal Poverty Level (FPL), incomes between 100 and 185% of FPL and incomes at or above 185% of FPL. The Federal Poverty Level is the indicator the U.S. government uses to determine the eligibility of households for federal subsidies and aid. The households with incomes below 185% of the FPL are considered to be eligible for SNAP benefits, but do not participate in the program and households above the 185% FPL are not eligible. The SNAP and the low-income non-participant groups were oversampled in the dataset to allow a good analysis specifically for these groups. Lastly, individual variables of the primary respondent are also taken into the analysis. These include variables for gender, ethnicity, the race of the respondent and their age. Also a health variable, the BMI of the primary respondent, is taken into the analysis. In this research, the final dataset included 4299 out of the total 4826 households, after deleting households in the dataset that had missing values for the variables of interest.

The estimation of the random-coefficient logit model in this empirical research is done using Stata. Instead of using aggregated data and the expenditure shares as empirical analogue for choice probabilities, as in Taylor and Villas-Boas (2016), individual data on store choice is used. In version 15 of this program, a new command is included to estimate the model with the individual data, called asmixlogit. For this command, the dataset had to be transformed from wide to long format. This means that each household has four rows in the dataset. Namely, there is one row for each of the food store types, accompanied by the distance to that type of food store. The primary food store, identified by the primary food shopper, was used as the choice variable. A dummy variable had to be created for this primary food store choice, indicating whether or not one of the four food stores was identified to be the primary food store of each household. With the transformed dataset, the model could be estimated including both attributes of the four alternatives as well as all the

(16)

different household characteristics and individual variables.

Following the Stata manual of the asmixlogit command, the utility of the random-coefficient logit model introduced in section 2.1.2 can also be defined as

uij = xijβi+ wijα + ziδj+ ij. (17)

j = 1, ..., J

βiare the random coefficients that vary over households, and xij is a vector of variables that vary over

both households and alternatives. The vector wij also varies over both households and alternatives, but here the coefficients, α, are fixed. δj is a vector of alternative-specific coefficients on zi, a

vector of variables that only varies across households and not across alternatives. In this research, xij contains only the distance variable as this is the only variable that is estimated with random

coefficients. wij contains four dummy variables, namely one for each food store type, indicating whether or not a store is of a specific type or not. Lastly, zi contains all the variables that only

vary across households and not across alternatives. It has to be mentioned that although prices are defined in the models in section 2.1.2, they are not included in the empirical analysis here. The reason for this is that no data about prices is included in the FoodAPS dataset. The asmixlogit command approximates the choice probabilities by simulation and then estimates the parameters in the model by maximum simulated likelihood.

In the first part of the analysis, the research mentioned earlier of Taylor and Villas-Boas (2016) is partly replicated. The distance to the closest food outlet store of each type and dummy variables for the type of food store are used as the store attributes. In this part, the model is also estimated for the SNAP-participating households and each of the non-participating household income groups. The results of these estimation are analyzed and compared to the research of Taylor and Villas-Boas. The second part of the analysis is focused on the effect of household and individual characteristics on food store choice. Interaction effects with the distance variable are also estimated in this part of the analysis to see how distance travelled varies with different household characteristics. The results and discussion of the model output are presented in the next section.

4

Results and analysis

The results of the estimation are presented in the following way. First, the output of the replication of Taylor and Villas-Boas (2016) is analyzed in section 4.1. Table 1 reports the estimates for the entire sample of 4299 households, as well as for subsamples of households based on SNAP-participation

(17)

and income group. Second, the estimates of different household and individual characteristics, as well as some interaction effects with distance are presented and analyzed in section 4.2 and section 4.3. To identify the model, one of the alternatives has to be defined as the base alternative of which the coefficient is set to zero.3 In this research, supermarket is identified as the base alternative. Therefore, coefficients corresponding to the dummy variables of the alternatives and the interactions with those dummy variables measure the change in likelihood relative to the choice of supermarket.

4.1 Heterogeneity by SNAP participation and income group

To replicate the results of Taylor and Villas-Boas (2016), the model in section 3 is specified with xij containing the distance variable and wij containing the four dummy variables of the different

types of stores. The estimates, including the corresponding standard errors in parentheses, of this model for the entire sample and the different subsamples are presented in Table 1. The first column contains the estimates for the entire sample. The second column contains estimates for the 1405 households participating in the SNAP-program. Column 3 contains the estimates for 304 non-SNAP households with an income lower than 100% of FPL, column 4 contains estimates for the 766 non-SNAP households with an income between 101 and 185% of FPL, and to conclude, column 5 reports estimates for 1823 non-SNAP households with an income higher than 185% of FPL.

The distance variable (distance; distance to closest store of each type) is the only variable that is estimated with random coefficients. It is assumed that the coefficients of distance are normally distributed. Column 1 of Table 1 shows that the estimated mean of these normally distributed coefficients for the overall sample is -0.381 with a corresponding standard deviation of 0.282, indi-cating that most households are less likely to shop at a certain store when the distance to that store increases. To be precise, approximately 90% of households are less likely to shop at a certain store as distance increases, under the assumption of the coefficients being normally distributed 4.

Column 1 also contains estimates for the dummy variables of the three different stores relative to the base alternative supermarket. Therefore, the estimates of the other three stores presented in Ta-ble 1 can be interpreted relative to the choice of supermarket. In the overall sample, households are more likely to shop at superstores (SS) compared to supermarkets, indicated by the significantly pos-itive coefficient of 0.159. The negative coefficients on the dummy variables for grocery stores (MLG)

3Without this normalization there is more than one solution for the coefficients that lead to the same choice

probabilities

4

The probability is calculated by normalizing the estimated mean and using the cumulative standard normal distribution table.

(18)

and combination grocery stores (CO) indicate that households are less likely to shop at these stores compared to supermarkets. In the overall sample shopping at superstores is most preferred, followed by supermarkets, convenience stores and grocery stores respectively. This corresponds to the results obtained by Taylor and Villas-Boas (2016).

In columns 2-5, the estimates with respect to SNAP-participation and the non-SNAP households across three income group are reported. Using the estimated mean and standard deviation of the distance variables in these columns of Table 1, the approximate percentages of how many households are less likely to shop at a certain store when distance increases, is again calculated. The percentages are 90%, 95%, 92% and 91% respectively, showing no big differences across these groups. Also the mean estimates of the distance variables across the different groups do not vary much.

The estimates by outlet category show some interesting patterns. First of all, the superstore estimates for SNAP (column 2) and eligible non-SNAP (columns 3 and 4) households relative to supermarkets are very similar. The estimate of the lowest income group is not significant, but this likely is the case due to the lower sample size. The coefficient for the non-eligible non-SNAP house-hold group (0.0659) is lower compared to the other three groups and not significantly different from zero. Therefore, there is no evidence for assuming the higher income group is more likely to shop at superstores compared to supermarkets. The reason for this could be that superstores tend to have lower prices compared to supermarkets, which is more beneficial for the lower income groups. Besides the fact that the SNAP-participating households place a higher disutility on combination grocery stores than the lowest income non-SNAP group, no significant differences exist across SNAP participation and different income groups.

4.2 Household characteristics influencing distance travelled

Table 2 reports estimates from the same model as above, except for the fact that interaction terms of household variables with distance are now included as well. The household characteristics that are interacted with distance include dummies for car ownership (Car), ethnicity (Hispanic; Hispanic or not), obesity (BMI; BMI above 25 or not), SNAP-participation (SNAP) and whether or not households live in a rural area (Rural). Additionally, a few store choice rationales are interacted with distance as well. These include dummies for whether households indicated they shopped at their primary store because of closeness to that particular store (Closeness), good produce selection (Produce), quality of food (Quality), variety of food (Variety) and low prices or good value of products (Prices). Lastly, a dummy variable for whether or not households indicated they thought

(19)

Table 1: SNAP participation and income groups Overall SNAP <100% FPL 101-185% FPL >185% FPL distance -0.381∗∗∗ -0.350∗∗∗ -0.471∗∗∗ -0.344∗∗∗ -0.409∗∗∗ (0.0315) (0.0566) (0.126) (0.0688) (0.0486) Normal sd(distance) 0.282∗∗∗ 0.272∗∗∗ 0.279∗ 0.248∗∗∗ 0.308∗∗∗ (0.0325) (0.0609) (0.110) (0.0688) (0.0515) CO -3.072∗∗∗ -3.330∗∗∗ -2.501∗∗∗ -3.160∗∗∗ -3.000∗∗∗ (0.0978) (0.195) (0.290) (0.245) (0.143) MLG -3.637∗∗∗ -3.653∗∗∗ -3.421∗∗∗ -3.209∗∗∗ -3.921∗∗∗ (0.150) (0.263) (0.523) (0.296) (0.262) SS 0.159∗∗∗ 0.234∗∗∗ 0.200 0.224∗∗ 0.0659 (0.0334) (0.0585) (0.128) (0.0791) (0.0515) SM (Base alternative) N 17192 5620 1216 3064 7292

Standard errors in parentheses

(20)

it was expensive to eat healthy is also interacted with distance. Only the interaction terms are included in Table 2.

A few of the interactions are significantly different from zero. First of all, the estimate for distance interacted with car ownership (distcar) is significantly positive (0.152), indicating car ownership leads to consumers willing to travel further for food stores. Households living in a rural area (distrural) are even more willing to travel further, indicated by the significantly positive coefficient (0.296). This seems plausible because of the fact that food stores are further away for those households in general, so that they are used to travelling further. There is also a slightly significant positive interaction effect (0.083) with the BMI dummy variable indicating obesity (distBMI), but there is no particular reason to assume obese people are willing to travel further. Households indicating they shop at their primary food store because of the low prices or good value are more likely to travel further as well (distprices; 0.117). The reason for this probably is that supercenters, generally having lower prices than other food stores, are further away for most households. Lastly, households indicating they shop at their primary food store because it is close to home (distclose) are much less likely to travel further indicated by the significantly negative coefficient (-0.436), which is a very plausible result.

Table 2: Distance interaction terms distcar 0.152∗ (0.0658) disthisp -0.0603 (0.0701) distwhite 0.0249 (0.0513) distrural 0.296∗∗∗ (0.0426) distBMI 0.0830∗ (0.0394) distsnap 0.0469 (0.0390) distprices 0.117∗∗ (0.0398) distclose -0.436∗∗∗ (0.0411) distproduce -0.112 (0.0590) distquality -0.0254 (0.0562) distvariety 0.107∗ (0.0449) disthealthycost -0.0313 (0.0368) dist -0.647∗∗∗ (0.0911)

Standard errors in parentheses

(21)

4.3 Heterogeneity by store choice rationale and other household characteristics

Table 3 reports estimates from the same model specification as in section 4.1. Instead of estimating the model for subsamples based on SNAP-participation and income level, the model is estimated on subsamples based on the reason households give to shop at a particular store. Column 1 of Table 3 provides estimates for the overall sample. In columns 2-6 the estimates for subsamples based on whether households indicated they shop at their primary food store because of low prices (Prices), products (Produce), quality (Quality), variety (Variety) or closeness to food store (Closeness) are reported. The mean estimate for the distance variable, which is again specified with random coeffi-cients and assumed to be normally distributed, varies quite a lot by the indicated reasons to shop at food stores. The mean estimate for households indicating they shop at a particular store because of low prices or good value (Prices; -0.205), is significantly higher compared to the overall sample (although still negative). This suggests that, on average, households shopping for prices are willing to travel further. Households shopping at a store because of variety also seem to be willing to travel further on average, indicated by the mean coefficient of -0.289, which is higher than the coefficient of -0.381 in the overall sample. Households indicating they shop for a good product selection or variety of foods do not seem to behave differently from the overall sample considering their distance travelled. The mean estimate for households indicating they shop at a particular store because of closeness to that store is much more negative than the mean estimate of the overall sample (-0.809). This suggests that households valuing closeness to stores are not willing to travel as much compared to the overall sample, which makes sense intuitively. Another interesting thing of the estimates in Table 3 is the fact that for the subsample of households specifying they shop at their primary store because of the closeness to home, superstores (SS) are not more likely to be chosen compared to su-permarkets. This is indicated by the insignificant estimate of -0.0804 in column 6. The logic behind this could be that superstores are, in general, further from home compared to supermarkets, which makes it less likely that households valuing closeness of a store to shop at superstores. Moreover, for households indicating they shop at a particular store because of the good quality of foods, shopping at supermarkets is preferred to shopping at superstores.

Table 4 provides insight into differences among some household characteristics. The model also used in section 4.1 is estimated again. The table shows estimates of this model for six different subsamples. These subsamples are based on whether households live in a rural area (Rural), eth-nicity (Hispanic), race (White; white or not), car ownership (Car), obesity (BMI) and on whether households indicate they think it is expensive to eat healthy or not (Healthycost). Looking at the

(22)

Table 3: Reason to shop at store

Overal Prices Produce Quality Variety Closeness distance -0.381∗∗∗ -0.205∗∗∗ -0.422∗∗∗ -0.389∗∗∗ -0.289∗∗∗ -0.809∗∗∗ (0.0315) (0.0338) (0.0915) (0.0796) (0.0610) (0.0625) Normal sd(distance) 0.282∗∗∗ 0.162∗∗∗ 0.286∗∗ 0.284∗∗ 0.193∗ 0.434∗∗∗ (0.0325) (0.0424) (0.102) (0.0981) (0.0770) (0.0507) CO -3.072∗∗∗ -3.003∗∗∗ -3.024∗∗∗ -3.373∗∗∗ -3.189∗∗∗ -3.581∗∗∗ (0.0978) (0.134) (0.229) (0.237) (0.226) (0.149) MLG -3.637∗∗∗ -3.789∗∗∗ -3.597∗∗∗ -3.908∗∗∗ -4.535∗∗∗ -3.774∗∗∗ (0.150) (0.217) (0.350) (0.352) (0.481) (0.217) SS 0.159∗∗∗ 0.293∗∗∗ -0.154 -0.255∗∗∗ 0.252∗∗∗ -0.0804 (0.0334) (0.0421) (0.0839) (0.0756) (0.0677) (0.0505) SM (Base alternative) N 17192 10220 2724 3352 4016 8532

Standard errors in parentheses

(23)

distance measures, it stands out that households in a rural area seem to be willing to travel further compared to the overall sample, indicated by the mean estimate of -0.176. On the other hand, on average, hispanic households are less likely to travel far compared to the overall sample, given their much more negative mean estimate for distance of -0.562. However, the high standard deviation of the random coefficients on the distance variable suggest that there is a lot of heterogeneity in preferences in this subsample. Although the mean estimate is more negative compared to the overall sample, still almost 10% of the households’ coefficients are higher than zero.5 This indicates that those households are not less likely to choose a particular store when distance from home to that store increases.

Table 4: Divided among household characteristics

Rural Hispanic White Car BMI Healthycost distance -0.176∗∗∗ -0.562∗∗∗ -0.328∗∗∗ -0.362∗∗∗ -0.315∗∗∗ -0.331∗∗∗ (0.0238) (0.111) (0.0326) (0.0320) (0.0332) (0.0406) Normal sd(distance) 0.122∗∗ 0.432∗∗∗ 0.249∗∗∗ 0.278∗∗∗ 0.239∗∗∗ 0.212∗∗∗ (0.0379) (0.131) (0.0352) (0.0339) (0.0355) (0.0424) CO -2.668∗∗∗ -2.782∗∗∗ -2.995∗∗∗ -3.164∗∗∗ -2.925∗∗∗ -2.951∗∗∗ (0.162) (0.216) (0.112) (0.110) (0.115) (0.146) MLG -3.034∗∗∗ -2.952∗∗∗ -3.610∗∗∗ -3.932∗∗∗ -3.755∗∗∗ -3.341∗∗∗ (0.240) (0.257) (0.177) (0.186) (0.194) (0.203) SS 0.274∗∗∗ 0.468∗∗∗ 0.102∗ 0.143∗∗∗ 0.198∗∗∗ 0.236∗∗∗ (0.0651) (0.0764) (0.0400) (0.0363) (0.0405) (0.0507) SM (Base alternative) N 4656 3364 11980 14544 11456 7304

Standard errors in parentheses

p < 0.05,∗∗ p < 0.01,∗∗∗p < 0.001

5

As stated before, this is the case when the coefficients are assumed to be normally distributed. The percentage is calculated by normalizing the estimated mean and using the cumulative standard normal distribution table.

(24)

4.4 Limitations

In this section, limitations of this research are addressed. The first constraint in this research that should be mentioned is the omission of data about prices in different food stores. Although prices are considered to be an important factor for determining store choice, there was no data about it available in the FoodAPS dataset. As mentioned before, endogeneity in prices is a problem that should be addressed when prices are included in possible further research.

Another limitation of this study is that only the choices of primary food stores, identified by the primary food shopper of each household, were considered. However, most households shop at multiple stores, and different household members may make purchases. The FoodAPS dataset does include information about other stores being visited, but it would be hard to include this into the discrete-choice analysis.

Lastly, some remarks can be made about the IIA assumption, which is accounted for in this research by using random coefficient logit instead of conditional logit. As mentioned in McFadden (1974), the IIA assumption is implausible when close substitutes exist in alternative sets, of which the bus-car example in section 2.1.1 is a good example. It is then stated that models implying the IIA assumption should only be applied in situations where alternatives can be assumed to be distinct and weighed independently by the decision-maker. It could be argued that, in this research, the store choices considered in this research are not really close substitutes to each other, which may result in the IIA assumption not being that big of a problem. However, estimating the random coefficient model, to relax the IIA assumption, does not pose a drawback in this research. The main problem of estimating a model like this, is that it is more computationally complex, resulting in long estimation times. However, estimation times of the random coefficient logit model in this research were relatively short, due to the low amount of specified alternatives.

5

Conclusion

In this research, a discrete choice analysis is done on food store choices of U.S. households. Al-though a wide body of research examined different factors influencing healthy food choices, not a lot of research has been done on the factors determining food store choice. Therefore, the goal of this research was to analyze how individual and household characteristics of food shoppers interact with both the type of food store they choose and the distance from home to each store. This research built on previous research by using the large nationally representative FoodAPS dataset to make

(25)

results more generalizable. Additionally, instead of using aggregated data and using expenditure shares as an empirical analogue for store choice, individual data is used. Moreover, a larger number of variables that could influence store choice is considered and the random coefficient logit model is estimated to account for the Independence of Irrelevant Alternatives (IIA) property.

To replicate part of the research done by Taylor and Villas-Boas (2016), heterogeneity by SNAP participation and income group was first analyzed. The results show that the distance households want to travel to food stores, does not vary across income groups and participation in the Sup-plemental Nutrition Assistance Program (SNAP). Additionally, preferences for type food store do not vary across these groups. Superstores are most preferred, after which supermarkets, grocery stores and combination grocery stores are the next preferred options respectively. Only for the high income group, there is no evidence that superstores are preferred to supermarkets, possibly because the generally lower prices in superstores are more beneficial for lower income groups. In general, the results of this research were very similar to the results presented by Taylor and Villas-Boas (2016). The second part of the results section shows that households owning a car, households living in a rural area, households indicating they shop at a particular store because of low prices, and households including members with obesity, are more willing to travel further for food stores. Also, households indicating they value closeness to home of a food store are not more likely to shop at superstores compared to supermarkets. Moreover, households shopping at a particular store because of the quality of food actually prefer supermarkets to superstores. Lastly, white people do not prefer superstores above supermarkets as much as non-white people do.

The main limitation in this research that could be addressed in further research is the absence of data about prices in food stores. Furthermore, only the primary food store was taken into account in this analysis, which poses the question of whether shopping at multiple stores could be considered in further research.

To conclude, it is important for policies like the Healthy Food Financing Initiative (HFFI) to understand how people shop for food and what factors determine the choices they make. This re-search provides insights into how preferences of food stores and distance from home to food stores vary among income groups and different individual and household characteristics. The next step is to explore how these preferences can be leveraged to promote healthier eating.

(26)

References

[1] Ben-Akiva, M. E., & Bierlaire, M. (1999): "Discrete choice methods and their applications to short-term travel decisions," Handbook of Transportation Science, 5-33

[2] Ben-Akiva, M. E. & Lerman, S.R. (1985): Discrete Choice Analysis: Theory and Application to Travel Demand, Cambridge, MA: MIT Press.

[3] Berry, S. (1994): "Estimating discrete-choice models of product differentiation," The RAND Journal of Economics, 25 (2), 242-262

[4] Berry, S., & Haile, P. (2016): "Identification in differentiated products markets," Annual Review of Economics, 25, 27-52

[5] Berry, S., Levinsohn, J., & Pakes (1995): "Automobile Prices in Market Equilibrium," Econo-metrica, 63 (4), 841-890

[6] Block, D., & Kouba, J. (2012): "A comparison of the availability and affordability of a market basket in two communities in the Chicago area," Public Health Nutrition, 9 (7), 837-845

[7] Dagsvik, J. K. (1995): "Probabilistic choice models for uncertain outcomes," Statistics Norway, 141

[8] Finkelstein, E. A., Trogdon, J. G., Cohen, J. W., Dietz, W. (2009): "Annual medical spending attributable to obesity: payer- and service-specific estimates," Health affairs, 28 (5), 822-831 [9] Hillier, A., Smith, T., Cannuscio, C., Karpyn, A., Glanz, K. (2015): "A Discrete Choice Approach

to Modeling Food Store Access," Environment and Planning B: Planning and Design, 42 (2), 263-278

[10] Krukowski, R. A., & McSweeney, J. (2012): "Qualitative study of influences on food store choice," Appetite, 59 (2), 510-516

[11] Manski, C. (1977): "The structure of random utility models," Theory and Decision, 8 (3), 229-254

[12] McFadden, D. (1974): "Conditional logit analysis of qualitative choice analysis," Frontiers in Econometrics, 105-142

(27)

[13] Nevo, A. (2000): "A practitioner’s guide to estimation of random-coefficients logit models of demand," Journal of economics & management strategy, 9 (4), 513-548

[14] Taylor, R., & Villas-Boas, S. B. (2016): "Food Store Choices of Poor Households: A Discrete Choice Analysis of the National Household Food Acquisition and Purchase Survey (FoodAPS)," American Journal of Agricultural Economics, 98 (2), 513-532

Referenties

GERELATEERDE DOCUMENTEN

and experience from different aspects. At the end of the second day, Mr.E ' Asmussen, Director of SWOV, gave a summary and conclusions of the essenfal aspects of the papers

Stelde in 2006 in alle opzichten teleur op dit proefveld. De bladstand was in het voorjaar zeer matig. later wel wat beter, maar in juli weer slecht door bladverbranding vanwege

Fietspaden, Zijn stedelijke '&#34; veilig voor bromfietsers 37 Fietspaden voor fietsers maar niet op kruisingen 37 Fletsverkeer, De veiligheid van het .&#34; 35. Helm,

The primary objective of this chapter, however, is to present the specific political risks identified in the Niger Delta, as well as the CSR initiatives and practices

In SMSC the spin splitting of the electron levels is strongly enhanced by the exchange interaction between the spins of the local- ised magnetic atoms and the

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

Self-control as a moderator on the moderating effect of goal to eat healthy on the interaction between healthy section menu to healthy food choice.. University

We find evidence that household size significantly affects the relationship of optimism and stock ownership decision but is not significant in the relation of optimism and risky