• No results found

Food demand elasticity : dealing with zero observations

N/A
N/A
Protected

Academic year: 2021

Share "Food demand elasticity : dealing with zero observations"

Copied!
90
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

FACULTY OF ECONOMICS AND BUSINESS

Šarlota Smutná

Food demand elasticity: dealing with zero

observations

MASTER THESIS

(2)

Center of Charles University in Prague for providing a data and final thank goes to Dr. Milan Ščasný for enabling the analysis of the data.

I hereby declare, that I have wrien this thesis using only literature and other sources listed in bibliography. Furthermore, I declare that I have not used this thesis to acquire another academic degree. Lastly, I declare that individual entries from the household budget survey have been analysed in accordance with the Agreement on the data usage signed between the Environment Center of Charles University in Prague and the Czech Statistical Office.

(3)

Elasticity is the most important tool of the demand analysis. In order to obtain correct results, it is necessary to consistently estimate the coefficients of the employed models. When using budget survey data for an analysis, the problem of observing significant number of zeros oen arises and has to be solved. A large variety of methods designed to deal with censored data have been developed over time. e goal of the thesis is to in-vestigate the performance of several estimators under different conditions such as viola-tion of normality assumpviola-tion or different levels of censoring. ese different approaches are: tobit model, two-part model, double-hurdle model, and sample selection model. e sample selection model is estimated by three different techniques: maximum likelihood, Heckman two-step procedure, and Cossle’s semi-parametric approach. eir perfor-mance is examined in the Monte Carlo experiment held on generated data and in the real data analysis of food demand. e data comes from the Household budget survey 2012 conducted by Czech Statistical Office on 3000 Czech households. e results of both parts are coherent and show that the best estimators for food demand are the OLS (part of two-part model) and Heckman two-step procedure.

(4)

Abstract ii

List of Tables vi

1 Introduction 1

2 Literature review 3

2.1 Motivation . . . 3

2.2 Evolution of main limited dependent variable models in food demand . . 8

2.2.1 Notation . . . 8

2.2.2 Tobit model . . . 9

2.2.3 Decision in two steps . . . 11

2.2.4 Two-part model / double-hurdle model . . . 13

2.2.5 Sample selection model . . . 15

2.2.6 Comparison of two-part and selection models . . . 17

2.2.7 Semi-parametric methods . . . 19

2.2.8 Cossle’s semi-parametric approach . . . 20

2.2.9 Goal of the empirical part . . . 22

3 Monte Carlo experiment 24 3.1 Methodological remarks . . . 24

3.2 Data generating process and true models . . . 25

3.3 Design of the experiment . . . 27

3.4 Results . . . 30

3.4.1 True model equivalent to the double-hurdle model . . . 34

3.4.2 True model equivalent to the sample selection model . . . 36

3.5 Conclusion . . . 38

4 Real data part 40 4.1 Methodological remarks . . . 40 4.2 Description of data . . . 41 4.3 Model . . . 44 4.4 Results . . . 47 4.5 Conclusion . . . 53 5 Discussion of results 55 6 Conclusion 57 Bibliography 59 iii

(5)
(6)

3.1 Characteristics of explanatory variables excluding constant term . . . 27

3.2 Different conditions to test performance of estimators . . . 28

3.3 Generating of different values of correlation between X1and X2 . . . 29

3.4 Intervals of estimation of ρ = 0 by FIML across all set-ups. . . 32

3.5 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1 and X2case c and level of censoring µ=0.75 . . . 33

3.6 Results of MSEc(first row) and MSEβ (second row) for different cases of collinearity and for ρ = 0 . . . 34

3.7 Estimates of constant terms in case of ρ = 0, collinearity of X1 and X2 case c, and level of censoring µ = 0.75 . . . 35

3.8 Results of MSEc(first row) and MSEβ(second row) for different collinear-ity of X1and X2 - case a, b, c taking into account all non-zero values of ρ. . . 37

4.1 Expenditure categories structured according to COICOP . . . 42

4.2 Selected food items according to their levels of censoring . . . 42

4.3 Description of variables . . . 43

4.4 Demand for vegetables, level of censoring 0.045 . . . 47

4.5 Demand for low-fat milk, level of censoring 0.089 . . . 48

4.6 Demand for pork, level of censoring 0.131 . . . 49

4.7 Demand for rice, level of censoring 0.295 . . . 50

4.8 Demand for beef, level of censoring 0.458 . . . 51

4.9 Demand for pears, level of censoring 0.764 . . . 52

4.10 Estimates of elasticities . . . 53

4.11 Elasticities estimated in Brosig (1998) . . . 53 A.1 Estimates of correlation ρ of u and v produced by FIML for case a of

collinearity and for different levels of censoring. . . I A.2 Estimates of correlation ρ of u and v produced by FIML for case b of

collinearity and for different levels of censoring. . . II A.3 Estimates of correlation ρ of u and v produced by FIML for case c of

collinearity and for different levels of censoring. . . III A.4 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case a and censoring level µ=0.05 . . . IV

A.5 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case a and censoring level µ=0.10 . . . . V

A.6 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case a and censoring level µ=0.15 . . . VI

A.7 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case a and censoring level µ=0.30 . . . VII

(7)

A.8 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case a and censoring level µ=0.50 . . . VIII

A.9 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case a and censoring level µ=0.75 . . . IX

A.10 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case b and censoring level µ=0.05 . . . . X

A.11 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case b and censoring level µ=0.10 . . . XI

A.12 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case b and censoring level µ=0.15 . . . XII

A.13 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case b and censoring level µ=0.30 . . . XIII

A.14 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case b and censoring level µ=0.50 . . . XIV

A.15 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case b and censoring level µ=0.75 . . . XV

A.16 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case c and censoring level µ=0.05 . . . XVI

A.17 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case c and censoring level µ=0.10 . . . XVII

A.18 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case c and censoring level µ=0.15 . . . XVIII

A.19 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case c and censoring level µ=0.30 . . . XIX

A.20 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

and X2- case c and censoring level µ=0.50 . . . XX

A.21 Results of MSEc(first row) and MSEβ (second row) for collinearity of X1

(8)

Introduction

Food is one of the most basic needs in human life. erefore, the demand for food is a widely studied topic in applied econometrics since the first half of 20ᵗʰ century, the very beginning of this science. e main point of interest of studies is food in general, food divided in subcategories (fruit, dairy products or meat products, for instance), or specifically targeted items (for example, different types of meat). e most useful tool to interpret the demand, relationship of consumers towards goods given by an interaction of price and quantity, is an elasticity. Particularly, we distinguish the income elasticity, own-price elasticity or cross-price elasticities.

In order to estimate these elasticities authors usually use models (discussed later) where the quantity or expenditures on goods are taken in logarithm. A problem can arise for single food items or for specific food categories when the observation of their quantity or expenditures is equal to zero, i.e., for those that are not consumed at given point in time. Hence, authors should handle the data carefully because the frequent zero-observations already cause the data to be censored, moreover, these observations cannot be used in a model because of the logarithms. To be more specific, the data is le-censored at zero. e treatment depends on characteristics of mechanism of censor-ing (household do or do not consume the food item) and its relation to the outcome (the quantity of good or the expenditures on good). Different approaches have been devel-oped in time and one part of this thesis aims to briefly summarize those main concepts. Typically, for the food demand, the Heckman two-step procedure assuming normally distributed errors is proposed as a treatment technique of the given problem. e princi-pal goal of this thesis is to examine the quality of Heckman estimator when the assump-tion of normality is violated with respect to different levels of censoring, different levels of selectivity, and different levels of collinearity between explanatory variables. Beside this, the thesis examines the behaviour of other methods such as the two-part model, the basic sample selection model estimated by Maximum Likelihood (MLE) instead of Heckman two-step procedure, application of the Ordinary Least Squares estimate, and Cossle’s semi-parametric approach for sample selection models. In order to exam-ine the performance of different techniques with respect to different conditions, Monte

(9)

Carlo simulations (MC) are used on generated data. As a result, this thesis provides in-formation about the effect of the degree of censoring and selectivity on the estimates of elasticities and comparison between different approaches. In contrast to previous work on this topic, the MC are extended in terms of different conditions of models, more ap-proaches are considered at the same time, and the data generating process aempts to simulate the food demand data.

Next, providing the results from MC, the real data of the household budget survey is used to estimate demand elasticities. e source of the data is the Czech Statistical Office, but the data itself is provided by the Environment Center of Charles University in Prague. e budget survey data consists of approximately 3000 Czech households and includes expenditures on a large variety of food as well as socio-demographic informa-tion. Specifically, vegetables, low-fat milk, pork, beef, rice, and pears demand elasticities are estimated. e choice of these items is convenient to compare the real data results with the MC results at the approximatively same levels of censoring.

e remainder of the thesis is structured as follows. e next chapter introduces a motivation to study further different techniques used in literature for dealing with cen-sored data and describes them. In the Chapter 3, the Monte Carlo analysis is realized. e data generating process follows the common features of variables used in literature. e Chapter 4 presents the real data and the estimation of elasticities from it. e Chap-ter 5 discusses results and ChapChap-ter 6 concludes the thesis.

(10)

Literature review

2.1 Motivation

Elasticities are the most common and important outcomes of demand studies in general, hence, they are for food demand as well. ey are a simple tool to express the relation-ship of consumer towards a good with respect to different parameters which determine the decision of purchase. By definition, an elasticity is a percentage change of one vari-able caused by one percentage change of another varivari-able. us, it measures a sensitiv-ity of reaction to such a change. In case of the demand analysis, the income, own-price, and cross-price elasticities can be distinguished. e change of quantity demanded or of expenditures on a good is related to the change in income, price of the given good and prices of other goods, respectively. e usefulness follows from a broad variety of elasticity’s applications. To name a few: evaluating tax impacts on both consumers’ pur-chases and government revenues, giving an important information to producers about their (potential) customers and about the quantity of their product demanded on the market, evaluating an effect of advertising, or examining the nutrient composition of consumers’ diet.

e idea behind taxing (take an example of indirect taxes as value-added tax) is that restructuring of tax system influences prices at the first place. Using appropriate elastici-ties it can be estimated what impact the demand will face and using changes in the quan-tity demanded (purchased) it can be determined how the direct revenues from goods’ taxation will change and how the revenues from taxing producers will change with the change in their earnings. e whole problem is more complicated as the behaviour of producers must be also taken in account. However, consumer demand elasticities remain an important part of such an analysis.

Examining the diet, nutritional composition of consumed food, can mainly serve for evaluating food policy issues and programs. Recently, the demand is widely studied in developing countries, moreover, the calories and nutrients received are calculated. Re-searchers observe the development of countries through the food whether inhabitants are able to purchase enough food to feed themselves sufficiently. From another point of

(11)

view, there exist many food programs also in developed countries such as the Food Stamp Program in the United States. ose are usually targeted at low-income or no-income people. e income elasticities enable to evaluate what diet is available for people with different incomes. Huang and Lin (2000) develop in their paper nutrient income elastici-ties, “the percentage change in nutrient availability with respect to changes in household food expenditure which are useful in evaluating the effect of the food stamp benefit on nutrient availability.” e diet is also examined in order to improve the public health. For example, Gould (1996) studied the fluid milk demand and how it varies with fat content. Milk was at that time the second most important source of fat in American diet. Such a study “may assist policy-makers in targeting health information to specific subgroups to more effectively achieve the goal of reducing dietary fat intake” (Gould, 1996), as it provides information on characteristics of inhabitants related to consumption of a given type of milk and how they react to changes in price (i.e. when the price of whole milk increases the quantity of reduced-fat milk can increase which would be, in principle, the goal of policy-makers).

Data used for demand analyses is basically of three types: time series, scanner data, and budget surveys. Firstly, time series are usually aggregated data on macroeconomic level. ey report the total consumption or expenditures over time based on averaging or aggregating over consumers. As this data provides overall information, the depen-dence on socio-demographic characteristics cannot be examined. Hence, it is not possi-ble to obtain information about specific types of consumers. According to Huang and Lin (2000), most of the available demand elasticities come from this type of data. Sec-ondly, scanner data is a relatively new type of data source, studies held on scanner data are available since 1980s. e data provides information on prices and quantities sold in supermarkets thanks to electronic systems recording all transactions. Such a type of data can be very useful for determining spatial allocation of food demand, an effect of advertising or distinguishing the brand demand for particular food item. e own and cross-price elasticities can also be estimated. Scanner data does not report infor-mation on socio-demographic characteristics of consumers but sometimes it is linked to regional characteristics where the store is located or simply to inhabitants in neighbour-hood of the store. Some of the first demand studies on scanner data are presented in Nayga (1992). Lastly, the budget surveys conducted mainly for households are rich in socio-demographic characteristics. is is the only type of data which is able to measure an effect of those variables. On the other hand, it includes just part of the population which does not have to be representative and researchers have to be careful when mak-ing general conclusions. e prices of goods are not commonly reported, they can be imputed from quantity and expenditures on good. When they are calculated upon the

(12)

information from households, they consequently vary for every household and this fact must be taken into account when analysing. To give an idea about the frequency of us-ing different types of data, Andreyeva et al. (2010) reviewed all studies about American food demand conducted during 1938-2007 for their dietary research. ey found that time series are the most frequently used data, 62 % of published papers work with this type of data. 21 % of papers were analysed employing budget survey data and 17 % with scanner data. e scanner data and budget surveys are very important sources of data that are today analysed more oen. As also Heien and Wessels (1990) stated in their paper: “Data based on these household-expenditure surveys present a major estimation problem. For any given household, many of the goods have zero consumption, implying a censored dependent variable. Techniques which do not take this censored dependent variable into account will yield biased results.” e broad use of budget data and being the only type of data with chance of observing zeros are the reasons why this thesis focuses on the budget survey data only.

e occurrence of zeros has a variety of explanations. First, it is a question whether the consumer participates in the market or not. Not participating means either that the consumer is not interested in the good, the good represents zero utility, or the consumer has even negative utility from such a good, a meat for a vegetarian, for instance. Hence, he will not buy the good even for zero price. However, the consumer can participate in the market and report the zero consumption in the same time. e zero quantity of good can be the corner solution of consumer utility maximization problem, for example, the substitutes are easily available, cheaper or the good itself is too expensive so it is not purchased at given time. us, the costs play their role there. Another cause is the time length of a survey, the good can be purchased but not so frequently thus the zero can occur for given period of time. In literature, this is called the infrequency purchase. In contrast, for data reported monthly, quarterly, or yearly, there is a low chance of such zeros for food items which are perishable. e last reason which has to be taken into account is misreporting.

Over time, the preferences or the prices can change and so can consumption. Gould (1996) gives an example: “With a drop in a commodity’s price, current consumers of a normal good have an incentive to increase their consumption. is situation represents an intensive response which has typically been analysed with regression-based method-ologies. For persons who are not current consumers of the commodity, a price reduction may even induce them to enter the market and purchase the commodity, an extensive re-sponse.” e problem is that according to the data we are usually not able to distinguish between different types of zeros but those different types can be generated by different stochastic mechanisms. is should be reflected in the analysis, an example of aempt

(13)

to incorporate the idea of infrequency purchase is given in the paper of Blundell and Meghir (1987). e situation is different, for example, in contingent valuation. ose studies are trying to investigate the willingness to pay (WTP) for non-goods usually dealing with environmental issues such as WTP for limitation of traffic in consumer’s neighbourhood. Data is usually obtained by interview’s methods where the researchers try to reveal the origin of zeros by the additional questions. Only the appropriate zeros are later considered and WTP can be estimated, for example, by spike models such as in Kriström (1997) or Reiser and Shechter (1999).

As Urban et al. (2006) points out: “zero treatment has a long tradition in economic literature, particularly in the field of welfare measurement and demand system mod-elling.” e history of dealing with zero observations, with censored dependent variable, is rich in general but also especially in the food demand. Many approaches and their modifications of selectivity models were developed over time.

Prior to the brief summary of the evolution of main approaches, where some of them we describe in detail, it is of a particular interest to distinguish two types of models used to analyse the food demand: the single equation framework and more complex framework, demand systems. By the single equation researchers investigate the single food commodity and its connection to other commodities can be examined by adding appropriate independent variables. e general and basic form of the single equation model can be wrien in matrix form as follows:

ln q = Xβ + ϵ, (2.1)

where q is the demanded good quantity of the interest, X consists of k explanatory vari-ables, among them the income, price of the investigated good, prices of possibly related goods and socio-demographic information, and ϵ are disturbances. e quantity, income and prices are commonly taken in natural logarithms in order to construct the log-log model. e elasticities are then estimated directly by corresponding beta coefficients.

In contrast, the demand systems estimate the system of equations for each food item or food subcategory which are related to each other and are generally more demanding. Stone (1954) with his Linear Expenditure System (LES) is considered to be the first author who derived the equation system built on consumer preferences. Since that time many modifications and approaches have been developed and among them the most important is the Almost Ideal Demand System model derived by Deaton and Muellbauer (1980). Nowadays it is very common model with a frequent usage and estimated by different

(14)

methods. To give an idea, the basic AIDS model is presented: wi = αi+ ∑ j γijln pj + βiln (x P ) , (2.2)

where wiis the share of good i in total expenditures (expenditures on goods within the

system investigated), p are prices of goods included in the system, x are total expendi-tures, and P is a price index defined as

ln P = α0+ ∑ k αkln pk+ 1 2 ∑ jk γkjln pkln pj.

e α’s, β’s and γ’s are coefficients to be estimated. e authors propose the estimation by the Maximum Likelihood Estimation (MLE). e socio-demographic variables dhcan

be included in the model as: αi = α0,i +

hαh,idh. e coefficients have to satisfy

additional constraints in order to obtain estimates in accordance with PIGLOG (price-independent generalized linear) preferences, on which basis the system is built. ere exist more convenient and practical modifications of AIDS model that have been de-veloped over time such as LA/AIDS (linear version of the model) or QUAIDS (quadratic approximation of model allowing for quadratic Engel curve). In these models, the elastic-ities are calculated through formulas composed of coefficients estimates. e calculation of elasticities is more extensively discussed in the work of Green and Alston (1990, 1991).

What we wanted to show is that for both frameworks the zero observations cause nonnegligible problems. In both cases, the data is censored. e dependent variable is limited as the noticeable part of observations can be equal to zero and the rest of observations are continuous over strictly positive values. Moreover, in the AIDS model the wiis also restricted by one. To be more precise, the AIDS model is estimated through

summation over households for every food item i of the system.¹ e zero share would not be likely for the average over households but the estimate of the equation (2.2) is calculated for every household at first. Hence, the zero share needs to be truly dealt with. us, in the case of AIDS demand system we face censored data as well. In the case of single equation model we can face both, censored data and truncated data if the log-log model is used. We cannot employ the zero quantity as ln(0) is not defined. Either we can delete them, the whole observation is lost, information on both sides - the dependent and the independent variables as well, thus, the data is truncated. Or, we can assign to those observations some minimal positive value which implies that the observation is either equal to that value or a lower value, the data is then censored and the information of independent variable is preserved.

(15)

For both frameworks as well as for both truncation and censoring of data, the variety of models has been developed. Concerning the truncation and censoring, the reasoning of models is basically the same, the model itself is usually slightly modified. is thesis pays more aention on techniques that are applied on single equations. Typically, simi-lar techniques but extended are used for demand systems as for example in the paper of Heien and Wessels (1990).

2.2 Evolution of main limited dependent variable

mod-els in food demand

2.2.1 Notation

In this section, the general notation will be introduced in order to keep it the same along the rest of the chapter and make it clear for readers. Following the notation of Cameron and Trivedi (2005), let say we have a food item A represented by:

• y∗ the completely observed/latent dependent variable; (logarithm o) quantity of

Apurchased,

• y the observed/available to use variable; (logarithm o) quantity of A purchased, • L threshold of censoring.

Supposing that

y∗ = Xβ + ϵ,

where X is the matrix (i x k) of k variables and i observations, ϵ is the vector of distur-bances such as E[ϵ] = 0, we are interested in consistent estimates of β, the vector of k coefficients, from the censored or truncated data y, as:

E[y∗|X] = Xβ E[y∗|X]

∂xk

= βk.

e βk measures the effect of variable xk in the whole population. We can be also

in-terested inE[y|X]

∂xk , the effect the independent variable has on the limited dependent one,

for example, the effect the price of A has on current consumers of A. However, we are rather interested in coefficients for whole population. Moreover, keeping both variables in logarithm form (y∗, x

k), the βkis the investigated elasticity.

We will use ϕ to denote probability density function f of the standard normal distri-bution and Φ as cumulative distridistri-bution function F of the standard normal distridistri-bution.

(16)

2.2.2 Tobit model

e basic model dealing with censoring of data is the tobit model proposed by Tobin (1958). e interest of his work is to gain more information from the limited dependent variables. Initially, the probit model was used for these variables which explains only the switching between zeros and ones. Zeros and ones do not necessarily represent the values of variable, usually they play a role of dummy variable which can represent, for example, participating or not in the market. To use full information, he developed his own model. e motivation for his work is the reporting of zeros in household survey of automobiles or durable goods. He assumes that “the explanatory variables influence both, the probability of limit response and size of non-limit response.” It means that the same stochastic mechanism determines the so called participation equation (whether the household consumes A or not) and the outcome equation (the quantity of A purchased). Hence, the same covariates should theoretically influence both, the probit model and modelling of the quantity only, for example, by the tobit model. Blundell and Meghir (1987) add that: “In the Tobit specification, all zero values taken by the dependent vari-able would correspond to a corner solution.” Following our notation and derivation of model in Tobin (1958) and Cameron and Trivedi (2005) and using the value of 0 as a threshold, the proposed model is then:

y∗ = Xβ + ϵ y =    y∗ y∗ > 0 y∗ ≤ 0.

Posing a distributional assumption on error terms ϵ ∼ N[0, σ2], the density f of

censored y is then derived as a mix of continuous densities of y itself, which has the density of normal distribution N[Xβ, σ2], and the probability of zero-observations:

Pr[y∗ ≤ 0] = Pr[Xβ + ϵ ≤ 0] = Φ ( −Xβ σ ) = 1− Φ ( σ ) (2.3)

then the final density is:

f (y) = ( 1 2πσ2e −(y−Xβ)2 2σ2 )d · ( 1− Φ ( σ ))1−d , (2.4)

with the binary indicator d defined as d = 1 when y∗ > 0and d = 0 otherwise. e

(17)

function: ln LN = ∑ + ( 1 2ln(2π)− 1 2ln σ 2 1 2(yi− x iβ) 2 ) +∑ 0 ln ( 1− Φ(x σ ) ) (2.5)

where the∑+is the summation over positive observations of y∗and

0 is the

summa-tion over zero observasumma-tions. Similarly, it can be rewrien as

ln LN = ∑ i di· ( 1 2ln(2π)− 1 2ln σ 2 1 2(yi− x iβ)2 ) +(1−di)·ln ( 1− Φ(x σ ) ) . (2.6) is model was originally described for censored data only, the zero-observations are included in the log-likelihood function. Amemiya (1973) in his paper provides a proof of the consistency and the asymptotic normality of the proposed estimator, and extends the theory behind the model. In the conclusion of the paper he gives the form of log-likelihood function for truncated data:

ln LN = ∑ i ( 1 2ln(2π)− 1 2ln σ 2 1 2(yi− x iβ) 2− ln(Φ(x′iβ σ )) ) . (2.7)

Greene (1981) derives the bias of Ordinary Least Squares (OLS) estimation technique and further proposes how to correct the results of the OLS by a simple formula in order to obtain consistent results. Comparing to the MLE, his procedure gives similar results according to the experiment in that paper but it is derived still on base of assumption of multivariate normally distributed X and y∗. Flood (1985) compares the corrected OLS

and MLE estimates and finds out that the MLE technique performs beer. Moreover, he studies the robustness of two estimators on non-normality assumption. He mixes two normal distributions and concludes that the corrected OLS is more robust than MLE but under the normality the MLE performs beer. Powell (1986) suggests an alternative way of the estimation, the trimmed least squares. It hinges on the assumption of symmetric distribution of y∗. e censoring cuts one of the tails of the distribution and the

estima-tion process cuts the second tail. As in the previous case, an assumpestima-tion on distribuestima-tion is imposed, although the method is already classified as semi-parametric.

e MLE is not robust on heteroskedasticity or distributional assumptions in general. Arabmazar and Schmidt (1982) study the biasedness of the probit model and censored and truncated tobit models. Instead of normal distribution, they use the Laplace, logistic and t-distribution. e conclusions of the paper are: results for each distribution are similar to each other; lower percentage of censoring leads to almost no bias, there is “virtually no bias” for 25 % of censoring; the censored tobit is usually less biased then the truncated one and when the variance of errors’ distribution is unknown, models

(18)

not controlling for selectivity can be beer than those correcting the selectivity under wrong distributional assumptions. e important fact for the food analysis this study has shown is that although the data are available for every household and we cannot use zeros in a log-log model, it is beer to give them the threshold value larger than zero in order to use these observations (censored data) than delete them (truncated data).

As we are interested mainly in the consistent estimate of β’s coefficients, the formu-las or derivations of standard errors and marginal effects for y are not included. e marginal effects of an independent variable on truncated or censored population are derived in McDonald and Moffi (1980).

Amemiya (1984) provides a good summary of the tobit model, possibilities of its estimation techniques and applications. In his terminology, the tobit model described above is the one called the standard tobit or the tobit of type I. e paper lists studies where the tobit model was used. Mainly, there are studies of labour market - analysing worked hours or studies of durable goods. Some other papers are also mentioned in McDonald and Moffi (1980).

Amemiya (1974) goes further and develops an extension of the tobit model on system of equations. e so called Tobin-Amemiya approach was adapted and slightly modified, for example, in paper of Dong et al. (2004) to estimate the food demand in Mexico. Later on Nelson and Olson (1978) and Amemiya (1979) develop a tobit for simultaneous system of equations.

To sum up, the tobit model is usable for both, truncated and censored data. It assumes that the process of participation and outcome are determined by the same stochastic process. It means that the probit and tobit model would use in such a case the same covariates with the same coefficients. e usual estimator is the MLE. e problem is in its normality assumption, violating this the estimates of coefficients would be biased. Regarding the aim of this thesis in the estimation of elasticities, the interest is to obtain consistent coefficients.

2.2.3 Decision in two steps

e tobit model’s assumption that the decision of purchasing good (participation) and quantity purchased (outcome) is specified by the same stochastic process seems to be unrealistic. e same variable is thus supposed to influence both the participation and the outcome in the same way (sign and magnitude). is is very unlikely in the case of food demand. e same variables which determine the participation are not probable to influence the quantity when, for example, the food item is not consumed because of per-sonal convictions (vegetarianism), religion convictions (Judaism, Hinduism) or dietary restrictions (diabetes).

(19)

e idea of decision in two steps was developed; different variables may influence the participation and outcome decision. Basic idea of the extension in two equations can be illustrated as: y∗1 = X1β1+ u y∗2 = X2β2+ v y2 =    y∗2 y∗1 > 0 y∗1 ≤ 0.

e first equation is called the participation equation. e variables of X1 determine

the decision of purchase. e variable y∗

1 is usually related to a dummy variable, and in

most of the cases it is completely observed. e second equation is called the outcome equation and the explanatory variables determine the outcome’s magnitude. For the main goal of the analysis the y2 is observed and it is the quantity of purchased goods,

possibly expressed in logarithm.

e distinguishing property of estimation approaches of such models is the assump-tion of correlaassump-tion of disturbances u and v. In the literature, there is a terminology problem while authors set and name their models. Assuming the independence of er-rors, they estimate the same or different models under different names such as two-part model, the tobit model of type II with independence, or double-hurdle model with inde-pendence. Furthermore, authors are interested in the problem of correlated errors in the two equations model shown above. us, they employ equally estimated models under different names such as the bivariate selection model, the probit selection model, the tobit model of type II, or the Heckman model. e variables included in X1 and X2 do

not have to be two disjoint sets, on the contrary, when the two sets have at least one variable in common the identification in the MLE is justified.

To illustrate the problem of tobit model, we take the paper of Haines et al. (1988) which can serve as a good example. e authors compare the tobit coefficients’ estimates with those obtained from one of the two-part models. e “probit” estimation designates coefficients of participation equation and “truncated” estimation represents the outcome equation. In cases where the tobit model is not appropriate its coefficients are similar either to the probit ones or the truncated ones or they are different from both of them. We can clearly see in the paper that sometimes, for example, the estimates produced by tobit model almost copies those of probit model but the outcome coefficients are slightly or distinctly different. e information given by tobit model is then at least imprecise.

(20)

2.2.4 Two-part model / double-hurdle model

e term two-part arises from the fact that we estimate the model in two phases - partic-ipation and outcome equation. Double hurdle term comes from the fact that the variable to be observed has to overcome two hurdles - firstly, the household has to participate in the market and secondly, the positive amount has to be purchased at given time.

Cragg (1971) is considered as the first author who proposes the two-part model ap-proach. He presents five different models together with tobit model, examines their estimates’ properties by sampling experiment and their performance by using real data to estimate the demand of households for durable goods. In the literature oen used double-hurdle model is aributed to Cragg’s model given by equations (7) and (9) in his original paper. In this text, we follow the model notation of Jones (1989) because of its clarity: participation equation: y∗1 = X1β1+ u outcome equation: y∗ 2 = X2β2+ v observation: y2 =    X2β2+ v y1 > 0 ∧ y∗2 > 0 0 otherwise.

e model allows for zero quantities also in the outcome equation not regarding the result of participation equation. Hence, it permits the zeros caused by infrequency of purchase. Assuming no correlation of u ∼ N[0, σ2

u] and v ∼ N[0, σv2], the likelihood

function is: LN = ∏ 01− Φ ( X1β1 σu ) Φ ( X2β2 σv ) ∏ +Φ ( X1β1 σu ) Φ ( X2β2 σv ) (2.8) ∏ + 1 σvϕ ( y2−X2β2 σv ) Φ ( X2β2 σv )−1 .

Concerning the aim of this thesis, the estimation of elasticities, let the outcome equation has the log-log model form same as in (2.1), then the interest is to obtain the unbiased estimate of coefficients β2’s.

is classical double-hurdle model lies heavily on assumption of normality and ho-moskedasticity as it is usual for the MLE, and on the zero correlation of error terms of two equations. Hence, for example, Newman et al. (2001) in analysis of demand for prepared meal and Yen and Huang (1996) in analysis of demand for finfish introduce into this model an inverse hyperbolic sinus transformation of observed dependent vari-able in order to overcome the distributional assumption, and a simple tool to overcome heteroskedasticity in this specific case through σi = exp(Z′γ). An example of

(21)

differ-ent popular transformation is the Box-Cox double-hurdle model used in Jones and Yen (2000) to analyse beef consumption which also relaxes the distributional assumption of errors.

As the two-part model, the literature indicates the model where the errors are in-dependent, and moreover, the outcome equation does not allow for zeros, they can be only in the participation equation. us, there are no zeros produced by consumption equation itself:

y1 = X1β1+ u (2.9)

y2|y1 > 0 = X2β2+ v. (2.10)

It can be conveniently rewrien for our purpose of food demand:

ln(y2|y∗1 > 0) = X2β2+ v. (2.11)

Following Jones and Yen (2000), the likelihood is simplified to:

LN = ∏ 0 Φ(−X1β1) ∏ + 1 σv ϕ ( ln(y2)− X2β2 σv ) Φ(X1β1), (2.12)

where u ∼ N[0, 1] and v|y∗

1 > 0 ∼ N[0, σ2v]. As y2 are directly defined as conditional,

the estimation gets simpler from the MLE to two separate estimation, while the probit estimates the participation equation (2.9), the OLS regress ln(y2)on X2 using only the

positive observations due to the conditioning.

Regarding elasticities, the aim is to determine E[ln(y2)]without conditioning on y∗1,

which is in this case equal to

E[ln(y2)] = P [y1 > 0] E[ln(y2|y1 > 0)] = Φ(X1β1) X2β2, (2.13) hence ∂E[ln(y2)] ∂xk |xk∈X2,xk∈X/ 1 = Φ(X1β1)β2,k (2.14) or ∂E[ln(y2)] ∂xk |xk∈X1,X2 = ϕ(X1β1)β1,kX2β2+ Φ(X1β1)β2,k, (2.15)

(22)

2.2.5 Sample selection model

e selection models have their name due to the fact that the sample is not random. e selectivity (rule) is given by the participation equation. As already mentioned, the errors of the participation and outcome equation are very likely to be correlated and then such a model needs to be treated differently. Puhani (2000) affirms that the selection model appears oen especially, among others, in texts estimating consumer demand. e basic idea of bivariate sample selection model (following Cameron and Trivedi (2005)), tobit model of type II, probit selection model (following Amemiya (1984)), simple selection model (in a vast literature) is the already known structure:

participation equation: y1 = X1β1+ u outcome equation: y2 = X2β2+ v observation: y2 =    y2 y1 > 0 0 y1 ≤ 0,

where u and v are correlated this time. Assuming that errors follow the bivariate normal distribution and are homoskedastic, we have:

( u v ) ∼ N [( 0 0 ) , ( σ2 u σuv σuv σv2 )] , where the σ2

u is normalized (posed equal to 1) due to the fact that it is the sign of y∗1

only which is relevant. e likelihood function can be derived from general expression (Cameron and Trivedi, 2005):

LN = ∏ 0 P [y1 ≤ 0]∏ + f (y2|y1 > 0)P [y1 > 0], (2.16)

that leads to the desired expression such that (Amemiya, 1984; Leung and Yu, 1996):

LN = ∏ 0 Φ ( −X1β1 σu ) ∏ + Φ (X 1β1 σu + ρ(y√2− X2β2)/σv 1− ρ2 ) ϕ ( (y2− X2β2) σv ) 1 σv , (2.17) where ρ is the correlation coefficient of errors defined as ρ = σuv

σuσv =

σuv

σv . e model

can be estimated by the MLE. is estimation is sometimes called the full-information maximum likelihood (FIML) and it is asymptotically efficient. On the other hand, it relies heavily on the distributional assumptions.

Yen and Huang (2002) is one of the first example which demonstrate in their paper the use of FIML estimator for demand system. ey present the Censored Translog demand

(23)

system estimated by FIML and simulated MLE together with application on real data. eir paper provides an analysis of demand for beef products in United States using data from 1987-88. ey also argue that the multi-step procedures are usually inefficient (see below) that is why the incorporating of FIML is of importance.

Heckman estimator

Heckman (1976) comes with a simpler way of estimating. He derives the bias which arises when the sample selection model is estimated by simple regression, OLS. e bias is then used to correct the OLS estimation. e procedure is clearly summarized and extended by derivation of asymptotic properties in the later work of Heckman (1979). In the literature, this estimator is also assigned to different names such as Heckman two-step procedure or LIML (limited-information maximum likelihood). e joint density of

uand v implies that we can define v as:

v = σuv σu

u + ξ = σuvu + ξ, (2.18)

which was the starting point for the derivation of estimator with ξ, the random variable that is independent of u and whose mean is zero. e OLS estimation of the outcome equation would lead to the bias:

E[y2|X2, y1 > 0] = X2β2+ E[v|X2, y1 > 0], (2.19)

because of its last term which can be wrien as

E[v|X2, y∗1 > 0] = E[v|X2, u >−X1β1] = E[ σuv σu u + ξ|u > −X1β1] (2.20) = σuv σu E[u|u > −X1β1] = σuv σu ϕ(−X1β1 σu ) 1− Φ(−X1β1 σu ) = σuv σu ϕ(−X1β1 σu ) Φ(X1β1 σu ) = σuv σu λ ( −X1β1 σu ) , where λ(z) = ϕ(z)

1−Φ(z) is the Mills ratio. en (2.19) can be rewrien as

E[y2|X2, y∗1 > 0] = X2β2+ σuv σu λ ( −X1β1 σu ) . (2.21)

A problem arises as the Mills ratio is unknown because of unknown β1and σu. Heckman

proposes an estimation procedure that allows to quantify these unknown parameters and its sequence is following:

(24)

• estimate the probit model for the participation equation and obtain thus the ˆβ1

and pose σu = 1,

• estimate values of ˆλ for each observation,

• estimate the outcome equation by OLS on non-zero observations including ˆλ as one of the regressors.

Heckman (1979) argues that the estimates are consistent and proves that the usual stan-dard errors of OLS are not appropriate. us, he further derives the covariance-variance matrix of this estimate. By definition of the procedure, the errors v are heteroskedatic. erefore, the estimate is not efficient. Disposing the appropriate standard errors, we can test the correlation of errors, H0 : σσuvu = 0, by t-test and possibly switch model to

two-part or double-hurdle. To avoid the complicated formulas of standard errors, they can be estimated through bootstrap method as well. e problem which can arise is mul-ticollinearity in case when the matrices X1 and X2 have a lot of variables in common

(see subsection 2.2.6).

Olsen (1980) reminds that the errors distributional assumption is less strict. Suppos-ing that u is standard normally distributed, the expected value of v conditional on u is linear in u, and the errors are correlated in terms of that they can be wrien as:

v = δu + ξ, (2.22)

the estimation procedure remains consistent. e exact joint distribution is not actually needed, neither the normal distribution of v and ξ. e estimation procedure in case of others than normal distribution of u is derived in the same paper. However, the MLE estimate is more efficient (Amemiya, 1984).

e Heckman two-step estimator is used, fox example, in analysis of demand for pre-pared meals by Park and Capps Jr. (1997). Heien and Wessels (1990) generalize this esti-mator to use it in demand systems and give illustration on AIDS model. eir approach was frequently used in food analysis, for food-away-from-home demand by Byrne and Capps Jr. (1996), for instance.

Concerning the elasticities, we need the unbiased estimation of β2’s which can be

obtained either by the MLE or Heckman two-step procedure. Both methods rely on the distributional assumption of errors and are only derived for the case of normality.

2.2.6 Comparison of two-part and selection models

Discussion on the performance of both models was rather extensive in the past. Several Monte Carlo experiments have been held in order to determine which of two models

(25)

performs beer in terms of biasedness of coefficients’ estimates and efficiency of stan-dard errors. ere were several strict proponents of two-part model and those of sample selection model. For example, Manning et al. (1987) find out that the two-part model performs beer even when the selection model is the true model. e problem together with its discussion is nicely summarized in the paper of Leung and Yu (1996) including the list of some MC studies. ey argue that some of the previous experiments (for example, Manning et al., 1987) are not correctly held as they oen include only one uni-formly distributed independent variable. ey carry on their own MC simulations and examine the performance of two-part model and sample selection model estimated by two methods described above with the switching of true model. e paper brings a clear resolution to the problem: “When the sample selection model is the true model, it per-forms substantially beer than the two-part model as long as there are no collinearity problems. When the two-part model is the true model, the sample selection model is inferior, but it is still reasonably close to the two-part model. e two-part model per-forms beer when it is the true model.” e problem of collinearity between X1 and

X2 or X2 and λ is more deeply studied than in the previous papers. e significantly

high collinearity is a problem for Heckman estimator as well as for the MLE. e estima-tors then perform poorly. Moreover, the collinearity is higher with the higher degree of censoring. Authors recommend to test the collinearity first and then decide about the method of estimation (even about the model framework, the two-part model can be beer choice in case of serious collinearity, for instance).

Puhani (2000) provides more recent summary by listing several Monte Carlo studies (the Leung and Yu (1996) included) and their characteristics. e paper is supposed to be aimed on the Heckman procedure but it gives also a good overview discussing the two-part model in comparison with the sample selection model. He says that: “e relative performance of the estimators is studied in relation to the joint distributions of the error terms u and v, the correlations between the error terms, the degree of censoring, and the degree of collinearity between the regressors X1 and X2 or between X2 and the

inverse Mills ratio.”² e résumé yields to following conclusions. e violation of joint distribution of errors does not lead to evident consequences. e higher the correlation between u and v the beer performance of FIML over LIML. e most interesting is the problem of collinearity which has the most important impact on decision between the estimators. “If collinearity problems are present, subsample OLS (or the Two-Part Model) may be the most robust and simple-to-calculate estimator.” (Puhani, 2000)

Concerning the distributional assumptions and unclear results of biasedness of esti-mators as they depend on the design of violation, some authors recommend to apply the

(26)

semi-parametric or non-parametric methods to which we move in the next section.

2.2.7 Semi-parametric methods

Semi-parametric methods introduced in the literature are mainly derived for the sample selection model. Following Heckman reasoning, the bias of OLS estimation of the out-come equation arises from E[v|X2, y1 > 0]. We rewrite the model similarly to Newey

et al. (1990) as:

outcome: y2 = d· (X2β2+ v) (2.23)

participation: d = I(X1β1+ u > 0), (2.24)

where the I(·) is the indicator function, then:

y2 = X2β2+ ϵ + E[v|u > −X1β1], (2.25)

where E[ϵ|X] = 0. e last term in equation (2.25) is the term of interest. e equation is

parametrically given and the term of E[v|u > −X1β1]is estimated semi-parametrically

in order to avoid the distributional assumption of errors. e equation (2.25) can be then estimated by OLS as the OLS does not pose the distributional assumptions, it needs to satisfy only E[ϵ|X2] = 0. e distributional assumptions allow us to apply the t-test or

F-test statistics.

Chay and Powell (2001) present three semi-parametric methods and illustrate their application on the real data estimating the earnings of black men in the United States before and aer the introducing a non-discriminatory law. Data are censored because of the taxable maximum earnings which is reported in case this threshold is overcome. ey compare the OLS estimates on censored data, the OLS estimates from only uncen-sored observations, the MLE, and three other semi-parametric techniques. ese tech-niques are: the censored least absolute deviations (CLAD), the symmetrically censored least squares (SCLS), and the identically censored least absolute deviations (ICLAD). ese models do not completely solve the problem of assumptions because on each of them the particular assumption is still imposed. “e SCLS estimator is based on the as-sumption that the error terms are symmetrically distributed around zero, which implies that their median (and mean) is zero.” e CLAD estimator requires the zero median and the ICLAD requires identically distributed errors. On the other hand, those assumptions are less strict than in the parametric case. e MLE provides in their specific application the worst results.

(27)

distribu-tional assumption in the Heckman two-step procedure as this estimator is very popular. Generally, the distribution of u is estimated semi-parametrically, then the β1 is

consis-tently estimated, and finally the error term is implemented in the outcome equation by a specific term. A very common non-parametric tool is the kernel estimation of distri-bution. e implementation of kernel estimation of distribution of -u by two different techniques are illustrated in Newey et al. (1990), for example. ey reconsider the data to study the labour supply of married women and make a comparison to the original study. Results are similar to parametric approach which is not surprising as specification tests imply weak evidence of selectivity in the data.

Inspired by recent paper of Hussinger (2008), for the purpose of this thesis we will employ the semi-parametric approach of Cossle (1991). Hussinger evaluates the impact of public R&D subsidies on private investments in R&D. She compares the results of the Heckman parametric estimator to semi-parametric Cossle’s, Newey’s, and Robinson’s procedures. Unfortunately, in her work she is not able to proof the selection problem and thus all results are similar to each other including the ones of OLS as well.

2.2.8 Cosslett’s semi-parametric approach

Cossle (1991) presents an alternative approach to the Heckman two-step procedure. e bias of the OLS estimate in the second stage arises from E[v|u > −X1β1]which can

be rewrien as an unknown function

y2 = X2β2+ ϵ + Ψ(X1β1). (2.26)

In the first-stage, we need to consistently estimate the ˆβ1through the unknown

distribu-tion F (−X1β1). Cossle (1991) discusses two possible estimators of unknown F : one

defined in Cossle (1983) and the other in Klein and Spady (1993). He lists also other op-tions of estimation of unknown F . is chapter aims on the estimator of Cossle (1983); the Klein and Spady (1993) method is basically an implementation of kernel estimation of density. In the second stage, we need to approximate Ψ by some functional form.

Cossle (1983) derives the “distribution-free maximum likelihood estimator of the bi-nary choice model.” e distribution of error terms is constructed as a step function. e procedure is explained step by step in Cossle (1983, pp.773) together with the definition of the log-likelihood function and process of its maximization. e nature of the step function does not allow to proceed the maximization task by usual techniques. Hence, Cossle (1983) uses the Manski (1975) estimator. e notation in the paper is lile bit different as the choice between two goods according to their indirect utility function is considered there.

(28)

Supposing that we have the consistent estimator of F , the following likelihood func-tion can be maximized in order to determine consistent estimates of ˆβ1 and additional

coefficients of ˆF: ln LN = Ni=1 dln(1− F (−X1β1)) + (1− d) ln(F (−X1β1)), (2.27)

where d is a binary indicator equal to one if and only if y∗

1 > 0. For purposes of

identi-fication under the assumption that at least one variable in X1 has non-zero coefficient,

we have: X1β1 = x1,1+ Kk=2 xk,1βk,1. (2.28)

e constant term is missing and for comparing estimates of semi-parametric procedure to those of the MLE and the original Heckman two-step procedure, Cossle proposes “standardized estimates”: ¯ β0,1 = mF sF , β¯k,1 = ˆ βk,1 sF , (2.29)

where mF is the mean of ˆF and sF is its standard deviation. Having these consistent

estimates of the first stage, the function Ψ in the second stage is then approximated by

y2,i = X2,iβ2+ ϵi+ J

j=1

λjI(i∈ I(j)), (2.30)

where the set of dummy variables is such that λj = E[v|u > u∗j]and

I(j) ={i|u∗j−1 <−X1,iβˆ1 < u∗j and y1,i = 1} (2.31)

is the subset of observations with u∗

0 = −∞, uJ +1 = ∞, I(J + 1) is empty, I(1) can

be empty, and remaining subsets are non-empty. e step function ˆF determines the

location of steps u∗

j and their number J. e constant term is also missing, hence Cossle

proposes: ˆ β0,2 = Mj=1 wjλˆj, M = o(J ), (2.32)

where wj are positive weights, closer unspecified.

Hussinger (2008) deals with the problem of identification of constant term as well. For the Cossle estimator, she proposes to use as an intercept the estimated coefficient of dummy variable which is equal to one for observations with high probability to be treated (Hussinger examines the treatment effects). For other semi-parametric methods, she uses the constant term defined in Heckman (1990) and its variation from Andrews

(29)

and Schafgans (1998). Both depend on arbitrarily chosen coefficients.

Cossle (1991) completes his approach by proving its consistency and by conducting the MC experiment to compare the MLE and semi-parametric results. Results show that when the bivariate normality determines the distribution of error terms, the MLE performs beer mainly in terms of standard errors when censoring level is small, 25 %. For censoring at 75 % level, both estimates are very similar. On the other hand, when the distribution of errors is a mix of two normal distributions the semi-parametric estimation performs clearly beer. To correct the problems of standard errors the bootstrap method can be used.

To see an example of continuous estimate of distribution function F by semi-parametric method which does not need special aention while maximizing, one can use Gallant and Nychka (1987) estimator. ey suppose the density is of the edited Hermite form such as:

f (u) = [PK(u)ϕ(u)]2,

where P (u) is a polynomial of order K which is arbitrarily chosen. e only given guidance is that the K is increasing in sample size. Further, they prove the consistency of estimator. ey state that the most similar approach to theirs is the one in Cossle (1983): “e fundamental difference between our approach and Cossle’s is that Cossle maximizes the likelihood over the collection of all distribution functions whereas we maximize over a restricted class of distributions having smooth densities. […] e re-stricted estimator can accommodate completely general exogenous variables and not just independently and identically distributed exogenous variables.” An useful and clear overview of the estimator is provided in Stewart (2004).

2.2.9 Goal of the empirical part

Statistical methods to test conditions under which the analysis is held on have been developed over time. We can test the distribution of the data or the correlation of er-rors, for instance. Contrary to availability of tests, the alternative estimators are not so common. We have seen that there exists a non-distributional transformation of double-hurdle model. For the Heckman two-step procedure mainly the semi-parametric alter-natives have been developed. Concerning the two-part model, the crucial assumptions are the conditioning and independence of errors, the distributional assumptions are not needed, thus there is no need of alternatives with different distributional assumptions. e FIML estimation of sample selection model has alternatives in the Heckman two-step procedure and its semi-parametric variants. Lastly, the tobit model has not been extended in practice for different distributions yet. Moreover, the performance of

(30)

alter-native approaches is not always satisfying. For example, the Box-Cox transformation does not allow for strict normal distribution of errors. On top of that, we have seen that, for example, Puhani (2000) proposes to use the two-part estimator in case where the true model is actually the sample selection model and the correlation of errors u and

vis non-zero. is suggestion is proposed because of the bias produced by violation of

normality assumption and collinearity problems which is greater for sample selection estimator than for two-part estimator. Finally, the researcher is supposed to test its data and then to choose the best model under the conditions he faces. e principal aim of the thesis is an extension of available Monte Carlo experiments, models performance comparisons, and suggestions given by previous papers. e extension consists of more combinations of conditions examined and of comparison of more models at the same time. In addition, the thesis focuses on the food demand analysis, thus, the design of the experiment should represent the estimation of elasticities of food items upon budget survey data. e results try to provide more precise suggestions of models’ choice to researchers in this field of study. To complete the idea, the real data part is included as well in order to evaluate the usefulness of MC experiment’s conclusions.

(31)

Monte Carlo experiment

3.1 Methodological remarks

is section provides a specification of estimators we will use in both following parts, in the Monte Carlo experiment and in the real data analysis.

TOBIT Coefficients β2are estimated by the MLE upon the log-likelihood function for

censored data (2.6) using the observed y2and X2:

ln LN = ∑ i di· ( 1 2ln(2π)− 1 2ln σ 2 v 1 2 v (y2,i− x′2,iβ2)2 ) +(1−di)·ln ( Φ(−x 2,iβ2 σv ) ) , (3.1) where the Φ(−x′2,iβ2

σv )is used instead of 1− Φ(

x′2,iβ2 σv ).

HURDLE e double-hurdle estimator takes both variables y1 with X1, and y2 with

X2. It estimates β1 and β2 at the same time by the MLE upon the logarithm form of

likelihood function (2.8).

OLS Assuming zero correlation of disturbances between participation and outcome equation the two-part estimator is simply the probit for participation and the OLS for outcome equation. As we are particularly interested in consistent estimates of β2, the

two-part model is represented by the OLS on uncensored data, it means on positive observations only.

FIML is estimator of sample selection model also estimates both β1 and β2 at the

same time upon the logarithm of likelihood function (2.17).

LIML In the first stage, we estimate β1 by probit model, then we calculate ˆλ. In the

second stage, we estimate β2by OLS using only uncensored (positive) observations and

to the regressors X2 we add the ˆλ. e log-likelihood function of the probit estimator

(32)

we use is:

ln LN =

d· ln(1 − Φ(−X1β1)) + (1− d) · ln(Φ(−X1β1)). (3.2)

COSSLETT (COSS) In the first stage, we estimate the ˆF (X1βˆ1)upon log-likelihood

function (2.27) where the distribution function F is specified by Gallant and Nychka (1987).¹ We choose this approach as the maximization can be proceeded without special aention as such F is continuous. In the second stage, we estimate β2 by OLS using

uncensored data and we add the set of dummy variables defined by (2.31) between the regressors. As the estimated ˆF is a continuous function instead of step function, we

determine the location of steps u∗

j by spliing the X1βˆ1 into intervals of equal length.

e number of intervals and also the dummy variables is equal to 6 for most of the es-timations. is number was determined by preliminary Monte Carlo simulations, the estimates were stable in this case and the OLS did not suffer from the perfect multi-collinearity problem. e constant term is excluded from the OLS estimation to avoid the dummy variable trap. We use the Hussinger (2008) approach in order to identify the constant term. According to her paper, the constant term is equal to the coefficient of the dummy variable which indicates high probability of participating. In our case, this means to use the first dummy variable as:

P [y1 = 1|X1] = 1− P [u ≤ −X1β1] = 1− F (−X1β1).

In sum, we will use six different estimators to obtain consistent estimates of β2.

3.2 Data generating process and true models

Let consider a situation we face while analysing the budget survey data. e goal is to estimate the income I, own-price pi and cross-price pj elasticities of given food item i

from the single equation model in matrix form:

ln qi = β0+ β1ln I + β2ln pi+ β3ln pj1+ln pj2+ k

l=1

βlxl+ ϵ, (3.3)

where xl are other explanatory variables such as the socio-demographic information.

e use of logarithms simplifies the elasticities’ calculations as appropriate β’s are ex-actly equal to elasticities.

(33)

e budget survey data usually consists of following information. ere are socio-demographic characteristics and information upon which we can estimate the demand for food item i such as its quantity purchased (q), expenditures on the item (p· q), or

price (p). Not all of these three lastly mentioned variables are always available but at least two of them are necessary for the estimation. Usually, the price is missing and is determined by unit values (expenditures divided by quantity). Unit values vary over households and consequently, they need special aention mainly in the interpretation.² For our purposes, we suppose there is an information in the dataset which indicates if the household participates in the market of food item i. e indication can be given through explicit variable or implicitly by distinguishing q = 0 and q > 0. Further, we suppose there is all necessary information to estimate the equation (3.3) including the prices and quantities.

e problem arises from the potential possibility of presence of zeros in q in the data set. e data is le-censored at 0, the observed independent variable ln q is le-censored at−∞: observed variable: ln q =    X2β2+ ϵ ln q∗ >−∞ or q > 0 −∞ otherwise. (3.4)

In that case, it is convenient to assign some minimal value to threshold L instead of−∞

in order to compute

P[ln q∗ ≤ L] = P[ϵ ≤ L − X2β2].

Moreover, the food demand is commonly estimated as the two-step decision problem as it seems to be more realistic.

Without loss of generality, we can suggest that the problem of food demand analysis can be represented by already known scheme:

participation equation: y∗ 1 = X1β1+ u outcome equation: y∗2 = X2β2+ v observation: y2 =    X2β2 + v y1 > 0∧ y2 > 0 0, where y∗

1 is the dummy variable (for example y1 = 1if q > 0 and y1 = 0otherwise) and

y2 with threshold equal to 0 represents ln q with L > −∞. Concerning elasticities, we

are interested in consistent estimates of β2.

For the MC experiment we simulate data according to the above two-step scheme.

Referenties

GERELATEERDE DOCUMENTEN

With respect to this one should especially think of the extra degrees of cost control that should be identified, the pros and resulting implications that allocating totally

contender for the Newsmaker, however, he notes that comparing Our South African Rhino and Marikana coverage through a media monitoring company, the committee saw both received

(continued) Examples of professional roles and attributes enhanced through Master of Nutrition students’ participation in the NOMA track module ‘Nutrition, Human Rights

Using standard arguments we f ind that the net stock immediately after arrival of the order generated at time 0 equals S-D[O,La].. Then it is clear that the net stock immediately

[r]

[r]

[r]

[r]