Analyzing the use of multiple proxies in empirical studies

(1)

Analyzing the use of multiple

proxies in empirical studies

Dirk van den Merkhof

(2)

Master's Thesis Econometrics, Operations Research and Actuarial Studies Supervisor: prof. dr. T.J. Wansbeek

(3)

Analyzing the use of multiple

proxies in empirical studies

Dirk van den Merkhof

*

July 2018

(4)

1 Introduction

Many studies involve variables that cannot be measured precisely, such as health, ability, or wealth. Researchers try to approximate these latent variables, by finding a closely related variable that can be observed precisely. For example, one could use an IQ-test score to approximate a person’s ability. We call a variable that approximates the latent variable a proxy variable. Researchers mostly use only one proxy variable when dealing with a latent variable and then proceed by using this proxy variable as if it were the true latent variable. However, an IQ-test score alone would not be fully representative of one’s ability. A single proxy variable will measure the latent variable with error. This measurement error causes our estimator to be biased, which is called attenuation bias. Multiple proxies may be better able to measure the true underlying variable, which re-sults in a smaller measurement error. We will never be able to eliminate the attenuation bias completely when using multiple proxies, but we can try to minimize it.

There are numerous ways of combining multiple proxies to approximate the latent vari-able. For example, to estimate one’s ability in mathematics, an average of multiple exam scores can be used. However, the score of a more difficult exam will be more informative about one’s ability than the score of an easy exam. All students score high on the easy exam, but only the best students score high on the difficult exam. Using a weighted average of the exam scores, where a more difficult exam has a higher weight, would give a better estimate of one’s ability in mathematics. Thus, we want to weigh the proxies and combine them in such a way that minimizes the attenuation bias.

(5)

Both methods entail two steps, where one step is estimating the optimal combination, and the other step is the regression estimation. For example, using the index method, we first estimate the optimal weights for multiple IQ-tests to construct an index for ability. Then, we regress wage on the estimated index for ability. Thus, the estimator of the effect of ability on wage is based on another estimate. This introduces extra variance, which should be taken into account. This can be done by calculating the asymptotic variance of our estimator, but Lubotsky & Wittenberg (2006) approximated the variance by bootstrapping. They argue that the extra variance is small and outweighed by the reduction of the attenuation bias. They add that the asymptotic variance is difficult to compute, and that researchers would be better off bootstrapping.

Lubotsky & Wittenberg (2006) used a sample size of 100,000 in their study of the effect of wealth on school enrollment. It is thus expected that the variance is very low. One wonders if the asymptotic variance is still relatively low when using a smaller sample size, and if the bootstrap is a good approximation in this case. This is the main question we try to answer in this paper. Even though the attenuation bias is decreased when using multiple proxies, the asymptotic variance may be large enough to make the contribution of the latent variable insignificant. Thus, we take on the challenge to compute the finite sample distribution of the effect of the parameters of interest.

(6)

The empirical study that we use to implement multiple proxies focuses on the effect of wealth on school enrollment in India. The latent variable in this model is wealth, which can not be measured precisely. Using the National Family Health Survey in India, there is no data available on income, but there is data on household assets. These household assets can serve as proxies for wealth, since more wealthy households can afford more assets. The survey also contains information or whether the children in each household are enrolled in school. We assess how wealth, proxied by household assets, affects school enrollment of children in India. In particular, to determine if wealth contributes signifi-cantly to school enrollment, we calculate the true asymptotic variance.

The paper is structured as follows. We discuss some literature in the next section and calculate the asymptotic variance in detail using the index and the coefficient method in the third section, while ignoring covariates. We then turn to the empirical study, where we explain the data and how the study is set up. The index and coefficient method are applied to the empirical study in section 6. We also compare the results of both methods in this section. Section 7 analyzes how the method of principal components compares to the coefficient and the index method. Having compared the different variances, we add covariates to our model in section 8. Lastly, we discuss the relevance of our findings in section 9, and conclude in Section 10.

2 Literature review

There are two formal requirements for a proxy variable according to Wooldridge (2010). Let xibe a proxy variable for ξiand let yibe the dependent variable. The first requirement

is that the proxy variable should be redundant. Redundancy of xi corresponds to

E (yi|ξi, xi) = E (yi|ξi) . (1)

This means that xi is irrelevant for explaining yi if ξi has been controlled for. For

(7)

when it is included. However, it is unlikely that there is a perfect proxy variable available in an empirical study. We will have attenuation bias, and the best we can do is minimiz-ing it. Wooldridge (2010) also argues that not all variables are suitable proxy variables. Including such variables to a regression can make the estimator more inconsistent than excluding them. Intuitively, this means that a variable that has a low correlation with the latent variable is a poor proxy. For example, gender should not be used as a proxy for someone’s ability in mathematics.

Black & Smith (2006) have analyzed the returns to college quality, where they approxi-mated college quality using proxies. They have implemented the approach of Lubotsky & Wittenberg (2006), among three other econometric methods. They found that the estimate obtained by using the Lubotsky & Wittenberg (2006) approach was very similar to the estimate obtained when only using one proxy, the Scholastic Aptitude Test (SAT) score. They also introduced a GMM approach where the variance of the latent variable is normalized, which resulted in an estimate about 20% larger than when only using the SAT score. This means that this GMM estimate has a lower attenuation bias than the Lubotsky & Wittenberg estimate. However, the downside of this approach is that the correlation between the proxies themselves is required to be zero when controlling for the unobserved variable. This assumption can often not be made. Since all proxies are related to the same unobserved variable, they may also have another factor in common. The Lubotsky & Wittenberg approach is more attractive, since it does not have this restriction.

3 Theory

(8)

Consider the following model in demeaned variables,

yi = ξiβ + i (2)

xi = ρξi+ vi, (3)

where yi is the dependent variable, and ξi is the unobserved latent variable, β is the

parameter of interest, xi are the proxies, ρ measures the relationship between the proxies

and the latent variable, and i and vi are error terms, for observations i = 1, 2, ..., n. The

vectors xi, ρ and vi contain k elements, corresponding to the number of proxies. We

also have that ξi is uncorrelated with vi, and both are uncorrelated with i ∼ (0, σ2). We

denote Ω = E (viv0i), and we assume it is non-singular.

We normalize for one proxy to ensure that we can estimate ˆρ consistently. Take ρ1=1,

such that ρ =   1 ρ₂  . (4)

and partition xi accordingly,

xi =   xi1 xi2  . (5)

Effectively, we measure our latent variable in terms of the first proxy. This gives us

E (x1iyi) = E ξi2β = σ 2 ξβ, (6) and, E (x2iyi) = E ρ2ξ 2 iβ = ρ2σ 2 ξβ. (7)

From (6) and (7), we create the moment condition which identifies ρ,

E [(xi2− ρ2x1i) yi] = 0. (8)

Thus, by normalizing ρ1 = 1, we get a consistent estimate for ˆρ2 by using the method of

(9)

3.1 The index method

This method uses ˆρ, the estimator that measures the relationship between proxies and the latent variable, to create optimal and consistent weights ˆw. Using these weights, we create an index ˆw0xi. This index is the approximation of the latent variable. Then, we

will perform OLS of yi on ˆw0xi to find the estimator of β, which we will call ˆβindex. That

is, ˆ βindex p → E (w 0_x iyi) E [w0_x i]2 = w0ρσ2 ξ [w0ρ]2σ2 ξ + w0Ωw β. (9)

This shows the attenuation bias of ˆβindex, since ˆβindex p

9 β.

We find the optimal w by minimizing the attenuation bias of ˆβindex. Thus, we choose w

such that w0ρ = 1, and minimize w0Ωw. According to Meijer et al. (2018), the minimum of w0Ωw under w0ρ = 1 is attained for

w = Ω

−1

ρ

ρ0_Ω−1_ρ. (10)

Since we require that w0ρ = 1, knowing w up to a scale is sufficient. We have that ρ0Ω−1ρ is just a scalar, so we rewrite (10) for some constant c,

cw = Ω−1ρ. (11)

Now, there holds

E (xix0i) = σ 2 ξρρ 0 + Ω, (12) so that E (xix0i) Ω −1 ρ = ρ σ_ξ2ρ0Ω−1ρ + 1 = dρ, (13) for some constant d. As a result, we find

Ω−1ρ = dE (xix0i) −1

ρ. (14)

We can substitute this result into (11), which gives

w = ˜c [E (xix0i)] −1

(10)

for some constant ˜c. Finally, we know w up to a scale. By pre-multiplying (15) with E (xix0i), the corresponding moment condition is

E [xix0iw − ˜cρ] = 0. (16)

We can also identify the scaling constant ˜c. We rewrite (15), E (xix0i) w

˜

c = ρ, (17)

and pre-multiply by w0. Since we require that w0ρ = 1, we get

˜

c = w0E (xix0i) w. (18)

Substituting this into (16) gives the moment condition for w,

Ehxix0iw − (w 0

xi) 2

ρi= 0. (19)

Now, we find ˆβindex by regressing yi on our approximation ˆw0xi. The final moment

equation that identifies ˆβindex is

E [w0xi(yi− w0xiβindex)] = 0. (20)

As a result, by using the method of moments, we can identify θindex = (βindex, w0, ρ02) 0

.

Since we are interested in the variance of βindex, we derive its asymptotic variance by the

method of moments. Using (20), (8), and (19), we get

hi(θindex) =      w0xi(yi− w0xiβindex) (xi2− ρ2x1i) yi xix0iw − (w0xi)2ρ      , (21)

with E [hi(θindex)] = 0. It has 1 + (k − 1) + k = 2k equations, and 1 + k + (k − 1) = 2k

(11)

as elements. Furthermore, let X have rows x0_i and let y have elements yi, ∀i. Then, ∂hi(θindex) ∂θ0_index =      − (w0_x i) 2 yix0i− 2βindexw0xix0i O 0 O −x1iyiIk−1 0 xix0i− 2ρw 0_x ix0i − (w 0_x i)2H      , (22) and ¯ G (θindex) = 1 n n X i=1 ∂hi(θindex) ∂θ0_index = 1 n      −w0_X0 Xw y0X − 2βindexw0X0X 0 0 O −y0_Xe 1Ik−1 0 X0X − 2ρw0X0X −w0_X0 XwH      . (23) We assume ¯G (θindex) is non-singular, so that we can calculate its inverse. Then, we

estimate Ψ, ˆ Ψ = 1 n n X i=1 hi ˆθindex h0_i ˆθindex , (24)

and we get the asymptotic variance of ˆθindex,

d avar ˆθindex = 1 n h ¯_G ˆ_θ_indexi−1 ˆ Ψh ¯G0 ˆθindex i−1 . (25)

3.2 The coefficient method

This method uses ˆρ as weights to create a weighted sum of the coefficients ˆβ₀, obtained by regressing the dependent variable yi on the proxies xi. The intuition of this method

is as follows. If one proxy would have a weak relationship with the latent variable, for example, the 4th proxy, then ˆρ4would be low. Consequently, we would like the ˆβ04of that

proxy to contribute relatively less to our estimator of interest, ˆβcoef. That is, even though

the relationship between xi4 and yi may be strong, we tune this relationship down since

(12)

Regressing yi on xi gives ˆ β₀ → [E (xp ix0i)] −1 E (xiyi) = σξ2ρρ 0 + Ω−1σ2_ξρβ = Ω−1ρ σ 2 ξ σ2 ξρ 0_Ω−1_{ρ + 1}β. (26)

If we substitute (10), the optimal w from the index method, into (9), and take w0ρ = 1, we find ˆ βindex p → σ 2 ξ σ2 ξ + (ρ 0_Ω−1_ρ)−2_ρ0_Ω−1_ΩΩ−1_ρ0 = ρ 0 Ω−1ρ σ 2 ξ σ2 ξρ 0_Ω−1_{ρ + 1}β. (27)

Hence, if we pre-multiply (26) by ρ0, we find that

ˆ

βcoef = ˆρ0βˆ0 p

→ ˆβindex. (28)

The optimal combination of regression coefficients yields the same result in the limit as the optimal combination of proxies. So, the moment conditions that identify β₀ and βcoef

are

E [xix0iβ0 − xiyi] = 0 (29)

E [βcoef− ρ0β0] = 0. (30)

It may look strange that (30) is non-stochastic, but this is fine. The asymptotic variance of βcoef could be derived by using the delta method, but using moment equation (30)

does practically the same. As a result, by using the method of moments, we can identify θcoef= (βcoef, β00, ρ02)

0

.

We can now derive the asymptotic variance of βcoef by using the method of moments.

Using (29), (30), and (8), we get

hi(θcoef) =      xix0iβ0− xiyi βcoef− ρ0β0 (xi2− ρ2x1i) yi      , (31)

with E [hi(θcoef)] = 0. It has k + 1 + (k − 1) = 2k equations and 1 + (k − 1) + k = 2k

(13)

Then, ∂hi(θcoef) ∂θ0_coef =      0 xix0i O 1 −ρ0 _−β0 0H 0 O −x1iyiIk−1      , (32) and ¯ G (θcoef) = 1 n n X i=1 ∂hi(θcoef) ∂θ0_coef = 1 n      0 X0X O n −nρ0 _−nβ0 0H 0 O −y0_Xe 1Ik−1      . (33) We assume ¯G ˆθcoef

is non-singular, so that we can calculate its inverse. Then, we estimate Ψ, ˆ Ψ = 1 n n X i=1 hi ˆθcoef h0_i ˆθcoef , (34)

and we get the asymptotic variance of ˆθcoef,

d avar ˆθcoef = 1 nh ¯G ˆθcoef i−1 ˆ Ψh ¯G0 ˆθcoef i−1 . (35)

4 Empirical example

We apply the theory above to study the relationship between wealth and school enroll-ment in India, in continuation of Lubotsky & Wittenberg (2006). This study was initially done by Filmer & Pritchett (2001). They analyzed if children are more likely to be en-rolled in school if their household is more wealthy than others, by using survey data from the National Family Health Survey of India. They looked at Indian households covered by this survey in 1992 and 1993, with children aged between 6 and 14. The survey con-tained no direct information on the wealth of the household, so wealth is the unobserved variable. To estimate the effect of wealth on school enrollment, we require proxy variables for wealth. The National Family Health Survey does contain information on household assets, which are used as proxies for wealth. For example, a household that has a car could be more wealthy than a household that does not have a car.

(14)

ac-captured by the first principal component, which finds the linear combination of the 21 proxies that maximizes the explained variance. However, Lubotsky & Wittenberg (2006) argued that there is no reason to believe that the first principal component will maxi-mize the predictive power of the assets, and found a way to minimaxi-mize the attenuation bias.

We try to replicate the findings of Lubotsky & Wittenberg (2006) in this paper, and focus on the asymptotic variance. To see how large the asymptotic variance becomes in a relatively small sample, we use a total of 2,000 observations from the National Family Health Survey. Using the method of moments described in Section 3, we derive the true asymptotic variance of our estimate of interest, using both the coefficient and the index method. We will compare these to the ad hoc OLS variance and the bootstrap variance of each method. The ad hoc OLS variance is the variance of our estimator of interest, without taking into account that we are dealing with an unobserved variable. That is, this variance assumes that the unobserved variable is known perfectly. Even though the index and the coefficient method should lead to the same results asymptotically, we wonder if the results are similar in the finite sample. So we are not only analyzing how much larger the true asymptotic variance is over the ad hoc OLS variance, but we are also analyzing if one method may yield better results.

5 Data

The National Family Health Survey is part of the Demographic and Health Surveys (DHS) Program, which provide technical assistance to more than 300 surveys in over 90 coun-tries. The National Family Health Survey conducted in 1992 and 1993 can be downloaded from the DHS website1_{, and the questionnaire itself is also available there. Interviews}

(15)

responses from 53,214 households with in total 109,973 children.

The dependent variable, school enrollment, is 1 if the child is enrolled in school, and 0 otherwise. We also have data on 21 household assets, the proxy variables. These variables have a value 1 if a household has a fridge, clock or watch, sewing machine, video recorder, radio, flush toilet, latrine toilet, livestock, separate kitchen, biomass as cooking fuel, and pump or open source water, as opposed to their household being connected to the water piping grid. If a particular asset is not available in the household, the value for that asset is 0. The only proxy variable that is not an indicator, rooms, is an integer variable that indicates the number of rooms in the household. The minimum number of rooms in a household is 1, and the maximum number of rooms in our sample is 34.

The dataset also contains information on general demographics of the household, which are used as covariates. We have an indicator for the gender of the head of the household, which is 1 if male, 0 if female, the age of the household head, an indicator for the gender of the child, the age of the child, the education level of the household head, and the log of the household size, which has values between 0.69 and 3.66. The descriptive statistics of our sample of 2,000 children can be found in the appendix.

(16)

6 Implementation

In this section, we discuss on how to implement the theory of Section 3 to the empirical study. For both the coefficient and index method, we estimate the effect of wealth on school enrollment, ˆβ, and its variance in several ways. We first calculate the variance without taking into account that this estimate is based on another estimate, ˆρ. This vari-ance is then compared to the asymptotic varivari-ance of ˆβ, taking into account the random nature of ˆρ. We expect that this variance is larger, since it includes the extra uncertainty of ˆρ. The comparison between the two variances will answer the question how much larger the true variance becomes when using multiple proxies in a finite sample. We also analyze how accurately the bootstrap approximates the asymptotic variance. The boot-strap variance should take the variance of ˆρ into account, so we expect it to be similar to the asymptotic variance.

6.1 Implementing ˆ

ρ

The estimator that measures the relationship between the assets and wealth, ˆρ, is ob-tained the same way for both methods. Solving the sample counterpart of (8) gives

ˆ ρ = n P i=1 xi2yi n P i=1 xi1yi . (36)

(17)

Table 1: Estimates and their standard error.

Asset ρˆ s.e.( ˆρ) βˆ₀ s.e. ˆβ₀ First PC Number of rooms 1.00 - 0.007∗ _0.004 _0.16 Clock 0.41∗∗∗ _0.050 _0.108∗∗∗ _0.025 _0.25 Bicycle 0.19∗∗∗ _0.034 _0.034∗ _0.020 _0.12 Radio 0.30∗∗∗ _0.039 _0.024 _0.021 _0.23 TV 0.29∗∗∗ _0.036 _0.036 _0.026 _0.32 Sewing machine 0.26∗∗∗ _0.034 _0.044∗∗ _0.022 _0.25 Motorcycle 0.14∗∗∗ _0.018 _-0.019 _0.026 _0.26 Fridge 0.13∗∗∗ _0.017 _-0.010 _0.027 _0.29 Car 0.03∗∗∗ _0.005 _-0.017 _0.036 _0.16

Drinking water from pump -0.17∗∗∗ _0.031 _0.054 _0.052 _-0.21

Drinking water from open source -0.04∗ _0.020 _0.004 _0.062 _-0.04

Non-drinking water from pump -0.16∗∗∗ _0.033 _-0.051 _0.051 _-0.17

Non-drinking water from open source -0.05∗∗ _0.024 _-0.008 _0.057 _-0.06

Has access to flush toilet 0.25∗∗∗ _0.032 _0.110∗∗∗ _0.026 _0.29

Has access to latrine 0.12∗∗∗ _0.022 _0.120∗∗∗ _0.026 _0.02

Electric lighting 0.41∗∗∗ _0.052 _0.127∗∗∗ _0.026 _0.26

Kitchen 0.31∗∗∗ _0.038 _0.101∗∗∗ _0.021 _0.15

Biomass for cooking -0.21∗∗∗ _0.031 _0.002 _0.026 _-0.28

Fan 0.37∗∗∗ _0.046 _0.019 _0.027 _0.32

Video recorder 0.05∗∗∗ _0.008 _-0.019 _0.032 _0.19

Livestock -0.15∗∗∗ _0.036 _-0.007 _0.022 _-0.16 ∗∗∗_{p < 0.01,}∗∗_{p < 0.05,}∗_{p < 0.1}

of ˆρ in Table 1 show that most assets contribute significantly to the latent variable on a 1% significance level.

6.2 Implementing the coefficient method

Solving the sample counterpart of (29) gives

ˆ

β₀ = (X0X)−1X0y. (37)

Most assets do not contribute significantly to school enrollment, as can be seen in Table 1. Having a clock, a flush toilet, a latrine or electric lighting has the largest effect on school enrollment. All assets are jointly significant on a 1% significance level.

Solving the sample counterpart of (30) gives ˆβcoef = ˆρ0βˆ0 = 0.205. If a household has

(18)

school. To see how accurate the estimate is, we calculate its variance. The ad hoc OLS variance, not taking the uncertainty of ˆρ into account, is

d Var ˆβcoef

= ˆσ2ρˆ0(X0X)−1ρ.ˆ (38)

When taking the square root of dVar ˆβcoef

, the standard error is 0.010, or, 1.0%. This is quite low relative to the estimate of 21% when only using 2,000 observations. However, the variance of ˆρ has not yet been taken into account, ˆρ has been taken as a certainty.

One calculates avar_d ˆθcoef

by implementing (33) and computing (35). The standard er-ror of ˆβcoef is then obtained by taking the square root of the first element of avard ˆθcoef

, which is 0.025. This standard error incorporates the variance of ˆρ, and we observe that the standard error is more than twice the OLS standard error. This increase in stan-dard error is exactly what we expected, when extra parameter uncertainty is included. Considering the relative magnitude of ˆβcoef, the standard error is still relatively small.

We can say that with 95% confidence, the effect of wealth, or having an extra room, on school enrollment is between 16% and 25%.

By bootstrapping, the uncertainty of both ˆρ and ˆβcoef can be simulated. One can perform

the bootstrap procedure as follows. From the original sample of 2,000 children, create a bootstrap sample by drawing 2,000 times with replacement. From this bootstrap sample, calculate ˆρ, ˆβ₀, ˆβcoef by (36), (37), and (28) respectively. We store these estimates and

repeat the procedure. In total, we do this 2,000 times, resulting in 2,000 values for ˆβcoef,

which represents a distribution of ˆβcoef. This is depicted in Figure 1.

It can be seen in Figure 1 that the distribution of the bootstrapped ˆβcoef is a bit skewed

to the right. The mean of the bootstrapped ˆβcoef is 0.22, and its standard error is 0.028.

(19)

Figure 1: Histogram of bootstrapping ˆβcoef. bcoef Frequency 0.15 0.20 0.25 0.30 0.35 0 100 200 300 400 500 600

Table 2: Standard errors using the coefficient method. Estimation method Standard error

OLS 0.010

Method of moments 0.025

(20)

can conclude that when taking the uncertainty of ˆρ into account, wealth has a significant effect on school enrollment in India, not including covariates. The difference between the OLS standard error and the true asymptotic standard error is relatively large. It may be worthwhile to take the uncertainty of using multiple proxies into account. The bootstrap standard error is somewhat larger than the asymptotic variance. This implies that boot-strapping does reflect the extra uncertainty, but it gives the uncertainty a slightly higher value.

6.3 Implementing the index method

By solving the sample counterpart of (15), and rescaling such that ˆw0ρ = 1, we findˆ optimal weights ˆw. Constructing the index X ˆw and solving the sample counterpart of (20) gives ˆβindex = 0.205. This is exactly the same effect as found with the the coefficient

method. The variance, not taking the uncertainty of ˆρ into account, is

d Var ˆβindex = 1 n y − X ˆw ˆβindex 0 y − X ˆw ˆβindex ˆ w0X0X ˆw−1 . (39)

The resulting standard error is 0.010, which is exactly the same as the standard error ob-tained by the coefficient method not taking extra uncertainty into account. The variance of ˆρ is taken into account by calculating ˆθindex using (25). The resulting standard error

of ˆβindex is 0.025, which is more than twice the OLS standard error.

The index method can also be bootstrapped to simulate the uncertainty of both ˆρ and ˆ

βindex. The procedure is very similar to the bootstrap procedure of the coefficient method.

Draw from the original sample 2,000 observations with replacement. From this bootstrap sample, calculate a new ˆρ, make ˆw, and regress y from the bootstrap sample on X ˆw from the bootstrap sample. We store the resulting estimate, and repeat this 2,000 times. This gives a distribution of ˆβindex, which is depicted in Figure 2.

The distribution of ˆβindexby bootstrapping is skewed to the right, it has mean of 0.22, and

(21)

(22)

Table 3: Comparing standard errors derived by different methods. Estimation method Coefficient method s.e. Index method s.e.

OLS 0.010 0.010

Method of Moments 0.025 0.025

Bootstrap 0.028 0.028

results in Table 3. It is clear that in the finite sample, the coefficient method and the index method yield the same results.

7 Principal Components

As mentioned before, Filmer & Pritchett (2001) used the first principal component as a proxy for wealth. This is a similar approach to the index method, where an index is cre-ated using first principal component as weights. This index captures the largest amount of information common to the proxies, and we assume that this information is wealth. The assets are scaled to have unit variance, and the first principal component is given in Table 1. Approximately 27% of the variance is explained by this component, and there is not one specific asset that contributes substantially to wealth. Having a television or a fan attributes the most, and using biomass for cooking contributes negatively to wealth. Hence, the first principal component finds similar relationships between the proxies and wealth compared to ˆρ.

When regressing yi on the index created by using the first principal component, we find

that the effect of wealth on school enrollment is ˆβPC = 0.14. The corresponding OLS

standard error is 0.008. When bootstrapping, we find a standard error of 0.007. Using the first principal component instead of the index method leads to a significantly lower estimate of the effect of wealth on school enrollment. This means that the attenuation bias of ˆβPC is larger than the attenuation bias of ˆβindex, which is what we expected.

(23)

8 Asymptotic variance with covariates

We expand the model in Section 3, by adding s covariates zi,

yi = ξiβ + z0iγ + i (40)

xi = ρξi+ vi, (41)

where we add that i is uncorrelated with zi and vi. Taking ˜yi = yi − z0iγ results in

similar equations as (6) and (7). This gives us a different moment equation for ρ,

E [(xi2− ρ2x1i) ˜yi] = 0. (42)

Section 6 showed that the index method and the coefficient method lead to the same results in the finite sample, so we will only derive the asymptotic variance using the coefficient method here. Let Z have rows z0_i, for i = 1, ..., n. We have two moment equations that identify ˆβ₀ and ˆγ using the coefficient method,

E (xii) = xiyi− xiz0iγ − xix0iβ0 = 0 (43)

E (zii) = ziyi− ziz0iγ − zix0iβ0 = 0 (44)

We have k + s unknown parameters and k + s equations. Obtaining the parameters of interest from (43) and (44) boils down to regressing y on X and Z simultaneously,

  ˆ β₀ ˆ γ  =     X0 Z0   X Z   −1  X0 Z0  y. (45)

Using ˆγ to calculate ˜yi allows us to calculate ˆρ by solving the sample counterpart of (42).

We can now estimate the effect of wealth on school enrollment when including covariates,

ˆ

βcov = ˆρ0βˆ0 = 0.15. (46)

(24)

Having identified θcov = (βcov, β00, ρ 0 2, γ

0₎0_{, we derive the asymptotic variance. Using (43),}

(46), (42), and (44), we get hi(θcov) =         xix0iβ0− xiyi+ xiz0iγ βcov− ρ0β0 (xi2− ρ2x1i) (yi− z0iγ) zix0iβ0− ziyi+ ziz0iγ         , (47)

with E [hi(θcov)] = 0. It has k +1+(k −1)+s = 2k +s equations and 1+(k −1)+k +s =

2k + s parameters. Then, ∂hi(θcoef) ∂θ0_coef =         0 xix0i O xiz0i 1 −ρ0 _−β0 0H 0 0 O −x1i(yi− z0iγ) Ik−1 −xi2z0i+ ρ2xi1z0i 0 zix0i O ziz0i         , (48) and ¯ G (θcov) = 1 n         0 X0X O X0Z n −nρ0 _−nβ0 0H 0 0 O − (y − Zγ)0Xe1Ik−1 −H0X0Z + ρ2e01X0Z 0 Z0X O Z0Z         . (49) We assume ¯G ˆθcov

is non-singular, so that we can calculate its inverse. Then, we estimate Ψ, ˆ Ψ = 1 n n X i=1 hi ˆθcov h0_i ˆθcov , (50)

and we get the asymptotic variance of ˆθcov,

d avar ˆθcov = 1 nh ¯G ˆθcov i−1 ˆ Ψh ¯G0 ˆθcov i−1 . (51)

(25)

wealth. The effect of these assets first captured by wealth, is now captured by the covari-ates. As a result, we find a lower estimate for wealth on school enrollment and we get a less accurate estimate, which is reflected by the increased standard error.

Table 4: Standard errors using multiple proxies, covariates included. Estimation method Standard error

OLS 0.013

Method of moments 0.029

Bootstrap 0.035

The standard error of ˆβ_cov using bootstrapping is 0.035, its histogram is shown in the appendix, and all standard errors using covariates are denoted in Table 4. Using the asymptotic variance by the method of moments, we find that the effect of wealth on school enrollment is between between 8.1% and 22%, with 95% certainty. Compared to the interval of 16% and 25% we found without using covariates, we see that covariates can have a substantial impact on the effect of wealth.

9 Discussion

Having extended the research of Lubotsky & Wittenberg (2006) using a much smaller sample, we find very similar results. When including covariates, our estimate of 0.16 as the effect of wealth on school enrollment is not significantly different from their estimate of 0.19, using the true standard error and a 5% significance level. If we were to use the OLS standard error to compare the results, then our estimate would be significantly different from the one of Lubotsky & Wittenberg (2006). This is an example where going through the trouble of accounting for the variance of ˆρ can change the outcome of an empirical study.

(26)

standard error of ˆβ using the method of moments is only 0.025 when not using covariates. So, the attenuation bias is significantly decreased using the index or coefficient method instead of the first principal component, even on a 1% significance level. The answer to our main question is that the asymptotic variance is still relatively low when using a smaller sample size. The variance by bootstrapping is a good approximation of this asymptotic variance. The standard error obtained by bootstrapping is somewhat larger than the true standard error, so we do recommend that researchers try to implement the method of moments to obtain a more correct result. Either the index method or the coefficient method can be used, as both will yield the same results.

The study of Filmer & Pritchett (2001) may not be ideal for studying the effect of mul-tiple proxies. The first principal component indicates that one component only explains 27% of the variance. So, there are many other factors than wealth that explain the assets in households. It is debatable if wealth is the largest factor in this case. There is not one asset that clearly contributes to the first principal component. We do see a clear negative effect of using biomass, having livestock and getting water from outside sources. These are assets that are common to rural areas, as opposed to urban areas. Hence, we may be measuring if the household is in an urban area instead of measuring wealth. Suppose only one proxy is used to approximate wealth, say, having a clock. This would result in a somewhat inaccurate estimate of wealth, with a large attenuation bias. If we were to use another proxy, for instance, having livestock, which is uncommon in urban areas, we are not only approximating wealth, but also urbanization. As a result, this may lead to a higher estimate, not due to a reduction in attenuation bias, but due to measuring a different effect. We wonder how the attenuation bias and the asymptotic variance will behave when using proxies that better explain the unobserved variable.

(27)

in our model can be expressed as transformations of the reduced-form parameters. Then, we should be able to estimate these parameters using the index or coefficient method, which take the binary response into account.

10 Conclusion

This paper has analyzed the effect of using multiple proxies in empirical studies. It focused in particular on calculating the asymptotic variance by using the method of moments, that takes into account that the approximation of the unobserved variable also has a variance. Using a small sample from a study about the effect of welfare on school enrollment in India, we introduced two methods to approximate the unobserved variable such that the attenuation bias is minimized. The coefficient method approximates the effect of the unobserved variable, whereas the index method approximates the unobserved variable itself. Using a small sample, the asymptotic variance is small enough for this implementation of multiple proxies to be worthwhile, compared to using the principal component method. The coefficient and the index method did not yield different results in the finite sample. The standard error more than doubled when taking the variance of the unobserved variable into account, compared to only using the OLS standard error. Bootstrapping tends to estimate an even higher standard error. We conclude that using the asymptotic variance gives a more correct, yet a less accurate representation of the true estimator when using multiple proxies in empirical studies.

11 References

Black, D. A., & Smith, J. A. (2006). Estimating the returns to college quality with mul-tiple proxies for quality. Journal of labor Economics, 24(3), 701-728.

Filmer, D., & Pritchett, L. (2001). Estimating Wealth Effects without Expenditure Data-or Tears: An Application to Educational Enrollments in States of India. Demography, 38(1), 115-132.

(28)

prox-ies. The Review of Economics and Statistics, 88(3), 549-562.

Meijer, E., T.J. Wansbeek and R.E. Wessels (2018), Multiple proxies, Working paper, University of Groningen.

Wang, L. (2002). A simple adjustment for measurement errors in some limited dependent variable models. Statistics & probability letters, 58(4), 427-433.

(29)

12 Appendix

Figure 3: Histogram of bootstrapping ˆβcov.

(30)

(31)