• No results found

Many and weak instruments and nonlinearity

N/A
N/A
Protected

Academic year: 2021

Share "Many and weak instruments and nonlinearity"

Copied!
36
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Many and weak instruments and nonlinearity

J. van Essen

(2)

Master’s Thesis

(3)

Many and weak instruments and nonlinearity

Jelle van Essen

July 17, 2015

Abstract

Previous research establishes that the 2SLS estimator for the linear model is biased when there are many or weak instruments, and that the LIML estimator is a better alternative. The present study assesses by means of simulation whether these findings generalize to nonlinear models. Considered performance indicators are median bias, quantiles of estimates, and rejection rates. Distinction is made between models that are nonlinear only in the parameters, only in the endogenous variables, or in both. It is found that the dominance of LIML over 2SLS does not extend unconditionally to nonlinear models, but is solely observed in the simulated model that is nonlinear only in its endogenous regressor. It is unclear whether the dissolution of LIML’s superi-ority over 2SLS is due to nonlinearity in parameters or due to the consequent use of an optimization algorithm. Moreover, it is found that an increase in the number of instruments or a decrease in instrument strength harms the performance of both 2SLS and LIML.

1

Introduction

Econometric research often considers situations of unclear causality where a model speci-fication with multiple endogenous variables is appropriate. Estimation in these models is complicated, since standard estimators such as ordinary least squares (OLS) and its deriva-tive estimators are inconsistent. The most common approach is then to use instrumental variable (IV) estimators.

(4)

estimator in the derivation of the asymptotic distribution of the limited information maxi-mum likelihood (LIML) estimator. While the use of 2SLS is widespread, it has been noted that it yields biased estimates, particularly in situations where there are many or weak in-struments (Anderson, Kunitomo, and Sawa, 1982; Bekker, 1994; Staiger and Stock, 1997); such situations are common in applied work.

An alternative estimator is the aforementioned LIML estimator (Anderson and Rubin, 1949). Assuming a homoskedastic error term distribution, LIML strongly outperforms 2SLS in terms of bias and rejection rates. Bekker (1994) finds that LIML’s performance is further enhanced by using standard errors based on alternative asymptotic theory that yields a better ap-proximation of LIML’s finite-sample performance; these standard errors are therefore often referred to as Bekker standard errors.

The extensions of 2SLS and LIML to the nonlinear model are due to Amemiya (1974, 1975). Remarkably, he finds that LIML is asymptotically more efficient than 2SLS in the nonlinear model, unlike in the linear model where 2SLS and LIML are equally efficient asymptotically. The nonlinear 2SLS or IV estimator has moderately frequently been applied in empirical work, and has regained popularity recently after being applied infrequently for several years. It is included in popular econometrics textbooks such as Cameron and Trivedi (2005) and Davidson and MacKinnon (2009). Similarly to the linear case, the popularity of 2SLS out-shines that of LIML: to the author’s knowledge, Amemiya’s nonlinear LIML estimator has not been applied in published work. Neither nonlinear LIML nor nonlinear 2SLS is included in Stata, which is a popular econometric software package.

While the bias of two-stage least squares in linear models given many or weak instruments has been discussed extensively, there have not been any inquiries into the behaviour of non-linear 2SLS and LIML under these circumstances to the author’s best knowledge; nor have Bekker standard errors been derived for nonlinear LIML. The only simulation study known to the author that considers the nonlinear 2SLS estimator is by Br¨ann¨as and Melkersson, who apply the estimator in a binomial regression model and only assess the effects of sample size and error term variance on estimator performance.

(5)

be nonlinear in only the parameters, in only the variables, or in both. The simulation study in this thesis considers one model from each category, or three in total. The choice of models is inspired by previous applications of nonlinear 2SLS: they are the most frequently estimated model types.

While models that are nonlinear only in the variables can, technically speaking, be esti-mated using linear 2SLS, care should be taken if the model is nonlinear in the endogenous regressors. First, the first-stage regression should have the nonlinear transformation of the regressor as the left-hand side, rather than the regressor itself. Second, polynomials of the base instruments should be included as instruments (Kelejian, 1971; Amemiya, 1974). This latter point implies that the problem of many instruments is particularly relevant to models that are nonlinear in the endogenous variables: the number of instruments is increased by a factor that is equal to the degree of the polynomials.

The considered estimators are presented in section 2. Section 3 elaborates on applications of the estimators in empirical work and the types of models that they cover. Section 4 explains the general setup of the simulation study, while section 5 discusses which standard errors should be used for nonlinear LIML in the simulations, in lieu of Bekker standard errors. The results of the simulations are given in section 6 and section 7 concludes.

2

Estimators

Consider the system

yi = f (xi, β) + εi; (1) xi = Π0zi+ ui; (2) εi ui ! ∼ N (0, Σ), (3)

i = 1, 2, . . . N , where yi is scalar, xi is a vector of length L, β is a vector of length l, f is a

function that is potentially nonlinear in both xi and β, Π is a K × L matrix, zi is a vector

(6)

of length N with elements f (xi, β), and Σ = " σ2 ε σ 0 uε σuε Σuu # .

2.1

Two-stage least squares

The nonlinear two-stage least squares (NL2SLS) estimator of β is given by the vector of estimates that minimizes

(y − f )0D(D0D)−1D0(y − f ), (4)

with D any matrix that satisfies:

1. limN1D0D exists and is nonsingular; 2. plimN1 ∂f∂β0D is a constant matrix of rank l;

3. plimN1 ∂β∂2f0

i∂βD is a constant matrix for i = 1, 2, . . . l.

Particularly, we consider the estimator for D = Z, to which Amemiya (1974, 1975) refers as the standard NL2SLS estimator. Its asymptotic covariance matrix is

VS = σε2plim N H −1

, (5)

where H = ∂f∂β0PZ∂β∂f0 with PZ = Z(Z0Z)−1Z0.

2.2

Limited information maximum likelihood

The log-likelihood function for the model (1)–(3) is given by L1 = − N 2 log |Σ| − 1 2Σ −1 S, (6) where S = " ε0ε ε0U U0ε U0U #

and we write ε for the vector with elements εi and U for the matrix with rows u0i. Substitute

(7)

of Σ, and substituting this into (6) we obtain the concentrated log likelihood function L2 = − N 2 log |S| = −N 2(log ε 0 ε + log |U0MεU |), (7)

where Mε= I − ε(ε0ε)−1ε0. Solving ∂L2/∂Π = 0 yields the optimal value

b

Π = (Z0MεZ)−1Z0MεX (8)

of Π, and substituting this into (7) yields another concentrated log likelihood function L = −N

2(log ε

0

ε + log |X0MεX − X0MεZ(Z0MεZ)−1Z0MεX|). (9)

The nonlinear limited information maximum likelihood (NLLIML) estimator (Amemiya, 1975) is obtained by maximizing the expression (9) over β. However, the following itera-tive method proposed by Amemiya (1975) is computationally lighter than optimizing (9) directly. Note that (7) can be rewritten as

L2 = − N 2 log |S| = −N 2(log ε 0 MUε + log |U0U |), (10)

where MU = I − U (U0U )−1U0. Therefore, the NLLIML estimator of β can alternatively

be obtained by iteratively computing bΠ as in (8), substituting it for Π in

(y − f )0MU(y − f ), (11)

and minimizing (11) over β. This method requires an initial value for either β or Π; an apparent choice is to use the OLS estimator eΠ = (Z0Z)−1Z0X as an initial value for Π. The asymptotic covariance matrix of NLLIML is given by

VL = plim N  1 ˜ σ2 · G −  1 ˜ σ2 − 1 σ2 ε  · H −1 , (12) where ˜σ2 = σ2 ε− σ0uεΣ−1uuσuε, G = ∂f 0 ∂βMU ∂f

∂β0, and H as defined earlier; see Amemiya (1975)

(8)

since plim N−1D0U = 0 and therefore I − U (U0U )−1U0 − D(D0D)−1D0 is asymptotically

idempotent. Specifically, G ≥ H. Moreover, note that VS−1 = 1 σ2 ε plim N−1H. We thus obtain VL−1− VS−1 = plim 1 N ˜σ2(G − H) ≥ 0, or VL≤ VS.

2.3

Modified nonlinear two-stage least squares

It is described above how the NLLIML estimator of β can be obtained by means of an iterative method. The first step in this method is to substitute the OLS estimator eΠ = (Z0Z)−1Z0X of Π into (11) and minimizing the resulting expression

(y − f )0[I − MZX(X0MZX)−1X0MZ](y − f ), (13)

over β, where MZ = I −PZ. Amemiya (1975) refers to the estimator obtained by minimizing

(13) over β as the modified NL2SLS estimator. Note that it does not fit the form (4). In the linear case, an expression for the modified NL2SLS estimator is

(X0[I − MZX(X0MZX)−1X0MZ]X)−1X0[I − MZX(X0MZX)−1X0MZ]y,

but since X0[I − MZX(X0MZX)−1X0MZ] = X0PZ, this reduces to (X0PZX)−1X0PZy,

which is the expression for the “standard”, linear 2SLS estimator. Thus, standard and modified NL2SLS are equivalent in the linear case.

The asymptotic covariance matrix of modified NL2SLS is VM = plim N G−1[˜σ2G + (σε2− ˜σ

2)H]G−1

(9)

see Amemiya (1975) for a derivation. It holds that VM = plim N G−1[σε2G − (σ 2 ε − ˜σ 2)(G − H)]G−1 = plim N [σε2G−1− G−1(σε2− ˜σ2)(G − H)G−1] < plim N σε2G−1 ≤ plim N σ2 εH −1 = VS.

To show that VM ≥ VL, note that

G−1[σε2G − (σε2− ˜σ2)(G − H)]G−1 ≥ 1 σ2 ε · G + 1 ˜ σ2 − 1 σ2 ε  (G − H) −1 is equivalent to [σε2I − (σε2− ˜σ2)Λ]−1 ≤ 1 σ2 ε · I + 1 ˜ σ2 − 1 σ2 ε  Λ,

where Λ = G−1/2(G − H)G−1/2; or, multiplying both sides by σ2 ε,  I − σ 2 ε − ˜σ2 σ2 ε Λ −1 ≤ I + σ 2 ε− ˜σ2 ˜ σ2 Λ.

Amemiya finds that this in turn is equivalent to  1 −σ 2 ε − ˜σ2 σ2 ε λ −1 ≤ 1 + σ 2 ε − ˜σ2 ˜ σ2 λ (15)

holding for all eigenvalues λ of Λ. Since 1 + σ 2 ε − ˜σ2 ˜ σ2 λ −  1 − σ 2 ε− ˜σ2 σ2 ε λ −1 = (1 − λ)˜σ 2+ λσ2 ε ˜ σ2 −  (1 − λ)σ2 ε + λ˜σ2 σ2 ε −1 = (1 − λ) + λσ 2 ε ˜ σ2 −  (1 − λ) + λ˜σ 2 σ2 ε −1 , the inequality (15) holds if

(10)

For any scalars a and b with ab > 0, a b + b a − 2 = (a − b)2 ab ≥ 0.

Therefore, the inequality (16) holds if 0 ≤ λ ≤ 1. Indeed λ is in this range: λ ≥ 0 follows from Λ ≥ 0, and using H ≥ 0 we have that

Λ = G−1/2(G − H)G−1/2 ≤ G−1/2GG−1/2

= I,

which implies that all eigenvalues of Λ are smaller than the sole eigenvalue 1 of the identity matrix. In conclusion, VM ≥ VL.

Summarizing, the efficiency ordering of the considered estimators is VL ≤ VM ≤ VS.

How-ever, there are reasons to still consider standard and modified NL2SLS despite the asymptotic efficiency of NLLIML. First, the finite-sample performance of the former two estimators may still trump that of NLLIML, albeit unexpected. Second, the first two are computationally more lightweight than NLLIML, which may be advantageous in some situations.

3

Applications

The performance of NL2SLS and NLLIML is assessed by means of a simulation study. Non-linear models are a category that encompasses many types of models, which complicates the choice of simulation setup. Inspiration for one or multiple models to simulate may be found in previous simulation studies on the nonlinear least squares (NLS) estimator. However, while there seem to have been a decent amount of such studies in chemistry1, it is unlikely

that they consider the models that are most interesting in econometrics; and the author was unable to find similar studies in econometrics journals. Rather, a review of past empirical work that applies NL2SLS or NLLIML is done to identify one or more model types that are particularly relevant in econometrics.

Amemiya’s estimators have not attained strong popularity. While the standard NL2SLS es-timator has been used in a moderate number of journal articles, the modified NL2SLS and NLLIML estimators have not been applied in published work to the author’s best knowledge.

(11)

Table 1: Number of articles that apply NL2SLS per category.

Nonlinear in endogenous variables 25

Dynamic model 13

Generalized linear model (GLM) 7

Uncategorizable 17

Unclear 5

Total 65

Note that some articles are counted in multiple categories.

(12)

65 articles were found that apply standard NL2SLS between 1975 and 2014.2

Investigation of the models to which NL2SLS is applied reveals three categories: models that are nonlinear in endogenous variables, dynamic models, and generalized linear models. The first category mostly pertains to models that are linear in the parameters and can technically be estimated consistently by means of linear 2SLS; however, Kelejian (1971) and Amemiya (1974) suggest that polynomials of the instruments should be included as instruments in general when the model is nonlinear in the endogenous variables.

Dynamic models that are linear in the parameters can often be rewritten as nonlinear models; for example, the model

yt = βxt+ ut; ut = ρut−1+ εt implies yt= βxt+ ut = βxt+ ρut−1+ εt = βxt+ ρ(yt−1− βxt−1) + εt = ρyt−1+ βxt− ρβxt−1+ εt,

which is nonlinear in the parameters.

Generalized linear models (GLM) are models where the response variable yi depends on the

regressors xi solely through transformations of the linear function x0iβ, where β is a vector

of parameters; so, yi = f (x0iβ) + εi,

where f is any linear or nonlinear function. The GLMs to which NL2SLS is applied are mostly probit and logit models: the former use the standard normal cumulative distribution function for f , while the latter use logistic functions.

2The applications of NL2SLS were found by searching Google Scholar for articles that cite Amemiya

(13)

Table 1 on page 11 presents the number of articles in each category. It shows that NL2SLS is most often applied to models that are nonlinear in the endogenous variables, followed by dynamic models and GLMs. There is still a large number of articles whose models cannot be grouped into clear categories; 17 articles pertain to this uncategorized group. Examples are Cobb-Douglas functions or generic multiplicative functions. Finally, there are five articles where it is unclear what type of model is used.

One article that applies NL2SLS, which is unpublished, does not mention a date of creation (Br¨ann¨as and Melkersson). The remaining 64 articles are dated, and the years in which they were created or published follow an interesting pattern. A plot of the number of articles applying NL2SLS in each five-year period since 1974 is presented in Figure 1 on page 11. It shows that despite initial popularity, the use of NL2SLS declined after 1990. However, this decline halted with a small increase from 2000–2004 to 2005–2009, and a strong increase in the next period: 11 articles used NL2SLS between 2010 and 2014, which is as much as the earlier apex between 1985 and 1989.

The applications of NL2SLS most frequently belong to public, micro-, macro-, or agricultural economics. Each of these fields may consider problems where heteroskedasticity is an issue, but particularly microeconomics is susceptible, as it often considers quantitative household decisions for an appropriate modelling choice is to have an error term with a variance related to household income. It has been noted in literature that LIML is inconsistent in the linear model with heteroskedasticity (Chao and Swanson, 2004; Bekker and van der Ploeg, 2005; Hausman, Newey, Woutersen, Chao, and Swanson, 2012), and assuming that this finding translates to the nonlinear case, this is a potential justification for using NL2SLS rather than NLLIML in some of the considered applications. However, heteroskedasticity is likely merely a small issue in many other applications, and those studies could benefit from using NLLIML as a more efficient estimator than NL2SLS.

4

Simulation setup

(14)

category is considered in this simulation study; these models are discussed in further length in the following subsections.

Each model has a key parameter β that is to be estimated. Performance indicators are: the median bias of the estimate of β; the length of the interval between the 5% and 95% quantiles of the estimates of β, which is referred to as the 90% range; and the rejection rate of the hypothesis that β = 0 given a 5% significance level. Section 5 discusses methods of obtaining the standard errors required in the computation of these rejection rates.

Parameters are varied in order to determine the influence of on estimator performance. Com-mon properties to all models are: the degree of endogeneity, represented by ω; the number of instruments, represented by K; and instrument strength, represented by the theoretical first-stage F statistic eF . Note that K represents the number of instruments, but is not the actual number of instruments in any of the models. Rather, it is the number of generated, base instruments, and additional instruments are included that differ per model; this is explained in the following subsections.

4.1

First model: dynamic

The first considered model is dynamic, with the following setup: yt= αy + βxt+ ut; xt= αx+ K X k=1 πkztk + ωεt+ vt; ut= ρut−1+ εt; εt vt ! ∼ N (0, I),

t = 1, 2, . . . T , where αy = αx = β = 0, T = 100 and πk = 0 for 2 ≤ k ≤ K. The instruments

ztk, k = 1, 2, . . . K, are drawn independently according to the standard normal distribution.

(15)

regressors:

yt= αy+ βxt+ ut

= αy+ βxt+ ρut−1+ εt

= αy+ βxt+ ρ(yt−1− αy− βxt−1) + εt

= (1 − ρ)αy + ρyt−1+ βxt− ρβxt−1+ εt.

Since yt−1 and xt−1 are uncorrelated with εt, they are included as instruments. The number

of instruments, including the constant, will therefore be K + 3. The parameters to be varied are: K ∈ {2, 17}; eF ∈ {5, 20}; ω ∈ {0, 2}; and the autocorrelation parameter ρ ∈ {0.2, 0.5}. The coefficient π1 is chosen according to

π1 =

r K + 2 T − 1(1 + ω

2)( eF − 1),

so that the theoretical F statistic for the regression of the xt on the instruments will be eF .

This follows from the R2 of this regression,

R2 = π 2 1 π2 1 + 1 + ω2 + K + 2 T − 1 1 + ω2 π2 1 + 1 + ω2 and e F = T − K − 3 K + 2 R2 1 − R2;

see Bekker and Wansbeek (2014).

For each choice of parameter settings, R = 500 replications are simulated, yielding R esti-mates that are used judge estimator performance. This number of replications is low, but this is necessary due to the time complexity of the simulation code.

4.2

Second model: nonlinear in endogenous regressor

(16)

i = 1, 2, . . . N , where αy = β = 0, N = 200 and πk = 0 for 2 ≤ k ≤ K. The constant

αx is chosen to be αx = 100, which renders the probability of xi taking a negative value

negligible: this is necessary since the natural logarithm function requires a positive input. The instruments zik, k = 1, 2, . . . K, are drawn independently according to the standard

normal distribution. As suggested by Kelejian (1971) and Amemiya (1974) for models that are nonlinear in endogenous regressors, polynomials zik2, zik3, . . . zdik of the base instruments are included as instruments for ln(xi). The number of instruments, including the constant,

will therefore be d · K + 1. The parameters to be varied are: K ∈ {1, 5}; eF ∈ {5, 20}; ω ∈ {0, 2}; and the degree d ∈ {4, 8} of the polynomials that are included as instruments. The coefficient π1 is chosen according to

π1 =

r d · K

N − 1(1 + ω

2)( eF − 1),

so that the theoretical F statistic for the regression of the xi on the instruments will be

e

F . For each choice of parameter settings, R = 10000 replications are simulated, yielding R estimates that are used to judge estimator performance.

Since the model is nonlinear only in the variables, standard and modified 2SLS reduce to the same estimator, and so do the linear and nonlinear variants of 2SLS and LIML. The linear variants are employed since they are less intensive computationally.

4.3

Third model: logistic

The third model is a logit, generalized linear model and is nonlinear in both its variables and its parameters. The following setup is considered:

yi = f (αy + βxi) + εi; xi = αx+ K X k=1 πkzik+ ωεi+ vi; εi vi ! ∼ N (0, I),

i = 1, 2, . . . N , where αy = αx = β = 0, N = 200 and πk = 0 for 2 ≤ k ≤ K. The function f

is of the logistic form f (z) = e

z

(17)

as in Rayp and Van de Sijpe (2007). The instruments zik, k = 1, 2, . . . K, are drawn

indepen-dently according to the standard normal distribution. Since the model is again nonlinear in the endogenous regressor, polynomials up to degree 8 of the base instruments are included as instruments. The number of instruments, including the constant, will therefore be 8K + 1. The parameters to be varied are: K ∈ {1, 5}; eF ∈ {5, 20}; and ω ∈ {0, 2}. The coefficient π1

is chosen according to π1 = r d · K N − 1(1 + ω 2)( eF − 1),

so that the theoretical F statistic for the regression of the xi on the instruments will be

e

F . For each choice of parameter settings, R = 1000 replications are simulated, yielding R estimates that are used to judge estimator performance.

Due to singularity issues, the column of the estimate of MZX in expression (13)

corre-sponding to the constant is removed. These problems are discussed in more detail in section 6.

5

Standard errors

Since rejection rates are a performance criterion in the simulations that are conducted, stan-dard errors should be computed for the NLLIML estimator. Nevertheless, simply using standard errors based on the large-sample-asymptotic covariance of NLLIML may be inap-propriate. For the linear model, Bekker (1994) finds that many-instruments asymptotics provide a better approximation of the finite-sample distribution of the LIML estimator. Many-instruments asymptotics are defined by the fraction of the number of instruments and the number of observations converging to a number α with 0 ≤ α < 1 as the number of observations increases; the standard, large-sample asymptotics have α = 0. Bekker derives standard errors based on these many-instruments asymptotics for the linear model, which entail better rejection rates than standard errors based on large-sample asymptotics. Such standard errors are often referred to as Bekker standard errors.

However, Bekker standard errors have not yet been derived for the nonlinear model, and do-ing so is not a trivial task. This is illustrated usdo-ing Wansbeek’s (2014) concise derivation of Bekker standard errors. Consider the model (1)–(3) with f linear: f = Xβ, and additional notation as in section 2. Write plim

N →∞ 1

(18)
(19)

written as ˆ

βLIM L = (X0C(ˆλ)X)−1X0C(ˆλ)y,

where C(ˆλ) = PZ − ˆλMZ, PZ = Z(Z0Z)−1Z0, MZ = I − PZ, and ˆλ is the smallest

solution to the generalized eigenequation det(S − λS⊥) = 0 with S = (y, X)0PZ(y, X) and

S⊥= (y, X)0MZ(y, X). It holds that

plim N →∞ 1 NX 0 C(ˆλ)X = Q and (17) plim N →∞ 1 NX 0C(ˆλ)εε0C(ˆλ)X = σ2 ε(Q + λΣu|ε), (18) with Σu|ε= Σuu− σ12 εσuεσ 0

uε and λ = α/(1 − α). Since

ˆ

βLIM L− β = (X0C(ˆλ)X)−1X0C(ˆλ)ε, (19)

it is found that

Avar( ˆβLIM L) = plim N →∞ 1 N(X 0 C(ˆλ)X)−1X0C(ˆλ)εε0C(ˆλ)X(X0C(ˆλ)X)−1 (20) = σε2Q−1(Q + λΣu|ε)Q−1.

Were f nonlinear, then the expression (19) would not hold. The expression (20) would be different and likely involve ∂β∂f0 rather than X; indeed,

∂f

∂β0 = X in the linear case. The

derivation of the probability limits (17) and (18) conveniently exploits the interaction of C( ˆλ) with the terms ZΠ and U that together constitute X. This would not be possible if X was replaced by ∂β∂f0, which complicates the derivation of Bekker standard errors in the

nonlinear model.

(20)

The second alternative is to obtain standard errors by bootstrap sampling. Given the equa-tion

yi = f (xi; β) + εi

for i = 1, 2, . . . N , bootstrap standard errors are obtained as follows:

1. Obtain the LIML estimates ˆβ of β and corresponding residuals ˆεi, i = 1, 2, . . . N ;

2. Sample with replacement N times from the ˆεi, i = 1, 2, . . . N , yielding ˜εi, i = 1, 2, . . . N ;

3. Construct new data ˜yi = f (xi; ˆβ) + ˜εi, i = 1, 2, . . . N ;

4. Obtain a LIML estimate ˜β(1) of β based on data xi and ˜yi, i = 1, 2, . . . N ;

5. Repeat steps 2–4 B − 1 more times, yielding estimates ˜β(2), ˜β(3), . . . ˜β(B); 6. The bootstrap standard error of the k-th element of ˆβ is given by B−11 PB

b=1( ˜β (b) k − ˆβk)

2,

with subscripts indicating the element of the vector.

Ideally, a high number of bootstrap replications B is chosen. However, this is infeasible in the simulations, since this bootstrap procedure is run in each replication of the simulation: setting B high implies unreasonable computation times.

The performance of these two methods of obtaining rejection rates in the linear model is assessed by means of simulation. The following setup is considered:

yi = αy + xiβ + εi; xi = αx+ K X k=1 zikπk+ ωεi+ ui; εi ui ! ∼ N (0, I),

i = 1, 2, . . . N , where αy = αx = β = 0, N = 500 and πk = 0 for 2 ≤ k ≤ K. Other

(21)

The latter is determined by choosing a theoretical F statistic eF ∈ {5, 10, 20} for the regression of the xt on the instruments, and then computing the corresponding value of π1:

π1 =

r K

N − 1(1 + ω

2)( eF − 1).

The instruments zik, k = 1, 2, . . . K, are drawn independently according to the standard

normal distribution. For each choice of parameter settings, R = 10000 replications are sim-ulated. In each replication, 2SLS and LIML estimates and standard errors are computed. 2SLS standard errors are based only on the large-sample-asymptotic covariance matrix, while large-sample-asymptotic standard errors (LSSE), Bekker standard errors and bootstrap stan-dard errors are computed for LIML.

The large-sample-asymptotic covariance matrices of the 2SLS and LIML estimators are equal in a linear model, and can therefore be similarly estimated. Write X for the matrix with rows (1, xi) and Z for the matrix with rows (1, zi1, zi2, . . . ziK). Again write PZ = Z(Z0Z)−1Z0.

Then the estimated covariance of 2SLS and LIML based on large-sample asymptotics is given by b V = ˆσ2(X0PZX)−1; ˆ σ2 = 1 N N X i=1 ˆ ε2i,

where ˆεi are the estimated residuals based on the 2SLS or LIML estimates, respectively.

Bekker standard errors are computed as follows. Write y for the vector with elements yi, and

again define S = (y, X)0PZ(y, X) and S⊥ = (y, X)0MZ(y, X) with MZ = I − PZ. Then

(22)

where ˆεiare the estimated residuals, ˆλ is the smallest solution to the generalized eigenequation

det(S − λS⊥) = 0, ˆαy and ˆβ are the LIML estimates of respectively αy and β, and the

subscript 22 denotes the bottom-right 2 × 2 submatrix of the corresponding matrix. Under

large-sample asymptotics, plim

N →∞

ˆ

λ = 0, which implies that bV and bVBekker are asymptotically

equivalent. Under many-instruments asymptotics, however, the latter is larger than the former asymptotically.

The simulation output is given in Table 2 on page 18. LIML with standard errors based on large-sample asymptotics performs quite well when instrument strength is high, but rejection rates are generally too high when the instruments are weak. It does not attain rejection rates as extreme as those of 2SLS given ω = 2 and many instruments, but under many instruments and ω = 0 it is outperformed by 2SLS.

Bekker standard errors yield similar rejection rates to those of 2SLS given ω = 0, but strongly outperform 2SLS given ω = 2. Generally, Bekker standard errors imply slight or moderate underrejection of the null hypothesis.

Interestingly, the bootstrap standard errors yield rejection rates that are very similar to those based on Bekker standard errors. Thus, bootstrap standard errors are an interesting alternative to Bekker standard errors, while the large-sample standard errors do not perform adequately. Therefore, bootstrap standard errors are used in the nonlinear simulations. They are used for both the NLLIML and the modified NL2SLS estimators.

6

Results

(23)

example, finding the optimum may be troublesome when the objective function is quite flat. Poor performance of the optimization algorithm is an alternative, potential explanation for the mediocre outcomes of LIML.

Second, the performance of all estimators is harmed by low instrument strength, in terms of median bias and often also in terms of rejection rates. This effect is particularly strong and unambiguous when ω = 2, so that there is endogeneity. Moreover, given ω = 2, absolute median bias is higher when the number of instruments is higher, and a higher number of instruments raises rejection rates for better or for worse.

Third, while the 90% range of LIML is smaller than of 2SLS in the logistic model, as expected due to the smaller asymptotic variance of LIML, its range is actually larger than the range of 2SLS in the other models. While this reinforces the finding that large-sample asymptotics do not always yield a good approximation of finite-sample estimator distributions, there is no evident explanation for the difference between models.

Fourth, the modified 2SLS and LIML estimators perform very similarly in all models in terms of all performance indicators. Exemplary histograms of estimates in Figures 2–4 confirm this, as the histograms of the modified 2SLS and LIML estimates are virtually identical for each of the three models. Therefore, Amemiya’s vision of modified 2SLS as a computationally lightweight alternative to LIML is fully justified. However, it must be noted that modified 2SLS also performs equally disappointingly as LIML in the models that are nonlinear in the parameters.

Fifth, there were problems with the computation of (X0MZX)−1 while attempting to

ob-tain the modified NL2SLS estimator in the logistic GLM model: the matrix X0MZX had an

all-zero column and row corresponding to the constant, which renders the matrix singular. Naturally, this should have been foreseen, as a constant is included both in the regressor matrix X and in the instrument matrix Z. The problem is solved by removing the column of MZX corresponding to the constant before substituting it into the objective function

(13).

Strangely, however, the same problem does not occur in the simulations for the dynamic model, which also includes certain variables in both X and Z. The computations of modi-fied NL2SLS in the dynamic model are therefore done without deleting columns from MZX.

(24)

harm-ful to the results, but this is no convincing evidence. Conversely, while all-zero rows and columns have no projective power, there is no reassuring theoretical basis for removing them. Meanwhile, practical applications of modified NL2SLS will often have variables included both as regressors and as instruments. The problems encountered in these simulations therefore require further investigation in future research.

The simulation output for the dynamic model with ρ = 0.2 and ρ = 0.5 is given in Table 3 on page 25 and in Table 4 on page 26, respectively. As noted earlier, the performance of modified NL2SLS and NLLIML is disappointing, with rather high median biases that of-ten exceed those of standard NL2SLS. Given endogeneity and weak or many instruments, rejection rates are excessively high for all estimators. Raising the level of endogeneity or low-ering instrument strength harms the performance of all three estimators in terms of median bias and rejection rates, but lowers the 90% range, which indicates that estimator variance is lower. Given endogeneity, a higher number of instruments implies higher median bias, shorter lengths of the 90% ranges and higher rejection rates. For ρ = 0.5, median biases are lower and 90% ranges are shorter than for ρ = 0.2.

Histograms of 500 estimates of β from equally many replications of the dynamic model are given in Figure 2 on 27. The settings are ρ = 0.2, ω = 2, 20 instruments, and eF = 5. Note that these are not the same replications used in the data in Table 3. The histograms of modified 2SLS and LIML, which are virtually identical, are somewhat more tail-heavy and more skewed to the left than the histogram of standard 2SLS. This is indicative of a slightly higher variance of LIML and modified 2SLS than of standard 2SLS, as can also be seen from the longer 90% range in Table 3. None of the histograms is centered around zero. Were the skewness absent, the shapes of the histogram would be normal-distribution-like.

(25)
(26)
(27)

−0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0 20 40 60 80 100 120 (a) Standard 2SLS −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0 20 40 60 80 100 120 (b) Modified 2SLS −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0 20 40 60 80 100 120 (c) LIML

Figure 2: Histograms of estimates of β from 500 replications of the dynamic model. Settings: ρ = 0.2, ω = 2, 20 instruments, eF = 5.

(28)
(29)
(30)

−30 −20 −10 0 10 20 0 50 100 150 (a) 2SLS −30 −20 −10 0 10 20 0 20 40 60 80 100 120 140 (b) LIML

Figure 3: Histograms of estimates of β from 500 replications of the NER model. Settings: d = 8, ω = 2, 41 instruments, e

F = 5.

(31)
(32)

0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 80 90 (a) Standard 2SLS 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 80 90 100 (b) Modified 2SLS 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 70 80 90 100 (c) LIML

Figure 4: Histograms of estimates of β from 500 replications of the logistic model, excluding estimates larger than 1. Settings: d = 8, ω = 2, 41 instruments, eF = 5.

(33)

instruments. Increased instrument strength is beneficial to the performance of both estima-tors.

Histograms of 500 estimates of β from equally many replications of the NER model are given in Figure 3 on 30. The settings are d = 8, ω = 2, 41 instruments, and eF = 5. Note that these are not the same replications used in the data in Table 6. The histogram of LIML is almost centered around zero, but is quite skewed leftward and non-normal. On the contrary, the 2SLS histogram is centered around a point that is clearly higher than zero, but only slightly skewed and looks reminiscent of a normal distribution.

The simulation output for the logistic model is given in Table 7 on page 31. Unlike in the other two models, all three estimators perform very similarly. Introduction of endogeneity worsens median bias and rejection rates, but reduces the 90% range; the same holds for the introduction of more instruments, particularly under endogeneity. Weaker instruments are detrimental to all performance indicators, again especially under endogeneity.

Histograms of estimates of β from 500 replications of the logistic model are given in Figure 4 on 32. The settings are d = 8, ω = 2, 41 instruments, and eF = 5. Note that these are not the same replications used in the data in Table 7. However, not all 500 estimates are included in these histograms: rather, the values larger than 1 have been excluded, which amount to less than 5% of all estimates. All estimators yield outliers much larger than 1, and inclusion of those outliers implies an uninformative histogram. Eerily, there are no neg-ative estimates of β for all estimators, and the histograms have a Gamma-distribution-like shape. The histograms of the modified 2SLS and LIML estimates are virtually identical, and are more concentrated around their nonzero modes than the histogram of the standard 2SLS estimates.

7

Conclusion

(34)

extend to the nonlinear model. Crucially, distinction is made between models that are non-linear only in their parameters, only in their variables, or in both simultaneously. One model of each type is considered.

It is found that LIML outperforms 2SLS only in the model that is nonlinear in its variables but not in its parameters; there is no clear preference for either estimator in the two models that are nonlinear in their parameters. It is unclear whether this failure of LIML to outdo 2SLS is due to the nonlinearity in the parameters or due to weakness of the optimization al-gorithm required to estimate the parameters of these models. Secondly, it is established that an increase in the number of instruments or a decrease in instrument strength is detrimental to the performance of both estimators under endogeneity.

There are several shortcomings or potential improvements of the present study. First, simu-lation settings have been set below par at several points to necessarily reduce computation time. Increasing the number of simulation replications or bootstrap replications may yield different outcomes and will yield outcomes that are subject to less variance. However, it is strongly advised to employ high-performance computing while doing so. Second, it should be determined whether the disappointing performance of LIML in the models that are non-linear in the parameters is due to the optimization algorithm or due to the nonnon-linearity in the parameters. A potential approach is to employ the optimization algorithm in the model that is nonlinear only in its variables and observe whether the performance of LIML worsens. Thirdly, there were singularity issues on several occasions which may have reduced estimator performance.

An interesting direction for future research is to extend the theoretical work by Bekker (1994) that established for the linear model the inconsistency of 2SLS and consistency of LIML un-der many-instruments asymptotics to the nonlinear model. Doing so should yield crucial insights about the causes of the present study’s outcomes: theory may confirm that both 2SLS and LIML perform badly in models that are nonlinear in the parameters, or a contrary finding would indicate that the problem is numerical. Moreover, theory would constitute a basis for advice on practical application of the estimators.

(35)

for the application of the considered estimators particularly with regard to these problems, since they are likely to occur often in practice.

References

J.S. Alper and R.I. Gelb. Standard errors and confidence intervals in nonlinear regression: comparison of Monte Carlo and parametric statistics. Journal of Physical Chemistry, 94 (11):4747–4751, 1990.

T. Amemiya. The non-linear two-stage least squares estimator. Journal of Econometrics, 2: 105–110, 1974.

T. Amemiya. The nonlinear limited-information maximum-likelihood estimator and the mod-ified nonlinear two-stage least-squares estimator. Journal of Econometrics, 3:375–386, 1975.

T.W. Anderson. Origins of the limited information maximum likelihood and two-stage least squares estimators. Journal of Econometrics, 127:1–16, 2005.

T.W. Anderson and H. Rubin. Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics, 20:46–63, 1949.

T.W. Anderson and H. Rubin. The asymptotic properties of estimates of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics, 21:570–582, 1950.

T.W. Anderson, N. Kunitomo, and T. Sawa. Evaluation of the distribution function of the limited information maximum likelihood estimator. Econometrica, 50(4):1009–1027, 1982. P.A. Bekker. Alternative approximations to the distributions of instrumental variable

esti-mators. Econometrica, 62(3):657–681, 1994.

P.A. Bekker and J. van der Ploeg. Instrumental variable estimation based on grouped data. Statistica Neerlandica, 59(3):239–267, 2005.

(36)

K. Br¨ann¨as and M. Melkersson. Endogeneity in a binomial count data model. Unpublished paper, 1998 or later.

A.C. Cameron and P.K. Trivedi. Microeconometrics: Methods and Applications. Cambridge University Press, 2005.

J.C. Chao and N.R. Swanson. Estimation and testing using jackknife IV in heteroskedastic regressions with many weak instruments. Working paper, March 2004.

R. Davidson and J.G. MacKinnon. Econometric Theory and Methods. Oxford University Press, 2009.

J.A. Hausman, W.K. Newey, T. Woutersen, J.C. Chao, and N.R. Swanson. Instrumental vari-able estimation with heteroskedasticity and many instruments. Quantitative Economics, 3:211–255, 2012.

H.H. Kelejian. Two-stage least squares and econometric systems linear in parameters but nonlinear in the endogenous variables. Journal of the American Statistical Association, 66 (334):373–374, 1971.

G. Rayp and N. Van de Sijpe. Measuring and explaining government efficiency in developing countries. Journal of Development Studies, 43(2):360–381, 2007.

D. Staiger and J.H. Stock. Instrumental variables regression with weak instruments. Econo-metrica, 65(3):557–586, 1997.

H. Theil. Repeated least-squares applied to complete equation systems. Technical report, Centraal Planbureau, 1953a.

H. Theil. Estimation and simultaneous correlation in complete equation systems. Technical report, Centraal Planbureau, 1953b.

Referenties

GERELATEERDE DOCUMENTEN

Indicates that the post office has been closed.. ; Dul aan dat die padvervoerdiens

I envisioned the wizened members of an austere Academy twice putting forward my name, twice extolling my virtues, twice casting their votes, and twice electing me with

order models the correlation of the different quantities are mostly below 10 degrees. It seems that with the overparametrized formulation, the true noise model coefficients cannot

7 a: For both the SC- and KS-informed hybrid ground-truth data, the number of hybrid single-unit spike trains that are recovered by the different spike sorting algorithms is shown..

With an additional symmetry constraint on the solution, the TLLS solution is given by the anti-stabilizing solution of a 'symmetrized' algebraic Riccati equation.. In Section

In order to compare the PL-LSSVM model with traditional techniques, Ordinary Least Squares (OLS) regression using all the variables (in linear form) is implemented, as well as

For the first generation group of Antillean, Aruban and Moroccan juveniles, the likelihood of being recorded as a suspect of a crime is three times greater than for persons of

Lasse Lindekilde, Stefan Malthaner, and Francis O’Connor, “Embedded and Peripheral: Rela- tional Patterns of Lone Actor Radicalization” (Forthcoming); Stefan Malthaner et al.,