On the limiting and empirical distribution of IV estimators when some of the instruments are invalid - 484fulltext

(1)

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

On the limiting and empirical distribution of IV estimators when some of the

instruments are invalid

Kiviet, J.F.; Niemczyk, J. Publication date 2006 Document Version Submitted manuscript Link to publication

Citation for published version (APA):

Kiviet, J. F., & Niemczyk, J. (2006). On the limiting and empirical distribution of IV estimators when some of the instruments are invalid. (UvA-Econometrics Working Paper; No. 2006/02). Faculteit Economie en Bedrijfskunde.

http://aimsrv1.fee.uva.nl/koen/web.nsf/view/EB445FE1F7880780C12571A90036B746/$file/06 02.pdf

General rights

It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations

If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

(2)

On the limiting and empirical distribution of IV

estimators when some of the instruments are invalid

Jan F. Kiviet

and

Jerzy Niemczyk

Tinbergen Institute, University of Amsterdam

preliminary draft of 11 July 2006

JEL-classi…cation: C13, C15, C30

Keywords: empirical density, inconsistent estimators, invalid

instruments, limiting distribution, weak instruments

Abstract

In practice IV estimation will often be employed when in fact some of the instruments are invalid. This is because moment conditions can only be tested if already a su¢ cient number of valid but untestable instruments is available. Moreover, tests for the validity of additional instruments, i.e. so-called overidenti-…cation restriction tests, will have limited power when instruments are weak and samples are small. We examine the case where the model is treated as over or just identi…ed, although some of the instruments may actually be invalid, whereas all variables are stationary. We derive an expression in terms of the various parame-ters of the data generating process for the inconsistency of the invalid IV estimator and obtain its limiting normal distribution. In speci…c simple models we scan this approximation to the …nite sample distribution over the relevant parameter space and compare it with the actual empirical distribution obtained by simulation. This parameter space is transformed to measures for: (i) model …t; (ii) simultaneity; (iii) instrument invalidity; and (iv) instrument weakness. We present our …nd-ings over this multi-dimensional parameter space by using dynamic visualization techniques, which can best be enjoyed on screen. Our major …ndings are that: (a) for the accuracy of large sample asymptotic approximations instrument weak-ness is much more detrimental than instrument invalidity; (b) the realizations of IV estimators obtained from strong but possibly invalid instruments seem usually much closer to the true parameter values than those obtained from valid but weak instruments.

Department of Quantitative Economics, Faculty of Economics and Business, University of Am-sterdam, Roetersstraat 11, 1018 WB AmAm-sterdam, The Netherlands; phone +31.20.5254217; email J.F.Kiviet@UvA.NL and J.Niemczyk@UvA.NL

(3)

1 Introduction

When in a regression model some of the explanatory variables are contemporaneously correlated with the disturbance term, whereas this correlation is unknown, then one needs further variables to use as instruments in order to …nd consistent estimators by the method of moments. These instrumental variables should have known (usually zero) correlation with the disturbances. Then they provide so-called orthogonality conditions that make it possible to obtain consistent instrumental variable (IV) estimators. In practice, however, it is mostly hard to assess whether an instrumental variable is valid indeed, i.e. is uncorrelated with the disturbance term. Firstly, instrument validity or orthogonality tests are only viable under just identi…cation or overidenti…cation by truly valid instruments. That is, they are built on the prerequisite of having already available a number of undisputed valid instruments, at least as great as the number of coe¢ cients

(k) to be estimated, whereas the validity of the initial k instruments is untestable.

Moreover, orthogonality tests will have reasonable power only when the instruments employed and those under test are not too weak (are su¢ ciently correlated with the regressors) and the sample size is substantial. Therefore, it seems very likely that IV estimation will often be employed when in fact some of the instruments are invalid. In this case the IV estimator for the structural parameters is inconsistent, even when the structural equation itself is correctly speci…ed for the parameters of interest.

In this paper we consider general and speci…c forms of linear structural equations and corresponding partial reduced form systems in stationary variables and examine the IV estimator when some of its exploited orthogonality conditions actually do not hold. We cover the general case where the number of moment conditions exploited (l); i.e. the valid and invalid conditions together, is at least as large as the number of unknown

coe¢ cients, i.e. we consider the (alleged) over or just identi…ed case (l k): We focus

on the distribution of such an invalid IV estimator for a single structural equation that for the rest has been speci…ed correctly in the sense that its implied error term is IID (independent and identically distributed) with unconditional expectation equal to zero. An alternative point of departure is chosen in Hale et al. (1980) where instruments are invalid due to omitted regressors. We derive an expression in terms of parameters and data moments for the inconsistency of the invalid IV estimator in an otherwise correctly speci…ed model and also obtain its limiting distribution. These results yield a …rst-order asymptotic approximation to the actual distribution in …nite sample of IV estimators. The asymptotic variance proves to be a rather complicated expression, although it can be substantially simpli…ed when specialized for the just identi…ed case

(l = k): By simulation we verify over the relevant parameter space of simple classes

of models whether these analytic …ndings are accurate regarding the actual estimator distribution in …nite sample and how these depend on the various model parameters.

In the illustrations we focus …rst on a very simple speci…c type of model, entailing just one explanatory variable and one instrument. Here we show that all our …ndings, both the analytic asymptotic and the simulated …nite sample results, are driven by just four primal econometric model characteristics, in addition to the sample size. These four characteristics are straightforward transformations of the underlying parameters of the data generating process. They are all related to particular correlation coe¢ cients, viz.: (i) the model …t; (ii) the degree of simultaneity; (iii) the degree of invalidity of the instrument; and (iv) the degree of instrument weakness. Thus, even in the simple

(4)

one-regressor one-instrument model, the distributional properties of the IV estimator are functions involving …ve arguments, which makes it di¢ cult to depict their behavior over all relevant argument values. Instead of presenting extensive tables, we present some series of 2D and 3D graphs in print, and we use dynamic multi-dimensional visualization techniques to present our …ndings more elegantly and e¤ectively on screen through ani-mations. We also present some results for an (alleged) overidenti…ed one-regressor model with two instruments. Here we …nd that both the actual …nite sample distribution and its asymptotic approximation can be expressed using just one extra argument, whereas at …rst face one might conjecture that both the invalidity and the strength of the extra instrument might matter separately.

The analysis of IV estimators employing invalid instruments has not received much attention in the literature yet. Although its limiting behavior has been examined before, see Maasumi and Phillips (1982), Hahn and Hausman (2003) and Hall and Inoue (2003), none of these studies provides a simple explicit formula for the asymptotic variance in

the general linear multivariate case, where l k 1: Such a formula is obtained here

by extending an approach1 _{that yielded similar results for inconsistent OLS estimators.}

The latter can be found in Kiviet and Niemczyk (2005), which completes some initial results obtained in Joseph and Kiviet (2005). As far as we know simulation evidence on the actual …nite sample distribution of valid and invalid IV estimators covering almost the entire parameter space of some basic models as presented here has not been produced before. The analysis of the exact …nite sample properties of consistent IV estimators has a long history, see Sawa (1969) and Phillips (1980). Recent contributions and further references can be found in, for instance, Phillips (2005) and Hillier (2005). Our …nd-ings illustrate those on the e¤ects of instrument weakness on the …nite sample density of consistent IV estimators by both Woglom (2001), who focusses on just identi…ed IV estimators, and Forchini (2006), who gives further theoretical underpinnings in case of overidenti…cation. They also supplement these …ndings, because we provide more exten-sive illustrations and consider invalid instruments as well. Whereas much of the recent literature on weak instruments focusses on developing appropriate tests and con…dence sets when instruments are weak but valid, see for instance Hahn and Inoue (2002) and Andrews and Stock (2005), the present study analyzes and illustrates properties of the distribution of coe¢ cient estimators.

From our simulations we establish that invalid but reasonably strong instruments yield IV estimators which have a distribution in small samples that is rather close to the analytic large-sample asymptotic approximations derived here. Hence, the distribu-tion of these estimators is often close to normal, but has its probability mass centered around the pseudo-true-value instead of the true value. However, when instruments are very weak, we establish that the accuracy of standard large-sample asymptotics is very poor, as had already been established for the valid instrument case. More importantly, though, for both valid and invalid instruments we …nd also that when the instrument is weak the probability mass of the actual distribution of instrumental variable estima-tors is generally much closer to the true value of the coe¢ cient than indicated by these much too ‡at asymptotic approximations. For valid but rather weak instruments it had already been established that the …nite sample distribution of IV can be skew, and that it becomes bimodal for very strong simultaneity, whereas for extreme weakness (i.e.

(5)

close to underidenti…cation) the dispersion explodes, while the median moves away from the true parameter value towards the probability limit of OLS. We …nd that for invalid weak instruments skewness, bimodality and a median away from the pseudo-true-value may occur for much more moderate weakness and simultaneity. Note, however, that in practice one can easily avoid to use weak instruments if one would lift the ban on invalid instruments, since weakness (unlike validity) can straight-forwardly be assessed. Because the invalid IV estimator is reasonably well behaved for reasonably strong in-struments, a tentative conclusion is that it seems more promising to attempt to produce accurate inference from IV estimators based on strong (as in OLS) but possibly invalid instruments, than on valid but weak instruments. In the latter case, not only the stan-dard asymptotic approximation is poor, but also the actual behavior of the distribution of the estimator is rather erratic and has much larger estimation errors than invalid but strong instruments produce. So, even when its actual behavior could be adequately approximated by alternative weak-instrument asymptotic methods it still may have an actual distribution that is less attractive than that of an IV estimator based on strong but possibly invalid instruments.

The structure of this paper is as follows. In Section 2 we introduce the model to be estimated and the generating schemes for all explanatory and instrumental variables, with their underlying statistical assumptions. Focussing on the alleged overidenti…ed case we consider the generalized IV or 2SLS estimator and derive its inconsistency and limiting distribution (proofs in appendices) for the case where all variables are weakly stationary, i.e. their …rst and second moments are constant through time. Next the results are specialized for the just identi…ed case. Section 3 contains graphic illustrations of both the asymptotic and …nite sample distributions in speci…c simple models. From these we examine the accuracy of the asymptotic approximations and the actual behavior over di¤erent values of all the various determining factors. Moreover, we compare the e¤ectiveness of IV with respect to OLS, which uses always extremely strong but possibly invalid instruments. In separate subsections we consider models with l = k = 1 and

with l 1 = k = 1: Hence, we focus on the very simple model with just one endogenous

explanatory variable and one or two - possibly weak and possibly invalid - instrumental variables. Finally, Section 4 concludes.

2 Model, assumptions and theorems

We consider data generating processes for variables for which n observations have been collected in y; X and Z; which all have n rows. The matrices X and Z have k and l

columns respectively, with l k. X contains the explanatory variables for vector y in a

linear structural model with structural disturbance vector ": The l variables collected in

Z will be used as instrumental variables for estimating the k structural parameters of

interest : Not all these instruments are necessarily valid, some of them may be weak,

whereas others may be extremely strong, especially when columns of X correspond to (or are spanned by) columns of Z: The basic framework is characterized by the following parametrization and stationarity and regularity conditions.

Framework A.We have: (i) the structural equation y = X + "; (ii) with disturbances

(6)

0; E("2

i) = 2"; E("3i) = 3 "3 and E("4i) = 4 4"; (iii) while X = X + " 0 and Z =

Z + " 0; such that E(X0_{") = 0} _{and E(Z}0_{") = 0; with} _and _{…xed parameter vectors}

of k and l elements respectively. Moreover, (iv) X0_X plim_n!1n 1X0X; _Z0_Z

plim_n!1n 1_Z0_Z _and

Z0X plim_n!1n 1Z0X have all full column rank, and (v) so

have X0_{X; Z}0_Z _{and Z}0_X _{with probability one. Finally, (vi) we have E(}1

nZ0Z j Z)

Z0_Z = o_p(n 1=2) and E(_n1X0Z j X; Z) _X0_Z = o_p(n 1=2):

Note that A(iii) implies

E(X0") = n 2_" and E(Z0") = n 2_" : (1)

Hence, if j = 0 for some j 2 f1; :::kg then the j-th regressor in X is predetermined

and will establish a valid instrument; otherwise, when _j _{6= 0; the j-th regressor is}

endogenous. Likewise, if g = 0 for some g 2 f1; :::; lg then the g-th column of Z

establishes a valid instrument, and an invalid instrument otherwise. It can be shown that

A(vi) boils down to the mild regularity assumptions 1_nZ0_{Z plim}

n!1n 1Z0Z = op(n 1=2)

and _n1X0_Z _plim

n!1n 1X0Z = op(n 1=2):

Since l kthe generalized instrumental variable (GIV) or 2SLS estimator of exists

and is given by ^ GIV = [X0Z(Z0Z) 1 Z0X] 1X0Z(Z0Z) 1Z0y (2) = ( ^X0X)^ 1X^0y;

where we introduced the notation ^

X = Z ^ (3)

= Z(Z0Z) 1Z0X;

where ^ = (Z0Z) 1Z0X contains the (reduced form) coe¢ cient estimates of the

…rst-stage regressions. In Framework A the probability limit of ^_GIV exists. We de…ne

GIV plim ^GIV (4)

= + plim[X0Z(Z0Z) 1Z0X] 1X0Z(Z0Z) 1Z0"

= + 2_"[ X0_Z _Z₀1_Z _Z0_X] 1 _X0_Z _Z1₀_Z ;

where _GIV is also known as the pseudo-true-value of ^_GIV: We shall denote the

incon-sistency of ^GIV as • GIV GIV (5) = 2_"[ X0Z Z01Z Z0X] 1 X0Z Z01Z = 2_" _^1 X0_X^ 0 ;

where we used _X^0_X^ X0Z _Z10_Z Z0X and plim(Z0Z) 1Z0X = _Z01_Z Z0X: Note

that in Framework A the GIV estimator is consistent if and only if = 0:

Below, we shall also look into the special case l = k (just identi…cation), where the above GIV results specialize to simple IV, i.e.

^ IV = (Z0X) 1Z0y; (6) IV = + 2 " 1 Z0X ; • IV = 2 " 1 Z0X :

(7)

When in fact Z = X (all regressors are used as instruments), i.e. = ; then IV specializes to OLS, i.e.

^ OLS = (X0X) 1 X0y; (7) OLS = + 2" X10X ; • OLS = 2 " 1 X0_X :

For the sake of simplicity, we will start with deriving special results for models with disturbances that have 3th and 4th moments corresponding to those of the normal distribution. Therefore, we also state:

Framework B. This specializes Framework A to the case: 3 = 0 and 4 = 3:

For GIV and IV estimators we now obtain the following results (proved in appendices) on their convergence in distribution.

Theorem 1. In Framework B we have n1=2_(^

GIV GIV)! N 0; VGIVN ; with

V_GIVN = 2_"c5(1 c3+ c4) _X^10_X^ + 2 "c 2 4 1 ^ X0_X^ X0X _X^10_X^

c4[ _X^10_X^ X0_X•_GIV•0_GIV + •_GIV•0_GIV _X0_X _^1

X0_X^] + 4_"c4(1 2c4) _X^10_X^ 0 1 ^ X0_X^ + 2_"c4(1 2c5)[ _X_^1₀_X_^ • 0 GIV + •GIV 0 1 ^ X0_X^] +[c5(1 2c5) c3+ "2(• 0

GIV X0_X•_GIV)]•_GIV•0_GIV;

where c1 2

" 0 1

Z0_Z ; c2 0 •GIV; c3 0•GIV; c4 = c1 c2 and c5 1 c3 c4:

The N in the superindex of VGIVN indicates that it refers to the case where the

distur-bances are "almost normal", because ₃ = 0 and ₄ = 3: We …nd that the limiting

distribution of ^GIV is still genuinely normal when instruments are invalid, although no

longer centered at but at the pseudo-true-value GIV:When all instruments are valid,

i.e. = 0; then _GIV = ; •_GIV = 0 and c1 = c2 = c3 = 0;giving c4 = 0 and c5 = 1; so

that Theorem 1 specializes to the standard result n1=2_(^

GIV )! N(0; 2"

1 ^

X0_X^):Note

that when all instruments are valid the asymptotic variance of ^GIV is not determined by

the simultaneity ; because _X^0_X^ = X0Z _Z10_Z Z0X = X0Z Z01_Z Z0X: However, when

instruments are invalid, i.e. _{6= 0; then} Z0X = Z0X + 2" 0 and thus X^0_X^ is

de-termined by both and : Then, when …tting X to Z; the " 0 part of X is no longer

(asymptotically) orthogonal to Z; due to the presence of " 0: This does not only lead to

the inconsistency, but also to the many extra terms in the asymptotic variance.

For the special case l = k we have 0 •GIV = 2" 0 Z01Z Z0_X _Z1₀_X = 2_" 0 _Z₀1_Z ; so

c1 = c2; c4 = 0 and c5 = 1 c3; giving:

Corollary 1. In Framework B for the special case l = k we have n1=2_(^

IV IV) ! N 0; VIVN ; with V_IVN = 2_"(1 c3)2 Z10X Z0Z X10Z [2c 2 3 2c3+ 1 "2(• 0 IV X0X•IV)]•IV• 0 IV;

(8)

where c3 0•_IV = 2 " 0

1

Z0_X :

When all instruments are valid, i.e. = 0; this result specializes to the standard

re-sult n1=2_(^

IV IV) ! N 0; 2"

1

Z0_X Z0Z _X10_Z : Since for general and the scalar

2 " 0

1

Z0X can either be positive or negative no general conclusions can be drawn on

the behavior of VN

IV in comparison to the reference case 2" Z10X Z0_Z _X1₀_Z: Depending

on the particular parametrization and data moment matrices the asymptotic variance of

individual coe¢ cient estimates may either increase or decrease, due to _{6= 0 or 6= 0:}

When Z = X; which gives = and ^IV = ^OLS; the resulting VIVN = VOLSN is the

same as the formula found for an inconsistent OLS estimator when the disturbances are (almost) normal, as derived in Kiviet and Niemczyk (2005).

Next we look at the case where the disturbances may have general 3rd and 4th

moment. Let be a n 1 vector of unit elements. Upon de…ning

Z0 plim n 1Z0 = plim n 1Z0 _Z0 ;

X0 plim n 1X0 = plim n 1X0 _X0 ;

we …nd (superindex N N indicates nonnormal disturbances):

Theorem 2. In Framework A we have n1=2_(^

GIV GIV)! N 0; VGIVN N ; where VGIVN N

is equal to VN

GIV; given in Theorem 1, plus two additional terms. When 4 6= 3 the

additional term is ( ₄ 3)_f 4_"c2₄ _^1 X0_X^ 0 1 ^ X0_X^ + 2 "c4c5[•GIV 0 1 ^ X0_X^ + 1 ^ X0_X^ • 0 GIV] + c 2 5•GIV• 0 GIVg;

where c4 = 2_" 0 _Z01Z 0 •GIV and c5 = 1 c4 0•GIV: When 3 6= 0 the additional

term is 3fc4[ 3"c4 _X^10_X^ X0 0 _^1 X0_X^ "•GIV• 0 GIV X0 0 _^1 X0_X^ + 3 "c5 _X^10_X^ 0 Z0 0 _^1 X0_X^ +( 3_" _^1 X0_X^ "•GIV)( 2 " 0 • 0 GIV X0Z) _Z01_Z Z0 0 _^1 X0_X^] +c5[ "c4 _X^10_X^ X0 • 0 GIV 1 " • 0 GIV X0 •_GIV• 0 GIV + "c5 _X^10_X^ 0 Z0 • 0 GIV +( " _X_^1₀_X_^ "1•GIV)( 2 " 0 • 0 GIV X0Z) _Z₀1_Z Z0 •0_GIV] +c4[ 3"c4 _X^10_X^ 0X0 _X_^1₀_X_^ " _X^10_X^ 0X0 •_GIV• 0 GIV + 3 "c5 _X^10_X^ 0Z0 _X_^1₀_X_^ + _^1 X0_X^ 0Z0 _Z10Z( 2 " Z0_X•_GIV)( 3_" 0 _^1 X0_X^ "• 0 GIV)] +c5[ "c4•GIV 0X0 _X^10_X^ 1 " • 0 GIV X0 •_GIV• 0 GIV + "c5•GIV 0Z0 _X^10_X^ +•_GIV 0_Z0 _Z01Z( 2 " Z0_X•_GIV)( _" 0 _^1 X0_X^ 1 " • 0 GIV)]g:

When all instruments are valid, i.e. = 0;then this result again collapses to the standard

result, i.e. VN N

GIV = 2"

1 ^

X0_X^;which highlights that normality of the disturbances is not a

requirement for the standard normal limiting distribution of ^GIV.

For the special case l = k Theorem 2 yields:

Corollary 2. In Framework A for the special case l = k we have n1=2_(^

IV IV) ! N 0; VIVN N ; with V_IVN N = 2_"c2₅ _Z01X Z0Z X10Z+ 3 "c25[ 1 Z0X Z0 • 0 IV + •IV 0Z0 _X10Z] [(5 ₄)c2₅+ 2c5( ₃ _"1•0_IV X0 1) + 1 _"2•0_IV X0X•IV]•IV• 0 IV:

(9)

where c5 1 0•_IV = 1 2 " 0

1

Z0_X :

Of course, for 3 = 0 and 4 = 3 this result simpli…es to that of Corollary 1: It also

shows that an increase (decrease) in the kurtosis leads to a larger (smaller) asymptotic variance.

In the proofs of the above theorems we employ a lemma that is a straightforward

extension of the following simple CLT (central limit theorem), which says: Let vi be a

k 1 random vector such that E(vi) = 0; E(vivi0) = Vi and E(viv0h) = O for i 6= h =

1; :::; n;then n1=2_v

! N(0; limn!1V ); where v = n 1Pn

i=1vi and V = n

1Pn

i=1Vi:We

employ the following generalized version:

Lemma. Let W = (w1; :::; wn)0 be a n k random matrix and ! a k 1 nonrandom

vector, whereas the n 1 vector " = ("1; :::; "n)0 has mutually uncorrelated elements

for which E("i _{j w}i) = 0; E("2i j wi) = 2"; E("3i j wi) = 3 3" and E("4i) = 4 4":

Then the k 1 vector vi = wi"i+ !("2i 2") has zero expectation, conditional variance

E(vivi0 j wi) = Vi = 2"wiw0i+ 3 "3(wi!0+ !wi0) + ( 4 1) 4"!!0; whereas E(vivh0) = O

for i 6= h; so that for n 1=2Pn

i=1vi = n 1=2[W0" + !("0" n 2")] the CLT implies

n1=2v _{! N[0;} 2_" W0W + 3 3"( W0 !0 + ! 0W) + ( 4 1) 4"!!0];

where W0W plim n 1W0W and W0 plim n 1W0 00W with a n 1 vector of

unit elements.

3 Illustrations

To illustrate the analytical asymptotic …ndings obtained in the foregoing section, we shall calculate the various formulas for particular models and show the corresponding normal densities over relevant parts of the parameter space. In addition, we will simulate these models and depict the empirical density of the estimators to check the relevance and accuracy of the …rst-order asymptotic approximations in …nite sample. Also we compare IV and GIV estimators (using possibly invalid and possibly weak instruments) with OLS. The latter estimator always uses extremely strong instruments that at the same time are invalid in case of simultaneity.

The limiting distributions obtained in the foregoing section are all of the generic form

n1=2_(^ •) ! N(0; V ) and they imply a …rst-order approximation to the distribution

of ^ in …nite sample that can be expressed as

^ a

N( + •; n 1V ): (8)

This entails a …rst-order asymptotic approximation to the mean error of ^ equal to

• = and to the mean squared error (AMSE) given by

AMSE(^) n 1V + • •0: (9)

The actual values of • and of (the square root of) AMSE(^) can be computed for any

n and any given values of the model parameters and asymptotic data moments. To …nd

(10)

with corresponding Monte Carlo estimates obtained from a series of realizations of ^ in simulated …nite samples. However, these cannot be achieved in the standard way when ^ does not have …nite …rst or second moments in …nite sample, as is the case

when l k 1:Then, irrespective of the number of Monte Carlo replications employed,

the sample moments from Monte Carlo experiments are not informative as they do not converge. Appropriate alternatives for the mean error and for the root mean squared error are then the median error and the median of the absolute error.

For a scalar estimator ^ of the median error ME(^) and the median absolute error

MAE(^) are de…ned as

Pr_f(^ ) ME(^)_{g = 0:5;}

Pr_{fj ^} _{j MAE(^)g = 0:5:} (10)

From a series of R independent Monte Carlo realizations ^(r) (r = 1; :::; R) we estimate

ME(^) by sorting the values (^(r) ) and taking the median value, and likewise for

MAE(^) _{after sorting the values j ^}(r) _{j : Of course, AMSE(^) is not the natural}

asymptotic counterpart of the Monte Carlo estimate of MAE(^): We assess the (scalar) asymptotic version AMAE(^) of MAE(^) in the following way. Let the CDF of the

normal approximation to the distribution of ^ be indicated by _•; _^(x):Then, for

m AMAE(^); we have 0:5 = Pr_{fj ^} _{j mg = 1} Pr_{fj ^} _{j> mg} = 1 Pr_f^ > m_g Pr_f^ < m_g = Pr_f^ < m_g Pr_f^ < m_g a = _•; ^(m) •; ^( m);

so that we can solve2 _m _from

•; ^(m) = 0:5 + •; ^( m): (11)

Below we will examine the empirical …nite sample distribution of scalar ^GIV and

compare it with ^_GIV a N( + •GIV; n 1VGIVN ):In addition, for various estimators ^GIV

(including ^IV and ^OLS), we examine MAE(^GIV)and compare it with AMAE(^GIV)

over the entire parameter space of two simple classes of models. We examined these models under Framework B only, employing normally distributed disturbances.

3.1 A simple just identi…ed model

We commence by considering the most basic example one can think of, viz. a model with one regressor and one possibly invalid and either strong or weak instrument, i.e.

k = l = 1: The two variables x and z, together with the dependent variable y, are

2_{Since m =} 1

•; ^[0:5 + •; ^( m)]; we employed the iterative scheme, m0 = 0; mi+1 =

1

•; ^[0:5 + •; ^( mi)] for i = 0; 1; ::: until convergence. When • = 0 no iteration is required since

(11)

supposed to be jointly IID with zero mean and …nite second moments. Hence, the variables are strongly stationary and our Theorems 1 and 2 apply. This model has been

addressed often before, recently in Woglom (2001) and Hillier (2005), and for l 1 in

Bound et al. (1995) and Hahn and Hausman (2003), although only in the latter paper invalid instruments are being considered.

We …rst evaluate the relevant expressions for the asymptotic distribution given in Corollary 1. In the model with k = l = 1 we can simplify the notation considerably, by

writing 2

x for X0X; xz or xz x z for Z0X;etc. Using = z"= 2" and = x"= 2" we

obtain • IV = IV = 2 " 1 Z0_X = z" xz = z" xz " x (12) c3 = 2" 0 1 Z0_X = 1 2 " z" x" xz = z" x" xz •0 IV X0X•IV = 4 " 0 1 Z0_X X0X _X10_Z = 2z" 2 x 2 xz = 2_" 2 z" 2 xz ; giving V_IVN = 2 " 2 x (1 2_z")( _xz _{z" x"})2+ 4_z"(1 2_x") 4 xz ; (13)

in the case where the disturbances are (almost) normally distributed. The expression for

the inconsistency •IV shows that its sign is determined by the sign of z"= xz; whereas

its magnitude is inversely related to the strength of the instrument, cf. Bound et al.

(1995). VN

IV is una¤ected by the signs of z"; x" and xz as long as the sign of the

product z" x" xz remains the same, or when either x" or z" is zero. Self-evidently, VIVN

diverges for xz approaching zero.

Without loss of generality we may focus in this model on the case = 1:This is just

a normalization and not a restriction, because we can imagine that we started o¤ from

a model yi = #x#_i + "i; with # 6= 0; and rescaled the explanatory variable such that

xi = x#i =

#_:

An important characteristic of the model is the signal-to-noise ratio (SN ), which is equal to SN = 2 2 x 2 " = 2 x 2 " : (14)

From (12) and (13) we …nd that VIVN and • are proportional to (the square root of) the

inverse of SN: In fact, after normalization to = 1;the approximation to the distribution

of the IV estimator in this simple model ^_IV a N( +•; n 1VN

IV)is completely determined

by n and the four model characteristics xz; x"; z" and SN:

Next we focus on obtaining an appropriate data generating scheme for this model, which is to be used in the simulations. In the notation of Section 2 it should be given by yi = xi+ "i xi = xi+ "i zi = zi+ "i 9 = ; (15)

where and are scalar. In order to obtain ("i; xi; zi)0 IID(0; ); with appropriate

(12)

then parameterize as follows:

"i = "vi;1;

xi = 1vi;2;

zi = 2vi;2+ 3vi;3:

This provides full generality. The coe¢ cient 1 determines 2x; whereas E(xi"i) = 0; as

it should. Also E(zi"i) = 0; and 2 and 3 enable any correlation between xi and zi and

any value of 2z: The above implies

0 @ "i xi zi 1 A = 1=2_v i = 0 @ " 0 0 " 1 0 " 2 3 1 A 0 @ vi;1 vi;2 vi;3 1 A : (16)

Note that the zero elements do not entail restrictions on ; because 1=2 is non-unique

and a lower-triangular form with positive diagonal elements can be found for any positive

de…nite .

In this simple model with k = l = 1 we have ^ IV = P ziyi P zixi = + P zi"i P zixi ; (17)

which clari…es that, irrespective of the sample size, the distribution of ^_IV is invariant to

the scale of zi:We may also change the sign of all the zi without a¤ecting ^IV:Therefore,

we may restrict ourselves in the illustrations to cases with _xz > 0 (the case _xz = 0

leads to underidenti…cation and was already excluded in the assumptions). Since the

distribution of ^IV becomes just its mirror-image when all xi are changed in sign, we

shall also restrict ourselves to cases where _z" 0; because of the following reasoning.

The value of _xz is invariant to changing the signs of all xi and zi values. Hence, for any

value of xz > 0the distribution of ^IV for z" 0and arbitrary positive or negative

value of _x" is equivalent with the distribution of (^_IV ) for _z" 0 and _x":

It is also obvious that x and " do not a¤ect the distribution of ^IV separately, but

only through their ratio. Hence, without loss of generality, we can impose some genuine

equality restrictions on the 6 parameters of : For these we choose

" = 1; (18)

2

z =

2

+ 2₂+ 2₃ = 1: (19)

By (18) we normalize all results with respect to "; and (14) simpli…es to

SN = 2_x: (20)

Because any GIV estimator is invariant to the scale of the instruments (only the space spanned by the instruments is relevant) we may impose (19), which will be used to obtain the value

3 = (1 2 22)1=2 ; (21)

where, without loss of generality, we may stick to positive values for 3 as long as v

(3)

i is

(13)

data realizations if both 1 and 2 would be changed in sign. Therefore, below we will

restrict ourselves to just positive values for both 1 and 3:

The above yields the following data (co)variances and correlations: 2 x = 2 + 2₁ 2_y = 2+ 2 + 1 + 2₁ x"= x"= = p ₂ + 2 1 z" = z" = xz = + 1 2 xz = ( + 1 2)= p ₂ + 2 1 (22)

Note that these, after the normalizations = 1; " = 1 and z = 1, depend on only

4 free parameters of the data generating process (DGP), viz. ; ; 1 and 2: As we

already established, the expressions for inconsistency in (12) and asymptotic variance

(13) evaluated under ₃ = 0 and ₄ = 3 (the 3rd and 4th moment of vi;1) depend on

just four characteristics too, viz. on x"; z"; xz and SN = 2x. The latter four can be

used in this simple model as a base for the Monte Carlo design parameter space, since they determine the parameters of the DGP through the relationships

= _z";

= _x" x;

1 = x (1 2x")1=2 ;

2 = ( xz x" z")= (1 2x")1=2 ;

(23)

from which 3 follows directly by evaluating (21). This reparametrization is useful,

because the parameters _x"; _z"; _xz and SN have a direct econometric interpretation,

viz. the degree of simultaneity, instrument (in)validity, and instrument strength, whereas

SN is directly related to the model …t, which can be expressed as SN=(SN + 1). We

prefer to avoid to use the ‘concentration parameter’as one of the relevant characteristics of this model in the present context, because this concept refers exclusively to the case where all instruments are valid.

From the above it follows that by varying the four parameters j x"j < 1; 0 z" < 1;

0 < _xz < 1 and 0 < 2

x=( 2x+ 1) < 1; we can examine the limiting and …nite sample

distributions of ^IV over the entire parameter space of this model. Note, however, that

not all admissible values of these parameters will be compatible. For example, when x"

is large and _z" is small, this cannot be compatible with _xz being very large. Moreover,

x has just an e¤ect on the scale of •IV; VIVN and ^IV ; so we may choose just one

…xed value for x and from these …ndings the results for any value of x can be obtained

simply by rescaling. In our calculations and simulations we will …x 2

x= 2" = 10; yielding

a population …t of the model of 10=11 = 0:909:

Actual values of •IV and of AMAE(^IV) can be calculated now for any set of

com-patible values of n; _x"; _z"; _xz and x:Next they can be compared with corresponding

Monte Carlo estimates obtained from ^IV realizations in simulated …nite samples, in

order to …nd out how accurate the …rst-order asymptotic approximations are. Before we present these summarizing characteristics, we will …rst examine the full density func-tions themselves. The Figures 1 through 4 contain 8 panels each. In all these panels four densities are presented, viz. for n = 50 (dark/black lines) and n = 200 (grey/red lines), both for the actual empirical distribution (solid lines) and for its asymptotic

approximation (dashed lines). The latter has been taken as ^IV

a

N( + •IV; n 1VIVN):

(14)

results we may expect to get quick insights into issues as the following. For which

com-binations of the …ve design parameter values is: (a) the actual density of ^IV close to

normal (symmetric, unimodal, etc.); (b) the actual median of ^IV close to IV; (c) the

actual tail behavior of ^_IV reasonably well represented by that of the N( IV; n 1VIVN)

distribution. Hence, we focus on the correspondence in shape, location and spread of the asymptotic and the empirical distributions. Since in this just identi…ed model the IV estimator does not have …nite moments, we do know that even when the instruments are valid, the asymptotic approximation will not capture the fat tail characteristic of the …nite sample distribution.

In Figure 1 _xz = 0:8; so the instrument is not ultra strong, but certainly not weak.

In Figure 2 xz = 0:3; in Figure 3 xz = 0:1 and in Figure 4 xz = 0:01: Hence, in the

latter …gure the instrument in certainly weak and we may expect that standard large sample asymptotics does not provide a very accurate approximation. All four …gures

contain eight panels for particular combinations of x" and z" values. The panels in the

left-hand columns have z" = 0, i.e. the instrument is valid and the standard asymptotic

result applies. In the right-hand columns _z" = 0:2; i.e. the instrument is invalid and

the IV estimator is inconsistent. Nevertheless, the asymptotic approximation presented

in Corollary 1 applies. The four rows of panels cover the cases x" = 0:3; x" = 0 (no

simultaneity; hence, OLS would be more appropriate than IV), _x"= 0:3 and _x" = 0:6:

From Figure 1 we see in the left-hand column that the standard asymptotic ap-proximation of IV when using a valid and strong instrument is very accurate when the

simultaneity is not very serious, but deteriorates when _x" increases, especially when

n is small. We note some skewness and one fat tail, but the asymptotic distribution

is never extremely bad for the cases examined. In the right-hand column we see that the new result of Corollary 1 is almost of the same quality but slightly less accurate. Especially for the smaller sample size we note some skewness and at least one fat tail in the empirical distribution, which are not captured by the …rst-order normal asymptotic approximation. In Figure 2, where the instrument is weaker, we …nd that when the instrument is valid the distribution is more skew, and more so for serious

simultane-ity. In the right-hand column this occurs for di¤erent x" values. In most cases there

is a substantial but not a dramatic di¤erence between the actual distribution and its approximation. The discrepancies are more pronounced in Figure 3, and a¤ect both

the standard ( z" = 0) and the new ( z" 6= 0) asymptotic approximations. From

Fig-ure 4 it is clear that the asymptotic approximations are useless (at the sample sizes examined) when the instrument is really weak. When the instrument is valid the actual distributions show some median bias, but they are much less dispersed than suggested

by nVN

IV: The magnitude of the bias in relation to the OLS bias and weakness of the

instrument has been analyzed by amy authors, see Sawa (1969) and (further refrences in) Hillier (2005). When the instrument is invalid and very weak then the …nite sample distribution of the inconsistent IV estimator is not centered at the pseudo-true-value. Surprisingly, it is actually much closer to the true value (also when the instrument is not so weak), whereas the distribution becomes bimodal when the instrument is very weak. Maddala and Jeong (1992), Woglom (2001), Hillier (2005) and Forchini (2005) show that bimodality of the consistent IV estimator occurs for much more severe simultaneity than

examined here, viz. for x" = 0:99;whereas Phillips (2005) shows that it is omnipresent

in the simple Keynesian model where simultaneity is always severe. Our …ndings suggest that using instruments that are both weak and invalid leads to bimodality, irrespective

(15)

of the degree of simultaneity.

From Figures 1 through 4 we conclude that, irrespective of whether the instruments are valid or not, one should avoid to use standard large sample asymptotics when in-struments are really weak. If one replaces the weak instrument with a strong one that is invalid (which is always possible by reverting to OLS), one may be able to produce

inference on by an inconsistent estimator, such as depicted in the right-hand column of

Figure 1, which has a distribution that is much more concentrated around the true value than that from the consistent estimator depicted in the left-hand column of Figure 4. The general validity of the …ndings from Figures 1 through 4 will be illustrated now by scanning the median absolute error over almost the full parameter space of this simple model.

Figure 5 provides an overview of the (in)accuracy of the asymptotic distribution of IV as an approximation to the actual distribution in …nite sample for n = 20 and for n = 100: These …gures (based on 10,000 replications) cover all compatible

posi-tive values of x" and xz; for z" = 0; 0:1; 0:3 and 0:6: This accuracy is expressed

as log[MAE(^_IV)= AMAE(^_IV)]: Hence, positive values (yellow, amber) indicate larger

absolute errors in …nite sample than indicated by the asymptotic approximation and negative values (blue) indicate that standard asymptotics is too pessimistic about the

absolute errors of ^_IV in …nite sample. Note that this log-ratio is invariant regarding the

value of SN = 2

x= 2": We …nd that the degree of simultaneity x" has little e¤ect, and

neither has the (in)validity of the instrument z": Just instrument weakness (roughly,

when j xzj < n 1=2) seriously deteriorates the accuracy of the large-n asymptotic

ap-proximation.

Figure 6 examines log[MAE(^OLS)= MAE(^IV)];which is also invariant with respect

to SN: It shows that in …nite sample the absolute estimation errors committed by OLS

are larger than those of IV only when both x" and xz are large. The area where IV

beats OLS gets smaller for larger z": We also note that OLS may beat IV by a much

larger margin (when the instrument is weak and the simultaneity not so serious) than IV will ever beat OLS (which happens when the instrument is strong, the simultaneity serious, and the instrument not strongly invalid).

3.2 A simple overidenti…ed model

The model of the above subsection can be extended such that we have two instruments

zi;1 and zi;2;i.e. l = 2 and = ( ₁; ₂)0: First, we examine by which minimal set of data

moments the limiting distribution is determined in this model. We assume again that

all variables in the regression have been scaled such that = 1 and 2

" = 1; whereas

the instruments Z have been transformed such that Z0Z = I (while still spanning

the original subspace). Such an orthonormal base for this subspace is nonunique, and

without loss of generality we may choose one in which only zi;1 is possibly correlated

with "i; so that 2 = 0: This implies that

(16)

where, of course, _z₁_"= ₁:Now the various entries in the formula of Theorem 1 specialize to X0X = 2x> 0; (25) ^ X0_X^ = 2^x= 2 x^x 2 x > 0; • GIV = 2 " 1 ^ X0_X^ X0Z 1 Z0_Z Z0"= ^ x" 2 ^ x = xz1 z1" x 2 x^x 2x = xz1 z1" 2 x^x x ; c1 = 2" "0_Z _Z1₀_Z _Z0_" = 2_z 1"; c2 = 2" "0Z Z10Z Z0X _X^10_X^ X0Z Z10Z Z0" = 2x"^ = 2 xz1 2 z1" 2 x^x ; c3 = "0X•GIV = x" ^x" x ^ x = x" ^x" x^x = x" xz1 z1" 2 x^x ;

from which c4 and c5 readily follow. From the above we conclude that the limiting

distribution of Theorem 1 is fully determined by (and varies with) the 5 data moments:

x; x^x; x"; z1" and xz1: However, in the special case z1" = 0 the minimal set of

parameters is just one dimensional, because _x^_x x su¢ ces. For the general case we …nd

V_GIVN = 2 " 2 x 2x^x (1 4_z₁_") + _z₁_" ₂1 x^x + 2_z₁_" 2_xz₁ ₄2 x^x + 2 4_z₁_" 4_xz₁ ₆3 x^x (26) + 2 " 4 x 4x^x 4 z1" 1 2 xz1 2 x^x 1 2 2_z₁_" 1 2 xz1 2 x^x ; where 1 = 3 z1" 2 x" xz1(1 + 2 z1" 2 4 z1") 2 xz1( z1" 5 3 z1"+ 2 5 z1"); 2 = 2 x"+ 4 z1" x" xz1 4 2 z1"[1 + 3 z1" x" xz1 2 x"+ 2 xz1(1 2 z1")]; 3 = 2 ( x" z1" xz1)(3 x" z1" xz1):

Note that this variance is invariant to sign changes of the correlations as long as the

sign of z1" x" xz1 is not a¤ected, or when either x" or z1" is zero. The sign of the

inconsistency •GIV is determined by the sign of z1" xz1: For given values of x^x and

z1" the magnitude of •GIV is a multiple of xz1, so it will be large when the invalid

instrument is relatively strong. For the special case _xz₁ = _x^_x;i.e. the second instrument

is orthogonal to x; the variance formula specializes to 2 " 2 x (1 2 z1")( xz1 z1" x") 2₊ 4 z1"(1 2 x") 4 xz1 (27) which, not surprisingly, corresponds to (13).

Next we examine whether, apart from n; the same number of parameters is required to obtain in all generality the …nite sample distribution of GIV by generating the appro-priate data processes. For that purpose the schemes (15) and (16) can be extended as

follows. Let now vi IID(0; I4): Again we take "i = vi;1 and, again restricting ourselves

to positive 1 for symmetrically distributed vi; we have

xi = vi;1+ 1vi;2

(17)

with 2

x = SN (this is all similar to the earlier example with l = 1): Now, however, we

have to compose the l = 2 instruments as

zi;1 = ₁vi;1+ 2vi;2+ 3vi;3+ 4vi;4;

zi;2 = 2vi;1+ 5vi;2+ 6vi;3+ 7vi;4:

These entail full generality, because they allow for both instruments any correlation with

the disturbance "i; any correlation with the regressor xi and any mutual correlation.

Since it is only the space spanned by these two instruments that matters for ^GIV, we

may replace zi;2 by a linear combination of zi;1 and zi;2 such that it no longer depends

on vi;1: This corresponds to taking 2 = 0 and re-interpreting 5; 6 and 7: Hence,

the general case of two possibly invalid instruments can be represented fully by that of one valid and one possibly invalid instrument, as we already argued above from the asymptotic perspective. We can perform a similar operation again, now with respect to

zi;1; such that we may impose 4 = 0: Next rescaling the instruments such that they

have unit variance leads to the generating schemes

zi;1 = 1vi;1+ 2vi;2+ (1 21 22)1=2 vi;3;

zi;2 = 5vi;2+ 6vi;3+ (1 25 26)1=2 vi;4:

(29)

Due to the symmetry of vi generality is maintained when we restrict ourselves to cases

where particular coe¢ cients are nonnegative. This extends to 2 and 5; because the

space spanned by the instruments does not change by multiplying all elements by 1,

yielding

0 2 1; 0 5 1: (30)

We also maintain full generality by imposing that the two instruments have zero

covari-ance, which implies 2 5+ 6(1 21 22)1=2 = 0; from which we …nd

6 = 2 5(1 21

2

2)

1=2

: (31)

So, for given values of x; x"and z1"= 1;we would be able to generate data according

to (28) and (29) if we also knew 2 and 5:

The asymptotic overall strength of the two instruments can be controlled by the

population R2 _{of the regression of x on Z = (z1}_{; z}

2); which is R2_xZ = x0Z 1 Z0Z Z0x 2 x : (32) Note that 2 x^x = ( x0Z _Z10_Z Z0x)2 2 x 2x^ = 2 ^ x 2 x = R2_xZ; (33)

and, since we imposed Z0_Z = I; we have

2 x^x = x0_Z _Z0_x 2 x = 2_xz₁ + (1 2_x")1=2 2₅; xz1 = x" z1"+ (1 2 x") 1=2 2:

From these we can express the (nonnegative) values of 5 and 2 as

5 = 2 x^x 2xz1 1 2 x" 1=2 (34)

(18)

and 2 = xz1 x" z1" (1 2 x")1=2 ; (35)

from which 6 follows directly by evaluating (31).

Hence, we can scan the …nite sample distribution of GIV for this class of model for any n over its entire parameter space by simulating data for all compatible values of

x, _x^_x; _x"; _xz₁ and _z₁_": Here again, these are found to be those data moments that

characterize the asymptotic distribution. They determine and 1 via (28) and 2; 5

and 6 via (35), (34) and (31), respectively. We may restrict ourselves to cases where

x^x > 0 (since the coe¢ cients of the simulation design are just determined by 2x^x).

Note that the coe¢ cients of the data generation process, notably 2; are una¤ected

(thus yielding the same distribution of ^GIV) if both z1" and xz1 are changed in sign.

Therefore, we will only examine cases with xz1 0: However, we shall also examine

only nonnegative values of _z₁_"; because changing the signs of both _z₁_" and _x" yields

the mirror image of the distribution of ^GIV when the distribution of " is symmetric. In

line with the just identi…ed model the distribution of ^_GIV for _z₁_" 0is equivalent

with the distribution of (^_GIV )for z1" 0and x"; because: If we change the

signs of z1"; x" and all "i then the variables xi; zi;1; zi;2 and thus ^xi remain the same,

whereas ^_GIV =Px^i"i=Px^2i changes sign.

In the special case that no instrument is invalid we have 1 = z1" = 0 in (29) and

thus full generality is maintained by making zi;2 independent of vi;3; giving 6 = 0; and

zero covariance of the two instruments implies now 2 5 = 0: Hence, we may choose

5 = 0; resulting in the simpli…ed generating schemes

zi;1 = 2vi;2+ (1 22)1=2vi;3;

zi;2 = vi;4:

(36)

These imply xz1 = x^x and 2 = x^x(1

2

x") 1=2 instead of (35). Hence, when = 0

the …nite sample distribution is determined by just 3 parameters (viz. x; _x" and x^x)

instead of 5 (apart from n), whereas the limiting distribution just depends on 2

^

x= 2x^x 2x:

In all calculations we …xed again 2

x= 2" = 10(which here too has only a multiplicative

e¤ect, i.e. just a¤ects the scale of the densities), as before we chose values x" =

f 0:3; 0:0; 0:3; 0:6g; z1" = 1 =f0:0; 0:2g and x^x =f0:8; 0:3; 0:1; 0:01g; whereas xz1 =

f x^x; x^x=2; x^x=8g: The latter values are associated with decreasing relative strongness

of z1 (and, complementary, a valid instrument z2 that is either uncorrelated with x;

contributes 50% to the joint strength of the instruments, or is relatively strong). Figures

7 and 8 contain some illustrative densities for xz1 < x^x;again for n = 50 and n = 200:

Since l k = 1; GIV does have a …nite …rst moment now. To save space we have put

more cases into one …gure. Moreover, we have omitted the cases where _xz₁ = _x^_x (and

xz2 = 0):Although we already established that these yield similar asymptotic results as

the k = l = 1 case, from the simulations we found that in this situation the …nite sample densities do di¤er slightly from the "no …nite moments" case, the more so for a weaker

instrument z1, especially when xz1 = x^x = 0:01: When both instruments are valid and

x^x = 0:01 the l = 2 case produces estimators which are slightly more e¢ cient than

the corresponding l = 1 estimators. This seems at odds with the …ndings in Donald and Newey (2001) which suggest that e¢ ciency bene…ts when weak instruments are discarded. Note, however, that their analysis assumes that the number of instruments

(19)

grows at a smaller rate than the sample size, whereas in our experiments the number

of instruments is …xed. When xz1 = 0:01; xz2 = 0 and z1" = 0:2 we …nd that the

bimodality of the GIV estimator is less pronounced than for the IV estimator.

Figure 7 presents densities for _x^_x = 0:8 and 0:3; and Figure 8 for the weaker

in-struments. In Figure 7 the asymptotic approximations are mostly reasonably accurate,

but not in Figure 8, especially when x^x = 0:01: In the latter case we note again that

the asymptotic approximations are much too pessimistic. The actual (median) bias of

the GIV estimator is much less dramatic as the inconsistency •GIV suggests, and even

though both instruments are very weak and one of them is also invalid, the actual density

of ^_GIV has most of its probability mass remarkably close to the true value 1. Although

Forchini (2005) suspects bimodality in the overidenti…ed model when the instruments

are valid but weak, we do not …nd it at x^x = 0:01:

Finally, we look again at median absolute error results. Figure 9 gives a more global impression of the accuracy of the asymptotic approximation in this model. We present

results for n = 20 only and establish that the overall instrument strength x^x is the

ma-jor determining factor, although the measures for instrument invalidity and simultaneity have an e¤ect too. Figure 10 makes comparisons with OLS for n = 100: We note that especially in the presence of invalid instruments there is much scope for OLS to produce more e¢ cient inference than GIV. Anyhow, our simulation results do not generally sup-port the conclusion by Hahn and Hausman (2003) that 2SLS is the preferred estimator

when n 100 and 2_x^_x 0:1: They arrive at this conclusion by comparing second-order

asymptotic approximations to MSE.

4 Conclusions

In this paper we obtained an explicit formula for the asymptotic variance of the general-ized instrumental variable estimator when some of the employed instruments are invalid. We showed that the limiting distribution of such an inconsistent estimator is normal, and is centered at the pseuso-true-value (true coe¢ cient plus inconsistency), whereas its asymptotic variance includes a number of terms and factors additional to the standard result. It can only be expressed when one is willing to make assumptions on the …rst four moments of the disturbances. To obtain our results we assumed covariance stationarity of all variables, i.e. the dependent, the explanatory and the instrumental variables. In the simple illustrative models that we used, the data observations are in fact IID, as is often assumed in cross-section applications. Note, however, that our theorems also hold for time-series applications, where independence of the sample observations is un-realistic. They are also directly applicable in case non-stationary series are involved, provided the model is formulated in error correction form and the long-run multipliers (the coe¢ cients of the cointegrating vector) have been imposed, so that the model and the instruments can all be represented by transformations of the original data that are integrated to order zero.

We examined the accuracy of our analytic large sample results in small samples by simulating a simple just identi…ed and a simple overidenti…ed model and establishing the actual behavior of instrumental variable estimators. Through a reparametrization of the structural and reduced form coe¢ cients into parameters that directly express the degree of simultaneity, the degree of (in)validity of the instrument(s), the strength

(20)

of the instrument(s) and the signal-to-noise ratio of the model, and by condensing the numerical results into graphic displays, it proved possible to produce a rather complete taxonomy of the behavior of the examined instrumental variables estimators over their full parameter space.

There is a quickly expanding literature on the shortcomings of standard large sample asymptotic approximations to the distribution of IV estimators when the sample size is small or moderate and some of the instruments are weak but valid, and how alter-native and better approximations could be obtained. The present study shows that it is possible to obtain an explicit large sample asymptotic approximation to the distri-bution of IV estimators when some of the instruments are invalid. Not surprisingly, however, that approximation is found to be vulnerable too, when instruments are weak. One option now would be to replace it by an alternative approximation that can cope with weakness of instruments. However, our illustrations suggest that it seems more worthwhile to abandon the employment of weak instruments altogether and just stick to strong instruments, even if they are invalid. For that situation we at least seem to have obtained here a reasonably accurate approximation to its …nite sample distribution, whereas at the same time this …nite sample distribution is such that it may yield much more accurate inference than that obtained on the basis of weak instruments.

References

Andrews, D.W.K., Stock, J.H., 2005. Inference with weak instruments. Invited paper for the 2005 World Congress of the Econometric Society, London. Cowles Foundation Discussion Paper No. 1530.

Bound, J., Jaeger, D.A., Baker, R.M., 1995. Problems with instrumental variable es-timation when the correlation between the instruments and the endogenous explanatory variable is weak. Journal of the American Statistical Association 90, 443-450.

Donald, S.G., Newey, W.K., 2001. Choosing the number of instruments. Economet-rica 69, 1161-1191.

Forchini, G., 2006. On the bimodality of the exact distribution of the TSLS estimator. Forthcoming in Econometric Theory.

Hahn, J., Hausman, J.A., 2003. IV estimation with valid and invalid instruments: application to the returns of education. mimeo. To appear in Les Annales d’Economie et de Statistique.

Hahn, J., Inoue, A., 2002. A Monte Carlo comparison of various asymptotic approx-imations to the distribution of instrumental variables estimators. Econometric Reviews 21, 309-336.

Hale, C., Mariano, R.S., Ramage, J.G., 1980. Finite sample analysis of misspeci…ca-tion in simultaneous equamisspeci…ca-tion models. Journal of the American Statistical Associamisspeci…ca-tion 75, 418-427.

Hall, A.R., Inoue, A., 2003. The large sample behaviour of the generalized method of moments estimator in misspeci…ed models. Journal of Econometrics 114, 361-394.

Hillier, G., 2005. Yet more on the exact properties of IV estimators. To appear in Econometric Theory.

Joseph, A.S., Kiviet, J.F., 2005. Viewing the relative e¢ ciency of IV estimators in models with lagged and instantaneous feedbacks. Journal of Computational Statistics

(21)

and Data Analysis 49, 417-444.

Kiviet, J.F., Niemczyk, J., 2005. The asymptotic and …nite sample distributions of OLS and simple IV in simultaneous equations. UvA-Econometrics discussion paper 2005/01.

Maasumi, E., Phillips, P.C.B., 1982. On the behavior of inconsistent instrumental variable estimators. Journal of Econometrics 19, 183-201.

Maddala, G.S., Jeong, J., 1992. On the exact small sample distribution of the in-strumental variable estimator. Econometrica 60, 181-183.

Phillips, P.C.B., 1980. The exact distribution of instrumental variable estimators in an equation containing n + 1 endogenous variables. Econometrica 48, 861-878.

Phillips, P.C.B., 2005. A remark on bimodality and weak instrumentation in struc-tural equation estimation. Cowles Foundation Discussion Paper No. 1540.

Rothenberg, T.J., 1972. The asymptotic distribution of the least squares estimator in the errors in variables model. Unpublished mimeo.

Sawa, T., 1969. The exact sampling distribution of ordinary least squares and two-stage least squares estimators. Journal of the American Statistical Association 64, 923-937.

Woglom, G., 2001. More results on the exact small sample properties of the instru-mental variable estimator. Econometrica 69, 1381-1389.

A

Proof of Theorem 1

Because the estimator ^_GIV tends for an increasing sample size not to ; but to _GIV; in order to

establish its limiting distribution we should not focus on pn(^_GIV ); but choose a center of the

distribution that tends to _GIV too, see Rothenberg (1972). For the sake of simplicity we shall center

at _GIV itself. Note that

p

n(^GIV GIV) =

p

n[( ^X0X)^ 1X^0" 2" _X^10_X^ X0_Z _Z01_Z ]: (37)

To obtain the limiting distribution we shall rewrite the right-hand side of (37) such that we can invoke the Lemma given at the end of Section 2. Below we …rst show that (37) can be rewritten as

p n(^_GIV _GIV) = (1 nX^ 0_X)^ 1 1 p n[W 0_{" + !("}0_" _n 2 ")] + op(1); (38)

for appropriate n k matrix W; with E(W0_{") = 0; and nonrandom k} _{1 vector !: Next, invoking also}

a theorem often attributed to Cramér, the lemma yields p n(^_GIV _GIV_{) ! N 0;} 2_" _^1 X0_X^ plim n!1 1 nW 0_{W + 2} 2 "!!0 _X^10_X^ ; (39)

upon assuming ₃= 0 and ₄= 3.

We …rst set out to rewrite (37) in the form (38). Using ^ (Z0Z) 1Z0X we get

p

n(^_GIV _GIV) = pn( ^X0X)^ 1_{f ^}0[Z0" E(Z0")] + ^0E(Z0") 2_"X^0X^ _^1

X0_X^ X0_Z _Z01_Z g(40) = (1 nX^ 0_X)^ 1 1 p_nf ^0[Z0" E(Z0")] + n 2_"[ ^0 1 nX^ 0_X^ 1 ^ X0X^ X0Z 1 Z0_Z] g:

(22)

For the second expression between square brackets in the …nal line of (40) we …nd ^0 1 nX^ 0_X^ 1 ^ X0_X^ X0_Z _Z01_Z (41) = ^0( Z0_Z 1 nZ 0_Z) 1 Z0_Z+ ( 1 nX 0_Z 1 n ^ X0X^ _^1 X0_X^ X0_Z) _Z10_Z = ^0( Z0_Z 1 nZ 0_Z) 1 Z0_Z+ [( 1 nX 0_Z X0_Z) + ( _X_^0X^ 1 nX^ 0_X)^ 1 ^ X0_X^ X0_Z)] _Z10_Z;

and this contains a factor which can be rewritten as ^ X0X^ 1 nX^ 0_X^ ₍₄₂₎ = X0_Z _Z10_Z Z0_X ^0 1 nZ 0_X = ( X0_Z _Z10_Z ^0) Z0_X ^0( 1 nZ 0_X Z0_X) = _f( X0_Z 1 nX 0_Z) 1 Z0_Z 1 nX 0_Z[(1 nZ 0_Z) 1 1 Z0_Z]g Z0_X ^0( 1 nZ 0_X Z0_X) = _f( X0_Z 1 nX 0_Z) 1 Z0_Z ^0[( Z0_Z 1 nZ 0_Z) 1 Z0_Z]g Z0_X ^0( 1 nZ 0_X Z0_X):

Now substituting the decompositions obtained in (41) and (42) into the expression within curly brackets in the …nal line of (40) we obtain

^0_[Z0_" _E(Z0_{")] + n} 2 "[ ^0 1 nX^ 0_X^ 1 ^ X0X^ X0Z 1 Z0_Z] (43) = ^0[Z0" E(Z0")] + n 2_"^0( Z0Z 1 nZ 0_Z) 1 Z0_Z +n 2_"(1 nX 0_Z X0Z) _Z10_Z + n 2"( _X^0_X^ 1 nX^ 0_X)^ 1 ^ X0X^ X0Z 1 Z0_Z = ^0[Z0" E(Z0")] n 2_"^0(1 nZ 0_Z Z0_Z) _Z01_Z + n 2"( 1 nX 0_Z X0_Z) _Z10_Z +n 2_" ( X0_Z 1 nX 0_Z) ^0₍ Z0_Z 1 nZ 0_Z) 1 Z0_Z Z0_X ^0( 1 nZ 0_X Z0_X) •_GIV;

where we substituted _Z10_Z Z0_X = plim ^ and •_GIV 2_" _^1

X0X^ X0Z 1

Z0_Z : Now exploiting item

(vi) of Framework A, we can employ

Z0" E(Z0") = Z0" + ("0" n 2_") ; 1 nZ 0_Z Z0_Z = 1 nZ 0_Z 1 nE(Z 0_{Z j Z) +}1 nE(Z 0_{Z j Z)} Z0_Z = 1 nZ 0_" 0₊ 1 n " 0_{Z +} 1 n(" 0_" _n 2 ") 0+ op(n 1=2); 1 nX 0_Z X0_Z = 1 nX 0_Z 1 nE(X 0_{Z j X; Z) +} 1 nE(X 0_{Z j X; Z)} X0_Z = 1 nX 0_" 0₊1 n " 0_{Z +} 1 n(" 0_" _n 2 ") 0+ op(n 1=2); so that the …nal expression given for (43) can be written as

^0_Z0_{" + ("}0_" _n 2

") ^0 2"^0Z0" 0 Z10_Z 2"^0 "0Z Z01_Z 2"^0("0" n 2") 0 Z01_Z

+ 2_"X0" 0 _Z01_Z + 2" "0Z Z10_Z + 2"("0" n 2") 0 Z10_Z + ^0Z0" 0 •GIV + ^0 "0Z •GIV

+ ^0("0" n 2_") 0 •_GIV X0" 0 •_GIV "0Z •_GIV ("0" n 2_") 0 •_GIV ^0 "0X •_GIV

^0_Z0_" 0•

(23)

This can be simpli…ed further by using c1 2" 0 Z10_Z ; c2 0 •GIV = 2" 0 Z10_Z Z0_X _^1 X0_X^ X0_Z _Z10_Z ; c3 0•GIV; giving ^0_Z0_{" + ("}0_" _n 2 ") ^0 c1^0Z0" 2"^0 0 Z01_ZZ0" c1("0" n 2") ^0 +c1X0" + 2" 0 1 Z0_ZZ0" + c1("0" n 2") + c2^0Z0" + ^0 •0GIV 0Z0" + c2("0" n 2") ^0 c2X0" • 0 GIV 0Z0" c2("0" n 2") ^0 • 0 GIVX0" c3^0Z0" c3("0" n 2") ^0 + op(n1=2) = [c4Ik ^0 •0GIV]X0" + [c5^0+ ( ^0 )( 2" 0 • 0 GIV X0_Z) _Z10_Z]Z0" (44) +(c4 + c5^0 )("0" n 2") + op(n1=2); where c4 c1 c2 and c5 1 c3 c4:

Note that (44) is equal to the factor in curly brackets in the …nal line of (40), and we want to

derive its limiting distribution after scaling by the factor 1=pn; so we may neglect the remainder term.

Cramér’s theorem implies that in deriving this limiting distribution we may replace ^ by its probability

limit : Hence, de…ning the k k matrix 1; the k l matrix 2and the k 1 vector !; such that

1 c4Ik 0 • 0 GIV; (45) 2 c5 0+ ( 0 )( 2" 0 • 0 GIV X0Z) _Z10_Z; ! c4 + c5 0 ;

we can invoke the Lemma now with

W0= 1X0+ 2Z0: (46)

For the case ₃= 0 and ₄= 3 we then obtain the limiting distribution

1 p n 1X 0_{" +} 2Z0" + !("0" n 2") ! N(0; 2"V0); where V0= 1 X0_X ₁0 + 2 Z0_Z 0₂+ 2 _"2!!0+ 1 X0_Z ₂0 + 2 Z0_X 0₁: (47)

In evaluating V0we can make use of

Z0_Z plim n 1Z0Z = Z0_Z 2_" 0

X0_X plim n 1X0X = X0_X 2_" 0

Z0_X plim n 1Z0X = Z0_X 2_" 0

X0_Z plim n 1X0Z = 0_Z0_X

and …nd that V0 can be expressed as

[c4Ik 0 •0GIV]( X0_X 2_" 0)[c₄I_k •_GIV 0 ] +[c5 0+ ( 0 )( 2" 0 • 0 GIV X0_Z) _Z10_Z]( Z0_Z 2_" 0)[c₅ + _Z10_Z( 2" Z0_X•_GIV)( 0 0 )] +2 2_"[c4 + c5 0 ][c4 0+ c5 0 ] +[c4Ik 0 •0GIV]( X0_Z 2_" 0)[c₅ + _Z10_Z( 2_" Z0_X•_GIV)( 0 0 )] +[c5 0+ ( 0 )( 2" 0 • 0 GIV X0Z) _Z10_Z]( Z0X 2" 0)[c4Ik •GIV 0 ]:

Next we examine these 5 terms of V0 one by one. The …rst one is

c2₄( X0_X 2_" 0) c₄( _X0_X 2_" 0)•_GIV 0

c4 0 •

0

GIV( X0_X 2_" 0) + 0 •0_GIV( _X0_X 2_" 0)•_GIV 0

= c2₄ X0_X 2_"c2₄ 0 c₄ _X0_X•_GIV 0 + 2_"c₃c₄ 0