The westernboot strap : comparison & assessment.

(1)

Master Thesis

Econometrics

The Westernboot Strap:

Comparison & Assessment

Student:

Xiao Long Lao Student number: 5742536

Supervisor: Dr. Simon Broda Second marker: Dr. Noud van Giersbergen

(2)

(3)

Preface

Simultaneous equations model estimation with weak instruments is a current hot item in the field of econometrics. I’ve learned more about econometrics than I ever could’ve imagined writing this thesis. Therefore I would like to thank my friend and teacher Simon Broda for giving me the opportunity to work on this great project and especially for his patience and help. I would also like to thank my parents Shi Chiao and Lily, my girlfriend Merel and my friends for their unconditional support while writing my final dissertation.

Xiao Long Lao

(4)

Introduction

Density evaluation of a ratio of random variables plays a vital role in modern day econometrics. An example of such a ratio is the estimation error of the Two Stage Least Squares (2SLS) estimator. Using former methods to evaluate the density of ratios impose various restrictions on them, for example the denominator cannot be-come negative. Broda & Kan (2013) derive inversion formulae that expresses the density of a ratio in terms of the joint characteristic function of the numerator and denominator. The authors show that these formulas remain valid even if the denom-inator is defined on the entire real line under certain conditions. In addition, the authors provide a saddlepoint approximation for the probability density function as well as the cumulative density function, based on the cumulant generating function. The authors dub the method the Westernboot strap, when applied to bootstrapping the 2SLS estimator.

This paper breaks down the Westernboot strap mathematically and provides an accessible introduction to Broda & Kan’s method for a post-graduate student in any quantitative field. All theorems and lemmas of Broda & Kan (2013) are proved and derived, including the mathematical techniques on which the Westernboot strap is based on. The saddlepoint approximation is based on the Laplace approximation which links the cumulant generating function with the density of a random variable. Former methods to approximate the density of a ratio of random variables involve solving double integrals, the approach of Broda & Kan provides a smart way to compute the density without having to solve any integrals. This is based on a so called saddlepoint approximation. When no derivation is given in this thesis, the solution is provided by the authors themselves and the reader is referred to their paper.

In order to test, assess and compare the performance of the Westernboot strap with other techniques, this thesis replicates Davidson & Mackinnon’s (2010) Wild

(7)

Restricted Efficient bootstrap. Both techniques are applied to the simultaneous equations model with one endogenous regressor and one instrument. The focus of this thesis is estimation of the simultaneous equations model with weak instru-ments, severe endogeneity, and heteroskedastic disturbances. The presence of weak instruments in a simultaneous equations setting has been studied intensively and Poskitt & Skeels (2012) provide an extensive survey to break down the ever growing literature on the topic. The size and power of various statistics under the Wild Restricted Efficient bootstrap scheme and the Westernboot strap are investigated. Davidson & Mackinnon (2010) show that the Wild Restricted Efficient bootstrap performs best with the Anderson-Rubin statistic whenever heteroskedasticity and endogeneity are present, most probably because the test is asymptotically valid un-der homoskedasticity and weak instrument asymptotics. Furthermore, confidence intervals are simulated for the Anderson-Rubin statistic (Anderson & Rubin, 1949) and the Westernboot strap. The coverage, number of intervals converging to in-finity, and average size of the confidence intervals are computed and compared. It is shown that the Westernboot strap of Broda & Kan provides more reliable and accurate confidence intervals than the Anderson-Rubin statistic under the Wild Re-stricted Efficient bootstrap scheme of Davidson & Mackinnon.

Chapter 2 provides a literature review of the mathematical techniques which are essential in order to understand the derivation of the Westernboot strap. This includes the derivation of the Laplace approximation and the saddlepoint approxi-mation of Daniels (1954). This chapter also includes a concise summary of bootstrap theory. In addition the Wild Restricted Efficient bootstrap scheme is explained, in-cluding the Wild and Restricted Efficient bootstrap methods on which it is based. Chapter 3 is dedicated to the paper of Broda & Kan (2013). It shows derivation of the inversion formulae in their paper and the derivation of the saddlepoint approx-imation the authors propose. Chapter 4 shows the simultaneous equations model that is used for the simulation experiment. In addition it gives the data generat-ing process. Chapter 5 presents the results of the simulation experiment, which are the rejection frequencies under the null and alternative hypothesis. The chap-ter ends with properties of the confidence inchap-tervals of the Weschap-ternboot strap and the Anderson-Rubin statistic under the Wild Restricted Efficient bootstrap scheme. Chapter 6 draws conclusions from the results and proposes possible further research topics.

(8)

Theoretical Background

The paper of Broda & Kan (2013) uses a wide range of mathematical techniques as foundation for the Westernboot strap, the most important one being the saddle-point approximation. Davidson & Mackinnon (2010) use multiple bootstrap resam-pling schemes in a large simulation experiment to verify which bootstrap scheme with which test statistic performs best whenever endogeneity and heteroskedasticity are present. This chapter reviews and derives the saddlepoint approximation from scratch. Thereafter, the three main bootstrapping techniques used in Davidson & Mackinnon (2010) are presented, beginning with a concise basic introduction. This includes the basic bootstrap procedure, asymptotic refinement, hypothesis testing, Wu’s Wild bootstrap (1987), Davidson & Mackinnon’s Restricted Efficient residual bootstrap (2008) and their Wild Restricted Efficient residual bootstrap technique (2010).

2.1 Saddlepoint Approximation

Since the introduction of the saddlepoint approximation by Daniels in 1954, various extensions of the technique have been developed, one of them being the approx-imation of Broda & Kan (2013). The authors use the method to approximate a double integral of a bivariate probability density, see Chapter 3 for an elaborate explanation. The reader should have a basic understanding of the saddlepoint ap-proximation. This is explained next.

The seminal paper of Daniels (1954) introduced the first saddlepoint approxi-mation, which is a formula for approximating a density function based on its corre-sponding cumulant generating function (CGF). The power of this technique lies in the fact that it removes the complication of computing the inversion integral, which can be computer intensive. In order to appreciate the full beauty of the saddlepoint

(9)

approximation, I start by defining the moment generating function (MGF).

For the values s where the integral converges, the moment generating function of a random variable X with density f(x) is

M (s) = EesX = Z +∞

−∞

esxf (x) dx,

Now assume that M(s) converges whenever s ∈ (a, b). The cumulant generating function is defined as

K(s) = ln M (s), s ∈ (a, b).

One of the important characteristics of the MGF and the CGF is the one-to-one link they both have with their probability density function. When the integral of a probability density function or cumulative density function is hard to compute, MGFs or CGFs can prove to be useful tools in order to approximate the associated integral, this is what Daniels (1954) does in his paper. Daniels’ approximation is solely based on the use of CGFs. The saddlepoint approximation is based on the Laplace approximation, which is explained next.

2.1.1 Derivation Laplace Approximation

Assume that g(x) is continuous and differentiable over (a, b) with a minimum at ˆx. The Laplace approximation, as in Butler (2007), equals

Z b a e−g(x)dx ≈ √ 2πe−g(ˆx) pg00_(ˆ_x) . (2.1)

The basic idea behind the Laplace approximation is that the integral in (2.1) is pri-marily defined by its curvature at its extremum ˆx. Hence the sharper the curvature the more accurate the approximation. The contributions to the integral are greatest when x is close to ˆx since the integral is taken over the exponential of g(·), this amplifies the curvature.

Derivation Laplace approximation

To derive (2.1), first multiply g(x) in the exponential function with a large number n, so that the extremum of the function is even more emphasized. Assume that the new integrand e−ng(x) is sufficiently concentrated on (a, b) such that the integral is not significantly affected when the range of integration is changed to (−∞, ∞). Following Butler (2007), applying a Taylor expansion of g(x) around ˆx gives

g(x) = g(ˆx) + g0(ˆx)(x − ˆx) +1 2g 00 (ˆx)(x − ˆx)2+ ∞ X i=3 gi_(ˆ_{x)(x − ˆ}_x)i i! , (2.2)

(10)

where the second term on the right hand side equals zero because of the minimum reached at ˆx. Substituting (2.2) in the integral on the left hand side of (2.1) and changing the range of integration such that a = −∞ and b = ∞ gives

Z +∞ −∞ e−ng(x)dx = e−ng(ˆx) Z +∞ −∞ e−12ng 00_(ˆ_x)(x−ˆ_x)2 e−nP∞i=3 gi(ˆx)(x−ˆx)i i! dx. (2.3)

Now let z =png00_(ˆ_{x)(x − ˆ}_{x) so that}

1

png00_(ˆ_x)dz = dx. (2.4)

This change of variables results in Z +∞ −∞ e−ng(x)dx = e −ng(ˆx) png00_(ˆ_x) Z +∞ −∞ e−12z 2 e−P∞i=3n − 1 2(i−2) zi i!ˆkidz, (2.5) where ˆ ki= gi(ˆx) g00(ˆx)i/2, i = 3, 4, ...

Note that the right hand side of (2.5) shows an expectation of the normal distri-bution. Now recall that the exponential function can be re-expressed as an infinite Maclaurin series; ex = P∞ n=0 x n n!. Let y = exp(− P∞ i=3n −1 2(i−2) z i i!kˆi), this can be rewritten as y = n−1/2ˆk3z 3 3! + n −1ˆk4z4 4! + n −3/2kˆ5z5 5! + n −2ˆk6z6 6! + ... (2.6) The right hand side of (2.5) can be rewritten as

s 2π ng00_(ˆ_x)e −ng(ˆx) Z +∞ −∞ φ(z) 1 − y + 1 2y 2_{+ O(y}3₎ dz, (2.7) where φ(·) is the standard normal distribution. This integral will be approximated up to the order O(n−2). Looking at the y’s of (2.7), note that the term i = 3, 5, ... contain odd powers of z, multiplying this with the standard normal density of z gives zero (odd moment). In addition, note that the order of the fourth term on the right hand side of (2.6) is of order O(n−2).

Next consider the square of y. Squaring the first term on the righthand side of (2.6) results in an even power of z. Squaring the other terms lead to terms which have are of order O(n−2) or smaller. This first squared term is

n−1/2 ˆ k3z3 3! !2 = n−1 ˆ_k 3z3 3! !2 .

(11)

Now integrating the three terms, following Butler (2007), leads to the following result s 2π ng00_(ˆ_x)e −ng(ˆx)  1 +1 n  −ˆk4z 4 4! E[Z 4_{] +}1 2 ˆ_k₃_z3 3! !2 E[Z6]    ,

where Z ∼ N (0, 1). Retaining only the first term leads to the Laplace approximation Z b a e−ng(x)dx = √ 2πe−ng(ˆx) png00_(ˆ_x) 1 + O(n −1₎ ≈ √ 2πe−ng(ˆx) png00_(ˆ_x) . (2.8)

2.1.2 Derivation Saddlepoint Approximation

Armed with the knowledge of the derivation of the Laplace approximation, I continue to derive the saddlepoint approximation. First let Xi ∼ IID(·) with Cumulant

Generating Function K(s). Now let ¯X = 1_nPn

i=1Xi and let f (¯x) be its probability

density function. The MGF of the random variable n ¯X equals expnK(s), hence the CGF of n ¯X equals nK(s). Recall that s is dependent on its random variable X, s = s(x). Similarly as in Butler (2007), the moment generating function corresponding to the density f (¯x) can now be defined as

enK(s) = Z +∞ −∞ f (¯x) esn¯xd¯x = Z +∞ −∞ esn¯x+ln f (¯x)d¯x = Z +∞ −∞ e−g(s,¯x)d¯x, where g(s, ¯x) = −sn¯x − ln f (¯x). Using the Laplace approximation as in (2.8), the following holds enK(s)≈ s 2π g00_{(s, ¯}_x s) esn¯x+ln f (¯xs)_. _(2.9) ⇐⇒ nK(s) ≈ 1 2ln 2π g00(s, ¯xs) + sn¯xs+ ln f (¯xs) ⇐⇒ ln f (¯xs) ≈ n (K(s) − s¯xs) − 1 2ln 2π g00_{(s, ¯}_x s) =⇒ ∂ ln f (¯xs) ∂ ¯xs ≈ n K0(s) − ¯xs ∂s ∂ ¯xs − ns, (2.10) where K0(s) is the first derivative of K(s) with respect to s and ¯xs denotes the

extremum of the function g(·). Taking the logarithm of the first equations gives the second one, rearranging the terms results in the third equation. The last implication comes after taking the derivative with respect to ¯xs. Recall from Subsection 2.1.1

that the Laplace approximation is centred around the extremum ¯xs, implying that

g0(s, ¯xs) = ∂g(s, ¯xs) ∂ ¯xs = − ∂ ln f (¯xs) ∂ ¯xs + ns = 0 (2.11)

(12)

for fixed s. This implies that equation (2.10) can be simplified as n K0(s) − ¯xs

∂s ∂ ¯xs

= 0 (2.12) whenever (2.11) holds. The derivative of s can be obtained from (2.11) by taking the derivative with respect to ¯xs which gives

∂s ∂ ¯xs = −1 n ∂2_{f (¯}_x s) (∂ ¯xs)2 6= 0 (2.13) Since ¯xs defines an extremum, the second derivative of this point is non-zero. This

implies that in order to let (2.12) hold, it must be that K0(s) = ¯xs.

This is known as the saddlepoint. Now that the saddlepoint is derived, the last term to be determined is g00(s, ¯xs). Taking the derivative of (2.11) with respect to ¯xs and

taking s as fixed gives

g00(s, ¯xs) = ∂2g(s, ¯xs) (∂ ¯xs)2 = −∂ 2_{f (¯}_x s) (∂ ¯xs)2 = n ∂s ∂ ¯xs (2.14) = n ∂ ¯xs ∂s −1 = n K00(s)−1

where the last equality holds because of (2.14) implied by (2.13). Plugging this in (2.9), rearranging and some basic algebra gives the saddlepoint approximation

f (¯xs) ≈

r _n 2πK00_(s)e

nK(s)−sn¯xs_.

with an error of the order O(n−1). It is easily shown that whenever n → ∞ the error gets smaller implying a better approximation. Whenever the integral of a density function is tedious or impossible to compute and the cumulant generating function is retrievable, the saddlepoint approximation proves to be a good alternative.

2.2 Bootstrap Theory

Since Efron (1979) proposed the bootstrap method in his seminal paper ”Bootstrap Methods: Another Look at the Jackknife”, many applications of the technique have been developed. The bootstrap is a useful resampling technique which can consis-tently estimate sampling distributions and provide consistent empirical estimates whenever the estimate is hard to obtain analytically, for example when the analyti-cal formula of the variance of an estimator is hard to obtain or compute. In addition, bootstrapped statistics can provide asymptotic refinement, explained below.

(13)

Efron adopted the following notation: let F be a unspecified probability distri-bution and

Xi ∼ FIID.

Where X = (X1, X2, ..., Xn) are the random variables and x = (x1, x2, ..., xn) its

observed realization. The problem Efron states is as follows: ”Given a specified random variable R(X,F), possibly depending on both X and the unknown distribution F , estimate the sampling distribution of R on the basis of the observed data x”.

The bootstrap technique opened many doors in the world of econometrics. Ex-tensive research has been done in the field, which has lead to a comprehensive availability of bootstrap theory. This subsection discusses the essentials of boot-strap techniques and procedures needed in order to understand the Wild Restricted Efficient bootstrap of Davidson & Mackinnon (2010). For additional information the reader is referred to Cameron & Trivedi (2005).

Davidson & Mackinnon (2010) propose a bootstrap procedure for a linear re-gression model estimated by instrumental variables where instruments are possibly weak and the data suffer from heteroskedasticity. The Wild Restricted Efficient residual (further abbreviated as WRE) bootstrap is based on an earlier bootstrap method proposed by Davidson & Mackinnon (2008) called the Restricted Efficient residual (further abbreviated as RE) bootstrap. The WRE bootstrap is basically the RE-bootstrap extended by the Wild-bootstrap, and therefore retains heteroskedas-ticity of the original sample in the bootstrapped samples. The model Davidson & Mackinnon (2010) estimate is presented next.

This subsection starts with an example of an empirical bootstrap procedure, this is followed by a short explanation of what advantage bootstrapping pivotal statis-tics can provide, called asymptotic refinement. Furthermore a short summary of hypothesis testing based on the T-statistic which is applied in Davidson & Mack-innon (2010) is presented. Consequently, in order to understand the origin of the WRE-bootstrap, the predecessors of Davidson & Mackinnon’s bootstrap scheme are explained. These include the Unrestricted Residual (UR) bootstrap, the Restricted Residual (RR) bootstrap, the Restricted Efficient (RE) residual bootstrap and the Wild bootstrap procedure of Wu (1986). Finally the the WRE-bootstrap scheme is presented.

2.2.1 Bootstrapping

This paragraph starts with notational information, which follows Cameron & Trivedi (2005) closely. Let the data be defined as w = (w1, ..., wn), where wi = (yi, xi) ∼

(14)

IID and ui the error term. Let ˆβ (which can be a vector, but let it be a scalar

for simplicity) be the associated√n consistent smooth estimator and let β0 be the

estimator value under the null-hypotesis. The bootstrap is usually applied for, but not limited to, the following statistics: the estimator ˆβ, the standard error s_βˆ, the

t-statistic t = ( ˆβ − β0)/s_βˆ, critical values and confidence intervals. Next follows an

example where the bootstrap proves to be a useful tool.

Suppose one wishes to estimate the variance of ˆβ from an unknown DGP y = g(w, β), but that its analytical formula is intractable. The bootstrap is, usually, able to provide a consistent estimate of V [ ˆβ] given that wi ∼ IID. The bootstrap

procedure is as follows:

1. From the original sample, randomly draw n times wiwith replacement. Repeat

this B times so that B samples are obtained.

2. For every sample compute its corresponding estimator: ˆβBi, where i = 1, 2, ...B.

3. Now we can compute the bootstrapped variance ˆVB[ ˆβ], which is defined as

ˆ VB[ ˆβ] = 1 B − 1 B X i=1 ( ˆβb_i −β)¯ˆ 2. (2.15)

This procedure pretends that the original sample of size n is the population. The above bootstrap procedure is known as an empirical bootstrap, in which in the first step the wi of the original sample are resampled. If the DGP is known, one could

also resample by randomly drawing from the known y = g(w, ˆβ). This is known as the parametric bootstrap. Instead of these two procedures, one could resample the fitted residuals after estimation of the model. This procedure is called the residual bootstrap.

In the simulation experiment of this thesis, the bootstrap is used to provide information about the size and power, as well as confidence intervals of certain statistics. If the statistic is pivotal, the bootstrapped statistic can provide asymptotic refinement, explained in the next paragraph. This can lead to a test statistic with a smaller error margin than the test statistic derived solely on the original data sample. What follows is a more accurate testing of hypothesis: bootstrapping can provide an improvement in testing.

(15)

Asymptotic Refinement

Let Xi∼ IID(0, σ2) for i = 1, 2, 3, ..., n and Z be the mean

Z = 1 n n X i=1 Xi.

The Central Limit Theorem states that Z converges to a normal distribution when n → ∞. This means that

P _√ n Z σ ≤ z = Φ(z) + R1, (2.16)

where Φ(·) is the standard normal CDF and R1 is the remainder of order O(n1/2)

and converges to zero when n → ∞. The Edgeworth expansion uses cumulants to approximate the probability function. Adding the first additional term to (2.16) gives P _√ n Z σ ≤ z = Φ(z) +g1(z)φ(z)√ n + R2, (2.17)

where g1(z) = −(z2− 1)φ(z)k3/6, φ(z) is the standard normal density function, k3

is the third cumulant of Zn, and R2 is of order O(n−1). It is easy to see that (2.17)

is asymptotically more accurate than (2.16), as it has an asymptotically smaller remainder term. The problem with implementing the terms of the Edgeworth ex-pansion lies in the fact that they rely on the cumulants of Z, which can be hard to obtain. Bootstrapping provides asymptotic refinement by capturing the Edgeworth expansion term in the statistic automatically whenever the bootstrapped statistic is pivotal. Noted should be that R1 is of order O(n1/2) and R2 of the order O(n−1),

which implies that asymptotically R2 < R1. Even though this statement always

holds asymptotically, in finite samples the contrary is possible. For a more elaborate derivation and discussion the reader is referred to Horrowitz (2001) and Cameron & Trivedi (2005).

Hypothesis Testing: percentile t-method

Under the null hypothesis H0 : β = β0, the t-statistic is defined as ˆt = ( ˆβ − β0)/s_βˆ.

Bootstrapping the t-statistic generates B bootstrapped statistics tb₁, tb₂, ..., tb_B, which can be used to make inference about β and its accuracy via hypothesis testing. Now let the bootstrapped statistics be ordered from smallest to largest, where tb₁ is the smallest and tb_B the largest. The unrestricted bootstrapped statistic tb_i is defined as

tb_i = βˆ

b i − ˆβ

s_βˆb

(16)

Provided that s_βˆb is a consistent estimator of the standard error of β and that

the error terms are independently and identically distributed, one can see that the statistic does not depend on any unknown variables because the limiting distribution of the t-statistic does not depend on any unknown variables. Hence the statistic is pivotal. Bootstrapping the statistic can provide asymptotic refinement whenever n → ∞, however smaller sample sizes could potentially lead to a higher error margin. At nominal level α, the critical value of the bootstrap is the lower α-quantile of the ordered bootstrapped statistics. Suppose the number of bootstrapped t-statistics equals 499 and the hypothesis are H0 : β = β0 versus Ha : β < β0. The critical

value given α = 0.05 is (B + 1)α = 25. A test statistic lower than the value of tb₂₅ rejects the null-hypothesis. The critical value of the test statistic for the upper tail is derived similarly, except that the critical value is calculated as (B + 1)(1 − α).

For a two sided test a distinction can be made between a symmetrical and non-symmetrical test. A non-symmetrical test takes the absolute values of all the statistics (again ordering them in size); the critical value is then defined as the upper α quantile of these statistics. The non-symmetrical test takes the lower α/2 quantile and the upper α/2 quantile as the critical values.

The distribution of the 2SLS estimator in Davidson & Mackinnon’s model is asymmetric and therefore they test the null hypothesis using the equal-tail bootstrap p-value which is defined as

ˆ pb_et = 2min 1 B B X i=1 I(tb_i < ˆt), 1 B B X i=1 I(tb_i > ˆt) ! . (2.18)

Let the nominal level α = 0.05, then the null hypothesis is rejected whenever ˆpb_et < α. Hence, equation (2.18) shows if a significant amount of the bootstrapped statistics are larger or smaller than the statistic from the original sample then the null hy-pothesis is rejected.

Confidence intervals are easily obtained using the same quantiles from above. The percentile t 100(1 − α)% confidence interval is

CI_1−ab = ( ˆβ − tb_α/2s_θˆ, ˆβ + tb1−α/2 s_θˆ)

where tb_α is the α-th-quantile of the ordered bootstrapped statistics, ˆθ is the original estimate of the initial sample and s_θˆthe original standard error of the initial sample.

The Model

(17)

fol-lowing model in the notation of Broda & Kan (2013),

y₁ = y₂β + Xγ + u (2.19) y₂ = z1π + Xδ + v (2.20)

where y₁ and y₂ are the endogenous vectors of dimension n × 1. X is the n × k matrix with exogenous variables, Z is the n × l matrix with exogenous instruments which has the following properties,

plim1 NZ 0 X = plim1 N N X i=1 zix 0 i= E[zix 0 i] = ΣZX (2.21) plim1 NZ 0_{u = plim}1 N N X i=1 ziui= E[ziui] = ΣZu= 0. (2.22)

Equation (2.19) is referred to as the structural equation and (2.20) as the reduced form equation. Equation (2.21) and (2.22) show that the instrument Z is correlated with X, but not with the error term u respectively. The covariance matrix of u and v is defined as

X

σ2

≡ σ

2

1,i ρi σ1,i σ2,i

ρi σ1,iσ2,i σ2,i2

!

where ρi is the correlation between u and v and can be dependent on row i of Z.

The model is estimated using the standard 2SLS estimator, the reader is referred to Cameron & Trivedi (2005) or Marno Verbeek (2008) for more information about instrumental variables estimation. Davidson & Mackinnon (2013) simulate the re-jection frequencies of multiple bootstrap techniques under the condition β = β0.

2.2.2 The Unrestricted & Restricted Residual Bootstrap

Davidson & Mackinnon (2010) first estimate the full model using 2SLS on (2.19) and (2.20). This gives the estimates ˆβ, ˆγ, ˆδ and ˆπ. The bootstrapped residuals are drawn from ˆ ub_i ˆ v_ib ! ∼ EDF uˆi n/(n − l))1/2vˆi ! .

The bootstrapped yb₁ and yb₂ are obtained by using the estimates and bootstrapped residuals

y_1,ib = y_2,ib β + Xˆ iˆγ + ˆubi

(18)

where Xi is the i-th row of the matrix X and the same holds for Zi. Davidson &

Mackinnon set ˆγ and ˆδ to zero, because ˆβ is invariant with regard to their values. Next ˆβ_ib is computed using the bootstrapped variables from above. Repeat this B-times to obtain the B bootstrapped ˆβ_ib’s.

The RR-bootstrap is performed almost the same way, except that the null hy-pothesis is imposed on the model: H0 : β = β0, where β0 = 0 in Davidson &

Mackinnon (2010). Estimation of the model gives ˜γ and ˆπ. The restricted boot-strapped residuals are drawn from

˜ ub_i ˆ v_ib ! ∼ EDF n/(n − k)) 1/2_u_˜ i n/(n − l))1/2vˆi ! .

The bootstrapped yb₁ and yb₂ are obtained in the following way y_1,ib = Xiγ + ˜˜ ubi

y_2,ib = Ziˆπ + ˆvib.

Davidson & Mackinnon obtain yb₁ from y_1,ib = ˜ub_i, because ˜γ has no effect on ˆβ. The bootstrapped ˆβ_ib are now obtained using the same algorithm as with the UR-bootstrap.

2.2.3 The Wild Bootstrap

The residual bootstrap is valid whenever the error terms are IID. One can eas-ily see that whenever heteroskedasticity is present and the resampled residuals are randomly drawn, with each residual given a probability weight of n−1, that het-eroskedasticity is not taken into account. This results in an inefficient bootstrapped estimator and an inconsistent variance if estimated.

In 1986, Wu proposed the Wild bootstrap which is a bootstrap residual resampling method that takes into account any heteroskedasticity. In his paper, Wu applies the method to the linear regression model y = Xβ +r, where r is the error term suffering from heteroskedasticity. In other words: V [ri|X] = V [ri|xi] = σ2i for i = 1, 2, ..., n.

The bootstrap procedure is as follows:

1. Obtain the residuals of the OLS regression, transform each ri by dividing it

with√1 − wi and multiplying it with ti, where wi = x

0

i(X

0_X)−1_x

i and where

ti is an element of vector t ∼ IID(0, I). This transformed error term is called

r_ib.

2. Obtain y_ib for i = 1, 2, ..., n by yb_i = x0_iβ + rˆ b_i, where ˆβ is the estimate of the original sample. This results in vector yb, where yb = (y₁b, yb₂, ..., y_nb)0.

(19)

3. The OLS estimator is ˆβb = (X0X)−1X0yb. 4. Repeat this B times.

It is easily shown that E[ ˆβb] = ˆβ and that V [ ˆβb] = (X0X)−1PN

i=1 r2 i 1−wixix 0 i(X0X)−1.

Wu (1986) shows that V [ ˆβb] has the “bias-robustness against error variance het-eroskedasticity” property, the reader is referred to his paper for the proof.

2.2.4 The Wild Restricted Efficient Residual Bootstrap

Restricted Efficient residual Bootstrap

The RE-bootstrap imposes the null-hypothesis H0 : β = β0 on the model, leading

to estimation more efficient than when no restriction is imposed on the model. In addition, the error in rejection probability decreases, Davidson &Mackinnon (2010) claim “... imposing a (true) restriction makes estimation more efficient, and using more efficient estimates in the bootstrap DGP should reduce the error in rejection probability (ERP) associated with the bootstrap test.”. Imposing H0 : β = β0 on the

model changes the structural equation to

y₁= Xγ + u.

Whenever the instruments are weak, estimation of the reduced form affects the bootstrap asymptotically. In order to efficiently estimate π in the reduced form equation, Davidson & Mackinnon use a technique proposed by Kleibergen (2002), which in turn is used to construct the K-statistic. To obtain the efficient estimator estimate

y₂ = Zπ + δMZy1+ r, (2.23)

where MZ= IN− Z(Z0Z)−1Z0 and r is the the residuals of the regression. The new

reduced form (2.23) has been augmented by the residuals of the structural equation. This is asymptotically equivalent to the Three Stage Least Squares (3SLS) estimator. The RE-bootstrap procedure is as follows

1. Run the regular 2SLS regression on the restricted model and obtain the esti-mate of the original data sample: ˜π and residual ˜u.

2. Randomly draw the bootstrapped residuals in pairs from the following empir-ical distribution function

˜ ub_i ˜ v_ib ! ∼ EDF (n/(n − k)) 1/2_u_˜ i n/(n − l))1/2v˜i ! (2.24)

(20)

where ˜vi ≡ y2,i − Ziπ. Davidson & Mackinnon (2010) state that rescaling˜

is not essential and that this will only have effect whenever the number of instruments l is relatively large compared to the number of observations n. Compute the bootstrapped y_1,ib and yb_2,i,

y_1,ib = yb_2,iβ0+ ˜ubi (2.25)

y_2,ib = Ziπ + ˜˜ vib

3. Use 2SLS to compute the bootstrapped βb_i. 4. Repeat this B-times.

Note that when H0 : β = β0 = 0, equation (2.25) simplifies to y1,ib = ˜ub1,i. Davidson

& Mackinnon studentize each bootstrapped βb

i and compare them to the t-statistic

of the original sample.

Wild Restricted Efficient Residual Bootstrap

In order to retain heteroskedasticity of the original sample Davidson & Mackinnon extend the RE-bootstrap with Wu’s Wild-bootstrap. The bootstrap DGP is almost the same as the RE bootstrap. The difference is that the WRE bootstrapped resid-uals are not randomly drawn with replacement. Every residual pair is multiplied by a random variable, ˜ ub_1,i ˜ ub_2,i ! ∼ EDF (n/(n − k)) 1/2_u_˜ i · wi n/(n − l))1/2v˜i · wi ! , (2.26) where the random variable wi follows the Rademacher distribution

P[wi = 1] = 1/2,

P[wi= −1] = 1/2.

Note that ˜ui and ˜viare multiplied by the same wi and that the residuals of the

origi-nal sample do not change order. This ensures the preservation of correlation between the error terms. The condition for this to be admissible is that the error terms have to be symmetrically distributed. Whenever the error terms are distributed asym-metrically, the researcher is better off drawing wi from the distribution proposed by

Mammen (1993), P[wi = − √ 5 2 ] = √ 5 + 1 2√5 P[wi = √ 5 2 ] = √ 5 − 1 2√5 .

(21)

The bootstrap DGP of the WRE-bootstrap differs little from the RE-bootstrap, but it has significant effect on the rejection frequencies as will be shown in the results of the simulation experiment.

(22)

The Westernboot strap

This chapter extensively derives the techniques and theorems used to build the Westernboot strap. Whenever the proof or derivation is already presented by Broda & Kan (2013), the reader will be referred to their paper.

First, inversion formula will be discussed and derived linking the characteristic function of a random variable with its distribution. This includes existing formula, which will be presented without proof, and the inversion formula of the ratio of the random variable R = X/Y . The latter will be explained and derived thoroughly following the paper of Broda & Kan (2013) closely. Secondly, the saddlepoint approx-imation of the PDF of the ratio is derived. This powerful theory makes computation of the density function more feasible by not having to compute the double integral defined in the inversion formulae and instead solve a system of equations. This can lead to a significant decrease in the time to compute a density. Finally the CDF approximation is derived, using a different approach than integrating the PDF. This is less straight forward and uses a result from Kolassa (2003).

3.1 The Inversion Formulae

Inversion formulae can be useful tools when the characteristic function of a random variable is available and the distribution function does not have an analytical repre-sentation. In their paper, Broda & Kan (2013) show that existing formulae remain valid under certain conditions even if Y is defined on the entire real line.

Gurland (1948) and Gil-Pelaez (1951) show that the inversion formula for the cu-mulative distribution function FX(x) of the random variable X, given its associated

characteristic function φX(t), is FX(x) = 1 2− 1 π Z ∞ 0 Im(e−itxφX(t) t )dt (3.1) 17

(23)

of all points where F is continuous. Wendel (1961) shows that this integral may fail to converge absolutely. It does converge whenever the following holds

E[log(1 + |X|)] < ∞. (3.2) Shephard (1991) derived the inversion formulae for the bivariate case,

FX,Y(x, y) = 1 2FX(x) + FY (y)(y) − 1 4 − 1 2π2 Z ∞ 0 Z ∞ 0

Ree−isx−ityϕX,Y(s, t) − e−isx+ityϕX,Y(s, −t)

st dsdt. (3.3) Formula (3.3) is valid whenever the mean of (X, Y ) < ∞ and if the absolute integral of ϕX,Y(s, t) exists. Now let

FR(r) = P[R < r] = P[

X

Y < r] = P[X − rY < 0] = P[W < 0],

where Y is almost surely positive and W = X − rY . The characteristic function of W is

ϕW(t) = E[etW]

= E[et(X−rY )] = E[etX−rtY] = ϕX,Y(t, −rt).

Let ϕi(·) be the first derivative of the characteristic function with respect to its i-th

argument. In addition, let E[log(1 + |X − rY |)] < ∞, let ϕX,Y be differentiable

(implying that the mean of (X,Y) is finite) and assume that ϕ2(t, −rt) is absolutely

integrable. Geary (1994) showed that the pdf of R can be expressed as fR(r) = 1 π Z ∞ 0 Im[ϕ2(t, −rt)]dt . (3.4) Broda & Kan (2013) show that (3.4) remains valid whenever X and Y form a definite pair, defined as follows

Definition 1. Definite Pair: We call two real-valued random variables a definite pair if ∃β such that P [X − βY < 0] = δ for δ ∈ {0, 1}.

In other words, whenever X and Y form a definite pair there exists a β such that the linear combination X − βY is almost always positive or negative. Broda & Kan (2013) use the definition in the following lemma,

(24)

Lemma 1. If X and Y form a definite pair, 0 is not an atom of Y , and for δ ∈ {0, 1}, β is such that P [X − βY < 0] = δ,

P[R < r] = 2δH(r − β) + (1 − 2δ) n

P[Y < 0] + sgn(r − β)P[W < 0] o

, (3.5) where H(·) is the Heaviside function, which is zero whenever the argument is nega-tive and is equal to one for a posinega-tive argument.

Proof of Lemma 1: The idea of the proof is to compute all probabilities P[R < r] under the condition P[X − βY < 0] = δ for δ ∈ {0, 1}. It is easily shown that whenever β = ∞ the condition always holds, because then P[Y = 0] = 0. First the probabilities are shown for P[X − βY < 0] = 0 and then for P[X − βY < 0] = 1. The proof for P[X − βY < 0] = 0 derived and presented by Broda & Kan, the reason for reproducing it here is because the proof of P[X − βY < 0] = 1 is based on theirs. To finish the proof all conditions will be combined. In order to prove the first part we need to find the condition for when R < β holds. Following Broda & Kan, in order to link R to β, R is rewritten as

R = X Y =

X − βY

Y + β = R1+ β. R < β implies R1< 0 which gives

R1 =

X − βY

Y < 0 ⇐⇒ Y < 0,

where the last implication holds from the condition P [X − βY < 0] = 0. If r < β, then R < r if and only if

R1+ β < r

⇐⇒ X − βY

Y < r − β ⇐⇒ X − βY

r − β < Y < 0. Hence Broda & Kan come to the following

P[R < r] = P[Y < 0] − P

Y < X − βY r − β

= P[Y < 0] − P[Y (r − β) > X − βY ] = P[Y < 0] − P[X − rY < 0].

If r > β, then R < r if and only if

R1+ β < r

⇐⇒ X − βY

Y < r − β ⇐⇒ X − βY

(25)

This gives P[R < r] = P[Y < 0] + P Y > X − βY r − β

= P[Y < 0] + P[Y (r − β) > X − βY ] = P[Y < 0] + P[X − rY < 0].

This concludes the first part of the proof provided by Broda & Kan. In order to compute the conditions whenever δ = 1, simply define X∗ = −X and Y∗ = −Y . It is easily shown that P[X∗− βY∗< 0] = 1 ⇐⇒ P[X − βY < 0] = 0. Hence the two conditions from above can be used to find the conditions for when δ = 1. For r < β this gives

P[R < r] = P[Y < 0] − P[X − rY < 0] = P[Y∗> 0] − P[X∗− rY∗ > 0]. Similarly for r > β this is

P[R < r] = P[Y < 0] + P[X − rY < 0] = P[Y∗> 0] + P[X∗− rY∗ > 0].

One can simply pick one of the four probability conditions based on the value of δ and r. Broda & Kan rather provide one formula based on the values δ and r, combining the four probability conditions gives

P[R < r] = 2δH(r − β) + (1 − 2δ) n

P[Y < 0] + sgn(r − β)P[W < 0] o which is exactly (3.5).

Putting these results together results into the following theorem of Broda & Kan (2013),

Theorem 1. If X and Y form a definite pair (i.e. ∃β : P[X − βY < 0] = δ for δ ∈ {0, 1}), 0 is an atom of neither Y nor W ≡ X − rY , E[log(1 + |Y |)] < ∞ and E[log(1 + |W |)] < ∞ then FR(r) = H(r − β) − (1 − 2δ) π Z ∞ 0 Im h ϕX,Y(0, t) + sgn(r − β)ϕX,Y(t, −rt) idt t . (3.6) If in addition, Y has a finite mean and ϕ2(t, −rt) is absolutely integrable, then FR(r)

is differentiable at r and fR(r) = sgn(r − β) π(2δ − 1) Z ∞ 0 Im [ϕ2(t, −rt)] dt = 1 π Z ∞ 0 Im[ϕ2(t, −rt)dt f or r 6= β. (3.7)

(26)

Proof of Theorem 1: (3.6) simply results from using (3.1) and substituting this in (3.5). Using (3.4) gives P[Y < 0] = 1 2 − 1 π Z ∞ 0 ImϕX,Y(0, t) dt t , P[W < 0] = 1 2 − 1 π Z ∞ 0 ImϕW(t) dt t = 1 2 − 1 π Z ∞ 0 ImϕX,Y(t, −rt) dt t . (3.8) Substituting these two results in (3.5) and some basic algebra gives (3.6). Differen-tiating (3.6) with respect to r gives (3.7).

The theorem shows that whenever X and Y form a definite pair, Geary’s formula is valid even if Y < 0. Broda & Kan (2013) derive general inversion formula for when X and Y do not form a definite pair the formulas of Theorem 1 do not hold. The authors begin with the following lemma,

Lemma 2. If 0 is an atom of neither Y nor W ≡ X − rY , then

FR(r) = P[W < 0] + P[Y < 0] − 2P[W < 0, Y < 0]. (3.9)

Proof of Lemma 2: Broda & Kan (2013) provide the proof of the lemma in their paper. The idea is to split P[X/Y < r] into two probabilities conditionally on Y . The reader is referred to the paper for the full proof.

Broda & Kan combine lemma 2 with Shephard’s inversion formula (3.3) which results into the next theorem,

Theorem 2. If (X,Y) has a finite mean, ϕX,Y is absolutely integrable and 0 is not

an atom of W ≡ X − rY , then for |r| < ∞, FR(r) = 1 2+ 1 π2 Z ∞ 0 Z ∞ 0

Re[ϕX,Y(s, t − rs) − ϕX,Y(s, −t − rs)]

st dsdt (3.10) and fR(r) = 1 π2 Z ∞ 0 Z ∞ −∞ Re[ϕ2(s, −t − rs)]ds dt t (3.11) whenever this integral converges absolutely.

Proof of Theorem 2: Shephard’s formula (3.3) applies because of the first two as-sumptions: (X, Y ) has a finite mean and ϕX,Y is absolutely integrable. In order to

(27)

complete the proof it is to be combined with (3.9), for the CDF: P[W < 0, Y < 0] = FW,Y(0, 0) = 1 2[FW(0) + FY(0)] − 1 4 − 1 2π2 Z ∞ 0 Z ∞ 0

Re[e−is·0−it·0ϕW,Y(s, t) − e−is·0+it·0ϕW,Y(s, −t)

st dsdt = 1 2[FW(0) + FY(0)] − 1 4 − 1 2π2 Z ∞ 0 Z ∞ 0

Re[ϕW,Y(s, t) − ϕW,Y(s, −t)

st dsdt. Where ϕW,Y can be rewritten as

ϕW,Y(s, t) = E[eisW +itY] = E[eis(X−rY )+itY]

= E[eisX−isrY +itY]

= E[eisX+i(t−rs)Y] = ϕX,Y(s, t − rs).

Filling in Shephard’s formula (3.3) gives the result for the CDF: FR(r) = P[W < 0] + P[Y < 0] − FW(0) − FY(0) + 1 2 + 1 π2 Z ∞ 0 Z ∞ 0

Re[ϕX,Y(s, t − rs) − ϕX,Y(s, −t − rs)

st dsdt = 1 2+ 1 π2 Z ∞ 0 Z ∞ 0

st dsdt,

where the second equality sign follows because the first four elements of the right hand side cancel each other out. For the pdf, the derivative of (3.10) is taken with respect to r, fR(r) = ∂ ∂rFR(r) = ∂ ∂r 1 2 + 1 π2 Z ∞ 0 Z ∞ 0

st dsdt = 1 π2 Z ∞ 0 Z ∞ 0 ∂ ∂r

st dsdt = 1 π2 Z ∞ 0 Z ∞ 0 Re[ϕ2(s, −t − rs) − ϕ2(s, t − rs)]ds dt t = 1 π2 Z ∞ 0 Z ∞ 0 Re[ϕ2(s, −t − rs) + ϕ2(−s, −t + rs)]ds dt t = 1 π2 Z ∞ 0 Z ∞ 0 Re[ϕ2(s, −t − rs)]ds dt t + Z ∞ 0 Z ∞ 0 Re[ϕ2(−s, −t + rs)]ds dt t = 1 π2 Z ∞ 0 Z ∞ 0 Re[ϕ2(s, −t − rs)]ds dt t − Z ∞ 0 Z −∞ 0 Re[ϕ2(¯s, −t − r¯s)]d¯s dt t = 1 π2 Z ∞ 0 Z ∞ 0 Re[ϕ2(s, −t − rs)]ds dt t + Z ∞ 0 Z 0 −∞ Re[ϕ2(¯s, −t − r¯s)]d¯s dt t = 1 π2 Z ∞ 0 Z ∞ −∞ Re[ϕ2(s, −t − rs)]ds dt t ,

(28)

where ϕi(·) is the first derivative with respect to the i-th argument of ϕ. The third

equality holds because the derivative and integral signs can be changed due to the fact that ϕX,Y(·) is assumed to be absolutely integrable. The fifth equality sign follows

from the fact that ϕX,Y(s, t) = ¯ϕX,Y(−s, −t), this implies that −Re[ϕ2(s, t − rs)] =

Re[ϕ2(−s, −t + rs)]. The seventh equality sign comes forth after the change of

variables ¯s = −s.

3.2 PDF Approximation

To derive the saddlepoint approximation for the pdf, Broda & Kan first compute a general expression for the pdf. The authors start with (3.9) and derive the pdf via differentiation. Differentiating (3.9) with respect to r gives

f_Rn(r) = ∂ ∂rFR(r) = ∂ ∂rP[ ¯W < 0] + ∂ ∂rP[ ¯Y < 0] − 2 ∂ ∂rP[ ¯W < 0, ¯Y < 0] = ∂ ∂rP[ ¯W < 0] − 2 ∂ ∂rP[ ¯W < 0, ¯Y < 0]. (3.12) Hence an expression has to be found for P [ ¯W < 0] and P [ ¯W < 0, ¯Y < 0]. The cgf of two random variables X and Y is defined as K(s, t) ≡ logE[exp(sX + tY )]. The authors start by computing the joint cgf. With ¯W = ¯X − r ¯Y , the cgf of ¯W and ¯Y is

logE h es ¯W +t ¯Y i = logE h es( ¯X−r ¯Y )+t ¯Y i = logEhes ¯X+(t−rs) ¯Yi = K(s, t − rs).

The standard Laplace inversion argument to compute the joint density of ¯W and ¯Y based on the joint characteristic function ϕ_{W , ¯}¯ _Y(s, t) is

f_{W , ¯}¯ _Y( ¯w, ¯y) = 1 2π 2Z +∞ −∞ Z +∞ −∞ e−is ¯w−it¯yϕ_{W , ¯}¯ _Y(s, t)dsdt. (3.13)

Multiplying the arguments of the moment generating function with i ∈ z equals its characteristic function, hence (3.13) can be rewritten as

f_{W , ¯}¯ _Y( ¯w, ¯y) = 1 2π 2Z +∞ −∞ Z +∞ −∞

e−is ¯w−it¯yM_{W , ¯}¯ _Y(is, it)dsdt. (3.14)

Now let ¯s = is and ¯t = it, this implies ds

i = d¯s and dt

(29)

Changing variables and the integration limits gives f_{W , ¯}¯ _Y( ¯w, ¯y) = 1 2π 2Z +i∞ −i∞ Z +i∞ −i∞ e−¯s ¯w−¯t¯yM_{W , ¯}¯ _Y(s, t)ds i dt i = 1 2πi 2Z +i∞ −i∞ Z +i∞ −i∞ e−¯s ¯w−¯t¯y+KW , ¯¯ Y(¯s,¯t)_d¯_sd¯_t, _(3.16)

recalling that the cgf is simply the logarithm of the moment generating function. Before continuing further derivation of the pdf in terms of the cgf, note that

M_{W , ¯}¯ _Y(s, t) = E[es ¯W +t ¯Y] = E[esn1 P iWi+t_n1PiYi_] = ΠiE[es 1 nWi+t 1 nYi] = ΠiMW,Y [s/n, t/n] = MW,Y[s/n, t/n]n.

Since the cgf is the logarithm of the mgf this implies that K_{W , ¯}¯ _Y(¯s, ¯t) = nKW,Y ¯s n, ¯ t n = nK ¯s n, ¯ t n . Now (3.16) can be rewritten as

f_{W , ¯}¯ _Y( ¯w, ¯y) = 1 2πi 2Z +i∞ −i∞ Z +i∞ −i∞ e−¯s ¯w−¯t¯y+nK(¯s/n,¯t/n)d¯sd¯t = n 2πi 2Z +i∞ −i∞ Z +i∞ −i∞ enK(˜s,˜t)−n˜s ¯w−n˜t¯yd˜sd˜t,

where the last equality sign follows from changing variables ˜s = ¯s/n and ˜t = ¯t/n. Equation (3.16) shows the density function as in Broda & Kan except that the limits of integration differ. In order to ensure that integration is possible, c1 and c2 are

added to the limits and condition them such that (c1, c2− rc1) ∈ τ in order to ensure

convergence of the integral for the cdf. Note that c1 and c2 are chosen such that

the contour is moved without passing any singularities. The joint density function is given as f_{W , ¯}n¯ _Y( ¯w, ¯y) = n 2πi 2Z c2+i∞ c2−i∞ Z c1+i∞ c1−i∞ en(K(s,t−rs)−s ¯w−t¯ydsdt. (3.17)

(30)

Now choosing c1< 0, c2 < 0 and integrating between −∞ and zero with respect to ¯ w and ¯y, F_{W , ¯}n¯ _Y(0, 0) = n 2πi 2Z c2+i∞ c2−i∞ Z c1+i∞ c1−i∞ Z 0 −∞ Z 0 −∞ en(K(s,t−rs)−s ¯w−t¯yd ¯wd¯ydsdt = n 2πi 2Z c2+i∞ c2−i∞ Z c1+i∞ c1−i∞ Z 0 −∞ − 1 nse n(K(s,t−rs)−s ¯w−t¯y 0 −∞ d¯ydsdt = n (2πi)2 Z c2+i∞ c2−i∞ Z c1+i∞ c1−i∞ Z 0 −∞ −en(K(s,t−rs)−t¯yd¯yds s dt = 1 2πi 2Z c2+i∞ c2−i∞ Z c1+i∞ c1−i∞ enK(s,t−rs)ds s dt t . (3.18) Differentiating this with respect to r gives

∂ ∂rF n ¯ W , ¯Y(0, 0) = ∂ ∂r " 1 2πi 2Z c2+i∞ c2−i∞ Z c1+i∞ c1−i∞ enK(s,t−rs)ds s dt t # = 1 2πi 2Z c2+i∞ c2−i∞ Z c1+i∞ c1−i∞ ∂ ∂r h enK(s,t−rs)ids s dt t = − 1 2πi · n 2πi Z c2+i∞ c2−i∞ Z +i∞ −i∞ K2(s, t − rs) · enK(s,t−rs)ds dt t (3.19) ( = I2),

where K2(s, t − rs) is the derivative with respect to its second argument. Note that

the minus of the last equation can be cancelled with 1/i2. c1 is chosen as zero after

the last equality sign, because differentiation removes the 1/s. Following the same steps as above, the density of ¯W is

f_Wn¯ = n 2πi Z c3+i∞ c3−i∞ en(K(s,−rs)−s ¯w)ds, (3.20) where c3 is such that (c3, −rc3) ∈ τ . By choosing c3 < 0 and integrating between

−∞ and zero with respect to ¯w the cdf is obtained as follows F_Wn¯(0) = n 2πi Z c3+i∞ c3−i∞ Z 0 −∞ en(K(s,−rs)−s ¯w)d ¯wds = n 2πi Z c3+i∞ c3−i∞ − 1 sne n(K(s,−rs)−s ¯w) 0 −∞ ds = − 1 2πi Z c3+i∞ c3−i∞ enK(s,−rs)ds s . (3.21) Differentiating this with respect to r gives the last piece of the pdf,

∂ ∂rF n ¯ W(0) = ∂ ∂r − 1 2πi Z c3+i∞ c3−i∞ enK(s,−rs)ds s = n 2πi Z +i∞ −i∞ K2(s, −rs) · enK(s,−rs)ds (3.22) ( = I1).

(31)

Substituting this in (3.12) gives the pdf expressed in two parts: f_Rn= I1+ 2I2.

The saddlepoint approximation will be applied to both I1 and I2. First I2 is

derived, starting by applying a Laplace approximation (as in Chapter 2) to the inner integral,

Z +i∞

−i∞

K2(s, t − rs) · enK(s,t−rs)ds. (3.23)

First let Ki(·, ·) denote the derivative of the cgf with respect to it’s ith argument

and let ˜s be the saddlepoint that solves

K1(˜s, t − r˜s) − rK2(˜s, t − r˜s) = 0 (3.24)

for every value of t, ˜s(t) is referred to as the inner saddlepoint. Applying the Laplace approximation, as in Chapter 2, to the integral of (3.23) gives

2π n 1/2 enK(˜s,t−r˜s) K2(˜s, t − r˜s) pcrK00(˜s, t − r˜s)cr (1 + O(n−1)) = 2π n 1/2 enh(t)g0(t)(1 + O(n−1)), (3.25)

where K00(·, ·) denotes the Hessian of the cgf and h(t) ≡ K(˜s, t − r˜s), cr ≡ (1, −r)0, g0(t) = K2(˜s, t − r˜s) pcrK00(˜s, t − r˜s)cr . Substituting (3.25) in I2 gives I2= n 2π 1/2 1 2πi Z c2+i∞ c2−i∞ enh(t)g0(t)(1 + O(n−1)) dt t . (3.26) To approximate the last integral of I2, the Broda & Kan propose a lemma which is

based on a theorem of Bleistein (1966).

Lemma 3. If g0(t) and h(t) are real functions of t, analytic in a strip containing

c 6= 0 and the imaginary axis, and h(t) has a unique saddlepoint on ˆtr 6= 0 on the

real axis in the interior of this strip, then 1 2πi Z c+i∞ c−i∞ g0(t)enh(t) dt t = e nh(0)_g 0(0) 1c>0− Φ( ˆw √ n) +e nh(ˆtr) √ 2πn g0(ˆtr) ˆ u − g0(0) ˆ w + O(n −1₎ , (3.27) where Φ(·) is the standard normal cdf, ˆw ≡ sgn(ˆtr)

q

−2(h(ˆtr) − h(0)), ˆu = ˆtr

q h00_(ˆ_t

r),

and for each r, the saddlepoint ˆtr solves h0(ˆtr) = 0.

Proof: The proof of this lemma is beyond the scope of this thesis, the reader is referred to Broda (2012) for a simple derivation.

(32)

Hence the first two derivatives of h(·) will have to be derived in order to make use of this lemma. The first derivative of h(·) is

∂ ∂th(t) = ∂ ∂tK(˜s, t − r˜s) = ˜s0(t)K1(˜s, t − r˜s) + K2(˜s, t − r˜s) − r˜s0(t)K2(˜s, t − r˜s) = ˜s0(t)(K1(˜s, t − r˜s) − rK2(˜s, t − r˜s)) + K2(˜s, t − r˜s) = K2(˜s, t − r˜s)

where the last equality holds because ˜s is the inner saddlepoint that solves (3.24). Noted should be that ˜s is dependent on t and that ˜s0(t) is the derivative of ˜s with respect to t. For the second derivative

∂ ∂t 2 h(t) = ∂ ∂tK2(˜s, t − r˜s) = ˜s0(t)K12(˜s, t − r˜s) + K22(˜s, t − r˜s) − r˜s0(t)K22(˜s, t − r˜s) = ˜s0(t)(K12(˜s, t − r˜s) − rK22(˜s, t − r˜s)) + K22(˜s, t − r˜s). (3.28)

Only one unknown term is to be determined in (3.28) which is ˜s0(t), this term can be found by differentiating (3.24), ∂ ∂t(K1(˜s, t − r˜s) − rK2(˜s, t − r˜s)) = 0 ⇐⇒ ˜s0(t)K11(·) + K21(·) − r˜s0(t)K21(·) − r˜s0K21(·) − rK22(·) + r2s˜0(t)K22(·) = 0 ⇐⇒ ˜s0(t)(K11(·) − 2rK21(·) + r2K22) = −K21(·) + rK22(·) ⇐⇒ ˜s0(t) = −K21(·) − rK22(·) c0_rK00(·)cr . (3.29)

Plugging the result into (3.28) and rearranging terms produces the second derivative of h(t) as in Broda & Kan,

h00(t) = |K

00_(˜_{s, t − r˜}_s)|

c0

rK00(˜s, t − r˜s)cr

. (3.30) The saddlepoint for the outer integral ˆtr, according to Lemma 2, is found by solving

h0(ˆtr) = 0. This implies for ˜s that it is dependent on ˆtr. The system of conditions

for the saddlepoints is ˆ

tr: K2 ˜s(ˆtr), ˆtr− r˜s(ˆtr) = 0,

˜

s(ˆtr) : K1 ˜s(ˆtr), ˆtr− r˜s(ˆtr) − rK2 ˜s(ˆtr), ˆtr− r˜s(ˆtr) = 0.

The last equation boils down to K1 s(ˆ˜tr), ˆtr− r˜s(ˆtr) = 0. Simplifying the system

(33)

& Kan. The outer saddlepoints ˆs and ˆt solve ˆ

tr: K2 s, ˆˆ tr− rˆs = 0,

˜

s(ˆtr) : K1 s, ˆˆ tr− rˆs = 0.

Which Broda & Kan write as

K0 ˆs, ˆt ≡ K1(ˆs, ˆt) K2(ˆs, ˆt)

0

= 0, (3.31) where ˆt = ˆt + rˆs. Broda & Kan define ˜s0 ≡ ˜s(0) and

˜

w0= sgn(˜s0)p−2K(˜s0, −r˜s0),

. Now let ˜g0 = g0(0), then

˜ g0 = K2(˜s0, −r˜s0) pc0 rK00(˜s0, −r˜s0)cr . Note that g0(ˆtr) = 0, because

g0(ˆtr) =

K2(ˆs, ˆt)

q

c0_rK00(ˆs, ˆt)cr

= 0.

Now applying Lemma 3 to I2 gives,

I2 = √ n√1 2π enh(0)g0(0)1c2>0− Φ( ˆw √ n) + √1 n 1 √ 2πe nh(ˆtr)₋g0(0) ˆ w + O(n −1₎ . (3.32) Where we can rewrite the exponential terms including (2π)−1/2 on the right hand side as 1 √ 2πe nh(0) ₌ _√1 2πe nK(˜s0,−r˜s0) = √1 2πe −n₂·√−2K(˜s0,−r˜s0) 2 = √1 2πe −1 2( √ n ˜w2 0) = φ(√n ˜w0), (3.33) 1 √ 2π· 1 √ 2πe nh(ˆtr) ₌ _√1 2π · 1 √ 2πe nK(ˆs,ˆt) = √1 2π · e nK(˜s0,−r˜s0)_·_√1 2πe nK(ˆs,ˆt)−nK(˜s0,−r˜s0) = φ(√n ˜w0) · 1 √ 2πe −n 2 √ −2[K(ˆs,ˆt)−K(˜s0,−r˜s0)] 2 = φ(√n ˜w0) · 1 √ 2πe 1 2( √ n ˆw)2 = φ(√n ˜w0) · φ( √ n ˆw). (3.34)

(34)

Substituting this into (3.32) results in the saddlepoint approximation of I2, I2 = √ nφ(√n ˜w0)˜g0 1c2>0− Φ( √ n ˆw) − φ( √ n ˆw) √ n ˆw + O n−1 . (3.35) I1 has to be derived in order to complete the saddlepoint approximation for the pdf.

Note that I1 is equal to the inner integral of I2with t = 0. Using the same approach

as above, it is easily shown that I1 =

√

nφ(√n ˜w0)˜g0 1 + O(n−1). (3.36)

Combining the two results of above leads to the following theorem.

Theorem 3. Suppose that X and Y have a joint density with respect to Lebesque measure on R2, and that their joint cgf K(s, t) ≡ logE[exp(sX + tY )] converges on the open set τ 3 (0, 0), with gradient K0(s, t) and Hessian K00(s, t). Let ¯X and ¯Y denote the mean of n independent copies of X and Y, respectively. For r ∈ R, a compact subset of the rand of K1(s, t)/K2(s, t), define the outer and inner

saddlepoints (ˆs, ˆt) and ˜s0 as the solutions to K0(ˆs, ˆt) = 02 and

c0_rK0(˜s0, −r˜s0) = 0, (3.37)

respectively, where cr ≡ (1, −r)0. Then, provided that ˆt 6= −rˆs, the density of the

ratio R ≡ ¯X/ ¯Y is f_Rn(r) = ˆf_n1(r)(1 + O(n − 1)), where ˆ f_n1(r) =√nφ(√n ˜w0)˜g0 1 − 2 Φ(√n ˜w +φ( √ n ˜w) √ n ˜w , (3.38) ˜ g0 ≡ K2(˜s0, −r˜s0) pc0 rK00(˜s0, −r˜s0)cr , ˜ w0 ≡ sgn(˜s0)p−2K(˜s0, −r˜s), ˆ w ≡ sgn(ˆt + rˆs) q −2[K(ˆs, ˆt) − K(˜s0, −r˜s0)].

For higher order approximations the reader is referred to the appendix of Broda & Kan.

3.3 CDF Approximation

Broda & Kan provide a derivation of the CDF approximation based on a result of Kolassa (2003), therefore this section will only show the result.

(35)

Theorem 4. Under the conditions of Theorem 3, F_Rn(r) = ˆFn(1)(r) + O(n−1), where ˆ F_n(1)≡ H∗(˜s0) − P1 1 − 2H∗(ˆtr) + H∗(ˇt0) − P2 1 − 2H∗(ˆs) + 2 H∗(ˆtr)H∗(ˆs) − P3, P1 ≡ en[˜κ (0) 0 +˜s20κ˜ (2) 0 /2] h I(0, n˜κ(2)₀ , ˜s0) + n˜κ(3)₀ I(3, n˜κ(2)₀ , ˜s0)/6 i , P2 ≡ en[˜κ (0) 0 +ˇt20κ˜ (2) 0 /2] h I(0, n˜κ(2)₀ , ˇt0) + n˜κ(3)0 I(3, n˜κ (2) 0 , ˇt0)/6 i , P3 ≡ en[ˆκ (0,0)_+ˆ_t0_ˆ K00ˆt/2]_×  I(0, n ˆK, ˆtr) + n 6 3 X j=0 3 j ˆ κ(3−j,j)I([3 − j, j], n ˆK, ˆt)   H∗(s) ≡ 1s≥0, ˆK ≡ {ˆκ(i,j)}, ˇt0 solves K2(0, ˇt0 = 0), (ˆs, ˆt) is as in (3.31)], ˆtr ≡ ˆt + rˆs, ˆ_{t ≡ (ˆ}_{s, ˆ}_t_r_{), ˆ}_{t = (ˆ}_{s, ˆ}_{t), ˆ}_K00 ≡ K00(ˆs, ˆt), ˇκ(j)₀ ≡ K2j(0, ˇt₀), ˜ κ(j)₀ ≡ j X k=0 j k (−r)kK1j−k₂k(˜s₀, −r˜s₀), ˆ κ(i,j)≡ k X k=0 i + j k (−r)kK1i+j−k₂k(ˆs, ˆt),

For the expressions K1i₂j ≡ ∂i+j_{K(s, t)/∂s}i∂tj the reader is referred to Appendix B

of Broda & Kan (2013).

Proof: Broda & Kan (2013) provide an elaboraste proof of the theorem in their paper. In order to compute the CDF, the authors provide an algorithm for I(·). The reader is referred to their paper for more information about the theorem.

(36)

Model & Simulation Setup

This chapter presents the model and the setup of the simulation experiment fo-cused on rejection frequencies (size and power properties) of the Westernboot strap and the WRE-bootstrap. The latter resampling method will be combined with the t-statistic and the heteroskedasticity robust th-statistic. The AR-statistic

(Ander-son and Rubin, 1949), which is equivalent to the K-statistic (Kleibergen, 2002) in the just-identified case, is not resampled in this simulation experiment. Davidson & Mackinnon (2010) show that the AR-test is asymptotically valid with weak in-struments and heteroskedasticity present when it is bootstrapped using the WRE scheme. Anderson & Rubin (1950) show that the AR-statistic is asymptotically distributed as a χ2_{(υ) distribution with υ degrees of freedom under}

homoskedas-ticity. In this experiment the critical values of the χ2(υ) distribution are taken for AR-statistic, where the degrees of freedom υ equal the number of instruments. In addition, confidence intervals of the Westernboot strap are created, computed, and compared to those of the AR-statistic.

The bootstrap procedures are applied to the just identified simultaneous equa-tions model with one endogenous regressor and one instrument. This differs from Davidson & Mackinnon’s model who use a total of 12 instruments for their first simulation experiments, 11 of which are irrelevant. This, combined with a different sample size and scheme for heteroskedasticity is the reason for the difference in size and power for the WRE-bootstrap results presented in the next chapter. Following Broda & Kan (2013), the model is defined as

y₁ = y₂β + Xγ + u, (4.1) y₂ = z1π + Xδ + v, (4.2)

where y₁ and y₂ are vectors of n × 1, as well as the error terms u and v, and the n × 1 vector z1 is the instrument. The matrix X is a n × k matrix of exogenous

(37)

regressors. The disturbances are distributed as ut vt ! ∼ 0, " σ2_u,t σuv σuv σv,t2 #! . (4.3) Using 2SLS, the estimator of β is defined as

ˆ β = z 0_y 1 z0y2 , (4.4)

where z = MXz1 and MX = In− X(X0X)−1X0. The associated estimation error

ˆ B = ˆβ − β is ˆ B = z 0_u πz0_{z + z}0_v. (4.5)

In the simulation experiment the Westernboot strap approximates the ratio of (4.5). For the Westernboot strap ˆβ is estimated using Three-Stage-Least-Squares (3SLS) which was introduced by Zellner & Theil (1962). 3SLS improves 2SLS by treating the model as a system of equations and taking into account interdependency in the system rather than estimating the equations one by one. Efficiency is gained by using the covariance information of the disturbances between equations. For more background regarding the efficient estimate see section (2.2). Davidson & Mackinnon use a different, but asymptotically equivalent, technique to get an efficient estimate of π, (section 2, page 130, Davidson & Mackinnon, 2010). The reason for using 3SLS in the Westernboot strap is because its power based Davidson & Mackinnon’s estimate of π causes it to go to zero under the alternative whenever the strength of the instrument increases. The reason for unusual behaviour is unknown. Note that the results of the WRE-bootstrap is the same as Davidson & Mackinnon’s way of estimating π efficiently in order to truly compare the results.

In order to approximate the CDF of (4.5) with the Westerboot strap, the fol-lowing inputs of the model are needed: ˆβ, β0, ˆπ3SLS, z, ˆu and ˆv. Broda & Kan

provide the formula of the joint characteristic function of z0u and πz0z + z0v in their paper. Combining their code to compute the density with the former mentioned inputs results into the approximation of the CDF of (4.5), which is used to compute the rejection frequencies.

Next the data generating process is described followed by the computation algo-rithm of the simulation experiment.

4.1 Data Generating Process

Whenever the disturbances are homoskedastic, u and v are randomly generated by a standard normal distribution. The correlation between the two disturbances is

(38)

defined as ρ. This is also the measure of endogeneity imposed in the model. When the disturbances are heteroskedastic, u and v are randomly generated by a GARCH (Bollerslev, 1986) process. The disturbances follow a GARCH(1,1) process defined as

σ2_t = α0+ α12t−1+ β1σt−12 , (4.6)

where α1+β1 < 1 and α0 > 0. In the experiment the chosen values are: α0 = 0.0010,

α1 = 0.94 and β1= 0.0590. This implies an unconditional variance of σ2 = 1. First,

both u and v1 are independently randomly generated by (4.6). Second, in order to

ensure endogeneity ρ in the model, v is made dependent by equating v = ρu +p1 − ρ2_v

1,

which gives the final heteroskedastic error terms u and v. Note that differs from the way Davidson & Mackinnon impose heteroskedasticity on the model. The authors use the following DGP to impose heteroskedsticity, using our notation

y1 = n1/2|z1|u,

y₂ = az1+ v, v = ρn1/2|z1|u +

p

1 − ρ2_v 1.

Imposing heteroskedasticity is possible in different ways. For this thesis the former method based on the GARCH(1,1) process is chosen. For the GARCH(1,1) process the values of α0, α1 and β1 are chosen such that σu2 and σ2v are equal to one.

The instrument z1 is a standard normally distributed random variable

normal-ized to have Euclidean length one, similar as in Davidson & Mackinnon (2010). Normalizing the instrument has the effect that the asymptotics do not depend on the sample size and ensures that the concentration parameter is equal to π under homoskedasticity, which thus measures the strength of the instrument

. The exogenous regressor matrix X consists solely out of one columns of ones, the constant. For this simulation experiment the sample size is set at n = 100, simula-tions have been done for n = 25, n = 50, n = 100, n = 200 and n = 400 in order to verify whether the sample size has a significant effect on the shape of the size and power curves of the statistics. These simulations show that all statistics show a similar shape while varying the sample size.

4.2 Simulation Algorithm

First, the residuals are randomly generated following the GARCH(1,1) process de-scribed in section 4.1. The instruments are randomly generated from a standard

(39)

normal distribution and normalized. Second, π is estimated for the WRE-bootstrap of Davidson & Mackinnon and for the Westernboot strap. This gives ˆπ3SLS and

ˆ

πD&K, where the latter one is the efficient estimator of Davidson & Mackinnon

(2010). Next, β is estimated which gives ˆβ.

After generation of the residuals, dependent variables, regressors, instruments, and estimation of the estimators, the t, th and AR statistic are computed under the

null based on the generated sample. Thereafter, The t and th statistic are

boot-strapped based on the WRE-bootstrap. Rejection frequencies under the null for the t and thstatistic are computed using the equal-tail bootstrap p-value (see subsection

2.2.1). The rejection frequencies of the AR-statistic are computed using the critical values of the χ2(υ) distribution as mentioned above. The rejection frequencies of the Westernboot strap are computed using the saddlepoint approximation of the CDF. More information concerning the number of bootstraps, exact level of endogene-ity, strength of the instruments is presented in the next chapter.

4.3 The Code

The model and bootstrap methods are simulated in Matlab . The saddlepointR

approximation for the Westernboot strap is programmed by Simon Broda, University of Amsterdam. The code is available from the author for evaluation of the results and expressions.

(40)

Results

This chapter presents the results of the simulation experiment. First the rejection frequencies are compared of the Westernboot strap and the WRE-bootstrap under the null for varying strength of the instrument. This is replicated for models suf-fering from mediocre and severe endogeneity and for weak and strong instruments. Furthermore, power is computed for a varying level of deviation from the null with fixed π and ρ, this leads to interesting results. The test statistic of the Westernboot strap is denoted as SP A. Statistics based on the WRE-bootstrap scheme are the t-statistic, the th-statistic, and the AR-statistic. Also, the rejection frequencies are

presented under weak and strong instrument asymptotics. The difference between the two asymptotic approximations is the sequence of the models that one assumes whenever n → ∞. Strong instrument asymptotics means that π is of order O(1), in other words, π is constant. In this case, the asymptotic distribution of ˆβ is stan-dard normal. Hence take the t-statistic and compare it to the critical values of the standard normal distribution. Under weak instrument asymptotics, π is of order O(n−1/2), hence the strength becomes smaller as the sample size increases. The limiting distribution for this is a ratio of normals. If you substitute a large π into this limiting distribution you return to the case of strong instrument asymptotics. The weak instrument asymptotics is computed by taking the t-statistic and treat is as a ratio of standard normal distributions with nuisance parameters replaced by es-timates. For this simulation experiment the weak instrument asymptotics rejection frequencies are computed using the Cauchy distribution. This chapter is concluded by the properties of the confidence interval based on the AR-statistic and the SP A. This includes coverage, percentage converging to infinity, and average length of the intervals.

(41)

5.1 Rejection Frequencies

Figure (5.1) shows the rejection frequencies of various statistics based on the WRE-bootstrap. The Westernboot strap rejection frequencies are denoted by SPA. The null: H0 : β0 = 0 is imposed when computing the rejection frequencies. This

means that the efficiently estimated π is estimated based on the model with the null imposed. Also, the null is imposed on the statistics. The nominal level is set at α = 0.05, endogeneity to ρ = 0.95, the disturbances are heteroskedastic, the number of bootstraps is B = 399 and the sample size is n = 100. The Westernboot strap approximates the bootstrap distribution as if one would do an infinite amount of bootstraps. The rejection frequencies are computed for π = [0, 10] with steps of 0.25. For every step 2500 simulations are executed. The Weak Asymptotics shows the rejection frequencies under weak instrument asymptotics. This is achieved by taking the t-statistic and treating it as a ratio of normals, with nuisance parameters replaced by the estimates. The Strong Asymptotics shows the rejection frequencies under strong instrument asymptotics. This is computed by taking the t-statistic and treating it as a standard normal random variable, it uses the critical values of the standard normal distribution to compute the rejection frequencies. The reader is referred to Staiger & Stock (1997) for more information concerning strong and weak instrument asymptotics. Note that the t-statistic actually performs better than the th-statistic for π < 0.5. The SP A seems to be oversized marginally for π < 4. After

this value the th-statistic, t-statistic, AR-statistic, and the SP A seem to converge.

The AR-statistic clearly performs best for 0 ≤ π ≤ 10. The AR-statistic is invariant to π under homoskedasticity, but also shows no variance under the heteroskedasticity in this experiment.

Figure (5.2) shows the power of the statistics when tested under the alternative: H0 : β0 = 4. The Strong Asymptotics shows the greatest power, unfortunately the

rejection frequencies are significantly oversized in (5.1) and therefore the power is somewhat unreliable. The th-statistic is second best, but similar to the last argument

it is oversized for small π. The undersized and oversized t-statistic and SP A show powers which are very close to each other. The AR-statistic shows a sort of lower bound for all statistics except for the Weak Asymptotics. When π > 4, the statistics seem to converge. The size of the t-statistic as well was the SP A is close to the nominal level, therefore there is reason to believe the power of this simulation does not deviate significantly from the true power.

Figure (5.3) shows the rejection frequencies under the null when the disturbances are homoskedastic. The t-statistic seems to under-reject marginally for π ∈ (1; 4).

The westernboot strap : comparison & assessment.

Master Thesis

The Westernboot Strap:

Comparison & Assessment

Preface

Contents

Introduction

Theoretical Background

2.1

Saddlepoint Approximation

2.2

Bootstrap Theory

The Westernboot strap

3.1

The Inversion Formulae

3.2

PDF Approximation

3.3

CDF Approximation

Model & Simulation Setup

4.1

Data Generating Process

4.2

Simulation Algorithm

4.3

The Code

Results

5.1

Rejection Frequencies