• No results found

Spatial Econometric Approaches to Tax Competition

N/A
N/A
Protected

Academic year: 2021

Share "Spatial Econometric Approaches to Tax Competition"

Copied!
34
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Spatial Econometric Approaches to Tax Competition

Hendrik Vrijburg†

Msc Thesis Groningen University October 2007

Abstract

Recently, Kapoor, Kelejian and Prucha propose in their 2007 Journal of Econometrics article “Panel Data Models with Spatially Correlated Error Components” a two-step General Method of Moments (GMM) estimator for a random effects panel data model incorporating spatially autocorrelated disturbances. This thesis discusses three exten-sions to the model studied by Kapoor et al. (2007). The first two extenexten-sions introduce a spatially lagged dependent variable and fixed effects, respectively. The third extension introduces a time lag of the dependent variable. The extensions are highly relevant for the empirical tax competition literature where most authors have failed to control for spatially autocorrelated disturbances and time dependency. Using Monte–Carlo techniques, we in-vestigate the performance of the estimators used in the literature until now with GMM estimators that are obtained by applying the procedure suggested by Kapoor et al. (2007).

Keywords: Spatial econometrics; time dynamics; strategic interaction; tax competi-tion

JEL codes: C33, C51, H71, H77.

(2)

1

Introduction

The commodity tax competition literature predicts that governments will set their taxes

strategically to increase the welfare of their citizens (or their own tax revenues).1 Following

a voluminous theoretical literature, various articles recently studied the phenomenon of com-modity tax competition empirically (see Table 1 for a short overview). Without exception, these studies find that the tax setting in one jurisdiction is influenced by the tax setting in surrounding jurisdictions.

However, as can be inferred from the last column of Table 1, the empirical studies on consumption tax competition find that the intensity of the strategic interaction among gov-ernments (referred to as the endogenous–interaction effect) differs widely. Part of this differ-ence can be explained by the different tax categories under study. However, even within the same tax category the estimates diverge. For example, Jacobs, Ligthart and Vrijburg (2007) find that for consumption taxes the responsiveness of the home government to changes in the consumption tax rates in neighboring states in the US is in the range of 0.38–0.93. Egger, Pfaffermayer and Winner (2005a, 2005b) find similar wide ranges of 0.16–0.82 and 0.54–0.98 respectively.

Table 1: Previous Research

Reference Jurisdictions Tax Instruments Methoda Endogenous–

Interaction effect Rork (2003) US Sales Tax, Income Tax, Corporate Tax,

Cigarette Tax and Gasoline Tax

IV −0.24–0.64

Evers, de Mooij and Vollebergh (2004)

EU Gasoline Tax IV 0.17–0.48

Luna (2004) Counties

Tennessee

Sales Tax OLS 0.04–0.16

Egger, Pfaffermayer and Winner (2005a)

OECD Average Effective Tax Rate SFGLS 0.16–0.82

Egger, Pfaffermayer and Winner (2005b)

US Gasoline Tax, Cigarette Tax, Beer Tax and Wine Tax

SFGLSb 0.54–0.98

Devereux, Lockwood and Redoano (2007)

US Gasoline Tax, Cigarette Tax IV and LSDVc 0.13–0.27

Jacobs, Ligthart and Vrijburg (2007)

US Average Effective Tax Rate IV and ABc 0.38–0.93 aAll articles, unless mentioned otherwise, use a FE model, that is, they control for fixed effects either by applying a

within–transformation or by including appropriate unit– and time– fixed effects.

bReports the results of both a FE and a RE estimator.

cA time–lag of the dependent variable is included to control for time dependency.

A potential explanation for these divergent estimates is the variety of estimation methods that is applied in the literature. This variety of estimation methods follows from a num-ber of econometric problems that the applied researcher faces in estimating the tax reaction functions. These econometric problems are summarized by Breuckner (2003): (i) the tax rates of all jurisdictions are determined simultaneously; (ii) the error component might well suffer from spatial error dependence and (iii) the presence of fixed effects (FE). Furthermore,

(3)

the presence of time dynamics (a time lag of the dependent variable) introduce additional estimation problems. In the remainder of this thesis a model that includes a time lag of the dependent variable will be referred to as dynamic and a model that does not include a time lag of the dependent variable will be referred to as static. It must be noted that the tax com-petition game is spatial in nature, this implies that spatial econometric techniques are needed to estimate the parameters of interest. Chapters 2 will introduce the spatial econometric literature and discuss the methods needed to consistently estimate the parameters.

In reference to the econometric problems identified by Breuckner it is important to mention that most articles listed in Table 1 ignore the problem of spatial error dependence. Only Egger, Pfaffermayer and Winner (2005a and 2005b) correct for spatial error dependency. Their estimator relies on a method proposed by Kapoor, Kelejian and Prucha (2007) (henceforth KKP) which makes use of the spatial structure of the variance–covariance matrix to obtain a more efficient estimator. In their 2007 paper, KKP propose a three step estimation procedure to obtain a Spatial Generalized Least Squares (SGLS) estimator. In the first step an estimate of the disturbance vector is obtained. In the second step KKP use their knowledge on the structure of the variance–covariance matrix to obtain an estimate of the parameters underlying the spatial error dependency process. In the third step, KKP use the estimates from the

second step to remove the spatial error dependency from the model.2 We will use this method

introduced by KKP to correct for spatial error dependency.

Finally, one important new direction in empirical research is provided by Jacobs et al. (2007) and Devereux et al. (2007) who study a dynamic model to correct for serial dependency. The models studied by Jacobs et al. and Devereux et al. have not been studied formally in the econometric literature. Application of the model is also empirically interesting as the allowance for time dynamics might have a profound influence on the estimation results, this is illustrated by Jacobs et al. (2007). In this paper, the estimated coefficients of the dynamic model are on average smaller than the estimated coefficients in the static model, the averages being 0.45 and 0.72 respectively.

This thesis focuses on GMM estimation in both a static spatial model and a dynamic spatial model that are both characterized by spatial error dependency. To be more specific, the contributions to the literature of this thesis are threefold. (i) The thesis investigates, using Monte–Carlo techniques, estimation in a dynamic panel data model with a spatially weighted dependent variable and spatially correlated disturbances. Although used in empirical research,

this model has not been formally analyzed in the literature yet.3 We will (ia) compare the

estimators by Jacobs et al. and Devereux et al. and (ib) use a dynamic extension of the method proposed by KKP to suggest a superior estimator.

(4)

(ii) The thesis studies, using Monte–Carlo techniques, estimation in static panel data models characterized by both a spatially weighted dependent variable and spatially corre-lated disturbances. We will compare the performance of the IV estimator used in the tax competition literature and the SGLS estimators used by Egger et al. (2005a and 2005b). The scientific relevance of this second contribution follows from the observation that although used in empirical research, the performance of the SGLS estimator relative to the IV estimator has not yet been investigated formally.

(iii) The thesis studies the bias introduced by neglecting the time lag of the dependent variable. This last contribution might provide interesting insights into the reliability of the estimates in Table 1. This because most “early papers” do not consider the time lag of the dependent variable, while it proves to be highly significant in the models studied by Jacobs et al. (2007) and Devereux et al. (2007). Can it be that the results of the “early” papers change considerably when a time lag is considered?

(5)

2

Spatial Econometrics

This Chapter makes use of Anselin (2006) to introduce the problems and methods associated with the spatial econometrics literature. Chapter 2.1 introduces the endogenous interaction model that underlies the basic spatial econometric model discussed in Chapter 2.2, Chapter 2.3 discusses estimation in spatial models, Chapter 2.4 discusses instrument selection, finally Chapter 2.5 discusses the method suggested by KKP to deal with spatial error dependency.

In the remainder of this thesis, the following notation will be used. rit denotes a scalar

observation on variable r for unit i on time t. Vectors are indicated by boldface characters. Therefore, r(t) is an N × 1 vector of observations on variable r for all N cross-sectional units in time period t = 1, ..., T , and r is an N T × 1 vector of observations on variable r, where the

observations are sorted first by time and second by units. Finally, RN is an N × N matrix.

In the above, T denotes the number of time periods considered and N denotes the number of cross-sections considered.

2.1 Identification in the endogenous interactions model

We will start with a short discussion of the identification problems faced by applied researchers in the tax competition literature. This chapter will introduce the general social interactions model. Manski (1993) shows that the parameters in models of social/spatial interaction are only identified when some strict assumptions hold. Manski defines three types of interaction: (i) contextual effects (related to exogenous characteristics of the group); (ii) endogenous effects (interaction between the units in the group) and; (iii) correlated effects (characteristics that the units have in common). Consider the following general cross-sectional model for time period τ

y(τ ) = α + δ E[y(τ )|z(τ )] + x(τ )0β + E[x(τ )|z(τ )]0γ + u(τ ), (1)

where y(τ ) is the dependent variable, z(τ ) are the exogenous group characteristics, x(τ ) are the observed characteristics of the units, α is an unit–specific effect, and u(τ ) are unobserved characteristics of the individuals. Most importantly, the unobserved characteristics (u(τ )) are assumed to be correlated across the individuals in the group, that is, E[u(τ )| x(τ ), z(τ )] =

z(τ )0η. In summary, this implies that the expected value of y(τ ) given the observed variables

x(τ ) and z(τ ) is given by

E[y(τ )|x(τ ), z(τ )] = α + δ E[y(τ )|x(τ )] + x(τ )0β + E[x(τ )|z(τ )]0γ + z(τ )0η,

(6)

reduced form of this model is give by

E[y(τ )|x(τ ), z(τ )] = α/(1 − δ) + E[x(τ )|z(τ )]0(γ + β)/(1 − δ) + z(τ )0η/(1 − δ),

and it can be seen that the different social effects cannot, without further restrictions, be identified separately.

In solving the identification problems, a first step is to consider some of the practical restrictions imposed by the tax competition literature. First of all, the articles listed in Ta-ble 1, claim that they only consider estimation of δ, β and α. The articles do not consider the interaction effect between the observed group characteristics and the observed individual characteristics and therefore assume implicitly γ = 0. Furthermore, it is important to recog-nize that the identification of both the endogenous interaction effect (δ) and the correlated

effect (η) is impossible (recognize that both E[y(τ )|x(τ )] and z(τ )0 are constant over the

cross-sections). The spatial econometrics literature introduces some additional assumptions that removes this impossibility, and this facilitates identification of δ and η separately.

2.2 An Introduction into the Spatial Econometrics Literature

Spatial econometrics consists of a subset of econometric methods that is concerned with spatial aspects present in cross-sectional and space-time (panel) observations. Variables related to location, distance and arrangement (topology) are treated explicitly in model specification, estimation, diagnostic checking and prediction (Anselin (2006)).

When analyzing the tax competition game, the spatial character underlying the theory (neighboring jurisdictions competing), and the influence of unobserved variables that are potentially correlated amongst jurisdictions, imply that spatial econometric techniques are required to obtain accurate estimates of the parameters of interest. This Chapter will therefore introduce some of the important concepts and methods used in the spatial econometrics literature.

We will refer to a model that includes both a spatial lag and a spatial error component as a spatial model. Both components will be introduced briefly. When using the assumptions γ = 0 and δ E[y(τ )|x(τ )] = g(y(τ ), δ), Equation 1 simplifies to

y(τ ) = g(y(τ ), δ) + x(τ )β + u(τ ). (2)

A spatial lag specification is characterized by the term g(y(τ ), δ), which represents obser-vations on the dependent variable in locations other than i, where δ is a so-called spatial autoregressive coefficient. In this thesis, we follow the empirical tax competition literature and assume that the observations on the dependent variable in neighboring locations can be

represented by a weighted average. That is, g(.)(τ ) = WNy(τ ), where WN is a so-called

(7)

representing the neighborhood set for each location. Loosely speaking, this implies that for i

6= j, Wij is non-zero for those locations considered to be neighbors of location i. The non-zero

elements Wij give a specific weight to location j to approximate its influence on the dependent

variable of location i. For example, in setting its taxes the government of region i does not only take into account the tax rate set by its next–door neighbor but also other regions in the

vicinity. Furthermore, Wii is assumed to be zero, that is, location i is not a neighbor of itself.

A large variety of different definitions for Wij are employed in the literature. Examples are:

adjacency, distance, border length, population, and GDP matrices.

The assumption that the conditional mean can be replaced by a weighted average of the dependent variable in neighboring locations ensures that δ is identified separately from η, this because the weighted average of neighbors ensures some cross–sectional variation in

WNy(τ ), this because not all jurisdictions in the sample are treated identically, while z(τ )

remains constant.

A spatial error specification is characterized by the assumption that the error components

of location i are correlated with the error components of neighbor j, that is E[uitujt] = σij,

where σij 6= 0 represents the covariance between jurisdiction i and jurisdiction j. Anselin

(2006) explains that: “unlike spatial lag models, spatial error specifications are typically not motivated by a theoretical economic model, but instead are formulated to deal with

data problems.” This quote highlights that examining the consequences of spatial error

correlation might be relevant even when economic theory does not consider this issue. This correlation results in the notion of spatial covariance, where the off–diagonal elements of

the variance–covariance matrix are non–zero and follow a spatial structure. The spatial

econometric literature employs various assumptions about the nature of the spatial structure. We follow KKP and Egger et al. (2005a, 2005b) by assuming in this thesis that the error structure is characterized by a spatially autoregressive process (SAR) between u(τ ) and a

weighted average of the error components of neighbors MNu(τ ),

u(τ ) = ρMNu(τ ) + ε(τ ), (3)

where ρ is a second spatial autoregressive coefficient, MN is a second spatial weights matrix,

and ε(τ ) is an i.i.d error term. Note that we allow for the possibility that WN 6= MN.

Combining Equations (3) and (2) and using the definition g(.)(τ ) = δWNy(τ ), we obtain

y(τ ) = δWNy(τ ) + ρMNy(τ ) − ρδWNMNy(τ ) + x(τ )β − ρMNx(τ )β + ε(τ ).

(8)

and identification of δ and ρ separately might become difficult. Again quoting Anselin (2006)

page 21: [When WN = MN] “there will be difficulty disentangling the role of δ and ρ. [From

the variance-covariance matrix of y(t)] it is clear that ρ and δ are completely interchangeable, suggesting that the same spatial covariance structure for y(t) can be obtained by a range of combinations of lag and error dependence.” Note that in a strict sense identification should be no problem. However, in an empirical application the high correlation between the different variables might result in the identification problems mentioned.

2.3 Estimation in the spatial econometrics model

After having introduced the particulars of the spatial econometrics literature we will now consider estimation of the model introduced by Equations (2) and (3). Spatial models are estimated by either maximum likelihood (ML) or the generalized method of moments (GMM). This thesis will focus attention on the GMM estimators. The choice for GMM follows directly from the methods used in the literature until now, as listed in Table 1, which are all special cases of the GMM estimator.

The preference for GMM over ML in the literature follows from the weaker assumptions underlying the GMM estimators compared to the ML estimator, these weaker assumptions

result in a lower computational burden.4 However, to quote Elhorst (2005): “the objection

to GMM from a spatial point of view is that this estimator is less accurate than ML (see Das, Kelejian, and Prucha (2003))”. The accuracy of the ML estimator is mainly due to a Jacobian term that the ML estimator takes into account and the GMM estimator does not. Incorporation of a ML estimator in this experiment is beyond the scope of this thesis, it seems

however an important direction for further research.5

In estimating Equation (2) applied researchers are confronted with the first of the problems mentioned by Breuckner (2003): the tax rates y(τ ) are determined simultaneously. This implies that appropriate instruments need to be used for instrumenting the spatially lagged

tax rates WNy(τ ). The next chapter will therefore adress instrument selection in the spatial

econometrics literature.

2.4 Instrument Selection

For the cross–sectional model introduced above, the spatial econometric literature suggests

basically two approaches for instrumenting WNy(τ ). Anselin (2006) mentions the approaches

of Kelejian and Robinson (1993) and Lee (2003). Using Equations (2) and (3) and assuming

4

A ML estimator relies on a full parametrization of the likelihood function for y, the derivation of this likelihood function is tedious. GMM relies completely on the existence of valid moment conditions, which are, as we will see in this thesis, readily available.

5See for a discussion of ML in a static model, Elhorst (2005a). Allers and Elhorst (2005) and Elhorst

(9)

g(.) = δWNy(τ ) and ρ = 0 the cross–sectional model to be considered is given by

y(τ ) = δWNy(τ ) + x(τ )β + ε(τ ). (4)

Both approaches to the selection of the optimal instrument set rely on the expected value of the spatially lagged dependent variable in Equation (4), that is

E[WNy(τ )] = WN(IN − δWN)−1x(τ )β, (5)

where IN is an identity matrix. Kelejian and Robinson’s (1993) approach to defining an

appropriate instrument set for the model in Equation (4), is based on an expansion of the inverse term in Equation (5). Assuming δ ≤ 1 and making use of the expanded inverse, Equation (5) can be rewritten as

WNy(τ ) =WN + δWN2 + δ2WN3 + ... x(τ )β,

where all variables included at the right hand side are potential instruments. From this

ex-pansion we define the set L ≡ {WNx(τ ), WN2x(τ ), ...} which denotes the total set of potential

instruments. This can be written as H1(τ ) = [l(τ ), x(τ )], where l(τ ) consists of the non–

multicollinear elements of the set L. With spatial error dependence (ρ 6= 0), the expanded

in-strument set becomes L ≡ {WNx(τ ), WN2x(τ ), ..., MNx(τ ), MNWNx(τ ), ..., MNWN2x(τ ),

...}.6

The second approach to defining an appropriate instrument set suggested by Lee (2003), uses the matrix

H2(τ ) = [WN(IN − ˆδWN)−1x(τ ) ˆβ, x(τ )],

where ˆδ and ˆβ are obtained from a first round regression using the instruments defined under

the first approach. Note that both instrument matrices are orthogonal to ε(τ ) due to the assumption of strict exogeneity of x(τ ).

In performing the simulation exercise, a third matrix, H3(τ ) = [H2(τ ), s(τ )],

outper-formed both matrices named above. This improved performance is the results of additional

moment conditions that are added to the matrix as compared to both H1 and H2. All three

the matrices are appropriate in solving the endogeneity problem identified by Breuckner (2003).

6This instrument set is also used by Kelejian and Prucha (1998) and therefore by all IV estimators listed

(10)

2.5 The KKP Method to Deal with Spatial Error Dependence

The second problem identified by Breuckner (2003) is the potential of spatial error dependence in the disturbance term. As mentioned above, Kapoor, Kelejian and Prucha (2007), suggest a three step estimation procedure that removes the spatial error dependence from the model (the KKP method). This chapter will briefly discuss this procedure. The procedure relies on obtaining a consistent estimate of the spatial autocorrelation coefficient (ρ). This estimate of ρ can be used to transform the model, which removes the spatial error dependence from the model. Ignoring spatial error dependence in panel data might leads to considerable loss in

estimation quality (cf. Baltagi, Egger and Pfaffermayr (2006)).7 The KKP method underlies

the estimation procedures discussed in Chapters 3 and 4. To start with, the model studied by KKP is given by

y = xβ + u, (6)

u = IT ⊗ [IN− ρMN]−1ε, (7)

where x is a T N ×q matrix of observations, β is a q ×1 vector of coefficients, and ⊗ represents the kronecker product. u follows the SAR process introduced in Equation (3). Furthermore,

KKP assume that ε = INµ+v, where µ is a N ×1 vector representing the (unobservable) unit–

specific effects, and v is an i.i.d error term. It is assumed that µ is distributed independently of the regressors, that is, we assume RE. µ and v are independently distributed with zero

mean and variances σ2µ and σ2v, respectively.

The variance–covariance structure of ε is crucial for estimating ρ, it is given by

E[itjs] =      σµ2+ σv2 if i = j; t = s σµ2 if i = j; t 6= s 0 otherwise . (8)

The estimation procedure leading to the SGLS estimator consists of three steps: (i) obtain an estimate of the coefficient vector; (ii) obtain an estimate of ρ; (iii) use the estimate of ρ to obtain an SGLS estimator.

STEP 1: A Consistent Estimator of β

OLS applied to Equation (6) provides a consistent estimator of β. This estimator allows us to calculate

ˆ

u = y − x ˆβOLS, (9)

7Estimation quality is measured by the Root Mean Squared Error, this will be discussed extensively in

(11)

where ˆβOLS is the OLS estimator of β.

STEP 2: A GMM Estimator of ρ

Using the structure of u given in Equation (7), KKP define six moment conditions that can

be used to obtain GMM estimators for the parameters ρ, σµ2 and σv2. In the following, the

moment conditions are introduced briefly.8

From Equation (3) it follows that

ε = u − ρ ¯u, (10)

¯

ε = ¯u − ρ ¯u,¯ (11)

where ¯u = (IT ⊗ MN)u and ¯u = (I¯ T ⊗ MN) ¯u. Based on these equations, the following six

moment conditions are defined

E            1 N (T −1)ε 0Q Nε 1 N (T −1)ε¯ 0Q Nε¯ 1 N (T −1)ε¯ 0Q Nε 1 Nε 0P Nε 1 Nε¯ 0P Nε¯ 1 Nε¯ 0P Nε            =            σv2 σ2vN1tr(MN0 MN) 0 σ12 σ21N1tr(MN0 MN) 0            , (12) where QN ≡  IT − JTT  ⊗ IN, PN ≡ JTT ⊗ IN, σ21 ≡ σv2+ T σ2µ, JT is a T × T matrix of unit

elements and tr(.) denotes the trace.

Note that QN and PN are both transformation matrices. The econometric literature

defines QN as the within–transformation and PN as the between–transformation. Now

rec-ognize that the first three conditions rely solely on the structure of vit, that is, QN removes

the unit–specific effect and only vit remains. In the last three conditions the RE nature of

the model is exploited, here PN removes the random component, and only the unit–specific

effect µi remains.

By substituting Equations (10)–(11) into the moment conditions of Equation (12), KKP

derive a non–linear system of six equations involving the parameters ρ, σ2v and σ12. The sample

8

(12)

analogue of this system is given by 1 N                2 (T − 1)uˆ 0Q Nuˆ¯ −1 (T − 1)uˆ¯ 0Q Nuˆ¯ 1 0 2 (T − 1) ˆ ¯ ¯ u0QNuˆ¯ −1 (T − 1) ˆ ¯ ¯ u0QNuˆ¯¯ tr(MN0MN) 0 2 (T − 1) h ˆ u0QNu + ˆˆ¯¯ u¯0QNuˆ¯ i −1 (T − 1)uˆ¯ 0Q Nuˆ¯¯ 0 0 2 ˆu0P Nuˆ¯ uˆ¯0PNuˆ¯ 0 1 2 ˆ¯¯u0PNuˆ¯ uˆ¯¯0PNuˆ¯¯ 0 tr(MN0MN) 2huˆ0PNu + ˆ¯¯ˆ u¯0PNuˆ¯ i ˆ ¯ u0PNuˆ¯¯ 0 0                      ρ ρ2 σ2 v σ21       = 1 N                1 (T − 1)uˆ 0Q Nuˆ 1 (T − 1)uˆ¯ 0Q Nuˆ¯ 1 (T − 1)u 0Q Nuˆ¯ ˆ u0PNuˆ ˆ ¯ u0PNuˆ¯ ˆ u0PNuˆ¯                . (13)

KKP use this system to define a consistent two-step GMM estimator for γ ≡ [ρ, σ2v, σ12]0.

In the first round, ρ and σv are estimated from the first three equations of this system, that

is, three equations in two unknowns. Subsequently, the fourth line of Equation (12) is used

to compute ˆσ21. In the second round, all six equations from the system are used to estimate

γ with a non–linear GMM estimator. This estimator uses the variance-covariance matrix of the moment conditions in Equation (12) as a weighting matrix. This weighting matrix is a function of ˆσ2v and ˆσ21.

STEP 3: A SGLS Estimator of β

The parameter ˆρ (the first element of ˆγ) is used to transform the variables in Equation (6).

The model corrected for spatial error correlation becomes ˘

y = ˘xβ + ε, (14)

where ˘r = [IT ⊗ (IN − ˆρMN)]r for r = {y, x}. Notice that the spatial error dependency

is removed as ˘u = ε. The model in Equation (14) is a RE model. An optimal estimator of

β can be obtained after transforming Equation (14) with a scalar multiple of the variance–

covariance matrix of ε.9 Recall from Equation (8) that the variance–covariance matrix of ε

is given by

Ωε= σ2vQN+ σ12PN.

Using σv as the scalar multiple, the following transformation matrix is obtained

σvΩ −1/2

ε = IN T − ηPN,

where η = 1 − σ2v/σ21. The SGLS estimator of β (βSGLS) is the OLS estimator of the

transformed model ˜

y = ˜xβ + ˜ε,

(13)

where ˜r = [IN T − ηPN]r for r = { ˘y, ˘x, ε}. The above procedure can be iterated in a process

where the βSGLS is used to obtain a new estimate of u, and Steps 2 and 3 are repeated until

(14)

3

Static Spatial Panel Data Models

The previous chapter has introduced the spatial econometric literature to identify the prob-lems encountered by applied researchers when estimating in a spatial model. Besides intro-ducing the problems, the chapter has explained how to solve the problems. In the following, Chapter 3 will focus upon estimation in a static spatial panel data model and Chapter 4 will focus upon estimation in a dynamic spatial panel data model.

The static spatial panel data model is the most common model studied in the tax com-petition literature. Although spatial error dependency might be important in estimating tax reaction functions, most authors (all except Egger et al. (2005a, 2005b)) do not correct for this. This chapter presents an estimation method that corrects for spatial error dependency in the static spatial model. Chapter 3.1 discusses an RE model and Chapter 3.2 discusses an FE model.

3.1 A Static Spatial RE Model with Spatially Correlated Disturbances

In this chapter, Equation (6) is extended with a spatial lag of the dependent variable. This extended model is similar to the RE specification used by Egger et al. (2005a), the model is given by

y = δWNy + xβ + u, (15)

u = IT ⊗ [IN− ρMN]−1ε. (16)

In the remainder of this thesis, this model will be designated by RE. Defining z ≡ [WNy, x]

and θ ≡ [δ, β]0, Equation (15) can be rewritten as

y = zθ + u. (17)

As mentioned earlier, the endogeneity problem mentioned by Breuckner (2003) implies that

E[z0u] 6= 0, resulting in an inconsistent OLS estimator. To correct for spatial error dependence

the estimation procedure follows the three steps identified by KKP. In the remainder of this thesis, we will refer to the estimator defined below as Spatial RE (SRE).

STEP 1: A Consistent Estimator of θ

Following the discussion in Chapter 2.3 and 2.4 we need to use an appropriate instrument

set to obtain a consistent estimator of θ. We use the N T × R instrument set H1,SRE ≡

[H1(1), .., H1(T )]0. Where the number of instruments R should be greater or equal to the

(15)

following condition to assure consistency of the estimator,

E[H1,SRE0 u] = 0. (18)

Using the instrument set H1,SRE, the two-step Panel GMM (PGMM) estimator is defined as

the value of θ that minimizes the objective function

ON(θ) =H1,SRE0 u

0

ARH1,SRE0 u ,

where AR is an R × R weighting matrix. The two-step nature of the estimator follows from

this weighting matrix. In the first step we use AR= [H1,SRE0 H1,SRE]−1, in the second step

we use AR= [ ˆS]−1 with

ˆ

S = H1,SRE0 u ˆˆu0H1,SRE, (19)

where ˆu is the estimated residual following from the first-step estimator of θ. Minimizing

ON(θ) with respect to θ gives the PGMM estimator,

θ1,SRE =z0H1,SREARH1,SRE0 z

−1

z0H1,SREARH1,SRE0 y. (20)

Plugging ˆθ1,SRE in Equation (17) and rewriting gives

ˆ

u = y − z ˆθ1,SRE, (21)

which is used in the following step to obtain an estimate of ρ. STEP 2: A GMM Estimator of ρ

In the second step consistent estimates of ρ, σv2and σµ2are obtained using ˆu from Equation (21)

and the moment conditions of Equation (13). The procedure used to obtain these estimates is similar to the one discussed in Chapter 2.5.

STEP 3: A SGLS Estimator of θ

In the third step, the estimate of ρ obtained in the second step is used to transform Equation (17) into

˘

y = ˘zθ + ε. (22)

(16)

leading to the transformed model ˜

y = ˜zθ + ˜ε, (23)

where ˜r = [IN T − ηPN]r for r = { ˘y, ˘z, ε}. The SGLS estimator is the PGMM estimator of

θ in Equation (23) and is defined as θ2,SRE = ˜z0H2,SREARH2,SRE0 z˜

−1 ˜

z0H2,SREARH2,SRE0 y,˜ (24)

where H2,SRE = [ ˜H3(t), ..., ˜H3(T )]0 making use of the availability of consistent estimates for

β, δ from ˆθ1,SRE and ρ from ˆγ. Furthermore, the tilde in the definition of H2,SRE denotes

that both the spatial and the random effects transformations are applied to the instrument

matrix. AN is defined equivalent to AN in Equation 20, with H2,SRE replacing H1,SRE.

The estimate obtained by this consistent estimator can be used in an iterative procedure

where ˆθ1,SRE in Equation (21) is replaced with ˆθ2,SRE. The newly obtained estimate of u can

be used to obtain new estimates of ρ, σ2v and σ2µ. The latter in turn lead to a new estimate

of θ2,SRE, and so on until convergence of θ2,SRE has been achieved. Notice that this is the

RE estimator used by Egger et al.(2005a).

3.2 A Static FE Model with Spatially Correlated Disturbances

This chapter introduces FE rather than RE, this is the second extension of the basic KKP model. There are two arguments to consider FE instead of RE: (i) when the unit–specific effects represent omitted variables, it seems reasonable to assumed that these are correlated with some of the other regressors in the model (results in FE), and; (ii) the units considered in a macro economic panel mostly include all units from a sample, that is, the cross–section can not be regarded as a random sample from a much larger population, that is, the unit–specific

effect it is not a randomly distributed variable.10

Recall from Chapter 2.5 that ε(t) = INµ + v(t). In a FE model, the elements of µ,

are assumed to be unobserved unit–specific effects that are potentially correlated with the regressors in z. This in contrast with the RE model, where the elements of µ where assumed to be unobserved unit–specific effects that are distributed independently of the regressors.

Consider the following model,

y = zθ + u, (25)

u = IT ⊗ [IN− ρMN]−1ε. (26)

10

(17)

In the remainder of this thesis, this model will be designated by FE. The potential corre-lation between z and µ results in inconsistency (the FE problem mentioned by Breuckner (2003)). There are three common solutions to this problem. The first solution is to include N indicator variables, one for each of the N fixed effects. The second solution is to apply the within–transformation, that is, subtract the average over time for each cross–section from each observation. The third solution is to first–difference the model, that is, subtract the one period lagged observation from each observation. All three solutions remove the unit–specific effect from the error–component, solving the inconsistency. A drawback of the first solution is that when N is large, a lot of the degrees of freedom are lost. As the within–estimator is more efficient than the first-differences estimator and the N fixed effects are not of direct interest (we are primarily interested in the γ coefficient), we will use the second option. The

within–transformation matrix, QN ≡  IT −JTT  ⊗ IN, is used to define y = zθ + u, (27)

where r = QNr for r = {y, z, u}. Again we follow the three steps identified by KKP to

correct for spatial error dependency.

STEP 1: A Consistent Estimator for θ

The two-step PGMM estimator using instruments H1,F E = QN[H1(1), ..., H1(T )]0 is a

con-sistent estimator for θ. Note that the QN matrix in this expression implies that the within–

transformation is also applied to the instruments. As the within–transformation takes for all variables averages over time for each cross–section, the instrument set must satisfy the strong exogeneity assumption, that is,

E[H1,SF E(s)ε(t)] = 0,

for t,s = 1, ..., T , to ensure consistency of the estimator. Analogous to Equation (20) this estimator is given by

θ1,SF E =z0H1,SF EARH1,SF E0 z

−1

z0H1,SF EARH1,SF E0 y,

where AR is defined equivalent to AR in Equation 20 with H1,SF E replacing H1,SRE.

Plug-ging ˆθ1,SF E in Equation (25) and rewriting gives,

ˆ

(18)

matrix suggested by Kelejian and Robinson (1993). We will refer to this estimator as IV in Chapter (5) which reports the simulation exercise.

STEP 2: A GMM Estimator for ρ

In contrast to the RE model, the distribution of the µ vector in the FE model is unknown. More specifically, we use the within–transformation to remove the variation that can be attributed to µ. For this reason the last three conditions, that are concerned with the structure of µ, do not hold in the FE model. Estimation of ρ therefore relies on the first three conditions

only.11 These first three conditions rely on the result that Q

Nε = v and provide three

equations in two unknowns, ρ and σ2v. Estimates of ρ and σv2 follow from non–linear GMM

estimation of this system.

STEP 3: A SGLS Estimator for θ

In the third step we use the estimate of ρ and perform a spatial transformation to the variables in Equation (27), which yields

˘

y = ˘zθ + ε,

where ˘r = [IN − ρMN]r for r = {y, z}. Again note that the spatial error dependency is

removed, θ can be consistently and efficiently estimated with a PGMM estimator using the instrument set H2,SF E = QN[ ˘H3(1), ..., ˘H3(1)]0, making use of the availability of estimates

for β and δ from ˆθ1,SF E and ρ from ˆγ,

θ2,SF E = ˘z0H2,SF EARH2,SF E0 z˘

−1 ˘

z0H2,SF EARH2,SF E0 y,˘

where AR is defined equivalent to AR in Equation 20 with H2,SF E replacing H1,SRE.

Anal-ogous to the models discussed in Chapters 2.5 and 3.1, θ2,SF E can be used in an iterative

process until θ2,SF E converges. θ2,SF E is the SGLS estimator used by Egger et al. (2005a,

2005b), we will refer to this estimator as Spatial FE (SFE) in Chapter 5 which reports the simulation exercise.

11

(19)

4

Dynamic Spatial Panel Data Models

This chapter extends the static spatial FE model of Chapter 3.2 with a one period time lag of the dependent variable

y(t) = λy(t − 1) + δWNy(t) + x(t)β + u(t),

where λ is the auto–correlation coefficient. When we define zd(t) ≡ [y(t − 1), WNy(t), x(t)]

and θd≡ [λ, δ, β]0, the dynamic spatial model can be written as

y(t) = zd(t)θd+ u(t), (28)

u(t) = [IN − ρMN]−1ε(t). (29)

In the remainder of this thesis, this model will be designated by DYN. Two endogenous

variables, y(t − 1) and WNy(t), enter the right hand side of the model which complicates

estimation considerably as both endogenous variables are highly correlated, this makes it

difficult to find appropriate (separately) instruments.12

It is important to mention that all standard panel data estimators are inconsistent if the

regressors include a time lag of the dependent variable.13 For example, when we account for

FE by applying the within–transformation, the transformed model is given by

y(t) = zd(t)θd+ u(t). (30)

When we now take the time lag of the dependent variable as an exogenous variable, the OLS estimator of this model is given by

θLSDV =z0dHLSDVBRHLSDV0 zd

−1

z0dHLSDVBRHLSDV0 y,

where HLSDV = [y(t − 1), HSF E,1] and BR= [HLSDV0 HLSDV]−1. Following Kiviet (1995)

and Cameron and Trivedi (2005) we will label this the least squares dummy variable (LSDV) estimator.

Now it is important to notice that the error term of Equation (30) is given by: u(t) = v(t) = (v(t) − v), where v is the average over time of v(t). From Equation (28), it follows that y(t) is correlated with v(t). Therefore, y(t − 1) is correlated with v(t − 1) and hence v.

This correlation implies that the LSDV estimator is inconsistent for finite T .14 Devereux et

al. (2007) suggest to use the second–lag of y(t) to instrument y(t − 1). However, as y(t − 2) is

12Peter Egger informed me that theoretically the estimator exists and is identified. Empirically, the issue

(20)

correlated with v(t), the estimator used be Devereux et al. (labeled LSDV in the simulation exercise) is also inconsistent.

The, to our knowledge, only article that investigates estimation of θdformally is Elhorst

(2007). However, Elhorst does not consider spatial–error dependence (he assumes u(t) =

µ + v(t)). Furthermore, he uses a ML estimator for θdbased on the unconditional likelihood

function of the model. Elhorst finds that a combination of the LSDV estimator and the ML

estimator gives a superior estimate of θd. The idea being that the LSDV estimator proves to

be a remarkable accurate (although inconsistent) estimator of δ and not of λ, whereas the ML estimator proves to be an accurate estimator of λ and not of δ. The combination, however, is fairly accurate. Besides the LSDV and ML estimator, Elhorst includes a GMM estimator, that is equivalent to the estimator used by Jacobs et al. (2007), in his simulation exercise. It is found that both the ML estimator and the combined LSDV/ ML estimator outperform the GMM estimator.

The remainder of this chapter discusses two consistent GMM estimators for the model in Equation (28). The first approach uses the GMM estimator suggested by Arellano and Bond (1991) and is equivalent to the estimator used by Jacobs et al. (2007). This first estimator is equivalent to the estimator used by Jacobs et al. (2007). In the simulation exercise we will name this estimator AB. This estimator relies on first–differencing the data to eliminate the

unit–specific effects from u(t).15 First–differencing the model in Equation (28) yields

y(t) − y(t − 1) = [zd(t) − zd(t − 1)]θd+ [IN − ρMN]−1[ε(t) − ε(t − 1)], (31)

∆y(t) = ∆zd(t)θd+ [IN − ρMN]−1∆v(t), (32)

where ∆r(t) = [r(t)−r(t−1)] for r = {y, zd, v}, recognize that u(t) = [IN−ρMN]−1[µ+v(t)].

The estimator defined by Arellano and Bond is equivalent to the two-step PGMM estimators used in Chapters 3.1 and 3.2,

θAB =∆zd0HABAF,ABHAB0 ∆zd

−1

∆z0dHABAF,ABHAB0 ∆y,

where the innovation of the approach suggested by Arellano and Bond to earlier PGMM

estimators is in AF,AB and the instrument matrix HAB.

AF,AB is define as an F × F weighting matrix (F stands for the total number of moment

conditions in the dynamic model). In the first step we use AF,AB = [HAB0 GHAB]−1, where

15Using the first–differencing operation to remove the unit–specific effect instead of the within–

(21)

G = K ⊗ IN and Kij =            2 if i = j −1 if i = j + 1 −1 if j = i + 1 0 otherwise

In the second step we use AF,AB = [ ˆS]−1 where

ˆ

S = HAB0 u ˆˆu0HAB.

For the instrument matrix HAB, Arellano and Bond suggest to use the levels of the

dependent variable (that is, y(t−2), ..., y(1)) as instruments for the lagged dependent variable in first–differences (that is, [y(t − 1) − y(t − 2)]). These instruments are appropriate as the instruments (y(t − 2), ..., y(1)), through serial–dependency, are correlated with the dependent variable in first–differences [y(t − 1) − y(t − 2)], but uncorrelated with the error term in first–differences ([v(t) − v(t − 1)]. To see this, recognize that y(t − 2) is correlated with v(t − 2), ..., v(0), but not with v(t) and v(t − 1). This leads to the moment condition

E[y(t − s)0∆v(t)] = 0, (33)

for t = 3, ..., T and s = 2, ..., T − 1. This moment condition relies on the so-called weak exogeneity assumption that the error term v(t) is uncorrelated with all observations prior to t. Equation (33) defines a potential of (T − 2)(T − 1)/2 instruments for the time lag of the dependent variable.

Besides the instruments for the dependent variable, we can use H1(t) containing the

instru-ments for the remaining variables. Taking both together, HAB(t) = [y(t−2), ..., y(1), H1(t)].

Note that, when more time periods become available, the size of HAB(t) increases with

one additional instrument per time period. In total HAB is block–diagonal with T − 2

blocks. The blocks are given by HAB(t) for t = 1, ..., T − 2, the total matrix contains

F = (T − 2)(T − 1)/2 + (T − 2)R instruments (where R refers the the number of instruments

defined for the static model to instrument the variables WNy(t) and x(t)). When assuming

T = 5, an example of the matrix is given by,

HAB =    y(1) H1(1) 0 0 0 0 0 0 0 0 0 y(1) y(2) H1(2) 0 0 0 0

0 0 0 0 0 y(1) y(2) y(3) H1(3)

(22)

In the following a second consistent estimator (DYN) will be defined. It must be noticed that estimator has not been used before. It makes use of the three steps identified in Chapter

2.5 to obtain an efficient estimate of θd in case of the presence of spatial error dependency,

and it makes use of an improved PGMM estimator suggested by Blundell and Bond (1998). In the following, each of the steps will be discussed.

STEP 1: A Consistent Estimator for θ

In the first step, we use the improved PGMM estimator suggested by Blundell and Bond (1998) to obtain an estimate of u(t). Blundell and Bond suggest to use an additional set of moment conditions. This set of moment conditions is based on the appealing insight that while Arellano and Bond (1991) propose to use the levels of the variables as instruments for the first–differenced variables, one can also use the first–differences of the variables as instruments for the variables in levels.

Blundell and Bond propose to use both the observations in levels and the observations in first–differences. To be more specific, they propose to stack the models in levels and first-differences in one overall regression model, this gives

yBL= zd,BLθd+ uBL,

where rBL= [∆r, r]0 for r = {y, zd, u}. Therefore, this model consists out of 2N T

observa-tions, this increased number of observations (and corresponding number of instruments) will increase estimation quality. The corresponding PGMM estimator is defined as

θ2,DY N =zd,BL0 H1,DY NAF,DY NH1,DY N0 zd,BL

−1

zd,BL0 H1,DY NAF,DY NH1,DY N0 yBL,

where H1,DY N is a block–diagonal matrix with two blocks. The upper–left block, Hf dif,

contains the instruments for the model in first-differences and the lower–right block, Hlevel,

contains the instruments for the model in levels. Following an appropriate choice of H1,DY N,

this yields a consistent estimator of θd. Note that the block–diagonal nature of H1,DY N

ensures that the instruments intended for the levels part of the equation do not interact with

the first–differenced variables and vice–versa. AF,DY N is an F × F weighting matrix defined

equivalent to ARin Equation (20) with H1,DY N replacing H1,SRE (F denotes the number of

dynamic moment conditions). The structure of H1,DY N ensures that also AF,DY N will be a

block–diagonal matrix.16

The choice of instruments in Hf dif is based on the moment conditions,

E[y(t − s)0∆v(t)] = 0, E[WNy(t − s)0∆v(t)] = 0, E[H10(t)∆v(t)] = 0,

16

(23)

and for Hlevel the instruments are based on,

E[v(t − s)0∆y(t)] = 0, E[v(t − s)0WN∆y(t)] = 0, E[v(t − s)0∆H1(t)] = 0,

for t = 3, ..., T and s = 2, ..., T − 1. These moment conditions are translated into instruments slightly different than the original Arellano and Bond instruments. We assume independency over T and combine for each s the instruments used by Arellano and Bond in one vector. Through this operation, the matrix is no longer block–diagonal. When assuming T = 5 examples of these matrices are

Hf dif =

 

y(1) WNy(1) 0 0 0 0 H1(1)

y(2) WNy(2) y(1) WNy(1) 0 0 H1(2)

y(3) WNy(3) y(2) WNy(2) y(1) WNy(1) H1(3)

  , and Hlevel =    ∆y(1) WN∆y(1) 0 0 0 0 ∆H1(1)

∆y(2) WN∆y(2) ∆y(1) WN∆y(1) 0 0 ∆H1(2)

∆y(3) WN∆y(3) ∆y(2) WN∆y(2) ∆y(1) WN∆y(1) ∆H1(3)

 .

The intuition behind this change is twofold. First the change results in instruments that are stronger correlated with the time lag of the dependent variable and result therefore in a better estimate of λ (although not reported, this is confirmed by the simulations performed). This stronger correlation comes at the expense of the number of instruments available, whereas Arellano and Bond obtain (T − 2)(T − 1)/2 instruments for the time lag of the dependent variable, the new approach allows for only (T − 2) instruments for both endogenous variables. However, when T grows, this costs might well become an advantage as the number of instru-ments following from Equation (33) grows exponentially, including numerous almost identical variables, leading to serious multi-collinearity problems.

Using Hf dif and Hlevel, we create H1,DY N. Using ˆθ1,DY N we obtain an estimate of u,

ˆ

u = y − zdθˆ1,DY N.

STEP 2: A GMM Estimator for ρ

Recognize that the model in Equation (28) is a FE model. This implies that, analogous to

Chapter 3.2, using ˆu and the first three lines from Equation (13) a consistent estimate of ρ

(24)

STEP 3: A SGLS Estimator for θ

Employing ˆρ, the SGLS estimator of θdis defined as

θ2,DY N = ˘zd,BL0 H2,DY NAF,DY NH2,DY N0 z˘d,BL

−1 ˘

zd,BL0 H2,DY NAF,DY NH2,DY N0 y˘BL,

where ˘r = IT⊗ [IN− ρMN]r for r = {yBL, zBL}. With respect to the choice of instruments,

H2,DY N is defined analogous to H1,DY N with H3 replacing H1 in the definitions of Hf dif

and Hlevel, respectively. Again, θ2,DY N can be used in an iterative procedure until θ2,DY N

(25)

5

Simulation

This chapter presents the results of the simulation exercise. After the description of the Monte Carlo experiment, the small sample properties of the different estimators are reported and the research questions will be answered.

5.1 Setup

The setup of the Monte Carlo experiment follows Baltagi et al. (2006) and Elhorst (2007). We report the small sample properties of the estimators on two differently generated data–sets. The first data–set is generated following the FE model given by Equations (25)–(26), and the

second data–set is generated following the DYN model given by Equations (28)–(29).17

In generating the data, the following procedure is applied: (i) generate the covariate matrix; (ii) generate the error component; and (iii) create the dependent variable. For the discussion that follows, it is important to make a distinction between the error component, u(t) in Equations (15),(25) and (28), and the random component of the error component v(t) in (t) = µ + v(t). The random component v(t) and the unit–specific component (µ) are both part of the error component u(t).

The first step in the DGP is the generation of the covariate matrix x. In both models (FE and DYN) we include only one exogenous variable in this covariate matrix, that is, x is an N T × 1 vector. Following Baltagi et al. (2007), the exogenous variable x is generated by xit= ςi+ zit with ςi ∼ i.i.d. U [−7.5, 7.5] and zit∼ i.i.d. U [−5.5], where ςi corresponds to the

unit-specific component of x, and vit corresponds to the random component of x. Both are

randomly generated from a uniform distribution (U ).

The second step in the DGP is the generation of the error component u. For the FE and DYN model, the unit–specific effect should be correlated with x. Several considerations lead

to the final specification of µ = sφς7.5. Each of the elements will be discussed briefly. In a DGP

following the FE model, we must model a correlation between x and µ to ensure that Spatial RE (SRE) estimator is inconsistent. Therefore, we include ς (which is the unit–specific part of x) in the expression that is used to generate µ. By changing the parameter φ, we can manipulate the importance of both the unit–specific and random component. An increase of φ leading to an increased proportion of the variance due to the unit–specific component. The division by 7.5 ensures that the unit–specific component is uniformly distributed on the [−1, 1] domain, hence both the unit–specific component and the random component have comparable absolute sizes. Finally, the unit–specific component is multiplied by s which regulates the

total size of the error–component.18

(26)

The third step in the DGP is the generation of the dependent variable y(t), and the

spatial- and time lag of the dependent variable: WNy(t) and y(t − 1) respectively. For the

FE model y(t) is generated according to the reduced form of Equation (25), that is

y(t) = [IN − δWN]−1x(t)β + (IN − ρMN)−1(µ + v(t)) , (34)

for t = 1, ..., T . The spatial lag of the dependent variable is obtained by multiplying y(t) with

the weighting matrix WN.

For the DYN model the DGP is a bit more complicated. Following Elhorst (2007), the process is given by

y(t) = [IN − δWN]−1λy(t − 1) + βx(t) + (IN − ρMN)−1(µ + v(t)) , (35)

for t = 2, ..., 100 and y(1) = µ. Afterwards, the first 100 − T observations are dropped such that we are left with the final T observations. The implicit assumption is that these final T

observations used are not significantly influenced by the initial values of y.19

We use two different weighting matrices: WN 6= MN. These matrices are characterized

by N = 48 as they are constructed from US state level data.20

In each of the tables in Section 5.2 below we use the following benchmark values for the parameters (with a short justification): (i) N = 48, this follows the choice of weighting matrix; (ii) T = 5, both the articles by Baltagi et al and Elhorst (2007) consider this time–span; (iii)

s = 5, this to ensure R2 ≈ 0.65; (iv) φ = 0.5, divide the error component equally in the

unit–specific component and the random component; (v) ρ = 0.3, substantially different from zero; (vi) λ = 0.3, substantially different from zero but not too large as for stationarity we require |λ| + |δ| ≤ 1, leaving some room for δ; (vii) δ = 0.5, substantially different from zero and different from ρ (to avoid identification problems) and not to large (this to satisfy the stationarity condition and to leave some room for λ), (viii) β = 1, standard in the literature. Following Elhorst (2007) all results are based on 1000 replications.

5.2 Simulation Results

We continue with reporting the results of the simulation exercise, that is, we will compare the performance of the different estimators on both models to draw conclusions regarding the three research questions formulated in the introduction of this thesis.

However, as this thesis is not intended to explore the exact reason for the inconsistency of the SRE estimator, we decided to keep things simple and adopt the specification for µ mentioned in the main text.

19The robustness of results with respect to changes in the initial values has been checked. 20

(27)

We measure the performance of the estimators with the Root Mean Square Error (RMSE). We follow KKP in defining the RMSE as

RM SE = " bias2+ IQ 1.35 2# ,

where the bias is defined as the difference between the median and the true value of the

parameter. IQ is the interquantile range and is defined as c1 − c2, where c1 is the 0.75

quantile and c2 is the 0.25 quantile. Note that for a normal distribution, IQ/1.35 comes close

to the standard deviation of the estimate.21 The RMSE combines the bias of the estimator

with a characteristic of the distribution of the estimates to judge the overall quality of the estimator. For example, an estimator might on average features a low bias (the median of the estimates is close to the true value) but have a high RMSE due to a large variability in the estimates (the estimates are relatively widely dispersed). Besides the RMSE also the bias will be reported.

In each of the tables below we report for each estimator and for each parameter both the bias and the RMSE (in square brackets). In each of the tables the left panel reports the performance for different time periods (T) and the right panel reports the performance for different values of the spatial autoregressive coefficient (ρ). In the following, each of the research questions will be answered separately.

Question 1: The performance of the dynamic estimators

This subsection answers the question: how do the performances of the AR, LSDV and DYN estimators compare to each other when the DGP follows the DYN model? Table (2) reports the findings. First it is important to recall that only the AB and the DYN estimator are theoretically consistent whereas the LSDV estimator is inconsistent. We will first discuss the left panel with variation in the time dimension. In the left panel we see that overall the DYN estimator outperforms the AB estimator. That is, DYN strictly outperforms AB with respect to the parameters β and δ (always a lower RMSE). With respect to the parameter β, AB reports a marginally lower RMSE. In accordance with the findings of Elhorst (2007), the inconsistent LSDV estimator reports a reliable estimate for δ (RMSE ≤ 0.070 for T > 10). Also the estimate of β by the LSDV estimator reports the smallest RMSE of the three estimators for T ≥ 10. Overall, for T ≤ 10 the DYN estimator seems the superior estimator. for T ≥ 10, the inconsistent LSDV estimator provides a good estimate of β and δ but not of λ.

(28)

values of ρ as it reports the lowest RMSE. Interestingly, there seems to be an upward trend in the RMSE when we increase ρ, this implies a lower efficiency when ρ increases. This is remarkable for the DYN estimator which is thought to eliminate the spatial error dependency problem. As an explanation it is important to understand the effect of an increase in ρ.

Notice that an increase in ρ increases the term (IN − ρMN)−1 in Equations (34) and (35).

This implies that, as the sizes of x(t), µ and v(t) remain unchanged, the relative size of the error component increases, implying relative more noise in the model. This makes estimation in the first round marginally more difficult, resulting in less accurate estimates of λ, δ and β in the first round and consequently a weaker optimal instrument which influences second round estimation.

Table 2: Performance of dynamic estimators

Time dimension (T) Spatial autoregressive parameter (ρ)

T 5 10 15 20 ρ -0.4 -0.2 0 0.2 0.4 AB AB λ -0.044 -0.044 -0.051 -0.054 λ -0.024 -0.027 -0.031 -0.038 -0.053 (0.083) (0.057) (0.057) (0.058) (0.066) (0.069) (0.072) (0.078) (0.089) δ 0.129 0.142 0.164 0.178 δ 0.042 0.059 0.080 0.110 0.153 (0.165) (0.153) (0.170) (0.182) (0.114) (0.117) (0.130) (0.150) (0.189) β -0.029 -0.034 -0.040 -0.043 β -0.011 -0.013 -0.017 -0.024 -0.035 (0.088) (0.056) (0.053) (0.053) (0.089) (0.085) (0.083) (0.086) (0.093) LSDV LSDV λ 0.015 -0.119 -0.083 -0.063 λ 0.018 0.020 0.017 0.016 0.009 (0.121) (0.157) (0.108) (0.083) (0.108) (0.106) (0.109) (0.117) (0.131) δ 0.008 -0.004 -0.002 -0.002 δ -0.008 -0.005 -0.002 0.004 0.012 (0.111) (0.070) (0.052) (0.047) (0.090) (0.092) (0.098) (0.105) (0.120) β 0.051 -0.005 -0.005 -0.004 β 0.059 0.056 0.052 0.052 0.051 (0.088) (0.043) (0.034) (0.029) (0.096) (0.092) (0.089) (0.087) (0.089) DYN DYN λ 0.029 0.030 0.029 0.030 λ 0.017 0.022 0.025 0.028 0.030 (0.059) (0.045) (0.041) (0.042) (0.054) (0.056) (0.058) (0.060) (0.060) δ -0.025 -0.040 -0.044 -0.045 δ -0.026 -0.028 -0.029 -0.027 -0.023 (0.068) (0.059) (0.056) (0.057) (0.061) (0.062) (0.065) (0.066) (0.069) β 0.048 0.042 0.039 0.040 β 0.042 0.045 0.047 0.049 0.049 (0.084) (0.061) (0.053) (0.052) (0.083) (0.083) (0.084) (0.085) (0.085) ρ -0.008 0.009 0.017 0.015 ρ 0.065 0.045 0.021 0.003 -0.022 (0.127) (0.079) (0.062) (0.055) (0.169) (0.161) (0.147) (0.135) (0.119)

Question 2: The performance of the static estimators

This subsection will answer the question: how do the performances of the IV, SFE and SRE estimators compare to each other when the DGP follows the FE model? Table (3) reports the results. Again first the left panel with variation over the time dimension (T) will be discussed. Recall that the SRE estimator is inconsistent when the DGP follows the FE model, this is reflected in the estimation quality: for β the SRE estimator reports a relative high RMSE (more than 5 times the RMSE of the SFE and FE model) for all choices of T.

(29)

more accurate in terms of a lower bias. From the right panel, which reports the effect of changing the parameter ρ, we see that the results mentioned hold for all values of ρ.

Table 3: The performance of the static estimators

Time dimesion (T) Spatial autoregressive coefficient (ρ)

T 5 10 15 20 ρ -0.4 -0.2 0.0 0.2 0.4 IV IV δ 0.016 0.006 0.004 0.002 δ 0.006 0.007 0.010 0.013 0.018 (0.094) (0.057) (0.049) (0.043) (0.080) (0.081) (0.083) (0.089) (0.100) β -0.005 -0.002 -0.002 -0.001 β 0.000 -0.001 -0.003 -0.005 -0.005 (0.059) (0.036) (0.030) (0.027) (0.065) (0.061) (0.059) (0.058) (0.060) SFE SFE δ 0.011 0.005 0.004 0.002 δ 0.009 0.009 0.009 0.011 0.014 (0.094) (0.059) (0.049) (0.042) (0.072) (0.076) (0.083) (0.090) (0.100) β -0.001 0.000 -0.001 0.000 β -0.002 -0.001 -0.002 -0.002 -0.001 (0.058) (0.034) (0.030) (0.026) (0.062) (0.062) (0.061) (0.059) (0.058) ρ -0.023 -0.004 -0.002 -0.004 ρ -0.013 -0.017 -0.020 -0.022 -0.027 (0.142) (0.094) (0.073) (0.066) (0.146) (0.152) (0.149) (0.142) (0.138) SRE SRE δ 0.015 0.013 0.013 0.010 δ -0.037 -0.025 -0.011 0.006 0.024 (0.049) (0.038) (0.034) (0.033) (0.055) (0.048) (0.045) (0.047) (0.053) β 0.211 0.200 0.190 0.181 β 0.220 0.219 0.217 0.214 0.209 (0.214) (0.202) (0.192) (0.183) (0.224) (0.222) (0.220) (0.217) (0.212) ρ -0.029 -0.025 -0.027 -0.026 ρ 0.052 0.029 0.005 -0.018 -0.041 (0.118) (0.078) (0.066) (0.060) (0.136) (0.128) (0.123) (0.119) (0.115) η η

Question 3: The consequences of model misspecification

This subsection will answer the question: what are the consequences of model misspecification, that is, the performance of the static IV estimator when the data generating process follows the dynamic model and the performance of dynamic estimators suggested by Jacobs et al and Devereux et al. when the data generation process follows the fixed effects model?

Tables (4) and (5) report the results when the DGP follows the DYN and FE model respectively. First the left panel of Table (4), with time variation will be interpreted. The performance of the optimal estimator in the DYN model (DYN) and the static estimator (IV) commonly used in the commodity tax competition literature are compared. Overall, we find that the IV estimator, besides omitting the time lag of the dependent variable, performs remarkably well. The IV even outperforms the DYN estimator with respect to estimating β for T ≥ 15. From the right panel, we infer that changing ρ has no important effect on the performance of both estimators. That is, for β, the RMSE remains approximately constant for both estimators. For δ, the RMSE features a mild increase for both estimators.

(30)

Table 4: The consequences of model misspecification: DGP follows the DYN model

T dimension Spatial autoregressive parameter (ρ)

T 5 10 15 20 ρ -0.40 -0.20 0.00 0.20 0.40 DYN DYN λ 0.029 0.030 0.029 0.030 λ 0.017 0.022 0.025 0.028 0.030 (0.059) (0.045) (0.041) (0.042) (0.054) (0.056) (0.058) (0.060) (0.060) δ -0.025 -0.040 -0.044 -0.045 δ -0.026 -0.028 -0.029 -0.027 -0.023 (0.068) (0.059) (0.056) (0.057) (0.061) (0.062) (0.065) (0.066) (0.069) β 0.048 0.042 0.039 0.040 β 0.042 0.045 0.047 0.049 0.049 (0.084) (0.061) (0.053) (0.052) (0.083) (0.083) (0.084) (0.085) (0.085) ρ -0.008 0.009 0.017 0.015 ρ 0.065 0.045 0.021 0.003 -0.022 (0.127) (0.079) (0.062) (0.055) (0.169) (0.161) (0.147) (0.135) (0.119) IV IV δ -0.037 -0.029 -0.019 -0.015 δ -0.055 -0.053 -0.049 -0.041 -0.035 (0.117) (0.079) (0.068) (0.059) (0.111) (0.109) (0.109) (0.112) (0.124) β -0.086 -0.047 -0.033 -0.025 β -0.082 -0.083 -0.084 -0.086 -0.086 (0.104) (0.061) (0.047) (0.038) (0.103) (0.104) (0.103) (0.104) (0.104)

estimator are larger than the bias and RMSE of the AB estimator). With respect to the parameter β, we see that the AB estimator outperforms the LSDV estimator (in all but one case (T = 20) the AB estimator has a lower RMSE than the LSDV estimator).

However the loss in estimation quality as a consequence of model mis-specification is sizable, the RMSE of both dynamic estimators is between 40 % till 90 % larger than the RMSE of the SFE estimator. With respect to the parameter δ the performance of the LSDV estimator is relatively good (the loss in RMSE with respect to the SFE estimator is at most 70 %). However, the RMSE of the AB estimator features, for T ≥ 10, more than a doubling compared to the RMSE of the SFE estimator.

(31)

Table 5: The consequences of model misspecification: DGP follows the FE model

Modify the T dimension Spatial autoregressive parameter (ρ)

T 5 10 15 20 ρ -0.40 -0.20 0.00 0.20 0.40 SFE SFE δ 0.011 0.005 0.004 0.002 δ 0.009 0.009 0.009 0.011 0.014 (0.094) (0.059) (0.049) (0.042) (0.072) (0.076) (0.083) (0.090) (0.100) β -0.001 0.000 -0.001 0.000 β -0.002 -0.001 -0.002 -0.002 -0.001 (0.058) (0.034) (0.030) (0.026) (0.062) (0.062) (0.061) (0.059) (0.058) ρ -0.023 -0.004 -0.002 -0.004 ρ -0.013 -0.017 -0.020 -0.022 -0.027 (0.142) (0.094) (0.073) (0.066) (0.146) (0.152) (0.149) (0.142) (0.138) AB AB λ -0.006 -0.005 -0.006 -0.006 λ -0.006 -0.006 -0.007 -0.006 -0.007 (0.057) (0.033) (0.026) (0.020) (0.052) (0.051) (0.053) (0.054) (0.057) δ 0.110 0.129 0.159 0.177 δ 0.035 0.047 0.066 0.091 0.137 (0.149) (0.140) (0.164) (0.181) (0.106) (0.109) (0.115) (0.135) (0.171) β -0.017 -0.023 -0.030 -0.034 β -0.006 -0.008 -0.012 -0.015 -0.022 (0.083) (0.051) (0.045) (0.045) (0.086) (0.082) (0.083) (0.083) (0.086) LSDV LSDV λ -0.019 0.208 0.186 0.184 λ 0.001 0.004 0.005 -0.014 -0.005 (0.578) (0.450) (0.476) (0.483) (0.593) (0.561) (0.560) (0.555) (0.560) δ 0.023 0.021 0.014 0.009 δ 0.003 0.006 0.011 0.019 0.036 (0.156) (0.091) (0.079) (0.069) (0.146) (0.142) (0.146) (0.152) (0.166) β 0.058 0.027 0.017 0.011 β 0.068 0.065 0.063 0.059 0.056 (0.109) (0.062) (0.051) (0.041) (0.125) (0.119) (0.117) (0.109) (0.109)

6

Conclusion

In this thesis we have investigated, using Monte–Carlo techniques, estimation in both static and dynamic spatial panel data models. Dynamic refers to a model that includes a time lag of the dependent variable and static refers to a model without a time lag of the dependent variable. Both the static and the dynamic models are characterized by a spatially lagged dependent variable and spatially autocorrelated disturbances. In the introduction we have formulated several research questions, below each is discussed briefly.

(i) How do, in the dynamic model, the performances of the estimators used by Jacobs et al. (2007) and Devereux et al. (2007) compare to the performance of the efficient estimator suggested in this thesis? We find that the efficient estimator outperforms the estimators suggested by Jacobs et al. and Devereux et al. in estimating the serial auto-correlation coefficient. Also in short panels, T ≤ 10, the efficient estimator suggested in this thesis seems the best alternative overall. However for T ≥ 15 we see that the inconsistent estimator used by Devereux et al. performs remarkably well.

(32)

mance of dynamic estimators suggested by Jacobs et al and Devereux et al. when the data generation process follows the static fixed effects model? We find that the performance of the IV estimator in the dynamic model is remarkably good (although it does not give us an estimate of the auto-correlation coefficient). However, when applying a dynamic estimator to a static model, we find a sizable (over 40 %) decrease in estimation performance.

Note that, although the specifications discussed in this thesis are intended to be applied to the tax competition literature, they are widely applicable to a more general field that studies the social–interactions between the units of investigation. Amongst others, Manski (1993) and Soetevent (2004) provide reviews of the social–interaction literature.

(33)

References

Allers, M.,andJ. P. Elhorst (2005): “Tax Mimicking and Yardstick Competition Among

Local Governments in the Netherlands,” International Tax and Public Finance, 12, 493– 513.

Anselin, L. (2006): Spatial Econometrics. Palgrave Macmillan, Basinstoke, 1st edn.

Arrelano, M., and S. Bond (1991): “Some Tests of Specification for Panel Data: Monte

Carlo Evidence and an Application to Employment Equations,” Review of Economic Stud-ies, 58, 277–297.

Blundell, R., and S. Bond (1998): “Initial COnditions and Moment Restrictions in

Dy-namic Panel Data Models,” Journal of Econometrics, 87, 115–143.

Breuckner, J. (2003): “Strategic interaction among governments:An overview of theoretical studies,” International Regional Science Review, 26, 175–188.

Cameron, C. A., and P. K. Trivedi (2005): Microeconometrics: Methods and

Applica-tions. Cambridge University Press, Cambridge, 2nd edn.

Das, D., H. H. Kelejian, and I. R. Prucha (2003): “Finite Sample Proporties of

Es-timators of Spatial Autoregressive Models iwth Autoregressive Disturbances,” Papers in Regional Science, 82, 1–26.

Devereux, M., B. Lockwood, and M. Redoano (2007): “Do countries compete over

corporate tax rates?,” Journal of Public Economics, 91, 451–479.

Egger, P., M. Pfaffermayr, and H. Winner (2005a): “Commodity taxation in a

’lin-ear’world: A spatial panel data approach,” Regional Science and Urban Economics, 35, 527–541.

(2005b): “An unbalanced spatial panel data approach to US state tax competition,” Economics Letters, 88, 329–335.

Elhorst, J. P. (2005): “Unconditional Maximum Likelihood Estimation of Linear and Log-Linear Dynamic Models for Spatial Panels,” Geographical Analysis, 37, 62–83.

(2007): “Maximum Likelihood Estimation of Dynamic Panels with Endogenous Interaction Effects,” mimeo, University of Groningen, Groningen.

(34)

un-Jacobs, J., J. Ligthart, and H. Vrijburg (2007): “Consumption Tax Competition Among Governments: Evidence from the United States,” IIPF Conference Paper 2007 Warwick.

Kanbur, R., andM. Keen (1993): “Jeux Sans Frontires: Tax Competition and Tax

Coor-dination when Countries Differ in Size,” American Economic Review, 83, 877–892.

Kapoor, M., H. Kelejian, and I. R. Prucha (2007): “Panel data models with spatially

correlated error components,” Journal of Econometrics, 140, 97–130.

Kelejian, H., and D. Robinson (1993): “A suggested method of estimation for spatial

interdependent models with autocorrelated errors, and an application to a country expen-diture model,” Papers in Regional Science, 72, 297–312.

Kelejian, H. H., and I. R. Prucha (1999): “A generalized moments estimator for the

autoregresive parameter in a spatial model,” International Economic Review, 40, 509–533. Lee, L. (2003): “Best spatial two-stage least squares estimators for a spatial autoregressive

model with autoregressive disturbances,” Econometric Reviews, 22, 307–335,.

Luna, L. (2004): “Local Sales Tax Competition and the Effect on Country Governments’,” Journal of the American Taxation Association, 26, 43–61.

Manski, C. F. (1993): “Identification of endogenous social effects: The reflection problem,” Review of Economic Studies, 60, 531–542.

Nielsen, S. B. (2001): “A simple model of commodity taxation and cross-border shopping,” Scandinavian Journal of Economics, 103, 599–623.

Rork, J. C. (2003): “Coveting thy neighbor’s taxation,” National Tax Journal, 56, 775–787. Soetevent, A. (2004): “Social Interactions and Economic Outcomes,” Phd thesis,

Referenties

GERELATEERDE DOCUMENTEN

In the case of foreign competition it is expected the inverted U shape has a short but steep positive effect, meaning for relatively low levels of competition innovation

Hypothesis 2: The long-term time horizon of the managerial decision-making will have a negative moderating effect on the relation between the level of competition and sustainable

Results fixed effects panel models explaining Productivity growth of the manufacturing sector Model Level MNF MNF MNF MNF MNF Dependent variable Product.. Interaction

Direct effects can be used to test the hypothesis as to whether a particular variable has a significant effect on the dependent variable in its own hospital, and indirect effects

The impact of neighboring jurisdiction size [Hypothesis 3] is measured by including the weighted average of neighboring states population.. Again, as a robustness check the labor

Nissim and Penman (2001) additionally estimate the convergence of excess returns with an explicit firm specific cost of capital; however they exclude net investments from the

The research has demonstrated that investigating management practices and activities influencing the effectiveness of organisations in Namibia, is a fruitful field in the

Two types of externalities can be considered: a production externality case where pollution is strictly proportional to the level of production in the region