• No results found

Crime and police : an investigation using the Split Panel Kackknife

N/A
N/A
Protected

Academic year: 2021

Share "Crime and police : an investigation using the Split Panel Kackknife"

Copied!
36
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Crime and Police:

an investigation using the Split Panel

Jackknife

Niek Houterman

Student number: 10846093 Date of final version: July 21, 2016 Master’s programme: Econometrics

Specialisation: Free Track Supervisor: dr. M.J.G Bun

Second reader: dr. J.C.M. van Ophem

(2)

Contents

1 Introduction 2

2 Crime and police 4

2.1 Instrumental Variables . . . 5

2.2 Generalized Method of Moments . . . 6

2.3 Natural experiments . . . 6

2.4 Vector Autoregression . . . 8

2.5 Comparison of methods . . . 9

3 Problems arising in Marvell & Moody (1996) 11 3.1 Nickell’s (1981) bias . . . 11

3.2 Incidental trends . . . 12

4 Addressing Nickell’s (1981) bias and the incidental trends problem 13 4.1 Bias corrected Fixed Effects . . . 13

4.2 Bootstrap . . . 14

4.3 Split Panel Jackknife . . . 15

4.4 Higher order Jackknife . . . 16

5 Empirical Results 17 5.1 Brief description of data . . . 17

5.2 Model setup . . . 17

5.3 Regression results . . . 18

5.4 Implied elasticities . . . 20

6 Monte Carlo simulation 21 6.1 Stationarity . . . 21

6.2 Monte Carlo setup . . . 22

6.3 Model A1 . . . 23

6.4 Model A2 . . . 24

6.5 Model B . . . 24

6.6 Simulation results . . . 25

(3)

7 Conclusions 30

(4)

CONTENTS 1

Abstract

This paper adds to the empirical results of Marvell & Moody (1996) by correcting for the Nickell (1981) bias using the Jackknife method recently proposed by Dhaene & Jochmans (2015). The Split- and Third Panel Jackknives are applied to the dataset of Marvell & Moody (1996). To gain better analytical understanding of their performance a Monte Carlo study is performed. Results presented here put additional confidence in the conclusions drawn by Marvell & Moody (1996). The impact of crime on the number of police is small, whereas the impact of police on the crime rate is negative and substantial. Moreover, simulation results suggest that the Jackknife decreases the bias, yet suffers from variance inflation. Standard errors of the Split Panel Jackknife are shown here to overestimate the true variance, resulting in size distortions of corresponding t-statistics. Third Panel Jackknife delivers accurate inference, however.

Acknowledgements

I would like to express gratitude to my supervisor, dr. Maurice Bun for his support, guidance and above all for the many interesting discussions we had. Because I wrote this thesis during full time employment, I am sure this process would have been frustrating and overwhelming without his counsel. In addition, I would like to thank my parents, friends and family for their continued interest.

(5)

Introduction

The relationship between the number of deployed police officers and the crime rate in a certain area is one that draws a lot of attention in econometrics literature. After all, it is a difficult question to answer. Criminals are expected, according to the utilitarian perspective of criminal behavior developed by Becker (1968), to avoid areas in which the probability of getting caught is high. One could therefore reason that the number of police officers patrolling the streets in any area has a negative impact on the local crime rate. However, anticipating this response, policy makers are likely to deploy police in areas where crime rates are highest, suggesting a positive relationship. This makes it very hard to empirically test the effectiveness of police since the decision to police is not exogenous to the amount of crime in that area. Numerous studies have sought to disentangle this relationship by using several econometric techniques. In a seminal paper, Levitt (1997) uses mayoral election cycles as instruments for police deployment and finds that an increase in policing reduces crime in an area. Other studies use natural experiments where some factor of policing power is exogenously changed. A prime example is a study performed by Draca, Machin & Witt (2011), who use the temporary increase in police deployment in London after the 2005 terror attacks. Their study shows that crime is reduced in boroughs where extra patrols are held. Marvell & Moody (1996) use the data of Levitt (1997) to investigate the relationship between crime and police with a Vector Autoregression (VAR) model. Setting up the problem in a time series environment allows the authors to conduct Granger causality tests, which show Granger causality in both directions, with the impact of police on crime being most substantial.

In particular, the VAR method used by Marvell & Moody (1996) is interesting because it makes the intertemporal relationship between the crime rate and policing levels explicit. After all, it takes time to train new police officers or change deployment strategies making analyses focussing on longer time horizons more useful for real policy decisions. Even though this makes Vector Autoregression an attractive method to answer the question of the relationship between crime and police, there are a number of problems that cast doubt on the conclusions presented in Marvell & Moody’s (1996) paper. One such problem is Nickell’s (1981) bias when fitting Fixed Effects (FE) models in dynamic panel data models. When T is small, the FE estimates

(6)

CHAPTER 1. INTRODUCTION 3

are substantially biased downwards. This puts Marvell & Moody’s (1996) conclusions into question. The credibility of their results are put into further doubt when considering a more recent paper by Philips & Sul (2006). They pointed out that inclusion of incidental (individual specific) trends will approximately double the bias of the FE estimator. Fortunately there are several bias correction methods that can address this issue.

This paper will investigate the usefulness of the Jackknife bias correction technique presented by Dhaene & Jochmans (2015) in addressing these problems. First, the results of Marvell & Moody (1996) will be reevaluated empirically by applying a Split Panel Jackknife (SPJ) and Third Panel Jackknife (TPJ) bias correction. This is of academic interest because it will add to the conclusions of Marvell & Moody (1996), giving policymakers more confidence in their decision on how to use police resources. Second, the effectiveness of the Jackknife bias correction method will be analysed using Monte Carlo simulations. The Jackknife has a particular appeal as bias correction method because Dhaene & Jochmans (2015) show that whenever the SPJ or TPJ is applied to an AR(1) model, variance is not inflated. The regressions run by Marvell & Moody (1996) differ from Dhaene & Jochmans’ (2015) analysis in two ways. First, Marvell & Moody (1996) run a bivariate VAR(2) model instead of an AR(1). Second, Marvell & Moody’s (1996) model includes incidental trends, which Phillips & Sul (2007) have shown to double Nickell’s (1981) bias. This paper investigates what effect these two features have on the performance of the Jackknife.

The rest of this paper is organised as follows. Chapter 2 reviews the literature sur-rounding the empirical work into the relationship between crime and police. Chapter 3 delves deeper into the problems of Marvell & Moody’s (1996) analysis. Chapter 4 presents the Jack-knife proposed by Dhaene & Jochmans (2015) that offers a potential solution to these problems. Chapter 5 presents the empirical analysis. Chapter 6 provides analytical (Monte Carlo) results. Chapter 7 concludes.

(7)

Crime and police

In order to understand the context of Marvell & Moody’s (1996) paper, this section will con-sider the literature surrounding the relationship between crime rates and police presence. For ease of understanding, this section will first present the basic question about how crime and police are related and how researchers have used different techniques in answering this question.

Criminals’ and policy makers’ respective strategies are to prevent being caught and to pre-vent crime happening altogether. These strategies are closely intertwined. Criminals might avoid areas with high numbers of police, while police will be deployed in areas where crime is highest. This makes decisive econometric analysis using simple techniques difficult. After all, merely comparing police presence with crime rates would suggest that the number of police in an area has a positive effect on the crime rate there. Estimating the effect of police numbers on crime rates naively would mean estimating the following equation:

cit= βpit+ x0itδ + εit (2.1)

where citis the number of crimes committed in area i at time t, pitis the number of police officers

in area i at time t, xitis a vector of control variables and εit is the error term. Applying OLS to

this specification leads to inconsistent estimates because pitwill be correlated with the error term

εit. Other reasons why inconsistency can occur is when we are confronted with measurement

error or omitted variables. Levitt (1998) shows that studies relying on reported crime rates to assess the effectiveness of police on the crime rate might understate the true effectiveness of police if reporting behavior is dependent on the amount of police. As an example, Levitt (1998) argues that people might be more inclined to report crimes if they believe the probability of the crime being solved is higher - which is likely to be true if policymakes decide to deploy substantially more police in an area. Levitt (1998) finds that this reporting bias is present in a number of datasets, but concludes that the overall effect is minimal.

(8)

CHAPTER 2. CRIME AND POLICE 5

2.1

Instrumental Variables

Instrumental Variables (IV) are an obvious tool to deal with endogeneity. The goal of IV is to obtain some form of exogenous variation in the endogenous variable and include that in the adjusted regression. In a well-known study, Levitt (1997) uses the effect mayoral elections have on police deployment in this way. He shows that the amount of police officers is higher in election years than in other years. Incumbent politicians are likely to increase public spending on social programs such as education and welfare programs as well as focus on police deployment. A likely reason for this could be that politicians try to boost their record in the run up to an election. Levitt argues that such increase in police numbers is exogenous to the amount of crime actually taking place in the respective city. Levitt employs a Two-Stage-Least-Squares (TSLS) method in the following way:

∆lncit= β1∆lnpit+ β2∆lnpit−1+ x0itδ + γt+ ηi+ εit (2.2)

where ∆ln indicates the average annual percent change in city i for crime or police over the years t and t + 1, xit denotes a vector of control variables including demographics, state and

local spending controls, and region and city dummies. Time trend effects are captured in γt

and ηi expresses city specific effects. The fact that city specific effects are apparent here might

suggest that prior to differencing, the model includes city specific trends, like in Marvell & Moody (1996).

In theory, if the instrument is relevant and exogenous, the estimates for the relation-ship between police and crime should be consistent. The instruments used here are novel and creative. Levitt’s instruments are correlated with the endogenous variable, number of sworn officers, but only mildly so: in election years, the number of officers increases with 2%. If in-struments are weak like they are here - they are likely to produce inconsistent estimates because the Jacobian condition is violated. Although the results presented in Levitt (1997) might seem reasonable, we cannot definitevely conclude that they represent the true relationship between crime and police. In fact, although the results hint towards the conclusion that the naive spec-ification of simply regressing police on crime levels is incorrect, the direct proof of the negative relationship between crime and police is lacking due to insignificance of the coefficient in this study.

Bun (2015) shows that the reliance on weak instruments can dramatically affect the precision of results. He does so by extending the analysis by Cornwell & Trumbull (1994) who use a mix of different offense types and per capita tax revenue as instruments. The authors define ’offense mix’ to be the ratio of crimes where there is ”face-to-face” contact (such as robbery, assault and rape) to those that do not. Cornwell & Trumbull (1994) find a lower impact of the criminal justice system on the crime rate compared to previous studies. Bun (2015) shows that these instruments are also very weak and extends Cornwell & Trumbull’s (1994) analysis by performing Generalized Method of Moments (GMM). Including lags of internal instruments make estimates of the impact of deterrence effects like police on crime dramatically more precise.

(9)

The next section will address GMM.

2.2

Generalized Method of Moments

Internal instruments can offer a good alternative to the often weak external instruments used. GMM uses internal instruments that are further lags of the dependent variable. Anderson and Hsiao (1981) first proposed this idea by using pi,t−2 as an instrument to pi,t−1. This is possible

because pi,t−2 is both relevant, being correlated with (pi,t−1− pi,t−2) and valid by not being

correlated with (εit− εi,t−1). This model can be easily extended beyond AR(1) by including

more lags of the dependent variable. Using lags as instruments this way is very convenient because they are readily available and therefore, like most typical instruments, do not have to be sought after beyond the dataset at hand.

But using only one lag as instrument is unnecessarily restrictive. Arellano & Bond (1991) propose to use multiple lags of the dependent variable as instruments. Consider the following specification:

cit = βpit−1+ x0itδ + ηi+ εit (2.3)

Removing individual specific fixed effects by taking first differences results in:

(cit− cit−1) = β(pit−1− pit−2) + (xit− xit−1)0δ + (εit− εit−1) (2.4)

So the OLS estimator is inconsistent when pit−1 is weakly exogenous. It will be correlated with

εit−1. The Arellano-Bond estimator is expressed as

ˆ βAB = " N X i=1 ˜ Xi,t−10 Zi ! WN N X i=1 Zi0X˜i,t−1 !#−1 N X i=1 ˜ Xi,t−10 Zi ! WN N X i=1 Zi0c˜i ! (2.5)

where ˜Xi is a matrix with tth row (∆pit−1, ∆x0it), ˜ci is a vector with tth row (∆cit) and Zi is a

matrix of instruments.

GMM is very useful, particularly because estimates are more precise than IV coun-terparts. Results in Bun (2015) show that IV standard errors are almost ten times larger than LSDV standard errors. That being said, although GMM is more precise than IV, their stan-dard errors are still larger compared to the original LSDV estimates they aim to correct. See for instance Monte Carlo results in Arellano and Bond (1991) and Kiviet (1995). This begs the question if there are even better techniques that offer a bias correction without the associ-ated variance inflation. In recent years, research has focussed on resampling methods such as Bootstrap and Jackknife which aim to do exactly that.

2.3

Natural experiments

Ideally, researchers conduct experiments that vary the level of one variable while keeping all other factors constant over a randomly selected treatment group. Unfortunately, such

(10)

experi-CHAPTER 2. CRIME AND POLICE 7

ments can usually not be conducted in social science due to ethical and moral objections. This would, for instance, mean deploying more police officers in specified areas without regard to the actual crime levels. This will unnecessarily put people at risk. Although such experiments with a clear scientific goal are very rare, natural experiments are more common. They offer a method to deal with simultaneity by appealing to some variation in the endogenous variable that is due to some unrelated event. There is a small number of studies that exploit such exogenous changes in police deployment. Draca, Machin & Witt (2011) use the increase in police deployment in the six weeks following the July 7th London terror attacks. They use a difference-in-difference on the model presented in the beginning of this chapter. Their approach is to assess the shift in crime rates before and after the deployment increase as follows,

(cit− ci(t−k)) = β(pit− pi(t−k)) + (xit− xi(t−k))0δ + (γt− γt−k) + (εit− εi(t−k)) (2.6)

where cit is the crime rate at time period t for borough i, pit is the level op police deployed

and Xit is a vector of controls, γt is a time fixed effect and k is the degree of differencing. The

authors also include borough specific fixed effects by including the term ηi which is dropped

after differencing and thus not shown in the equation above.

The authors use two jumps in the level op police presence: the increase after the attacks and the sudden drop after the operation was ended. There are a number of benefits to this method. Firstly, they do not rely on the merits of an instrument which may or may not be valid and could be very weakly relevant. In fact, the increase in police deployment after the London terror attacks had the prime goal of police visibility and crime deterrence. Police activity rose with 30% in the treatment period. This is indeed a much more dramatic change than the average of a 2% increase in Levitt (1997). This puts a lot more confidence in the conclusions drawn by Draca, Machin & Witt (2011). They find significant evidence that an increase in police reduces crime that is likely to be prevented by police visibility. Moreover, crime rates quickly rose back to their initial levels once police activity came back down again. It is the timing of the change in the crime rate that is the key indicator to a causal relationship. The fact that crimes dropped immediately once police activity was increased and rose right after they were dropped again suggests that criminals adapt their behavior according to the amount of visible police on the streets, confirming a negative relationship between crime and police presence.

In another study, Di Tella & Schargrodsky (2004) bypass the problem of simultaneity similarly by using the increased police deployment following a terror attack in the Jewish quarter in Buenos Aires. Subsequent to the attack, all Jewish institutions receive police protection. This is a similar increase in police visibility and deployment to the paper by Draca, Machin & Witt (2011). Di Tella & Schargrodsky (2004) also arrive at a similar conclusion: there is a large negative effect of police deployment on the crime rate. Specifically, Di Tella & Schargrodsky (2004) measure the effect of police presence on car theft. This effect is limited to the specific areas where police presence is increased.

(11)

CarT hef tit = α0SBPit+ α1OBPit+ α2T BPit+ Mt+ Fi+ εit (2.7)

where CarT hef tit is the amount of car thefts on block i in month t, SBPit(Same Block Police)

is a dummy variable that equals 1 for the months after the terror attack if there is a protected institution in the block, otherwise it equals 1. OBPit (One Block Police) is a similar dummy

variable that equals one in the months after the terror attacks if the block where the car theft took place is one block away from any protected institution. T BPit (Two Block Police) is again

similar, but now two blocks away. Mt is a month effect, Fi is a block fixed effect, εit is the

error term. The authors find that SBP is significant and negative, whereas OBP and T BP are insignificant. These results show that car theft declines by a dramatic 75% in blocks with a protected institution.

These studies put increased confidence in the effect (temporary) police deployment has on the crime rate. However, as mentioned, Natural Experiments are not always available, and it it worthwhile to investigate whether crime reacts similarly if police deployment is increased over a longer period of time at a city-level. After all, criminals might respond only to short-term visibility of police. Longer short-term programmes to expand the police force in a city might not have a similar effect. Due to the exogeneity of police deployment, these studies also do not make explicit the effect crime has on the amount of police deployed. After all, one of the central problems with investigating crime and police is that the relationship runs both ways. It is therefore of central importance to gain understanding of the reverse causal relationship as well. What is the effect of the crime rate on police? The next section focusses on Vector Autoregression, a technique that allows investigation on the relationship of crime and police from such an angle: over a longer period of time, while making explicit the relationship from crime to police as well.

2.4

Vector Autoregression

There exist numerous studies that use VAR analysis in uncovering the relationship between police and crime. Corman, Joyce & Lovitch (1987) use data of New York City to estimate the relationship between police and property-related felony crimes as well as a number of other vari-ables. Their approach is characterized by the use of several time series that span from 1970 to 1984. In their system of equations, they seek to answer two questions. Firstly, they investigate which of the variables cause one another and which are exogenous. They use Granger-causality tests to answer these questions. As a second focus, they investigate the magnitude of the response of each variable in changes in the others. This is done by using Impulse Response Functions. They fit a VAR(4) model with five variables: proportion of young males, unem-ployment rate, police per population, the arrest rate and the crime rate. Results suggest that arrests deter crime. Moreover, they find that the crime rate is sensitive to changes in police deployment, but that police deployment is less sensitive to the crime rate.

(12)

CHAPTER 2. CRIME AND POLICE 9

In a comprehensive study, Marvell & Moody (1996) cite numerous other papers and summerize the techniques used. The authors convincingly argue that the VAR model is the best procedure for addressing the problem of misspecification and endogeneity. The core of their claim is that the full impact of increased crime rates takes longer than a year due to the hiring and training of new officers. This contrasts the approach of Corman, Joyce & Lovitch (1987) who only considered lags up to 6 months. Marvell & Moody fit a bivariate VAR(2) model that will be used later in this paper:

cit= β11cit−1+ β12cit−2+ β13pit−1+ β14pit−2+ δ0xit+ ucit (2.8)

pit= β21cit−1+ β22cit−2+ β23pit−1+ β24pit−2+ δ0xit+ upit (2.9)

ucit= ηic+ λct+ γict + εcit (2.10) upit= ηpi + λpt + γipt + εpit (2.11) where xitis a vector of economic and demographic control variables, ηi is the city-specific effect,

λt is a time effect, γit are city specific (incidental) trends and εit is the error term. Results

show that Granger causality runs in both directions. There is a small impact of crime on police, which suggests the sluggishness of policy makers and the time it takes to hire new officers as hypothesized before. The impact of police on crime is negative and substantial.

2.5

Comparison of methods

In theory, the instrumental variables technique solves the problem of endogeneity completely. If the instrument is relevant and exogenous, the estimates for the relationship are consistent. However, in practice the vast majority of instruments are weak, rendering their application useless. Internal instruments techniques like GMM might improve precison compared to regular IV, as shown by Bun (2015). Natural Experiments are similar to instrumental variables in the sense that they offer a way to add exogeneity to the estimation. If this variation is strong, like the increase in police deployment subsequent to the London terror attacks investigated by Draca, Machin & Witt (2011), it offers a much more robust analysis than studies relying on instrumental variables that show only modest variation. An downside of this technique however is that this variation is often only of short duration and sudden effect. While it is interesting to investigate the instantaneous relationship between crime and police, it leaves the question unanswered of how these variables relate to each other over time.

All-in-all, the two main downsides of these methods are not shared by the VAR. Firstly, these methods only model a single-equation relationship between crime and police, while it is reasonable to suspect that the feedback from both variables runs via multiple time lags. A VAR makes these dynamics explicit. Second, these methods only investigate short-run or instantaneous effects. VAR makes it possible to investigate the relationship over a longer timespan simply by including more or longer time lags. It is therefore the preferred method

(13)

for investigating the longer term effects of crime and police. This being said, the VAR used in Marvell & Moody (1996) also has its shortcomings when considering the bias of the FE estimator. The following chapter delves deeper into this.

(14)

Chapter 3

Problems arising in Marvell &

Moody (1996)

3.1

Nickell’s (1981) bias

The results of Marvell & Moody (1996) are here put into question mainly because the number of time periods is not large enough. To briefly illustrate the problem before the literature of bias correction techniques is presented, consider the basic dynamic AR(1) model:

cit= βci,t−1+ ηi+ εit (3.1)

Here, cit could be interpreted as the crime rate in a certain area i at time t and xit would

include exogenous demographic factors in the same area at the same time. The individual specific effect, ηi, could be interpreted as a characteristic that is specific to city i that could

influence the crime-police relationship. The error term εit is considered to be independently

distributed. In principle, this model can be extended beyond AR(1), but it is used here for ease of exposition. The FE estimator eliminates ηi from the model. It follows from substraction of

the time-averaged model ¯ci = β¯ci−+ ¯εi from the original model:

cit− ¯ci= β(ci,t−1− ¯ci−) + (εit− ¯εi), (3.2)

where ¯ci = T1 PTt=1cit, ¯ci− = T1 PTt=1cit−1 and ¯εi = T1 PTt=1εit. So here the fixed effect ηi is

eliminated because it is time invariant. Using OLS estimation on this yields the FE estimator, ˆ βF E, ˆ βF E = " N X i=1 T X t=1

(ci,t−1− ¯ci−)(ci,t−1− ¯ci−)0

#−1"N X i=1 T X t=1 (ci,t−1− ¯ci−)(cit− ¯ci) # , (3.3) ˆ βF E = β0+ N X i=1 T X t=1 (ci,t−1− ¯ci−)(εit− ¯εi). (3.4) 11

(15)

Now, for consistency it is required that plim(N T1 )PN

i=1

PT

t=1(ci,t−1− ¯ci)(εit− ¯εi) = 0. This

however does not happen in practice, since cit is correlated with εit, so that ci,t−1is correlated

with εi,t−1 and hence with ¯εi. This implies that the regressor (ci,t−1− ¯ci−) is correlated with

the error (εit− ¯εi). This bias, however, can decrease in size dramatically if the term ¯εi becomes

very small, for example when T → ∞. However, in practice panels are usually quite short. Nickell (1981) proved this result by showing that the bias occurs in panel data with large N and small T . Asymptotically both N and T tend to infinity, removing the bias of the FE estimator. Nickell’s (1981) main result is presented below. Please note that it assumes assumes fixed T . After rewriting, Nickell (1981) shows under covariance stationarity:

plim N →∞ ( ˆβF E− β) = −(1 + β) T − 1 {1 − 1 T 1 − βT 1 − β } × {1 − 2β (1 − β)(T − 1)[1 − 1 T 1 − βT 1 − β ]} −1 . (3.5)

The complete proof and details of all steps can be found in Nickell (1981). For reasonably large values of T we then have

plim

N →∞

( ˆβF E− β) '

−(1 + β)

T − 1 . (3.6)

As Nickell (1981) notes, even for T = 10 and β = 0.5, the bias is -0.167, a very large bias indeed.

3.2

Incidental trends

Next to taking into account the dynamic panel data strucutre of the model in Marvell & Moody (1996), it is also necessary to consider the fact that the authors included city-specific trends. These variables, referred to as incidental trends from here onwards, have an impact on the size of the FE estimator’s bias. Including incidental trends changes the model analyzed by Nickell (1981) to

cit= βci,t−1+ ηi+ γit + εit (3.7)

where γit are the city-specific (incidental) trends. Philips & Sul (2006) showed in their paper

that the bias increases approximately twofold after inclusion of incidental trends. Following derivations in Phillips & Sul (2007), inclusion of incidental trends simply yields:

plim

N →∞

( ˆβF E − β) ' 2

−(1 + β)

T (3.8)

which is approximately twice as large as the bias orignating from the simple model excluding incidental trends. If we wish to uncover the true relationship between crime and police, it is vital to account for this effect. This paper investigates the ability of the SPJ and TPJ methods, which will be explained below, in adjusting this bias by comparing Monte Carlo results of regressions that includes and excludes incidental trends.

(16)

Chapter 4

Addressing Nickell’s (1981) bias and

the incidental trends problem

This chapter focusses on the different methods that have been presented in the literature that aim to correct Nickell’s (1981) bias. The Jackknife, recently suggested by Dhaene & Jochmans (2015) and as discussed in this chapter, will form the basis of extending the work done by Marvell & Moody (1996) later in this paper. We will briefly discuss other bias correction methods for sake of comparison.

4.1

Bias corrected Fixed Effects

Kiviet (1995) derived an analytical approximation of the formula of the small sample bias of the FE estimator based on the model expressed in equation 3.1. Kiviet shows in Monte Carlo simulations that the corresponding estimator has a smaller bias while maintaining high efficiency. In fact, its asymptotic variance is remarkably similar to that of the original FE estimator. Although the results of this approach would suggest a ’free lunch’, meaning a bias reduction without much variance inflation, they come under strong theoretical restrictions. Moreover, to estimate the bias, Kiviet (1995) requires knowledge of the unkown true parameter β0. Because

it is obviously not readily available, he proposes to use consistent method of moments estimates. This might be hard to do in practice since the small sample performance will rely heavily on the preliminary estimator that is used.

Bun & Carree (2005) derive a bias-corrected estimator that does not require the usage of an initial consistent estimator. An important advantage of Bun & Carree’s (2005) estimator is that it does not depend on the ratio of the variance of the individual-specific effects and the variance of the general error term. This is a downside that many GMM estimators have, as Bun & Kiviet (2006) show. Monte Carlo results show that Bun & Carree’s (2005) estimator performs very well, even in small samples. Suppose we have an ARX model, Bun & Carree (2005) express the bias of the FE estimators as follows,

(17)

βbias = −σ

2

εh(β, T )

(1 − ρ2xc−1c−12 (4.1)

δbias= −ζβbias (4.2)

where h(β, T ) is a positive function and ρ2xc−1 = σxc−1/σxσc−1 and ζ = σxc−1/σx2. The

bias-corrected estimators are found by an interative process in which the FE estimate of σε2 is used to find estimates of β and δ, whose estimates are then used to find the next step estimate of σε. This process is continued until convergence. They propose two other methods in which an

expression for the inconsistency of σε and an infeasible estimate of σ2ε respectively are used.

Bun & Carree (2005) perform Monte Carlo simulations to test the performance of the proposed bias correction method. In particular, the performance of their estimator is compared to that of the FE and GMM estimators, as well as a first-order version of Kiviet’s (1995) estimator. Under homoscedasticity, the bias corrected estimator is virtually unbiased, with a very small bias under time series heteroscedasticity.

In a subsequent paper Bun & Carree (2006) address this issue of heteroskedastic-ity. An expression of the bias is provided under heteroskedasticity and a corresponding bias corrected estimator is produced. Monte Carlo simulations are run both under cross-section het-eroscedasticity and time series hethet-eroscedasticity. Results show that the bias of the corrected estimator is very small and becomes virtually non-existent when T moves beyond 4.

4.2

Bootstrap

In general, Bootstrap procedures offer asymptotic refinement making them useful in uncovering the theoretical behavior of estimators. The Bootstrap is advantageous in practice because it is easier to implement than asymptotic theory. In essence, it replaces theoretical power with computer power. In the present case it offers a good alternative to the highly technical bias correctors that are reliant on analytical approximations of the FE estimator’s bias as discussed previously. Bootstrap methods allow for direct measurement of the estimator’s variance whereas other methods would have to deal with sample analogues that could be severely biased when the number of observations is low. Moreover, these approximations are often only valid under a particular theoretical regime with a set number of restrictions. If these restrictions do not hold however, the correction method cannot be applied. Resampling techniques such as Bootstrap may then offer a good alternative.

Everaert & Pozzi (2007) present a bias correction method based on an iterative Boot-strap procedure. Their aim is to produce a bias correction that has a lower asymptotic variance than the GMM procedure while being robust to violations of the assumptions that often have to hold in for the other bias correction methods to be valid. Monte Carlo replications are used to show that the estimator that results from this procedure is comparable to the ’bias-corrected FE estimators’ discussed previously. Its ability to correct the bias is comparable to Kiviet’s

(18)

CHAPTER 4. ADDRESSING NICKELL’S (1981) BIAS AND THE INCIDENTAL TRENDS PROBLEM15

(1995) and Bun and Carree’s (2005) estimators. However, the Bootstrap estimator’s asymp-totic variance is smaller in samples with small T. This is a very important result, especially because this is when the FE bias is most severe. Compared to Arellano and Bond’s (1991) GMM method, the bootstrap procedure produces better results both in bias correction and variance. This means that if bias reduction without significant variance inflation is the primary consideration, the Bootstrap procedure is preferred over the other methods.

4.3

Split Panel Jackknife

From the previous discussion, we know that the FE estimator is biased and inconsistent. Dhaene & Jochmans (2015) make the following derivations based on a FE AR(1) model like in equation 3.1. Generally, as T → ∞, the higher-order bias can be expressed as,

ˆ βF E − β0 = B1 T + o(T −1) (4.3) where B1 is a constant.

The Jackknife is a non-parametric method to remove the bias by obtaining an estimate of B1

T . The original FE estimator can then be corrected. Here, the Split Panel Jackknife

estimator is used as in Dhaene & Jochmans (2015). This Jackknife variation is particularly appealing to dynamic panel data structures as it keeps the time dependence structure intact. Dhaene and Jochmans propose to split the data structure into S sub-panels that are proper subsets of the original panel. This method ensures that each sub-panel consists of consecutive observations. The sub-panels are of size S ≤ Tmin, where S denotes the size of the sub-panels and

Tmin denotes the smallest T for which the estimator of β is defined. An estimate is calculated

for each sub-panel. For sub-panel s, we have the fixed effects estimator ˆβs. Combining all

estimates produces the overall Jackknife estimator:

ˆ βJ ackknif e= S T S X s=1 ˆ βs (4.4)

So if we have two sub-panels, we have the split panel jackknife estimator, say ¯β1/2,

¯ β1/2= 1

2( ˆβS1+ ˆβS2) (4.5)

We can now produce an estimate of the bias by

ˆ

βbias = ˆβJ ackknif e− ˆβF E (4.6)

Then the bias-corrected estimator becomes

ˆ

βcorrected = ˆβF E− ( ˆβJ ackknif e− ˆβF E) = 2 ˆβF E− ˆβJ ackknif e (4.7)

Although now ˆβbias removes the bias term B1, we are still left with the higher-order

(19)

is that the bias is itself estimated with a bias of order o(T−1). To account for the issue of the higher-order bias, the sub-panels ought to be defined such that the sample is split in two (potentially overlapping) parts. To give some context, in the supplementary appendix to Dhaene and Jochmans (2015), a proof is provided which makes clear that using half-panels minimizes the magnitude of these bias terms. Please see this paper for details.

4.4

Higher order Jackknife

The SPJ removes the dominant bias term from the FE estimator. Dhaene & Jochmans (2015) propose higher-order Jackknife procedures to remove the bias terms that remain. However, removing each term affects the size of the terms of higher order. It is therefore of interest to investigate how higher order Jackknives, such as the TPJ compare to the SPJ.

Higher-order bias terms are removed by combining weighted averages of subpanel estimators that correspond to the length of the relevant subpanel. As Dhaene & Jochmans (2015) show, the Third Panel Jackknife removes the first and second order bias if a1/2 and a1/3

satisfy 1 + a 1/2+ a1/2 T − a1/2 T /2− a1/3 T /3  B1= 0, (4.8) 1 + a 1/2+ a1/2 T2 − a1/2 (T /2)2 − a1/3 (T /3)2  B2= 0, (4.9)

Solving this system of equations gives a1/2 = 3 and a1/3 = −1. The third-panel Jackknife estimator is then

ˆ

β1/3= 3 ˆβF E− 3 ¯β1/2+ ¯β1/3 (4.10)

Dhaene & Jochmans (2015) assert that (higher-order) Jackknives do not inflate asymptotic variance for AR(1). That is,

N T ( ˜β1/G− β0) d

→ N (0, Σ−1) (4.11)

where G is the order of the Jackknife and Σ−1 is equal to the variance of the SPJ. To assess whether this result holds in the VAR(2) context of Marvell & Moody (1996), TPJ estimates are included in the chapters that follow.

(20)

Chapter 5

Empirical Results

The results of running a bivariate VAR(2) model on Levitt’s (1997) data will be presented here, similar to the approach by Marvell & Moody (1996). The coefficient estimates, standard errors and t-statistics of the FE, SPJ and TPJ will be compared in order to assess their performance and reliability.

5.1

Brief description of data

The same dataset as in Levitt (1997) and Marvell & Moody (1996) is used here. Please refer to these two sources for a more detailed description. The data are organised in a panel of 59 U.S. cities with observations spanning 24 years (1969-1992). The panel consists of data on crime, the police force and a number of control factors such as demographic composition of a city and amount of spending on certain goverment programmes. These factors are used here as control variables. The police variable is defined by the logarithm of the number of sworn police officers employed in each city. Levittt (1997) took crime data from the Uniform Crime Reports and include annual information regarding the number of crimes in the categories of murder, nonnegligent manslaugter, forcible rape, assault, robbery, burglary, larceny and motor vehicle theft. The crime variable is the logarithm of the sum of violent and property crimes divided by the population of the city.

5.2

Model setup

Please find below the results of running three different estimation techniques, FE (denoted by ˆ

βF E), the split panel jackknife ( ˆβ1/2) and the third panel jackknife ( ˆβ1/3). These are performed

on two different models: A and B. Model A is equal to the VAR(2) presented in equations 2.8-2.11. Model B is equivalent to Model A but excludes incidental trends, so referring to equations 2.8-2.11 the following is different in Model B:

ucit = ηic+ λct+ εcit (5.1)

(21)

upit = ηip+ λpt + εpit (5.2)

5.3

Regression results

Table 5.1 and 5.2 describe the coefficient estimates as well as the standard errors. The standard errors for the SPJ and TPJ are the averages of the standard errors of the respective split or third panel estimates. That is, ˆσ1/2= 12(¯σS1+ ¯σS2) and ˆσ1/3= 13(¯σT 1+ ¯σT 2+ ¯σT 3). This is in

line with Dhaene & Jochmans (2015).

Table 5.1 describes the results from running Model A. We can see that the Jackknife does in fact bring the value of the estimated first-order autoregressive coefficient upwards. This is to be expected as the FE is biased downward due to Nickell (1981). This result is particularly apparent regarding the first autoregressive lags. That is, the coefficient of cit−1in the regression

of cit as a dependent variable rises with almost 25% from 0.80 in the FE estimate to 1.01 in the

Split Panel Jackknife and 0.99 in the Third Panel Jackknife. The coefficient of pit−1 where pit

is the dependent variable, there is an increase of almost 50% from the 0.50 FE estimate to the 0.75 and 0.74 estimates of the Split Panel and Third Panel Jackknives respectively. From Table 5.2 summarizes the results of running Model B. Here we note a similar, yet less pronounced effect when we compare the FE results to those of the Jackknives. The first autoregressive lag of ct rises from 0.94 (FE) to 1.01 (SPJ) and 0.99 (TPJ). The first autoregressive lag of pt has a

value of 0.63 for the FE estimate whereas the Jackknife results show 0.74 (SPJ) and 0.70 (TPJ). Marvell & Moody (1996) concluded that the impact of crime on the number of police is small, whereas the impact of police on the crime rate is substantial. This conclusion is not significantly altered after correcting for the Nickell (1981) bias by means of the SPJ and TPJ. Police has a substantial negative effect on crime and crime has no substantial impact on police in the FE model. The values of the first autoregressive lags of both cit and pit are materially

higher in Model B than in Model A. The likely cause of this is that Model B does not take into account incidental trends. Part of the city specific time variation that is apperant in the data generating process is captured by the coefficients of the lagged variables of crime and police in Model B. Comparing the standard errors of both models, a clear loss in precision can be seen as we go from FE to SPJ and from SPJ to TPJ. This contrasts with Dhaene & Jochmans (2015) who found no increase variance in the AR(1) case.

(22)

CHAPTER 5. EMPIRICAL RESULTS 19

Table 5.1:

Model A coefficient estimates

dependent variable: crime dependent variable: police Model cit−1 cit−2 pit−1 pit−2 cit−1 cit−2 pit−1 pit−2

ˆ βF E 0.80 -0.23 -0.17 0.05 -0.04 0.09 0.50 0.00 ˆ β1/2 1.01 -0.23 -0.17 0.02 -0.04 0.12 0.75 0.07 ˆ β1/3 0.99 -0.28 -0.12 0.05 -0.00 0.17 0.74 0.03 ˆ σF E 0.029 0.029 0.036 0.036 0.024 0.023 0.030 0.029 ˆ σ1/2 0.044 0.042 0.056 0.056 0.035 0.034 0.044 0.044 ˆ σ1/3 0.060 0.056 0.074 0.076 0.047 0.044 0.058 0.061 Table 5.2:

Model B coefficient estimates

dependent variable: cit dependent variable: pit

Model cit−1 cit−2 pit−1 pit−2 cit−1 cit−2 pit−1 pit−2

ˆ βF E 0.94 -0.17 -0.15 0.11 -0.02 0.07 0.63 0.08 ˆ β1/2 1.06 -0.17 -0.17 0.12 -0.02 0.06 0.74 0.11 ˆ β1/3 1.05 -0.20 -0.23 0.11 -0.04 0.06 0.70 0.10 ˆ σF E 0.029 0.029 0.036 0.035 0.023 0.023 0.029 0.029 ˆ σ1/2 0.042 0.041 0.052 0.051 0.034 0.033 0.043 0.042 ˆ σ1/3 0.053 0.051 0.067 0.067 0.044 0.042 0.055 0.055

(23)

5.4

Implied elasticities

Because the coefficients in Table 5.1 and Table 5.2 are expressed as logarithms, they should be interpreted as elasticities. From the figures presented in these tables, we can calculate implied long term elasticities. For the estimates of Model A, we have the values for implied long run elasticities presented in Table 5.3. For the FE results, the long term elasticity of crime is calculated to be 0.10, that is [(−0.04 + 0.09)/(1 − 0.50 − 0.00)] = 0.10. So for every 10% increase in crime, governments add about 1% more police on average. For the long term elasticity of police in the FE model, this is −0.28. That is [(−0.17 + 0.05)/(1 − 0.80 − 0.23)] = −0.28. So for every 10% increase in police, crime drops by about 2.8% on average. The same calculation is made for the SPJ and TPJ models. We can see that the effect of crime on police (−0.68) and vice-versa (0.44) is stronger based on the SPJ results. The TPJ elasticities show a similar elasticity for police (−0.24) to the FE elasticity, but an even higher elasticity for crime (0.74). When we look at 5.4 we see a similar pattern in the direction of the elasticities, that is positive elasticities for crime and negative for police, yet the elasticity for crime is close to zero. The elasticity for police is more pronounced especially in the TPJ case. Considering these results, even though coefficient estimates might not differ so dramatically in Table 5.1 and Table 5.2 at first sight, long term elasticities are quite different. This suggests that the Nickell bias has a large impact on potential policy recommendations that are based on VAR analyses like the one performed in Marvell & Moody (1996).

Table 5.3:

Implied elasticities Model A

FE SPJ TPJ Long term elasticity for crime 0.10 0.44 0.74 Long term elasticity for police -0.28 -0.68 -0.24

Table 5.4:

Implied elasticities, model B

FE SPJ TPJ Long term elasticity for crime 0.06 0.05 0.02 Long term elasticity for police -0.17 -0.45 -0.80

(24)

Chapter 6

Monte Carlo simulation

This chapter compares the performance of the FE, SPJ and TPJ in a theoretical context. This investigation provides insight into the behavior of the Jackknife in an environment that is similar to the model in Marvell & Moody (1996). This chapter will address several topics: the performance of FE, SPJ, TPJ in terms of the size of bias, simulated standard deviations, estimated standard errors and an analysis on t-statistics.

6.1

Stationarity

The literature surrounding the bias correction of FE estimators in dynamic panel data assumes the data to be stationary. Dhaene & Jochmans (2015) note that the Jackknife corrections may be more sensitive to violations of the stationarity conditions than analytical correction methods. This may be due to the fact that the Jackknife splits the panel in two or more subpanels. When the dynamics of the panel differ from subpanel to subpanel, each corresponding subpanel estimate may be very different. This would then lead to a very poor estimate of the bias. In order to prevent such a situation, a check for nonstationarity is performed. Since we have a bivariate VAR(2) model, we can write the FE results of Model A as follows,

" cit pit # = " 0.80 −0.17 −0.04 0.50 # " cit−1 pit−1 # + " −0.23 0.05 0.09 0.00 # " cit−2 pit−2 # + " ucit upit # (6.1)

Which can be written as:

Yt= Φ1Yt−1+ Φ2Yt−2+ Ut (6.2)

Where Yt is a 2 × 1 vector with yit as its first element and xit second. Yt−1 and Yt−2 are the

first and second lags of these respectively. Rewriting using the lag-operator and ignoring the error term for now,

Φ(L)Yt= (In− Φ1L − Φ2L2)Yt (6.3)

The characteristic polynomial can then be defined as

Π(z) = (In− Φ1z − Φ2z2) (6.4)

(25)

To ensure this model is stationary, the roots of the polynomial |Π(z) = 0| have to lie outside the unit circle. Doing so produces two real roots and two complex roots:

r1 = −27.2034, r2= 2.2594, r3 = 1.6154 + 1.0013, r4 = 1.6154 − 1.0013 (6.5)

These all lie outside the unit cricle, so the bivariate VAR(2) that is the result of running the Marvell & Moody (1996) data on Model A is stationary. To check the stationarity of Model B (FE estimates), consider the following.

Φ∗1 = " 0.94 −0.15 −0.04 0.50 # , Φ∗2= " −0.17 0.11 0.09 0.00 # (6.6)

and the resulting roots:

r∗1 = −12.7462, r∗2 = 2.2703 + 0.6294i, r∗3 = 2.2703 − 0.6294i, r∗4 = 1.4278 (6.7) These all lie outside the unit cricle, so the bivariate VAR(2) that is the result of running the Marvell & Moody (1996) data on Model B is stationary. The Jackknife is therefore likely to give consistent estimates in the Monte Carlo study.

6.2

Monte Carlo setup

The Monte Carlo setup is chosen to correspond highly with the empirical context in Marvell & Moody (1996) in order to make cross comparison easier. Parameter values, the dimensions of T and N and the inclusion of the error terms are inspired by their respective values presented in the empirical part of this paper. FE estimates of Model A and Model B are chosen as input for the Monte Carlo procedure. The original data are used to provide input for the crime and police variables. Other data values are generated such to resemble their observed values as much as possible. The estimated values for each variable in the error term form the basis for each corresponding variable in the data-generating process (dgp) of the Monte Carlo simulation. A total of 1000 Monte Carlo replications are performed for each model. This procedure is performed three times:

• A1: The dgp is based on coefficient estimates of Model A

• A2: The dgp is based on coefficient estimates of Model A, but excludes the terms ˆγcit and ˆ

γipt

• B: The dgp is based on coefficient estimates of Model B

Both Model A2 and Model B will provide insight into how omission of incidental trends will affect the performance of the FE, SPJ and TPJ. Both approaches however omit these incidental trends in a different way. The downside of Model A2 is that it entails omitting the relevant

(26)

CHAPTER 6. MONTE CARLO SIMULATION 23

terms of the incidental trends from the first two observations, which one could argue departs from the empirical estimation performed earlier. Model B does not suffer from this criticism, but one could argue that the true dgp of crime and police as investigated by Marvell & Moody (1996) does include incidental trends. Omitting them from empirical measurement would there-fore likely influence the other coefficients, which is exactly what we see when we compare the estimated coefficients of Model A to those of Model B. For sake of completeness, both Model A2 and Model B are included in the results that follow. The two sections that follow describe how the Monte Carlo simulation is performed, the subsequent section summarizes the results.

6.3

Model A1

Here, cit and pit denote empirical observations from the Levitt (1997) dataset. Artificial data

are generated as follows:

1. Two starting values for c∗it and p∗it are chosen. To mimic the empirical sample as much as possible, they are chosen to be the observed values from the empirical Levitt (1997) dataset.

c∗i1= ci1 (6.8)

c∗i2= ci2 (6.9)

The same procedure is followed for the first two values of p∗it. For a few i, the first two observations are missing. In these cases, the earliest two values for which there are observations are used as starting values.

2. Subsequent values are created using the fitted values from the empirical regression. This results in the following data generating process:

c∗it = ˆβ11c∗it−1+ ˆβ12c∗it−2+ ˆβ13p∗it−1+ ˆβ14p∗it−2+ ˆδ1 0

xit+ u∗cit (6.10)

p∗it= ˆβ21c∗it−1+ ˆβ22c∗it−2+ ˆβ23p∗it−1+ ˆβ24p∗it−2+ ˆδ2 0

xit+ u∗pit (6.11)

where

u∗cit = ˆηci + ˆλct+ ˆγict + ε∗cit (6.12) u∗pit = ˆηip+ ˆλpt+ ˆγipt + ε∗pit (6.13) The residuals are generated according to:

ε∗qit → N (0, ˆσ2,qεit) (6.14) where q ∈ {c, p}

(27)

6.4

Model A2

For the Monte Carlo simulations that exclude incidental trends, the data is generated according to the following steps:

1. Two starting values for c∗it and p∗it are chosen. Here they are chosen to be the observed values from the empirical Levitt (1995) dataset minus the fitted values (Model A) of the incidental trends:

c∗i1= ci1− ˆγict (6.15)

c∗i2= ci2− ˆγit (6.16)

The same procedure is followed for the first two values of p∗it. This choice is made because, if the dgp in model A is the true data generating process, the two initial values for cit and

pit still include the incidental trends effects. Not substracting these creates a situation

where the two starting observations are substantially different from the predicted values further in the sample. For a few i, the first two observations are missing. In these cases, the earliest two values form the basis of the procedure.

2. Subsequent values are created using the fitted values from the empirical regression. This results in the following data generating process:

c∗it= ˆβ11c∗it−1+ ˆβ12cit−2∗ + ˆβ13p∗it−1+ ˆβ14p∗it−2+ ˆδ 0

xit+ u∗cit (6.17)

p∗it= ˆβ21c∗it−1+ ˆβ22c∗it−2+ ˆβ23p∗it−1+ ˆβ24p∗it−2+ ˆδ0xit+ u ∗p

it (6.18)

where

u∗cit = ˆηic+ ˆλct+ ε∗cit (6.19) u∗pit = ˆηpi + ˆλpt + ε∗pit (6.20) The residuals are generated according to:

ε∗qit → N (0, ˆσ2,qεit) (6.21) where q ∈ {c, p}

6.5

Model B

The data generating process for Model B is equal to that of A1, except it uses the fitted values of Model B in the empirical regression (Table 5.2). In Model A2, incidental trends are included in the empirical regression and excluded in the Monte Carlo data generating process. In Model B, they are excluded in both the empirical regression (Model A) and the Monte Carlo data generating process.

(28)

CHAPTER 6. MONTE CARLO SIMULATION 25

6.6

Simulation results

Table 6.1 gives the simulated coefficient estimates. That is, the average of all simulated coef-ficient estimates. From these results we can note a number of things. When we compare the models, we note that Model A1 has a much more severe bias for all coefficients compared to Model A2. Model B cannot be directly compared due to the difference in values of β0 for the

regression coefficients, but it is clear that the bias as a proportion of coefficient size is smaller than those in Model A1. These differences are to be expected and are due to the inclusion of incidental trends, as described in Philips & Sul (2007). Moreover, the SPJ and TPJ are not robust for the increase in bias due to incidental trends as their biases in model A1 are larger than those in A2. This is of interest when considering empirical applications of the SPJ and TPJ beyond the AR(1) model analyzed by Dhaene & Jochmans (2015).

When we compare the results of the FE, SPJ and TPJ we can note the following. In the vast amount of cases, the coefficients become less biased if we go from FE to SPJ to TPJ. This suggests that the Jackknife reduces the Nickell (1981) bias also in the context of this paper. It is however to be seen at what price has to be paid in order to reach such a correction. The remainder of this chapter will shed light into this question.

Table 6.1:

VAR(2) Monte Carlo coefficient estimates

dependent variable is cit dependent variable is pit

Model Stat cit−1 cit−2 pit−1 pit−2 cit−1 cit−2 pit−1 pit−2

A β0 0.800 -0.230 -0.170 0.050 -0.040 0.090 0.500 0.000 A1 βˆ∗F E 0.707 -0.265 -0.165 0.027 -0.040 0.110 0.395 -0.055 A1 βˆ∗1/2 0.861 -0.224 -0.164 0.058 -0.038 0.107 0.557 0.028 A1 βˆ∗1/3 0.793 -0.206 -0.146 0.070 -0.026 0.089 0.496 0.025 A2 βˆ∗F E 0.782 -0.225 -0.168 0.048 -0.038 0.093 0.479 0.005 A2 βˆ∗1/2 0.824 -0.203 -0.170 0.053 -0.039 0.082 0.523 0.029 A2 βˆ∗1/3 0.814 -0.195 -0.168 0.046 -0.038 0.073 0.516 0.033 B β0 0.940 -0.170 -0.150 0.110 -0.020 0.070 0.630 0.080 B βˆ∗F E 0.907 -0.161 -0.134 0.105 -0.007 0.066 0.618 0.081 B βˆ∗1/2 0.968 -0.145 -0.135 0.111 -0.006 0.056 0.679 0.096 B βˆ∗1/3 0.960 -0.125 -0.126 0.122 -0.002 0.054 0.677 0.111

(29)

Table 6.2:

VAR(2) Monte Carlo simulated standard deviations

dependent variable is cit dependent variable is pit

Model Stat cit−1 cit−2 pit−1 pit−2 cit−1 cit−2 pit−1 pit−2

A1 σˆ∗F E 0.029 0.026 0.035 0.035 0.024 0.022 0.030 0.028 A1 σˆ∗1/2 0.038 0.033 0.043 0.047 0.029 0.028 0.039 0.039 A1 σˆ∗1/3 0.066 0.054 0.070 0.076 0.048 0.045 0.067 0.065 A2 σˆ∗F E 0.020 0.017 0.027 0.022 0.017 0.014 0.022 0.018 A2 σˆ∗1/2 0.023 0.022 0.030 0.029 0.019 0.019 0.026 0.023 A2 σˆ∗1/3 0.038 0.038 0.042 0.047 0.028 0.031 0.037 0.038 B σˆ∗F E 0.024 0.022 0.017 0.016 0.019 0.018 0.014 0.013 B σˆ∗1/2 0.028 0.027 0.027 0.026 0.022 0.022 0.023 0.021 B σˆ∗1/3 0.044 0.044 0.050 0.052 0.032 0.036 0.044 0.040

Table 6.2 gives the simulated standard errors of the coefficients of ˆβ∗F E, ˆβ1/2∗ and ˆβ∗1/3. The variance in Model A1 is highest when we compare across estimates with Models A2 and B. Incidental trends therefore do not only result in an increase in bias, but that it also has an upward impact on the variance.

When looking at the performance of the FE, SPJ and TPJ across models, we note that the standard deviation of the coefficients are increasing when we go from the FE to SPJ to TPJ. This indicates that using the Jackknife as a bias correction method is not a loss-free exercise. This sheds different light on the results of Table 6.1 since although the Jackknife coefficient estimates correct for the bias apparent in the FE esimator, they themselves should be interpreted with care. This provides additional insight about the overall usefulness of the Jackknife bias correction method in more empirically relevant contexts. Here, the performance of the Jackknife is clearly different than the AR(1) case analysed by Dhaene & Jochmans (2015) who found no variance inflation. In addition, we can say that there is a dramatic increase in the standard deviations of the TPJ compared to those of the SPJ even though their coefficient estimates are not materially different as we have seen in Table 6.1. This is in line with Dhaene & Jochmans (2015), who suggested that the SPJ would be the preferred option for bias correction.

(30)

CHAPTER 6. MONTE CARLO SIMULATION 27

Table 6.3:

VAR(2) standard errors

dependent variable is cit dependent variable is pit

Model Stat cit−1 cit−2 pit−1 pit−2 cit−1 cit−2 pit−1 pit−2

A1 σˆF E 0.028 0.027 0.036 0.035 0.023 0.022 0.029 0.028 A1 σˆ1/2 0.043 0.041 0.055 0.054 0.035 0.033 0.044 0.044 A1 σˆ1/3 0.059 0.055 0.074 0.075 0.048 0.045 0.060 0.061 A2 σˆF E 0.020 0.017 0.027 0.022 0.017 0.014 0.022 0.018 A2 σˆ1/2 0.032 0.030 0.041 0.038 0.026 0.024 0.033 0.031 A2 σˆ1/3 0.042 0.041 0.053 0.051 0.034 0.033 0.043 0.041 B σˆF E 0.024 0.021 0.018 0.016 0.020 0.017 0.014 0.013 B σˆ1/2 0.037 0.035 0.036 0.034 0.030 0.029 0.030 0.027 B σˆ1/3 0.048 0.047 0.052 0.047 0.039 0.038 0.042 0.038

Table 6.3 provides the avarage of all 1000 estimated standard errors for each relevant coefficient. When we compare their values across Models A1, A2 and B we detect a similar pattern as we have seen in Table 6.2. The standard errors of the FE model are smallest, the SPJ standard errors are larger and those of the TPJ larger still. So when appplying SPJ and TPJ methods empirically, we are likely to note an increase in variance due to the estimated standard errors. This is what we have seen previously in this paper as well, see Table 5.1 and Table 5.2.

The standard deviations in Table 6.2 can be compared to those presented in Table 6.3 to see if there is any apparent over- or underestimation of the true variance, where we used the Monte Carlo standard deviation from Table 6.2 as a measure of the true standard deviation. Please see Table 6.4 for a comparison. The percentage difference is reported, where a positive number indicates overestimation. We note that for the FE model, the estimated standard errors are largely in line with the standard deviations, which is to be expected. However, we note a striking difference in the performance of the estimated standard errors for the SPJ and TPJ. The estimated standard errors for the SPJ dramatically overstate the variance by around 30 % in most cases. For the TPJ there does not seem to be a dramatic systematic over- or underestimation. When we compare the results across the models, we can note that Model A2 and B have a higher incidence of overestimation of SPJ and TPJ standard errors compared to Model A1. The main difference between these models are the exclusion of incidental trends in Models A2 and B. As we have seen in Table 6.2 and Table 6.3, Model A1 has a higher variance than Models A2 and B for FE, SPJ and TPJ. However, excluding incidental trends seems to somehow make the difference between the estimated standard errors and the standard deviations more pronounced. Monte Carlo results can at best be suggestive, so it is left for

(31)

Table 6.4:

Overestimation of standard errors, %

dependent variable is cit dependent variable is pit

Model Stat cit−1 cit−2 pit−1 pit−2 cit−1 cit−2 pit−1 pit−2

A1 ˆσF E 3% -4% -3% 0% -4% 0% -3% 0% A1 ˆσ1/2 13% 24% 28% 15% 21% 18% 13% 13% A1 ˆσ1/3 -11% 2% 6% -1% 0% 0% -10% -6% A2 ˆσF E -2% -2% -3% 0% 0% 1% 0% 0% A2 ˆσ1/2 39% 36% 37% 31% 37% 26% 27% 35% A2 ˆσ1/3 11% 8% 26% 9% 21% 6% 16% 8% B ˆσF E 0% -5% 6% 0% 5% -6% 0% 8% B ˆσ1/2 32% 30% 33% 31% 36% 32% 30% 29% B ˆσ1/3 9% 7% 4% -10% 22% 6% -5% -5%

future theoretical research to investigate why this difference occurs.

Table 6.5 shows the rejection frequencies of the two-sided t-test ˆβ = β0 at the 5 %

level. These results allow us to identify any potential size distortion that takes place. From the previous tables, we can infer if the size distortion is due to biased coefficients, or that the size distortion is due to the over- or underestimation of standard errors. From table 6.5 we infer that size distortion is particularly apparent in the first autoregressive lags. This is to be expected, since we know that their coefficients are heavily biased. Comparing the results across the FE, SPJ and TPJ results, we see that rejection frequencies are comparable. Across the three models, we see that Model A1 has the largest size distortion. This is the model that includes incidental trends. Therefore we ought to be careful in interpreting hypothesis test results especially when we know the Nickell bias (1981) and incidental trends are apparent.

(32)

CHAPTER 6. MONTE CARLO SIMULATION 29

Table 6.5:

Rejection frequencies for the two sided test ˆβ = β0

dependent variable is cit dependent variable is pit

Model Stat cit−1 cit−2 pit−1 pit−2 cit−1 cit−2 pit−1 pit−2

A1 F Eˆ 0.903 0.265 0.051 0.099 0.070 0.157 0.950 0.502 A1 1/2ˆ 0.286 0.018 0.012 0.022 0.028 0.053 0.223 0.062 A1 1/3ˆ 0.076 0.057 0.058 0.064 0.067 0.052 0.077 0.082 A2 F Eˆ 0.150 0.057 0.045 0.039 0.052 0.054 0.151 0.060 A2 1/2ˆ 0.050 0.077 0.005 0.013 0.012 0.020 0.055 0.099 A2 1/3ˆ 0.046 0.119 0.013 0.032 0.022 0.056 0.044 0.104 B F Eˆ 0.274 0.073 0.148 0.065 0.094 0.067 0.135 0.059 B 1/2ˆ 0.052 0.060 0.016 0.011 0.016 0.032 0.336 0.029 B 1/3ˆ 0.061 0.138 0.068 0.083 0.035 0.051 0.225 0.139

(33)

Conclusions

The goal of this paper was to improve on the empirical results of Marvell & Moody (1996) by correcting for the Nickell (1981) bias using the Jackknife method recently proposed by Dhaene & Jochmans (2015). To gain a better understanding of the intricacies of this correction, a Monte Carlo simulation study has been conducted. It was set up to correspond highly with the empirical context in order to allow for comparison.

The following statements can be made regarding the empirical application of the Split-and Third Panel Jackknives. Marvell & Moody (1996) concluded that the impact of crime on the number of police is small, whereas the impact of police on the crime rate is negative and substantial. This conclusion is not significantly altered after correcting for the Nickell bias by means of the Split- and Third Panel Jackknife. Police has a substantial negative effect on crime, and crime has no substantial impact on police. The only material difference between the bias adjusted results with the original Fixed Effects results is that the first autoregressive coefficients are increased; crime and police are more persistent when correcting for the the Nickell (1981) bias. All in all, since the empirical part of this paper cannot cast significant doubt into the direction of the results of Marvell & Moody (1996), we can put extra confidence in their overall result.

With regards to the analytical investigation on the performance of the Jackknife, this paper has shown that the inclusion of incidental trends creates an increase of Nickell’s bias even for the Split- and Third Panel Jackknife estimators. This suggests that these methods should be used with care when incidental trends are involved and provides an area of attention for future research on bias correction methods. The overall performance of the Split Panel Jackknife in terms of bias reduction is dramatic; in all cases reported here, the bias is smaller than those of the Fixed Effects coefficients. An interesting observation is that this bias is mostly positive, rather than negative suggesting ’overshooting’ the true value. The Third Panel Jackknife outperforms the Split Panel Jackknife in terms of bias correction. However, the Split Panel Jackknife and Third Panel Jackknife both suffer from variance inflation. This result is different from Dhaene & Jochmans (2015) who found no variance inflation in the AR(1) case. The variance inflation of the Third Panel Jackknife is much higher than that of the Split Panel

(34)

CHAPTER 7. CONCLUSIONS 31

Jackknife. This result shows that the Jackknife procedure proposed by Dhaene & Jochmans (2015) does in fact suffer from the common variance-bias tradeoff in the VAR(2) model that is analyzed here and increases as the order of the Jackknife increases. Isolating the effect of incidental trends leads to the conclusion that incidental trends contribute to a higher variance. When comparing the simulated standard deviations of the Jackknife estimates, it is clear that the estimated standard errors of the Split Panel Jackknife present a consistent overestimation. It is left for future research to analyze what the origin of this result is. When considering rejection frequencies of significance tests, results show here that size distortion is apparent in the tests on the first autoregressive lags, which is explained by the large bias in their coefficients. However, the size distortion of the Split Panel Jackknife is not materially different from those of the Third Panel Jackknife, suggesting that the overestimation of standard errors does not play a decisive factor in influencing hypothesis testing.

(35)

Literature

Anderson, T. W., & Hsiao, C. (1981). Estimation of dynamic models with error components. Journal of the American Statistical Association, 76(375), 598-606.

Arellano, M., & Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies, 58(2), 277-297.

Becker, C. (1968). Punishment: An Economic Approach. Journal of Political Economy, 76(2), 169-217.

Bun, M. J. (2015). Identifying the impact of deterrence on crime: internal versus external instruments. Applied Economics Letters, 22(3), 204-208.

Bun, M. J., & Carree, M. A. (2005). Bias-corrected estimation in dynamic panel data models. Journal of Business & Economic Statistics, 23(2), 200-210.

Bun, M. J., & Carree, M. A. (2006). Bias-corrected estimation in dynamic panel data models with heteroscedasticity. Economics Letters, 92(2), 220-227.

Bun, M. J., & Kiviet, J. F. (2006). The effects of dynamic feedbacks on LS and MM esti-mator accuracy in panel data models. Journal of Econometrics, 132(2), 409-444.

Corman, H., Joyce, T., & Lovitch, N. (1987). Crime, deterrence and the business cycle in New York City: A VAR approach. The Review of Economics and Statistics, 695-700.

Cornwell, C., & Trumbull, W. N. (1994). Estimating the economic model of crime with panel data. The Review of Economics and Statistics, 360-366.

(36)

CHAPTER 8. LITERATURE 33

Dhaene, G., & Jochmans, K. (2015). Split-panel jackknife estimation of fixed-effect models. The Review of Economic Studies, rdv007.

Di Tella, R., & Schargrodsky, E. (2004). Do police reduce crime? Estimates using the allocation of police forces after a terrorist attack. The American Economic Review, 94(1), 115-133.

Draca, M., Machin, S., & Witt, R. (2011). Panic on the streets of london: Police, crime, and the july 2005 terror attacks. The American Economic Review, 101(5), 2157-2181.

Everaert, G., & Pozzi, L. (2007). Bootstrap-based bias correction for dynamic panels. Journal of Economic Dynamics and Control, 31(4), 1160-1184.

Kiviet, J. F. (1995). On bias, inconsistency, and efficiency of various estimators in dynamic panel data models. Journal of Econometrics, 68(1), 53-78.

Levitt, S. D. (1997). Using electoral cycles in police hiring to estimate the effect of police on crime. The American Economic Review, 270-290.

Levitt, S. D. (1998). Why do increased arrest rates appear to reduce crime: deterrence, in-capacitation, or measurement error?. Economic Inquiry, 36(3), 353-372.

Marvell, T. B., & Moody, C. E. (1996). Specification Problems, Police Levels, And Crime Rates. Criminology, 34(4), 609-646.

Phillips, P. C., & Sul, D. (2007). Bias in dynamic panel estimation with fixed effects, inci-dental trends and cross section dependence. Journal of Econometrics, 137(1), 162-188.

Referenties

GERELATEERDE DOCUMENTEN

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden Downloaded from: https://hdl.handle.net/1887/4410.

Jan Willem Gort Gezondheidscentrum Huizen, directeur Laurence Alpay Hogeschool Leiden, onderzoeker Arlette Hesselink Hogeschool Leiden, onderzoeker Jacqueline Batenburg

Euler Script (euscript) font is used for operators... This paper is organized as follows. In Section II the problem statement is given. In Section III we formulate our least

Euler Script (euscript) font is used for operators... This paper is organized as follows. In Section II the problem statement is given. In Section III we formulate our least

Since the Veiligheidsmonitor is not specifically designed to study the willingness to notify the police and to report crimes, several other characteristics of offenses

The draft decision dated 1 st March 2018 assumes that 50/50 entry-exit split allows transmission system operators (TSO) to collect revenues via the entry and exit network

Information on the above-mentioned subjects was collected through interviews (partly face-to-face, partly by telephone) with respondents from all police forces, the National

konden de kaarten niet geaccepteerd worden, die zijn nu bij de secretaris, R.. Er is opnieuw een schoning van debibliotheek uitgevoerd, dit in samenwerking met de