• No results found

The effect of light trucks on traffic safety measured by different models

N/A
N/A
Protected

Academic year: 2021

Share "The effect of light trucks on traffic safety measured by different models"

Copied!
47
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The effect of light trucks on traffic safety measured

by different models

Niels Groot July 21, 2015

Summary

The strong growth of the light truck segment is one of the most significant changes in the auto industry of the United States during the last decades. This paper will investigate the effect of light trucks on the traffic safety by using three different models, namely the Tobit, Heckman and Cosslett models, and try to choose the optimal one. The results suggest that the Heckman or Cosslett model are the best in measuring the effect They measure that occupants of light trucks in two-vehicle crashes face smaller risks than those in cars and higher risk in single-vehicle crashes. Also the marginal effects of the second vehicle being a light truck in two-vehicle crashes or the vehicle being a light truck in single-vehicle crashes are positive, so that they increase the fatalities.

1

Introduction

In the last decades drivers have been running an ’arms race’ on American roads by replacing cars with sport utility vehicles (SUVs), pickup trucks and passenger vans. After that, they replaced these vehicles with even larger SUVs and even heavier trucks, including the tank-like Hummer (White, 2004). The market share of the light trucks increased from 17% to about 50% from 1981 to 2006 (Li, 2012). An important reason for this increase in light trucks might be that families tend to see them as providing a better self-protection in case of a crash or accident.

In the recent years, several studies investigated the effect that light trucks have on the traffic safety. Examples are Gayer (2004) using IV es-timation, White (2004), Anderson (2008) using a state-level panel data set with an accident-level micro-data set, and Li (2012) using a Tobit model. These studies have documented that in multiple-vehicle crashes light trucks

(2)

provide superior protection to their occupants while posing greater dangers to the occupants of the colliding cars. The authors think this result is largely due to the design mismatch between light trucks and passenger cars. Light trucks, particularly SUVs and pickups, are generally taller and have higher front ends. On the other hand, in the single-vehicle crashes the results mostly show that light trucks are less safe for their occupants than passenger cars. This is likely due to the fact that light trucks have a higher center of gravity and therefore have a larger tendency to roll over in accidents.

This paper will also estimate the effect from light trucks on traffic safety using the same data as used by Li (2012). The results Li got using his Tobit model are for clarity given in Table 1 on the next page. In this table v1 stands for the first vehicle in the crash and v2 stands for the second vehicle in the crash. This Tobit model however has a possible disadvantage. It supposes that the process that underlies the traffic injuries or fatalities is the same for all observations. Therefore, after replicating as close as possible the results from Li (2012), I will add a Heckman (1979) model where the process is split into two sections. The first for whether or not we have a injury and the second for the severity of the injuries. Then by comparing I will try to choose which model performs the best. But due to some strange results from this Heckman model, I will introduce a semiparametric model, namely the Cosslett (1991) model. And when the estimations for this model are done, I will compare its results with the Heckman model.

The remainder of this paper is organized as follows. First I will de-scribe the data that is used. Then in section 3, I will replicate the results for the Tobit model from Li (2012) as close as possible and calculate its equiva-lent fatalities, marginal effects and do some robustness checks. Section 4 will then be about the Heckman model and its results. After that I will compare both models in section 5 and then introduce the Cosslett (1991) model and its results in section 6. In section 7 I will compare the Heckman and Cosslett model and in section 8 there will be a conclusion.

(3)

Table 1: Tobit results from Li(2012)

v1 = passenger car v1 = light truck single vehicle

para SE para SE para SE

(1) (2) (3) (4) (5) (6)

v2 = light truck .031* .006 .028* .006

Single = light truck .060* .008

Small city -.064* .008 -.056* .009 -.098* .012 Medium city -.067* .009 -.073* .011 -.119* .015 Large city -.081* .007 -.067* .008 -.101* .010 Seat belt -.012* .007 -.031* .008 -.126* .011 Rain -.016* .008 -.007 .010 .007 .011 Snow -.049* .021 -.048 .024 -.234* .023 Dark .019* .006 .028* .007 -.046* .008 Weekday -.014* .006 -.013 .007 -.020* .008 Interstate highway -.048* .010 -.002 .010 .079* .012 Divided highway .057* .006 .036* .006 .098* .011 Alcohol (v1) .062* .014 .068* .017 .183* .012 Drugs (v1) .027* .010 -.006 .010 .057* .012 Age <21 (v1) .017 .011 -.005 .014 .075* .015 Age >60 (v1) .052* .008 .063* .011 .076* .016 Male driver (v1) -.037* .006 -.033* .006 -.027 .009 Young male (v1) -.040* .015 -.009 .019 -.037 .019 Occupants (v1) .037* .004 .023* .003 .052* .004 Speeding (v1) .260* .026 .206* .042 .307* .018 Alcohol (v2) .074* .013 .091* .014 Drugs (v2) -.025* .009 .010 .009 Age <21 (v2) -.004 .010 .008 .011 Age >60 (v2) .032* .008 .029* .009 Male driver (v2) .018* .006 .004 .006 Young male (v2) .012 .013 .004 .014 Occupants (v2) .007* .003 .010* .004 Speeding (v2) .187* .031 .093* .026 Intercept -.706* .033 -.601* .042 -.780* .029 σ .304* .014 .260* .018 .452 .015 Observation 76586 59733 48813

Estimates taken from the Li (2012) paper.

Note: Similar explanatory variables are used in White(2004). Year dummies (8) are included in all regressions. The omitted category for vehicle type is passenger car. The base group for city size is rural area. Alcohol is 1 if the driver is found under influence of alcohol. Young male is 1 if the driver is male and younger than 21. Speeding is 1 if the travel speed is 10 miles per hour above the speed limit. Occupants is the number of occupants in the vehicle.

(4)

2

Data

The dataset I will use in this paper is from the General Estimate System (GES) database and is maintained by the National Highway Traffic Safety Administration (NHTSA). This is the same database as used in the paper of Li(2012). The information in this dataset is obtained from car accident reports of about 400 police agencies across the United States of America. In these accident reports there is detailed information about the vehicle types involved in the accidents, the number of persons and their injuries and the circumstances in which the accident took place (e.g. the weather type dur-ing the accident, the type of city it took place or whether the driver of a vehicle has been drinking). I will use this data to examine the safety effects of different types of vehicles to their own occupants and the occupants in colliding vehicles, while controlling for the set of conditions. I will only use the accident reports from 1998 up to and including 2006, which is the last year of data available to me.

Also, I only use single-vehicle and two-vehicle accident reports in-volving passenger cars and/or light trucks. Light trucks in this paper are defined as sport utility vehicles (SUVs), pickup trucks and passenger vans. There are 76586 passenger cars involved in two-vehicle crashes, from which 0.55 percent with fatalities and 8.03 percent with occupants suffering inca-pacitating injuries. With light trucks there are 59733 two-vehicle accident reports, from which 0.29 percent with occupants sustaining fatalities and 5.60 percent with occupants suffering incapacitating injuries. In total I have 48813 single-vehicle accidents, with 27911 of these involving passenger cars and 20902 involving light trucks. For those 27911 involving cars, from which 2.26 percent with fatalities and 15.26 percent with incapacitating injuries. For those 20902 involving light trucks, from which 2.68 percent were with fatalities and 16.10 percent with incapacitating injuries.

The descriptive statistics of the variables, mostly dummies, are given in Table 2 below. The first three variables are about the place where the ac-cident happened. The omitted category is here the acac-cidents that took place in rural areas. The variable seat belt is 1 if the driver of that vehicle wore a seat belt. The variables rain, snow and dark are 1 if it was raining, snowing or dark when the crash happened and the dummy variable weekday is about whether or not the accident took place during a weekday, which are the days from Monday up to and including Friday. The variables interstate highway and divided highway are about the road on which the accident happened, where the omitted category is a two-way street. The next variables are about

(5)

the drivers of the vehicles. The variables alcohol or drugs are 1 if the driver of that vehicle was drunk or under influence of drugs during the accident. There are also dummies included for the driver of the vehicle being a male, under 21 or over 60 years old. The dummy speeding is 1 if the travel speed of that vehicle was more than 10 miles per hour above the speed limit on that road. Finally, the variable occupants is the number of occupants in that vehicle.

For some of the variables, I already have an idea about the expected sign in the regression models. It is expected that for example wearing a seat belt or accident taking place during weekdays decreases the severity of an accident, whereas I think that for example the accident taken place when it was dark, speeding and the driver being a male increases the severity of a crash (White, 2004).

(6)

Table 2: Descriptive statistics of all the variables used

Two-vehicle Single vehicle

v1 = Passenger car v1 = Light truck

Mean Std.dev Mean Std.dev. Mean Std.dev.

Small city .1922 .3940 .1971 .3978 .1659 .3720 Medium city .1022 .3029 .1015 .3020 .0781 .2684 Large city .4110 .4920 .3992 .4897 .3191 .4661 Seat belt .8030 .3978 .8236 .3811 .7248 .4466 Rain .1067 .3088 .1025 .3034 .1216 .3269 Snow .0193 .1377 .0215 .1453 .0520 .2219 Dark .2525 .4344 .2273 .4191 .4980 .5000 Weekday .7660 .4234 .7814 .4133 .6783 .4671 Interstate highway .0587 .2351 .0651 .2467 .1380 .3449 Divided highway .2864 .4521 .2967 .4568 .2739 .4460 Alcohol (v1) .0244 .1543 .0241 .1533 .1317 .3382 Drugs (v1) .1012 .3016 .0924 .2896 .1503 .3574 Age <21 (v1) .1599 .3665 .1003 .3004 .1902 .3925 Age >60 (v1) .1287 .3349 .0892 .2850 .0836 .2768 Male driver (v1) .4397 .4964 .6115 .4874 .5973 .4904 Young male (v1) .0675 .2509 .0639 .2445 .1147 .3187 Occupants (v1) 1.415 .7816 1.489 .9476 1.417 .8699 Speeding (v1) .0050 .0704 .0035 .0589 .0391 .1940 Alcohol (v2) .0331 .1789 .0354 .1847 Drugs (v2) .1078 .3101 .0976 .2968 Age <21 (v2) .1754 .3803 .1891 .3916 Age >60 (v2) .1081 .3105 .1119 .3153 Male driver (v2) .5741 .4945 .5454 .4979 Young male (v2) .1014 .3018 .1032 .3042 Occupants (v2) 1.434 .8607 1.437 .8433 Speeding (v2) .0045 .0672 .0060 .0773

Note:The base group for city size is rural area. Alcohol is 1 if the driver is found under influence of alcohol. The same holds for drugs. Young male is 1 if the driver is male and younger than 21. Speeding is 1 if the travel speed is 10 miles per hour above the speed limit. Occupants is the number of occupants in the vehicle.

(7)

3

Estimation based on the Tobit model

3.1 Tobit estimation

First I will try to replicate the estimations as done by Li (2012) exactly or as closely as possible. In this paper the crash severity for a vehicle in a crash is defined as the rate of equivalent fatality per occupant. This measure ranges from 0 (no fatality and incapacitating injury in the vehicle) to 1 (all the occupants are killed). In this section, I use, like Li did, a censored normal regression model, or Tobit model with as dependent variable yi∗ where he converted 20 incapaciting injuries into one fatality. 1. This latent variable y∗i is assumed to be linear in the regressors with an additive error that is normally distributed and homoskedastic. Thus

y∗ = x0β +  (1)

where the error term 

 ∼ N (0, σ2)

has variance σ2 constant accros observations. The observed y is defined as

y =    0 if y∗≤ 0 y∗ if 0 < y∗ < 1 1 if y∗≥ 1

This is a two-sided Tobit model with lower limit of 0 and upper limit of 1. I use two-vehicle and single-vehicle accidents. If I have a two-vehicle accident, one of the two vehicles is selected2 to be the first vehicle and the dependent variable is the crash outcome of this vehicle. I then distinguish between the first vehicle being a passenger car and being a light truck, so that in total I have three types of accidents, which are the same as Li(2012) had: two-vehicle crashes where the first one is a passenger car, two-two-vehicle crashes where the first one is a light truck and single-vehicle crashes.

1

The National Safety Council (NSC) estimated that the average comprehensive cost per death is about 20 times of that per incapacitating injury in motor vehicle crashes

2

This is because the General Estimates System tends to report the vehicle in which the most serious harm occurs as v1

(8)

Table 3: Regression results of tobit models

v1 = passenger car v1 = light truck single vehicle

para SE para SE para SE

(1) (2) (3) (4) (5) (6)

v2 = light truck .0386* .0040 .0341* .0043

Single = light truck .0250* .0057

Small city -.0749* .0059 -.0639* .0061 -.0720* .0085 Medium city -.0571* .0072 -.0663* .0079 -.0786* .0116 Large city -.0500* .0046 -.0465* .0047 -.0378* .0064 Seat belt -.0443* .0049 -.0586* .0051 -.1541* .0064 Rain -.0164* .0065 -.0163* .0069 .0012 .0085 Snow -.0254 .0153 -.0452* .0163 -.1715* .0163 Dark .0381* .0045 .0280* .0048 .0121* .0059 Weekday -.0171* .0045 -.0185* .0048 -.0196* .0059 Interstate highway -.0181* .0082 .0330* .0076 .1059* .0087 Divided highway .0645* .0043 .0422* .0045 .1272* .0071 Alcohol (v1) .0630* .0109 .0763* .0108 .1246* .0079 Drugs (v1) .0539* .0072 .0069 .0076 .0871* .0076 Age <21 (v1) -.0090 .0070 -.0209 .0112 .0324* .0112 Age >60 (v1) .0593* .0056 .0522* .0066 .0439* .0104 Male driver (v1) -.0265* .0043 -.0342* .0044 .0077 .0066 Young male (v1) -.0327* .0111 .0063 .0144 -.0198 .0142 Occupants (v1) .0293* .0023 .0203* .0019 .0364* .0029 Speeding (v1) .2351* .0189 .1401* .0237 .2576* .0116 Alcohol (v2) .0922* .0094 .0819* .0089 Drugs (v2) -.0233* .0072 .0176* .0072 Age <21 (v2) .0016 .0083 -.0011 .0081 Age >60 (v2) .0158* .0063 .0179* .0064 Male driver (v2) .0269* .0044 .0129* .0046 Young male (v2) .0062 .0105 -.0052 .0106 Occupants (v2) .0040 .0022 .0006 .0023 Speeding (v2) .1463* .0210 .0935* .0193 Intercept -.4307* .0112 -.3380* .0115 -.4147* .0129 σ .2902* .0030 .2349* .0033 .4073* .0037 Observation 76586 59733 48813

Note: Similar explanatory variables are used in White(2004). Year dummies (8) are included in all regressions. The omitted category for vehicle type is passenger car. The base group for city size is rural area. Alcohol is 1 if the driver is found under influence of alcohol. Young male is 1 if the driver is male and younger than 21. Speeding is 1 if the travel speed is 10 miles per hour above the speed limit. Occupants is the number of occupants in the vehicle.

(9)

The estimation results for these Tobit regressions are given in Table 3. In this table, v1 represents the first vehicle and v2 represents the second vehicle in a two-vehicle collision. The first two colums (1) and (2) report the results for two-vehicle crashes where the first vehicle was a passenger car. In the regression of columns (3) and (4) the first vehicle is a light truck and the last two columns (5) and (6) report the results for single-vehicle crashes.

Most of the coefficients have the expected sign and are significant at a five percent level. For example, the results suggest that using a seat belt lowers the equivalent fatality rate for all three groups whereas the driver of the first vehicle and/or driver of the second vehicle being drunk increases the equivalent fatality rate for all three groups. However, some estimates have a unexpected sign. For example, based on the results from the Tobit model being a male driver decreases the equivalent fatality rate in the two-vehicle crashes, for both the first as the second vehicle. For single-vehicle crashes the estimated effect of being a male is also positive, but not significant.

The variables which are mostly interesting for this paper are the dummy if the second vehicle was a light truck in case of two-vehicle crashes and the dummy if the vehicle was a light truck in case of a single-vehicle crash. When I look at the results from the Tobit estimation, all these three coefficients in Table 3 are positive and highly significant at a five percent level. This suggests that if a light truck is involved in the crashes, the equiv-alent fatality rate increases, no matter what the first vehicle is in case of a two-vehicle crash.

The results I obtained from the Tobit model are not exactly the same as Li(2012). When comparing the results, I see that his coefficients are sometimes higher and sometimes lower than mine, but almost all of the coefficients have the same sign. In a few cases, the sign of the coefficients is different, for example the dummy for the first driver to be under 21 in case of a two-vehicle crash where the first vehicle is a passenger car. But in all the cases where the sign is different, the coefficients for both Li(2012) as this thesis are not significant at a five percent level. When I compare the ˆσ, it strikes me that all the ˆσ in my case are somewhat lower than in the case of Li, suggesting that my estimation of error term  has a lower standard deviation.

3.2 Equivalent fatalities

I now want to evaluate the relative safety of the two types of vehicles , e.g. cars and light trucks, in single vehicle accidents as well as the effect of the

(10)

two types on crash outcomes in two-vehicle crashes. I do this the same way as Li (2012) did, by calculating the expected equivalent fatality rate for every yi based on the previous tobit regressions. The expected value for y is

equal to:

E [y] = 0 ∗ Pr(y∗≤ 0) + E [y∗|0 < y∗ < 1] ∗ Pr(0 < y∗ < 1) + 1 ∗ Pr(y∗ ≥ 1) where E [y∗|0 < y∗ < 1] = Ex0β + |0 < x0β +  < 1 = x0β + E|0 < x0β +  < 1 = x0β + σ ∗ E  σ| 1 − x0β σ >  σ > −x0β σ  = x0β + σ ∗ φ( −x0β σ ) − φ( 1−x0β σ ) Φ(1−xσ0β) − Φ(−xσ0β)

where φ is the standard normal pdf and Φ is the standard normal cdf. Also, I have Pr(0 < y∗ < 1) = Pr(0 < x0β +  < 1) = Pr(0 − x0β <  < 1 − x0β) = Φ(1 − x 0β σ ) − Φ( −x0β σ ) and Pr(y∗ ≥ 1) = Pr( ≥ 1 − x0β) = 1 − Φ(1 − x 0β σ ) = Φ( x0β − 1 σ )

So that the expected value for y becomes

E [y] = (x0β + σ ∗ φ(−xσ0β) − φ(1−xσ0β) Φ(1−xσ0β) − Φ(−xσ0β)) ∗ (Φ( 1 − x0β σ ) − Φ( −x0β σ )) + 1 ∗ Φ( x0β − 1 σ ) = x0β ∗ (Φ(1 − x 0β σ ) − Φ( −x0β σ )) + σ ∗ (φ( −x0β σ ) − φ( 1 − x0β σ ) + 1 ∗ Φ( x0β − 1 σ )

The sample means of these fatality rates are reported in Table 4. Tct is in

this table the number of equivalent fatalities per occupant for a passenger car where the second vehicle was a light truck et cetera. The sample means and the standard errors are obtained by doing 1000 bootstrap replications of size 1000.

(11)

crash a positive difference. If the second vehicle was a car, light truck oc-cupants face only 53.2 percent of the risk of those of passenger cars and if the second vehicle was a light truck they face only 54.7 percent of the risk. However, this is not the case in single-vehicle crashes. There I see a negative difference, meaning that in this case occupants of light trucks face a higher risk, namely 10.2 percent, than those of passenger cars.

Table 4: Equivalent fatalities per occupant in the Tobit model Two-vehicle crash Single-vehicle crash

First vehicle Second vehicle

Car Light truck

Car Tcc = .0092 Tct =.0128 Tc =.0353 (.0028) (.0041) (.0060) Light truck Ttc = .0049 Ttt = .0070 Tt =.0389 (.0020) (.0030) (.0073) Difference Tcc -Ttc = .0042 Ttc -Ttt = .0058 Tc -Tt = -.0036 (.0035) ( .0053) (.0066) 3.3 Marginal effects

To find the effect of light trucks on the traffic safety for the three groups and to be able to compare this effect with the effect estimated by the Heckman model later on, I will calculate in this subsection the marginal effects for the Tobit model. In section 3.2 we found that the final expression for the expected value for y was

E[y] = x0β∗(Φ(1 − x 0β σ )−Φ( −x0β σ ))+σ∗(φ( −x0β σ )−φ( 1 − x0β σ )+1∗Φ( x0β − 1 σ )

The marginal effects are calculated as ∂E[y] ∂x = β ∗ (Φ( 1 − x0β σ ) − Φ( −x0β σ )) + x 0 β ∗ (φ(1 − x 0β σ ) ∗ −β σ − φ( −x0β σ ) ∗ −β σ ) + σ ∗ (φ(−x 0β σ ) ∗ x0β σ ∗ −β σ − φ( 1 − x0β σ ) ∗ x0β − 1 σ ∗ −β σ ) + φ(x 0β − 1 σ ) ∗ β σ = β ∗ (Φ(1 − x 0β σ ) − Φ( −x0β σ ))

(12)

Where I have used ∂(x∂x0β) = β, ∂Φ(z)∂z = φ(z) and ∂φ(z)∂z = −zφ(z). These marginal effects are given in Table 5. The standard errors are computed using the Delta method. We have

Standard Error of marginal effect of x = Dx0∗ V ∗ Dx

where V is the variance-covariance matrix from the estimation and Dx is the column vector whose jth entry is the second partial derivative of the marginal effect of x, with respect to the coefficient of the jth independent variable.

When comparing the marginal effects from Table 5 with the coeffi-cients found in Table 3, one may notice that they have the same sign. This is because (Φ(1−xσ0β) − Φ(−xσ0β)) is always positive and thus the sign of the marginal effects solely depends on the sign of β.

The marginal effects may be interpreted as follows: they indicate how a one unit change in an independent variable xk affects uncensored

ob-servations y, given that the other variables stay the same. For dichotomous variables that only can take the values 0 or 1, this is the effect on the uncen-sored observations y if the variable changes from 0 to 1. For example, using the Tobit model, I calculated that, in case of a two-vehicle crash where the first vehicle was a car, the second vehicle being a light truck instead of a car increases the expected equivalent fatality rate by 0.0030.

(13)

Table 5: Marginal effects of tobit models

v1 = passenger car v1 = light truck single vehicle

dy/dx SE dy/dx SE dy/dx SE

(1) (2) (3) (4) (5) (6)

v2 = light truck .0030* .0003 .0018* .0002

Single = light truck .0039* .0009

Small city -.0058* .0005 -.0034* .0003 -.0112* .0013 Medium city -.0044* .0006 -.0036* .0004 -.0122* .0018 Large city -.0039* .0003 -.0025* .0003 -.0059* .0010 Seat belt -.0035* .0004 -.0032* .0003 -.0239* .0010 Rain -.0013* .0005 -.0009* .0004 .0002 .0013 Snow -.0020 .0012 -.0024* .0009 -.0266* .0025 Dark .0030* .0004 .0015* .0003 .0019* .0009 Weekday -.0013* .0004 -.0010* .0003 -.0030* .0009 Interstate highway -.0014* .0006 .0018* .0004 .0164* .0014 Divided highway .0050* .0003 .0023* .0003 .0198* .0011 Alcohol (v1) .0049* .0008 .0041* .0006 .0194* .0012 Drugs (v1) .0042* .0006 .0003 .0004 .0135* .0012 Age <21 (v1) -.0007 .0005 -.0011 .0006 .0050* .0017 Age >60 (v1) .0046* .0004 .0028* .0004 .0068* .0016 Male driver (v1) -.0021* .0003 -.0018* .0002 .0012* .0010 Young male (v1) -.0025* .0009 .0003 .0008 -.0031 .0022 Occupants (v1) .0023* .0001 .0011* .0001 .0057* .0005 Speeding (v1) .0183* .0015 .0076* .0013 .0400* .0018 Alcohol (v2) .0072* .0007 .0044* .0005 Drugs (v2) -.0018* .0006 .0010* .0004 Age <21 (v2) .0001 .0006 -.0001 .0004 Age >60 (v2) .0012* .0005 .0010* .0003 Male driver (v2) .0021* .0003 .0007* .0002 Young male (v2) .0005 .0008 -.0003 .0006 Occupants (v2) .0003 .0002 .0000 .0001 Speeding (v2) .0113* .0016 .0050* .0010

Note: Year dummies (8) are excluded from this table * = Significant at a five percent level.

(14)

3.4 Robustness checks

In the estimation of the effect of light trucks on traffic safety, I convert 20 incapacitating injuries into one fatality. To check if the results from the To-bit model depend on this converting scale, I have estimated the same model but with different converting scales, namely converting 10 or 30 incapaci-tating injuries into one fatality. The equivalent fatalities per occupant for a converting scale of 20, 10 or 30 are shown in Table 6.

When I compare the results for different converting scales, I notice some results. First, the crash severity for a converting scale of 10 is always the highest and for a converting scale of 30 always the lowest. This is of course obvious, because by converting 10 incapacitating injuries into one fatality you create a higher value for y∗ in equation (1) and therefore higher values of y, and in case of a converting scale of 30 it is exact the opposite.

The second result is the most important, namely that the first two differences are for all three converting scales negative and the last one is always positive. This means that, no matter what the converting scale is, occupants of light trucks face smaller risk than those of passenger cars when there is a two-vehicle crash and higher risk when there is a single-vehicle crash.

Table 6: Robustness checks for expected equivalent fatality rate

Converting scale 20 10 30 Tcc .0092 .0112 .0085 Ttc .0050 .0065 .0045 Tct .0128 .0156 .0120 Ttt .0070 .0091 .0064 Tc .0353 .0394 .0340 Tt .0389 .0433 .0375 Tcc -Ttc .0041 .0048 .0040 Tct - Ttt .0058 .0065 .0055 Tc -Tt -.0036 -.0040 -.0035

In section 3.3, I found the marginal effect of the light truck variable (and the other variables) on the latent variable yi for a converting scale of 20.

In Table 7 these marginal effects are shown again with the standard errors obtained using the delta method between parentheses, but now together with the marginal effects calculated for a converting scale of 10 or 30.

In all three the groups the expected marginal effect in the Tobit model is the highest when using a converting scale of 10 and the lowest

(15)

when using 30 as the converting scale. This again sounds logic, because these scales have the highest and lowest crash severity respectively. Further, all the marginal effects are positive and in fact highly significant. This suggest that, in case of two-vehicle crashes, the second vehicle being a light truck increases the equivalent fatality rate for the first vehicle and, in case of a single-vehicle crash, this vehicle being a light truck instead of a passenger car increases the fatality rate, no matter what the converting scale is.

Table 7: Robustness checks for marginal effects light truck v1 = passenger car v1 = light truck single vehicle

Scale 20 .0030 .0018 .0039 (.0003) (.0002) (.0009) Scale 10 .0035 .0024 .0043 (.0004) (.0003) (.0010) Scale 30 .0028 .0017 .0038 (.0003) (.0002) (.0009)

4

Estimation based on the Heckman-selectivity model

4.1 The model

In the previous section I have replicated the estimations of the Tobit model done by Li(2012) and calculated the marginal effects of this model. I did this with a lower limit of zero and a upper limit of one. A possible disadvantage of this Tobit model is however that it is assumed that the process for all the observations is the same. This means that it assumes that the process for whether or not we have injured or dead people (So the process that the dependent variable y is zero or greater than zero) is the same as the process for the size of these injuries given that we have one (So given that the dependent variable y is greater than zero).

In this section I will do the same estimations as done in the Tobit section, but now I assume that there might be a different process for the two parts. The model that I will use for this has no generally accepted name. It is sometimes called the Type 2 Tobit model, but mostly it is called the Heckman model, named after Heckman (1979), who used it to illustrate estimation in a situation of sample selection.

(16)

In the Heckman model, we have a so called participation equation y1=



1 if y∗1 > 0 0 if y∗1 ≤ 0 and a resultant outcome equation

y2 =



y∗2 if y1 = 1

0 if y1 = 0

where I specify a linear model with additive errors for the latent variables, so

y∗1 = x01β1+ 1

y∗2 = x02β2+ 2

Where we reduce to the Tobit model with a lower bound of zero in case of y1∗ = y∗2. The participation equation is in our case the process that will determine whether or not there are injuries or deaths. Then, given the result from the participation equation, the outcome equation, which is the severity of the crash, is equal to zero or another process.

Further, to be able to do a estimation by Maximum Likelihood, there have to be made an assumption about the errors. I assume that the errors are joint normally distributed and are homoskedastic, with

1 2  ∼ N  0 0  ,  1 σ12 σ12 σ22 

where σ12 = ρ ∗ σ2 with ρ the correlation between 1 and 2 and σ12 = 1 is

used since only the sign of y∗1 is observed.

4.2 Heckman estimation

Before I can run the Heckman estimation, a problem arises. This model obtains formal identification from the normality assumption when the same covariates appear in the selection equation and the outcome equation, but identification may be hard. (Cameron et al. (2004)). Generally, an exclusion restriction is required to generate credible estimates: there must be at least one variable which appears with a non-zero coefficient in the participation equation but does not appear in the outcome equation. So for the model to

(17)

work well there has to hold x16= x2.

I want to be able to compare the estimates from the Heckman model with the Tobit model. Because of this, I will include all the variables that I used in the Tobit model in the participation equation, so that x1 will

include all the variables. In x2, I will only include variables that are likely

to influence the severity of the injuries, given that we have some injury, so given that y1∗ > 0. Based on a report from the World Health Organisation (2010), I will include the following variables in x2:

• v2 = light truck • Single = light truck • Seat belt • Alcohol (v1) • Drugs (v1) • Occupants (v1) • Speeding (v1) • Alcohol (v2) • Drugs (v2) • Occupants (v2) • Speeding (v2)

Because my main variables of interest is the effect of the second vehicle be-ing a light truck in two-vehicle crashes or the vehicle bebe-ing a light truck in single-vehicle crashes on the crash severity, I will also include these variables in x2.

The results from the Heckman estimation are given in Table 8. Again the first two colums (1) and (2) report the results for two-vehicle crashes where the first vehicle was a passenger car. In the regression of columns (3) and (4) the first vehicle is a light truck and the last two columns (5) and (6) report the results for single-vehicle crashes.

(18)

Table 8: Regression results of Heckman models

v1 = passenger car v1 = light truck single vehicle

para SE para SE para SE

(1) (2) (3) (4) (5) (6)

Outcome Equation

v2 = light truck .0250* .0048 .0041 .0054

Single = light truck .0167* .0059

Seat belt -.0370* .0059 -.0420* .0064 -.0900* .0065 Alcohol (v1) -.0226 .0117 .0444* .0126 -.0394* .0076 Drugs (v1) .1311* .0090 .0615* .0099 .1349* .0075 Occupants(v1) -.0227* .0028 -.0131* .0025 -.0279* .0029 Speeding (v1) .0607* .0185 .0199 .0251 .0092 .0108 Alcohol (v2) .0242* .0101 .0139 .0101 Drugs (v2) -.0446* .0090 .0047 .0093 Occupants(v2) .0082* .0027 .0031 .0028 Speeding (v2) .0258 .0211 -.0051 .0203 Intercept .1764* .0175 .1376* .0205 .2584* .0169 Participation Equation v2 = light truck .1096* .0138 .1500* .0184

Single = light truck .0571* .0146

Small city -.2711* .0203 -.2780* .0257 -.1958* .0213 Medium city -.2077* .0249 -.2890* .0334 -.2310* .0293 Large city -.1838* .0157 -.2074* .0200 -.1113* .0164 Seat belt -.1287* .0172 -.2189* .0221 -.3398* .0165 Rain -.0683* .0225 -.0665* .0295 .0250 .0215 Snow -.1297* .0537 -.2160* .0699 -.4615* .0403 Dark .1366* .0157 .1247* .0206 -.0077 .0151 Weekday -.0651* .0157 -.0938* .0203 -.0606* .0151 Interstate highway -.1060* .0290 .1192* .0328 .3413* .0223 Divided highway .2372* .0149 .1972* .0191 .3215* .0180 Alcohol (v1) .2767* .0384 .2794* .0479 .4485* .0204 Drugs (v1) .0590* .0258 -.0327 .0332 .0775* .0201 Age <21 (v1) -.0373 .0241 -.1066* .0478 .1031* .0283 Age >60 (v1) .1740* .0197 .2139* .0282 .0666* .0266 Male driver (v1) -.1098* .0150 -.1694* .0185 -.0264 .0167 Young male (v1) -.0978* .0385 .0443 .0615 -.0201 .0360 Occupants (v1) .1336* .0079 .1050* .0079 .1546* .0076 Speeding (v1) .7658* .0698 .6069* .1046 .7560* .0311 Alcohol (v2) .3041* .0330 .3602* .0386

(19)

Drugs (v2) -.0458 .0252 .0667* .0312 Age <21 (v2) .0065 .0287 .0098 .0343 Age >60 (v2) .0695* .0218 .0783* .0273 Male driver (v2) .0978* .0154 .0496* .0198 Young male (v2) .0136 .0363 -.0217 .0450 Occupants (v2) .0040 .0078 -.0019 .0100 Speeding (v2) .5316* .0755 .4443* .0838 Intercept -1.4601* .0354 -1.4213* .0443 -.4147* .0129 ρ -.1713* .0405 -.0916 .0610 -.0889* .0365 σ2 .1923* .0020 .1512* .0020 .2685* .0021 Observation 76586 59733 48813

Note: Year dummies (8) are included in all regressions. The omitted category for vehicle type is passenger car. The base group for city size is rural area. Alcohol is 1 if the driver is found under influence of alcohol. Young male is 1 if the driver is male and younger than 21. Speeding is 1 if the travel speed is 10 miles per hour above the speed limit. Occupants is the number of occupants in the vehicle.

* = Significant at a five percent level.

Most of the coeffients have again, just as in the Tobit model, the expected sign and are significant at a five percent level, while other have the opposite sign as I would expect. An example again is the coefficient in the partic-ipation equation for the first driver being a male, which is again negative although not significant in the single-vehicle case.

Almost all the variables that are both in the participation equation and outcome equation, have the same sign in both equations. An example where this is not true is the driver being drunk in case of a single-vehicle crash. This variable has a significant positive effect in the participation equa-tion and a significant negative effect in the outcome equaequa-tion. This suggests that alcohol in single-vehicle crashes has a positive effect on the process whether or not we have a injury but a negative effect on the severity of the injuries, given that we have one.

The coefficients for the second vehicle being a light truck in case of two-vehicle crashes or the vehicle being a light truck in the single-vehicle case are positive and significant at a five percent level in both the partic-ipation and outcome equation. The only exception is the coefficient of the outcome equation in the two-vehicle case where the first vehicle was a light truck, which is not significant at a five percent level. These results suggest that light trucks have a positive effect on both the process whether or not we have a injury and the severity of the injuries.

Further all the ρ are negative, although the second one is not sig-nificant at a five percent level. Looking at a Likelihood-ratio test of the

(20)

participation and outcome equation being independent, which means a null hypothesis of ρ = 0, I see that the null hypothesis is rejected at a five percent level in the two-vehicle case where the first vehicle is a passen-ger car (χ2(1)=13.90 with a p-value of 0.0002) and in the single-vehicle case(χ2(1)=5.26 with a p-value of 0.0218). In the two-vehicle case where the first vehicle is a light truck the null hypothesis is not rejected at a five per-cent level(χ2(1)=1.90 with a p-value of 0.1683). This means that the results of the Heckman model suggests that there is a negative correlation between the two processes in at least two of the three cases. This is strange as we would expect that the higher the expected value for y1 is, the higher the

value of y2 will be.

4.3 Equivalent fatalities

Just as I did in the Tobit part and what Li(2012) did, I will now evaluate the relative safety of the two types of vehicles in single-vehicle accidents as well as the effect of the two types on crash outcomes in two-vehicle crashes. I will do this again by calculating the expected equivalent fatality rate for every y2i based on the Heckman regressions. In this case, where I assumed

that y2 equals zero when y1≤ 0, The expected value for y2 is equal to:

E [y2|x] =Ey1∗[E [y2|x, y1∗]]

= Pr[y1∗≤ 0|x] ∗ 0 + Pr[y1> 0|x] ∗ E[y2∗|x, y1∗> 0] (2) Where we have

Pr[y∗1 > 0|x] = Pr[x01β1+1 > 0|x] = Pr[1 > −x10β1|x] = 1−Φ(−x01β1) = Φ(x01β1)

Where Φ is again the standard normal cdf. Note that there is no standard deviation involved in this because I have set this to one for 1 earlier. The

only thing that has to be done now is finding an expression for E[y∗2|x, y∗1 >

0]. We have

E[y2∗|x, y1∗> 0] = E[x02β2+ 2|x01β1+ 1 > 0]

= x02β2+ E[2|x01β1+ 1 > 0]

To obtain E[2|x01β1+ 1 > 0] when 1 and 2 are correlated, Heckman(1979)

(21)

where σ12 = ρ ∗ σ2 and ν is a random variable independent of 1. We then have E[y2∗|x, y1∗> 0] = x02β2+ E[σ121+ ν|x01β1+ 1 > 0] = x02β2+ σ12E[1|1 > −x01β1] + 0 = x02β2+ σ12∗ φ(−x01β1) 1 − Φ(−x01β1) = x02β2+ σ12∗ φ(x01β1) Φ(x01β1)

So that we have for the expected value of y2 in equation (2)

E[y2|x] = Pr[y1∗ > 0|x] ∗ E[y ∗ 2|x, y ∗ 1 > 0] = Φ(x01β1) ∗ (x02β2+ σ12∗ φ(x01β1) Φ(x01β1) ) = Φ(x01β1)x02β2+ σ12φ(x01β1) (3)

These sample means are reported in Table 9 below. The sample means and the standard errors are again obtained by doing 1000 bootstrap replications of size 1000, just as I did in the Tobit model.

When comparing the numbers across rows, I see for a two-vehicle crash again a positive difference. If the second vehicle was a car, light truck occupants face only 61.4 percent of the risk of those of passenger cars. In the Tobit case this was only 53.2 percent. If the second vehicle was a light truck they face 55.1 percent of the risk against the 54.7 percent in the Tobit case, which is quite similar. In single-vehicle crashes there is again a negative difference, meaning that in this case occupants of light trucks face a higher risk, namely 110.2 percent, than those of passenger cars, which is the same as calculated by the Tobit model.

(22)

Table 9: Equivalent fatalities per occupant in the Heckman model Two-vehicle crash Single-vehicle crash

First vehicle Second vehicle

Car Light truck

Car Hcc = .0070 Hct =.0107 Hc=.0272 (.0026) (.0037) ( .0064) Light truck Htc = .0043 Htt = .0059 Ht=.0300 (.0019) (.0030) (.0073) Difference Hcc -Htc = .0027 Htc -Htt = .0048 Hc-Ht = -.0028 (.0032) ( .0049) (.0078) 4.4 Marginal effects

To find the marginal effects in Heckman model, it is convenient to define x to be the vector formed by the union of x1 and x2 and rewrite x01β1 as

x01γ1 and x02β2 as x02γ2. Our censored mean derived in the last section then

becomes

E[y2|x] = Φ(x0γ1)x0γ2+ σ12φ(x0γ1)

Because usually x1 6= x2, γ1 and/or γ2 will have some zero entries. In my

case, because all variables are used in the participation equation, but not all variables are used in the outcome equation, so only γ2 will have some zero

entries.

Now that I have defined the censored mean in terms of x, I can find the marginal effects in the Heckman model by taking the derivative of the previous equation with respect to x. This becomes

∂E[y2|x]

∂x = γ1∗ φ(x

0γ

1) ∗ x0γ2+ Φ(x0γ1) ∗ γ2− σ12∗ x0γ1∗ φ(x0γ1) ∗ γ1

Where I again have used∂(x∂x0γ) = γ, ∂Φ(z)∂z = φ(z) and∂φ(z)∂z = −zφ(z). These marginal effects are given in Table 10 on the next page. The standard errors are again computed using the Delta method.

The marginal effects obtained by the Heckman (1979) model are comparable with those found by the Tobit model. In fact, all but two which are not signifcant, namely the effect of the second driver being under 21 in case when the first vehicle was a light truck and the driver being a male in the single-vehicle case, have the same sign. The marginal effects again suggest that the second vehicle being a light truck in the two-vehicle crashes

(23)

or the vehicle being a light truck in the single-vehicle crashes increases the equivalent fatalities.

Table 10: Marginal effects of Heckman models

v1 = passenger car v1 = light truck single vehicle

dy/dx SE dy/dx SE dy/dx SE

(1) (2) (3) (4) (5) (6)

v2 = light truck .0038* .0004 .0015* .0003

Single = light truck .0050* .0011

Small city -.0041* .0004 -.0023* .0003 -.0070* .0008 Medium city -.0032* .0004 -.0024* .0003 -.0082* .0010 Large city -.0028* .0003 -.0017* .0002 -.0040* .0006 Seat belt -.0051* .0005 -.0043* .0004 -.0279* .0013 Rain -.0010* .0003 -.0006* .0003 .0009* .0008 Snow -.0020* .0008 -.0018* .0006 -.0164* .0015 Dark .0021* .0003 .0010* .0002 -.0003 .0005 Weekday -.0010* .0002 -.0008* .0002 -.0022* .0005 Interstate highway -.0016* .0004 .0010* .0003 .0122* .0009 Divided highway .0036* .0003 .0017* .0002 .0114* .0008 Alcohol (v1) .0023* .0010 .0049* .0008 .0090* .0014 Drugs (v1) .0120* .0008 .0033* .0006 .0265* .0015 Age <21 (v1) -.0006* .0004 -.0009* .0004 .0037* .0010 Age >60 (v1) .0026* .0003 .0018* .0003 .0024* .0009 Male driver (v1) -.0017* .0002 -.0014* .0002 -.0009 .0006 Young male (v1) -.0015* .0006 .0004 .0005 -.0007 .0012 Occupants (v1) .0001 .0002 .0001 .0001 .0006 .0005 Speeding (v1) .0168* .0018 .0063* .0017 .0285* .0020 Alcohol (v2) .0067* .0009 .0038* .0006 Drugs (v2) -.0045* .0008 .0008 .0006 Age <21 (v2) .0001 .0004 .0001 .0003 Age >60 (v2) .0011* .0003 .0007* .0002 Male driver (v2) .0015* .0002 .0004 .0002 Young male (v2) .0002 .0005 -.0002 .0004 Occupants (v2) .0008* .0002 .0002 .0002 Speeding (v2) .0103* .0020 .0034* .0013

Note: Year dummies (8) are excluded from this table. * = Significant at a five percent level.

(24)

4.5 Robustness checks

Once again, I have used for estimating the effect of light trucks on traffic safety a converting scale of 20 incapacitating injuries into one fatality. To check if the results from the Heckman model in contrast to the Tobit model depend on this converting scale, I have done the same estimations but with different converting scales, namely again 10 or 30 incapacitating injuries into one fatality. These The equivalent fatalities per occupant for a converting scale of 20, 10 or 30 are shown in Table 11.

If we look at these results, we again see, as expected, the crash severity for a converting scale of 10 is always the highest and for a converting scale of 30 always the lowest. Also we see, as in the Tobit case, that for all three converting scales the first two differences are negative and the last one is positive. This again means that, no matter what the converting scale is, occupants of light trucks face smaller risk than those of passenger cars when there is a two-vehicle crash and higher risk when there is a single-vehicle crash.

Table 11: Robustness checks for crash severity

Converting scale 20 10 30 Hcc .0070 .0099 .0061 Htc .0043 .0063 .0037 Hct .0107 .0145 .0095 Htt .0059 .0085 .0050 Hc .0272 .0333 .0253 Ht .0300 .0364 .0280 Hcc -Htc .0027 .0036 .0024 Hct - Htt .0048 .0060 .0045 Hc -Ht -.0028 -.0031 -.0028

Furthermore I have again estimated the expected marginal effects of the light truck variable for a converting scale of 10 and 30. These results together with the effect already estimated with a converting scale of 20 are given in Table 12. We see that, as expected, a scale of 10 has the highest marginal effects and a scale of 30 the lowest. Also, all the marginal effects are positive and significant at a five percent level. This suggests again that, in case of two-vehicle crashes, the second two-vehicle being a light truck increases the equivalent fatality rate for the first vehicle and, in case of a single-vehicle crash, this vehicle being a light truck instead of a passenger car increases the fatality rate, no matter what the converting scale is.

(25)

Table 12: Robustness checks for marginal effects light truck v1 = passenger car v1 = light truck single vehicle

Scale 20 0.0038 0.0015 0.0050 (.0004) (.0003) (.0011) Scale 10 0.0044 0.0022 0.0055 (.0005) (.0004) (.0011) Scale 30 0.0036 0.0013 0.0048 (.0004) (.0003) (.0010)

5

Comparison Tobit and Heckman model

I now replicated the Tobit model of Li(2012) and did a Heckman estimation using fewer assumptions. In this section I will compare both models.

5.1 Expected equivalent fatalities per occupant

For both models I have calculated the expected equivalent fatalities per occupant. The means for all three the converting scales from Tables 6 and 11 are combined and shown in Table 13, with the standard errors, found by doing 1000 bootstraps of size 1000, between parentheses. Also, the actual means of the data are given in this table. In this table all the expected equivalent fatalites with the ”T” are from the Tobit model and with the ”H” from the Heckman model.

Comparing these results, I see that the expected equivalent fatalities per occupant are always higher when I use the Tobit model than when I use the Heckman model and the two models almost have the same standard errors in all cases. For example, in two-vehicle crashes where both vehicles are a car the Tobit model predicts a equivalent fatality per occupant of .0092 with a standard error of .0028 and the Heckman model predicts a equivalent fatality per occupant of .0070 with a standard error of .0025, when using a converting scale of 20.

When comparing with the actual means of the data, I clearly see that the expected equivalent fatalities per occupant for the Heckman model are closer to the actual means than those for the Tobit model. For example, the actual equivalent fatality per occupant is .0057 in the previous case, whereas the heckman model predicted .0070 and the Tobit model .0092. Because both models also approximately have the same standard errors in all cases, this suggests that the Heckman model is better in predicting the

(26)

equivalent fatalities per occupant.

Table 13: Predicted equivalent fatalities per occupant for To-bit and Heckman model

Converting scale 20 10 30 Tcc .0092 .0112 .0085 (.0028) (.0029) (.0028) Hcc .0070 .0099 .0061 (.0025) (.0026) (.0026) Data .0057 .0087 .0047 Tct .0128 .0156 .0120 (.0041) (.0042) (.0041) Hct .0107 .0145 .0095 (.0037) (.0039) (.0037) Data .0103 .0142 .0091 Ttc .0050 .0065 .0045 (.0020) (.0020) (.0020) Htc .0043 .0063 .0037 (.0018) (.0019) (.0018) Data .0037 .0058 .0030 Ttt .0070 .0090 .0064 (.0030) (.0032) (.0030) Htt .0059 .0085 .0050 (.0029) (.0031) ( .0029) Data .0053 .0080 .0044 Tc .0353 .0394 .0340 ( .0060) (.0061) (.0060) Hc .0272 .0333 .0252 (.0064) (.0062) (.0065) Data .0243 .0308 .0222 Tt .0389 .0433 .0375 (.0072) (.0074) (.0072) Ht .0300 .0364 .0280 (.0073) (.0071) ( .0074) Data .0281 .0349 .0258

(27)

5.2 Wald test

A main difference between the Tobit and Heckman model is that the Tobit model assumes that the process is the same for all the data whereas the Heckman assumes that there are two different processes. The first process determines whether or not we have injured or dead people (the process if the dependent variable y is zero or greater than zero) and the second process determines the size of these injuries given that we have one (so given that the dependent variable y is greater than zero).

To check which of these assumptions is more likely, one can perform a Wald test on the parameters of the Heckman model that are both in the participation and outcome equation. The null hypothesis is that these parameters are equal to each other. When the Wald test rejects this null hypothesis at a 5 percent level, then with 95 percent certainty the Heckman model is the better model compared with the Tobit model.

The Wald test works as follows: Let ˆθn be the our sample estimator

of the P parameters we want to do the Wald test with. In the single-vehicle case, we have in the Heckman model 7 variables, including the constant, that are both in the participation and outcome equation, so that p = 14. In the two-vehicle case we have 11 variables, including the constant, that are both in the participation and outcome equation, so that here p = 22. This ˆθn

is supposed to follow asymptotically a normal distribution with covariance matrix V, so that (ˆθn− θ)

D

→ N (0, V ).

The test of q hypotheses, where q = 7 in the single-vehicle case and q = 11 in the two-vehicle cases, on the p parameters is expressed with a q x p matrix R:

H0 : Rθ = r

H1 : Rθ 6= r

Where r is a q x 1 vector of zeros. The test statistic is: (R ˆθn− r)0[R ˆ V NR 0]−1(R ˆθ n− r) D → χ2Q

(28)

Table 14: Wald test statistics

v1 = passenger car v1 = light truck single vehicle Scale 20: χ211 = 2656.46 χ211= 1703.35 χ27 = 2713.43 (0.000) (0.000) (0.000) Scale 10 χ211 = 2876.27 χ211= 1847.83 χ27 = 2917.25 (0.000) (0.000) (0.000) Scale 30 χ211 = 2585.05 χ211= 1655.70 χ27 = 2647.97 (0.000) (0.000) (0.000)

We see that for both the single-vehicle case and the two-vehicle cases and for all three the converting scales the test statistics are very high and the P-values are low. In fact, all the P-P-values are lower than 0.01, so that all cases the null hypothesis is with 99 percent probability rejected at the expense of the alternative hypothesis. This means that the parameters from variables that are both in the participation and outcome equation are with 99 percent probability not the same, which suggests that there are indeed two different processes. This means that the Wald test suggests that the Heckman model is preferred over the Tobit model.

5.3 Hold out sample

Another way to find which model performs better is by looking how well the models forecast the next expected equivalent fatalities per occupant. One can do this by for example out-of-sample testing or hold out sample. A hold out sample is often used for forecast models and is a popular way to test the likely accuracy of a forecasting method.

Just as when calculating the expected equivalent fatalities per oc-cupant, I will use 1000 bootstrap replications of size 1000. To conduct the hold out test, one take out the most recent periods as if it did not exist(The hold out sample). In my case I will only remove the observations in the boot-strap sample from the last year, which is 2006, to make sure that there are enough years left. After this, I estimate the Tobit and Heckman models in the remaining subsample. Then, using the estimation results, I will forecast for both models the equivalent fatalities per occupant for the observations of year 2006 which were left out. The better the model is in predicting, the closer these predictions will be to the real value. The results together with its standard errors, divided into the second vehicle being a light truck or not in the two-vehicle cases and the vehicle being a light truck or not in

(29)

the single-vehicle case, for the Tobit and Heckman model for all three the converting scales are given in Table 15.

When we look at these results, we see that the Tobit model overes-timates the forecasted equivalent fatalities per occupant for the year 2006 in both the single-vehicle and two-vehicle cases and for all three the convert-ing scales. The Heckman forecasts a equivalent fatality per occupant that is much closer to the real values in every case. The only situation where the Heckman model is not that close to the real value is for all three the con-verting scales when we have a two-vehicle crash where both vehicles are a car, but it still is closer than the Tobit model in that case. Also, for all cases and converting scales the Heckman model has a lower standard error than the Tobit model. This all suggests that the Heckman model is the better model in forecasting the future equivalent fatalities per occupant.

(30)

Table 15: Forecast values for the hold out sample year 2006 for the Tobit and Heckman model

Converting scale 20 10 30 Tcc .0114 .0142 .0106 (.0055) (.0060) (.0053) Hcc .0086 .0123 .0074 (.0048) (.0054) (.0048) Data .0068 .0101 .0057 Tct .0150 .0184 .0140 (.0070) (.0075) (.0068) Hct .0122 .0166 .0108 (.0064) (.0070) (.0063) Data .0118 .0158 .0105 Ttc .0063 .0081 .0057 (.0038) (.0041) (.0038) Htc .0047 .0068 .0039 (.0031) (.0033) (.0029) Data .0042 .0066 .0034 Ttt .0088 .0113 .0080 (.0064) (.0070) (.0062) Htt .0069 .0097 .0058 (.0051) (.0056) (.0052) Data .0063 .0092 .0053 Tc .0418 .0469 .0402 (.0118) (.0126) (.0117) Hc .0307 .0376 .0283 (.0088) (.0094) (.0087) Data .0303 .0376 .0279 Tt .0465 .0520 .0448 (.0133) (.0140) (.0130) Ht .0340 .0413 .0316 (.0095) (.0101) (.0094) Data .0338 .0408 .0313

(31)

6

Estimation based on the Cosslett model

In the last section, I have compared the Tobit model with the Heckman model using three different measurements, namely the expected equivalent fatalities per occupant for both models, a Wald test on the coefficients of the Heckman model and a checking their future equivalent fatalities per oc-cupant predictions using hold out sample. All three measurements suggest that the Heckman model is the better model to estimate the effect of light trucks on traffic safety.

But there is something strange about the Heckman results. In equa-tion (3) I found that

E[y2|x] = Φ(x01β1)x02β2+ σ12φ(x01β1)

where σ12= ρ ∗ σ2. This suggests that the expected value for y2 is positively

correlated with ρ, so that the higher ρ the higher this expected value will be. This looks obvious, that a higher value in y1 will lead to a higher value in y2.

However, when looking at the results from the Heckman estimation in Table 9, I notice that in all three the cases ρ is actually negative and in case of the two-vehicle crashes where the first vehicle is a car or single-vehicle crashes is even significant at a five percent level. This suggests a strange negative effect between the two processes and may indicate that there is something wrong with the Heckman model. Therefore in this section I will estimate everything again but now using a semiparametric estimation technique introduced by Cosslett(1991).

6.1 The model

Cosslett’s selection model may be seen as the semiparametric analogue of the Heckman’s two-step estimator, which is another version of the Heckman (1979) model that I used. It still assumes that the process for whether or not we have injured or dead people is different from the process of the size of these injuries given that we have one. But instead of making the assumption that the error terms are normally distributed, a dummy variables approxi-mation of the selection correction is specified.

The dummy variables are defined based on the value-ordered index zi0γ. This zi0γ is estimated in the first step by some semiparametric estimator for the binary response model. In this paper I will approximate the density of the error term in this first step by a Hermite form of a certain order K, see Gallant and Nychka(1987) and Stewart(2004) for more information. The

(32)

predicted value ziγ is then cut in M sections. Each section corresponds toˆ

one dummy Dmand these dummies are used in the second step in the

partic-ipation equation which is estimated by OLS. So the particpartic-ipation equation looks like

y1i= F (zi0γ)

and a resultant outcome equation

y2i= x0iβ + M

X

m=1

bm∗ Dim(zi0γ) + ξi

where y1 = 1 if y∗ > 0 and y1= 0 if y∗= 0 and y2 = y∗.

6.2 Cosslett estimation

Before using the Cosslett estimator, I first have to decide the number of dummies M that I will use. We don’t want to be the number of dummies M too small, because it approximates the error term of the second step. If this error term is fluctuating a lot and we have to few dummies, it will not be a good approximation. We also don’t want the number of dummies M too big, because then the dummies might be insignificant. So I choose the number of dummies in all my estimations to be 30.

Further I have to choose the order of the Hermite form K, see Stew-art(2004). One can find the best order K by beginning with a relative high order, say 8, and look if the coefficients are significant. If not, one can decide to choose a lower order. By doing this, the coefficients are significant at a five percent level in the single-vehicle case for a order K = 3. However, for the two-vehicle cases the coefficients remain insignificant. But if you choose a lower order it will reduces to a normal probit model. So for all three the cases I will use a order of K = 3.

Because the Cosslett model is considered as the semiparametric ana-logue of Heckman’s estimator, I will use the same variables in the first and second step as I did when using the Heckman model. That is, all the vari-ables will be used in the first step in ziand the effect of light trucks, alcohol,

drugs, the speed of the vehicles and the number of occupants. The results for this Cosslett model with a M = 30 and a K = 3 are given in the table below.

(33)

Table 16: Regression results of Cosslett models

v1 = passenger car v1 = light truck single vehicle

para SE para SE para SE

(1) (2) (3) (4) (5) (6)

Outcome Equation

v2 = light truck .0031* .0005 .0003 .0003

Single = light truck .0030* .0012

Seat belt -.0030* .0006 -.0026* .0005 -.0163* .0015 Alcohol (v1) .0005 .0015 .0061* .0011 -.0097* .0019 Drugs (v1) .0122* .0009 .0033* .0007 .0279* .0017 Occupants(v1) -.0019* .0003 -.0012* .0002 -.0080* .0008 Speeding (v1) .0448* .0032 .0033 .0029 .0051 .0036 Alcohol (v2) .0075* .0013 .0010 .0010 Drugs (v2) -.0040* .0009 .0005 .0007 Occupants(v2) .0008* .0003 -.0002 .0002 Speeding (v2) .0150* .0034 -.0004 .0022 Intercept .0055* .0008 .0032 .0089 .0468 .0403 Participation Equation v2 = light truck .1093* .0268 .1485* .0412

Single = light truck .0584* .0156

Small city -.2705* .0607 -.2739* .0726 -.2040* .0264 Medium city -.2073* .0504 -.2840* .0784 -.2562* .0371 Large city -.1859* .0423 -.2043* .0553 -.1120* .0187 Seat belt -.1256* .0320 -.2163* .0575 -.3644* .0293 Rain -.0691* .0268 -.0654* .0333 .0211 .0229 Snow -.1365* .0601 -.2161* .0877 -.5400* .0645 Dark .1339* .0324 .1227* .0361 -.0084 .0161 Weekday -.0662* .0211 -.0939* .0307 -.0601* .0165 Interstate highway -.1173* .0383 .1140* .0432 .3578* .0302 Divided highway .2383* .0524 .1954* .0521 .3376* .0276 Alcohol (v1) .2845* .0721 .2751* .0851 .4599* .0324 Drugs (v1) .0575* .0287 -.0326 .0337 .0691* .0211 Age <21 (v1) -.0402 .0255 -.1080* .0549 .1096* .0303 Age >60 (v1) .1613* .0393 .2088* .0588 .0506 .0288 Male driver (v1) -.1155* .0286 -.1692* .0467 -.0235 .0180 Young male (v1) -.0927* .0428 .0464 .0623 -.0283 .0382 Occupants (v1) .1361* .0299 .1032* .0281 .1640* .0120 Speeding (v1) .8231* .1915 .5934* .1949 .7724* .0572 Alcohol (v2) .3151* .0750 .3544* .1003

(34)

Drugs (v2) -.0454 .0271 .0657 .0353 Age <21 (v2) .0093 .0283 .0120 .0340 Age >60 (v2) .0738 .0266 .0770* .0338 Male driver (v2) .0964* .0255 .0483* .0227 Young male (v2) .0083 .0362 -.0221 .0448 Occupants (v2) .0044 .0078 -.0019 .0099 Speeding (v2) .5780* .1486 .4373* .1410

Intercept -1.4578 Fixed -1.4186 Fixed -1.0250 Fixed

SNP coefficient 1 .0278 .1630 .0104 .1892 .1169* .0447

SNP coefficient 2 .0936 .1011 -.0211 .1913 -.2966* .0211

SNP coefficient 3 -.0382 .0350 .0050 .0333 .1131* .0135

Observation 76586 59733 48813

Note: Year dummies (8) and Cosslett dummies (30) are included in all regressions. The omitted category for vehicle type is passenger car. The base group for city size is rural area. Alcohol is 1 if the driver is found under influence of alcohol. Young male is 1 if the driver is male and younger than 21. Speeding is 1 if the travel speed is 10 miles per hour above the speed limit. Occupants is the number of occupants in the vehicle.

* = Significant at a five percent level.

The 30 Cosslett dummies are excluded from this table, because otherwise it will be too large. The first dummy for the lowest ziˆγ is the omitted category.

The coefficients of the other dummies are given in figure 1. When looking at the coefficients of the other dummies in the outcome equation, they are almost always positive and also significant. This is exact what we would expect, because we expect that the higher the expected value for y1 is, the

higher the expected value for y2 is.

When comparing the results between the outcome and the partici-pation equation, we see that they mostly have the same sign for the same variables. The only exceptions which are significant at a five percent level are the number of occupants for the first vehicle, which are all significant negative in the outcome equation and significant positive in the participation equation, and the driver being drunk in the single-vehicle crashes, which is also significant negative in the outcome equation and significant positive in the participation equation.

The results are very comparable with the results from the Heckman (1979) model in Table 9. They almost always have the same sign, except for the coefficient in the outcome equation of the first driver being drunk when the first vehicle is a passenger car and for the coefficient in the out-come equation of the number of occupants of the second vehicle when the first vehicle is a truck, but both of these are not significant at a five percent level. When the coefficients are significant at a five percent level, they

(35)

al-Figure 1: Cosslett dummies for v1 = passenger car, v1 = light truck and single vehicle crashes

(36)

ways have the same sign. The only thing that is clearly different is the size of the coefficients in the outcome equation, which are larger in case of the Heckman model than in case of the Cosslett model. This is logical because in the Cosslett model we have added 30 Cosslett dummies in the outcome equation which will take over some of the effects of the variables.

The coefficients for the second vehicle being a light truck in case of two-vehicle crashes or the vehicle being a light truck in the single-vehicle case are positive and significant at a five percent level in both the partic-ipation and outcome equation. The only exception is the coefficient of the outcome equation in the two-vehicle case where the first vehicle was a light truck, which is not significant at a five percent level. This is also the same as in the Heckman case. These results suggest again that light trucks have a positive effect on both the process whether or not we have a injury and the severity of the injuries given that we have one.

6.3 Equivalent fatalities

Now I can evaluate again the relative safety of the two types of vehicles by calculating the expected equivalent fatality rate for every y2i based on the

Cosslett regressions. The expected value for yi becomes

E [yi|x] = xiβ +ˆ M X m=1 ˆ bm∗ Dim(z0iˆγ) (4)

So that the expected value is the expected value of the xiβ part plus theˆ

effect of the dummy that is expected to be one. These expected values are reported in Table 17 below. The sample means and the standard errors are again obtained by doing 1000 bootstrap replications of size 1000, just as I did for the Tobit and Heckman model, where I still use M = 30 and K = 3. When comparing the numbers across rows, we see the same as for the Tobit model and the Heckman (1979) model, that for a two-vehicle crash a positive difference, meaning that occupants of light trucks face smaller risks than those in cars, 63.8 percent as opposite to 61.4 percent in the Heckman case when the second vehicle is a car and 52.4 as opposite to 55.1 percent in the Heckman case when the second vehicle is a light truck. However, this is again not the case in single-vehicle crashes. There we again see a negative difference, meaning that in this case occupants of light trucks face a higher risk than those of passenger cars, namely 13.0 percent higher than those of passenger cars as opposite to the 10.2 percent in the Heckman model.

(37)

Table 17: Equivalent fatalities per occupant in the Cosslett model Two-vehicle crash Single-vehicle crash

First vehicle Second vehicle

Car Light truck

Car Ccc = .0058 Cct =.0103 Cc=.0246 (.0020) (.0037) (.0051) Light truck Ctc = .0037 Ctt = .0054 Ct= .0278 (.0015) (.0026) (.0063) Difference Ccc -Ctc = .0021 Ctc -Ctt =.0049 Cc-Ct = -.0032 (.0025) (.0045) (.0079) 6.4 Marginal effects

Unlike for the Tobit and Heckman models, I can’t find the marginal effects of the Cosslett model by taking the derivative of previous equation (3) with respect a variable in xi, if that variable is also in zi. This is because a small

change in this variable may lead to the jump from the one dummy to the other, which means that the derivative will not be continous for all i but will have jumps in it. And because I have chosen to put all the variables into zi,

this problem holds for all the variables.

To be able to calculate the marginal effects of the Cosslett model I will therefore use the numerical approximation of the first difference, which says that the marginal effect is

M E = f (x + ∆x) − f (x) ∆x

So to calculate the marginal effect, I will add a small amount of 0.1 to the variable we want to know the effect of in both the first and second step. By choosing 0.1 I make sure that this ∆x is not too big but also not too small so that there will be enough jumps from one dummy to the other. With this slightly higher x variable I will recalculate the expected values for every yi

using the same β and γ as before. The change in the expected value divided by 0.1 is then a approximation of the marginal effect.

This means that there are two possible ways in which the depen-dent variable yi may change. First, for all variables it holds that adding an

amount of 0.1 will change the predicted value zi0γ. This may lead to a jumpˆ from one dummy to one other dummy and therefore another coefficient bm

(38)

are also in xi, there will be another effect due to changes in this x variable.

The results from this are given in Table 18 below.3

When comparing this with the marginal effects obtained by the Heckman model in Table 10, I see that they all have the same sign, ex-cept for the number of occupants of the first vehicle in all three the cases, which have a negative marginal effect in the Cosslett estimations but a positive marginal effect the Heckman estimations, but not significant. The results suggest again that the second vehicle being a light truck in the two-vehicle crashes or the two-vehicle being a light truck in the single-two-vehicle crashes increases the equivalent fatalities. For example, we expect that using the Cosslett model, in case of a two-vehicle crash where the first vehicle was a car, the second vehicle being a light truck instead of a car increases the expected equivalent fatality rate by 0.0043.

3

Because of the complexity I will not calculate the standard errors for these marginal effects.

(39)

Table 18: Marginal effects of Cosslett models

v1 = passenger car v1 = light truck single vehicle

dy/dx dy/dx dy/dx

(1) (2) (3)

v2 = light truck .0043 .0017

Single = light truck .0050

Small city -.0044 -.0024 -.0070 Medium city -.0034 -.0025 -.0085 Large city -.0031 -.0018 -.0038 Seat belt -.0047 -.0045 -.0281 Rain -.0012 -.0006 .0007 Snow -.0023 -.0019 -.0173 Dark .0023 .0011 -.0003 Weekday -.0011 -.0008 -.0019 Interstate highway -.0019 .0010 .0119 Divided highway .0041 .0018 .0111 Alcohol (v1) .0012 .0087 .0062 Drugs (v1) .0130 .0030 .0302 Age <21(v1) -.0007 -.0010 .0037 Age >60(v1) .0028 .0019 .0017 Male driver (v1) -.0019 -.0015 -.0007 Young male (v1) -.0015 .0004 -.0009 Occupants (v1) -.0006 -.0002 -.0025 Speeding (v1) .0313 .0100 .0319 Alcohol (v2) .0092 .0044 Drugs (v2) -.0046 .0012 Age <21(v2) .0002 .0001 Age >60(v2) .0013 .0007 Male driver (v2) .0017 .0005 Young male (v2) .0001 -.0002 Occupants (v2) .0009 .0002 Speeding (v2) .0113 .0033

(40)

6.5 Robustness checks

Just like for the Tobit and Heckman model, I will now look if the convert-ing scale effect the results from the Cosslett model. I will again do this by changing the standard converting scale of 20 incapacitating injuries into one fatality to a converting scale of 10 respectively 30. The first step will not change because the dependent variable is a dummy variable and does not depend on the converting scale. The only differences that might occur are due to differences in the dependent variable in the second step. While doing this, I will number of dummies M = 30 and the order of the Hermite form K = 3 the same. The results are given in Table 19.

When looking at the results, we see that they are not that much different from the Tobit and the Heckman models. As expected the convert-ing scale of 10 is again the highest and the convertconvert-ing scale of 30 the lowest. Further we see that, as in the Tobit and Heckman case,that for all three converting scales the first two differences are negative and the last one is positive. This means that also for the Cosslett model we have that no matter what the converting scale is, occupants of light trucks face smaller risk than those of passenger cars when there is a two-vehicle crash and higher risk when there is a single-vehicle crash.

Table 19: Robustness checks for crash severity

Converting scale 20 10 30 Ccc .0058 .0089 .0048 Ctc .0037 .0058 .0030 Cct .0103 .0141 .0090 Ctt .0054 .0081 .0044 Cc .0246 .0311 .0224 Ct .0278 .0346 .0255 Ccc -Ctc .0021 .0031 .0018 Cct - Ctt .0049 .0061 .0045 Cc -Ct -.0032 -.0036 -.0031

Furthermore I have again estimated the expected marginal effects of the light truck variable for a converting scale of 10 and 30. These results together with the effect already estimated with a converting scale of 20 are given in Table 204. We see that, as expected, a scale of 10 has again the highest marginal

4

Because of the complexity I will again not calculate the standard errors for these marginal effects.

(41)

effects and a scale of 30 the lowest. Also, all the marginal effects are afresh positive and significant at a five percent level. This suggests again that, in case of two-vehicle crashes, the second vehicle being a light truck increases the equivalent fatality rate for the first vehicle and, in case of a single-vehicle crash, this vehicle being a light truck instead of a passenger car increases the fatality rate, no matter what the converting scale is.

Table 20: Robustness checks for marginal effects light truck v1 = passenger car v1 = light truck single vehicle

Scale 20 .0043 .0017 .0050

Scale 10 .0049 .0023 .0054

Scale 30 .0041 .0014 .0047

7

Comparison Heckman and Cosslett model

In the previous section I have estimated the Cosslet model, its marginal effects and did some robustness checks. In this section I will compare the Cosslett model with the Heckman (1979) model estimated in section 4. I will compare with the Heckman model because in section 5 I found that all the comparison methods prefer this model above the Tobit model estimated in section 3.

7.1 Expected equivalent fatalities per occupant

First, just like when I compared the Tobit and the Heckman model, I will compare the expected equivalent fatalities per occupant for the Heckman and Cosslett model. That is, I will combine Table 11 and 19 in Table 21 with the standard errors, again obtained by doing 1000 bootstraps of size 1000, given between parentheses. Also, the actual means of the data are again given in this table. In this table all the expected equivalent fatalites with the ”H” are from the Heckman model and with the ”C” from the Cosslett model.

Referenties

GERELATEERDE DOCUMENTEN

More specifically, the survey focuses on drivers road behaviour, attitudes and opinions concerning drinking and driving, speeding and seat belt use, opinions on

Therefore, we investigated the effects of R, and S, phosphate methylation on antiparallel DNA duplexes with molecular mechanics calculations on phosphate-methylated

In this section we introduce spaces of fUnctions which are restrictions of harmonic functions to sq-l or are harmonic functions on IRq.. The following lemma

i) An increase in farm income, both at the rural poor household level and the national level, will enhance better access to nutritious food and spending on other non-food factors

In this paper we study the two echelon vehicle routing problem with covering options (2E-VRP- CO), which arises when both satellite locations and covering locations are available

Tabel 2 • Inwendige hoogte, nestbreedte, groen en netto opbrengst van Fambo op fijne ruggen onder invloed van aandrukken. * Gegevens J.J.Slabbekoorn,

Before stating our traffic rule for at-crossing zones, we introduce a global crossing token to guarantee that there is at most one at-crossing vehicle which can change its state at

The vehicle routing problem which is the subject of this paper is to determine a set of tours of minimum total length such that each of a set of customers, represented by points in