• No results found

Differences in wages caused by ethnicity in the Netherlands : a parametric and semiparametric approach

N/A
N/A
Protected

Academic year: 2021

Share "Differences in wages caused by ethnicity in the Netherlands : a parametric and semiparametric approach"

Copied!
43
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Faculty of Economics and Business

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided up into a number of sections and contains references. An outline can be something like (this is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page) (c) Introduction (d) Theoretical background (e) Model (f) Data (g) Empirical Analysis (h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you use should be logical) and the heading of the sections. You have a free choice how to list your references but be consistent. References in the text should contain the names of the authors and the year of publication. E.g. Heckman and McFadden (2013). In the case of three or more authors: list all names and year of publication in case of the rst reference and use the rst name and et al and year of publication for the other references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty as in the heading of this document. This combination is provided on Blackboard (in MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number (d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics

Differences in Wages Caused by Ethnicity in

the Netherlands

A Parametric and Semiparametric Approach

Joah Rakeem Tjon-Affo

(10334882)

Master’s programme: Econometrics

Specialisation: Econometrics

Date of final version: July 15, 2017

Supervisor: Dr. J. C. M. van Ophem

Second reader: Dr. E. Aristodemou

Abstract

The aim of this thesis is to investigate the effect of ethnic background on wages in the Netherlands. To do so, a distinction is made between first and second generation immigrants on the one hand and Western and non-Western immigrants on the other. The effect is studied by the

use of the common Ordinary Least Squares (OLS) and Instrumental Variables (IV) methods, as well as the (slightly) less common Heckman (1979) and Cosslett (1983) methods. Based on the different methods, this research concludes that on average immigrants earn 6% less than natives, where Western immigrants earn 6 to 9% less and non-Western immigrants have no disadvantage. Furthermore, first generation immigrants receive 8 to 10% less, while the second generation earns

the same as natives. However, in the aggregate case some effects were averaged out. Thus, when lastly focussing on the individual effects, we find there is no significant difference between the

wages of natives and first generation/Western or second generation/non-Western immigrants, while first generation/non-Western and second generation/Western immigrants earn 9 to 12% and

(2)

This document is written by Joah Rakeem Tjon-Affo who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

1 Introduction 1

2 Literature 3

2.1 Research in the Netherlands . . . 3

2.2 Research in the rest of the world . . . 5

2.3 Endogeneity of schooling . . . 6

3 Model and methods 7 3.1 Model I – OLS . . . 7

3.2 Model II – IV . . . 8

3.3 Model III – Heckman . . . 8

3.4 Model IV – Cosslett . . . 9

4 Data 9 4.1 General distribution . . . 10

4.2 Distribution per ethnic background . . . 14

5 Results 16 5.1 Monthly income . . . 16 5.1.1 Legislation instruments . . . 16 5.1.2 Regional instruments . . . 21 5.1.3 Aggregated effects . . . 24 5.2 Hourly wage . . . 27 5.2.1 Individual effects . . . 27 5.2.2 Aggregated effects . . . 32 6 Conclusion 35

(4)

1

Introduction

At the end of the year 2015, more than a fifth of the Dutch population consisted of immigrants (Ooijevaar and Bloemendal, 2016), from which more than half are considered to be of Non-Western origin. To clarify, the Dutch Central Bureau of Statistics (CBS) distinguishes two classes of origin: Western and Non-Western. All countries in Europe (apart from Turkey), North-America and Ocea-nia are considered to be Western countries. Next to that, Indonesia and Japan are also considered Western. All other countries are classified as Non-Western.

The relative number of immigrants in the Netherlands is only expected to grow over the coming years, where the CBS predicts this percentage to have grown to over 30% in 2060. As might be clear from the high percentage of immigrants, the Dutch society has become quite diverse. All this diversity leads to big differences in culture, ethics and other characteristics of the inhabitants of the Netherlands. Historically, differences and minorities together have often lead to discrimination or racism. Since discrimination is undesirable, harmful and even against the law it is important to continuously reevaluate its presence in a society.

A common way to assess some kind of wrongdoing from an economic point of view is by investigating whether there are any differences in wages between certain ethnic groups, based solely on ethnic background. Ample research has been done on this subject, in many societies and countries. Examples are Kee (1995) and Van Ours and Veenman (2004) in the Netherlands, Chiswick (1980) and Blackaby et al. (1998) in Britain and Neal and Johnson (1996) and Fan et al. (2017) in the US. It however is always important to keep addressing such moral issues over time to check for, or even control their development.

The main focus of this research will be to find the effect of ethnicity on wage in the Nether-lands. To do so, a clear distinction between Western/Non-Western and first/second generation im-migrants will be made. This will result in four types of imim-migrants, for which the individual as well as the aggregated effects will be estimated. Based on previous research, we expect a non-Western ethnicity to have a negative effect on income. Although former research has stated that Western immigrants experience a smaller (negative) effect, unobserved language complications or cultural differences might still result in a difference in wage. On the other hand, since the Dutch consider

(5)

themselves to be very tolerant and non-racist, one might also expect (or even hope) the effect of ethnicity to be insignificant.

Schooling is often considered to be endogenous in the wage equation. Especially in the case of different ethnic groups, a selectivity problem might arise due to the different attitude of certain groups towards schooling. Turkish children for instance might be motivated to start working as soon as possible and quit school due to cultural influences of parents. Furthermore, some groups might already experience discrimination in the educational system, causing them to get selected into lower quality/level schools and have a lower return on education. Moreover, schooling in the country of origin for first generation immigrants might be inferior due to a different system, lower quality or language differences.

To get a first impression of the results, a simple Ordinary Least Squares (OLS) estimation will be conducted. The previously mentioned arguments however indicate heterogeneity or en-dogeneity of the schooling variable, causing OLS to give inconsistent results. Hence, a second method, namely Instrumental Variables (IV), controlling for this endogeneity, will be used. Har-mon and Walker (1995) state that years of schooling should be treated as a categorical rather than a continuous variable. To implement this, they adopt a third approach, namely the Heckman se-lectivity model (Heckman, 1979), which will be addressed in detail in Section 2.3. Although it might be questionable whether years of schooling is actually nominal, the highest level of educa-tion completed (which will be used in this research) is unarguably categorical, justifying the use of the Heckman method in this paper. Lastly, since the Heckman model unfoundedly assumes the errors are normally distributed, it might be good to estimate a semiparametric model in addition, for which an extension of the Cosslett method (Cosslett, 1983) is chosen.

Following these steps, we expect to prefer the results of IV over OLS and likewise the results of the Heckman model over IV. Since the Cosslett method is semiparametric, we expect to lose some significance here, hence while this method is less restricted than the Heckman method, we cannot conclude beforehand which of the two we will prefer.

The data used originates from the Longitudinal Internet Studies for the Social sciences (LISS) panel, which is conducted by CentERdata (Tilburg University, The Netherlands). Information is

(6)

availabe on e.g. gender, age, wage, origin and level of education of respondents.

The paper is structured as follows. Section 2 gives a review of relevant literature. Hereafter, Section 3 states the models that will be used in the remainder of the paper, as well as the methods used for inference. Next, Section 4 supplies more insights into the data and its source. Following, Section 5 states and discusses the results. Lastly, Section 6 presents a short recap and conclusion.

2

Literature

2.1

Research in the Netherlands

One example of research done in the Netherlands on the effect of ethnicity on income is Kee (1995). He tries to capture the differences in wages caused by discrimination for male household heads in the Netherlands. For his research, he used different data sets for immigrants and natives, both from surveys conducted around 1985.

Kee (1995) considers four countries of origin of immigrants: the Netherlands Antilles, Suri-name, Turkey and Morocco. For each individually, he estimates this effect not only by looking at observed wage differentials, but also by offered wage surplus between natives and each group. He follows the work of Reimers (1983), who only considers employees, since this main group is affected differently. Both Kee (1995) and Reimers (1983) use a Heckman model to correct for selectivity caused by the participation decision. Next to that, Kee (1995) presents a problem with identifying discrimination since the non-discriminatory wage is unobserved. To catch this discrim-ination, he decomposes the ‘native-immigrant observed wage differential’ into two parts. A first part, caused by differences in personal characteristics and a second part caused by discrimination, consisting of the overvaluation of characteristics of natives and the undervaluation of the charac-teristics of immigrants. This results in the following estimation equation.

(7)

ln WN − ln WI = (XN − X1I)β∗ − X2Iβ2I − X3Iβ∗∗ + XN(βN − β∗) + X1I(β∗ − β1I)

+ X31(β∗∗ − β3I) (1)

In (1), N stands for natives, I for immigrants, subscript 1 for characteristics immigrants have in common with natives, 2 for language and cultural characteristics and 3 for foreign training (school-ing) characteristics. Furthermore, ln W is the mean of the natural logarithm of wages and X the mean of the regressors. Lastly, β∗is formed to be sensitive to relative proportions of natives and immigrants and β∗∗ as the non-discriminatory return to training acquired abroad. Their derivations can be found in Kee (1995, p. 308).

Resulting from these models, Kee (1995) finds that the average native-immigrant offered wage surplus for Antilleans, Surinamese, Turks, and Moroccans respectively equals approximately 35%, 41%, 54% and 44%. Moreover, he found that 35% and 15% of the wage difference between natives and Antilleans or Turks respectively, is caused by tastes for discrimination (Kee, 1995, p. 315). For Surinamese no discrimination was found, while for Moroccans this discrimination was even positive.

Van Ours and Veenman (2004) is another example of research done in the Netherlands. They use data from a 1998 survey amongst 13 Dutch cities with a high percentage of immigrants. In their research, they consider the same four immigrant groups as Kee (1995), but only focus on the second generation. Due to this focus, their sample is relatively young and only at the start of their career. They conclude ethnicity does not have a big impact on wages, while they do find it influences the unemployment rate. They do however emphasise that while discrimination seems to diminish, they have only been able to consider individuals at the start of their career and cannot say much about the development of these careers.

(8)

2.2

Research in the rest of the world

Also in other countries, such as Britain, plenty of research has been done on discrimination in the labour market. An example of this is the paper of Chiswick (1980) using data from 1971, who similarly to Kee (1995) solely focuses on men, but instead uses different groups of immigrants: white and coloured. Furthermore, Chiswick (1980) splits the native population up correspondingly. By doing so, he is able to estimate the difference in earnings between natives and foreigners on the one hand and between white and coloured men on the other. Moreover, he can now analyse combinations of these groups, such as differences between the earnings of white natives and white foreigners, too.

Compared to natives, Chiswick (1980) finds no significant difference on earnings for white immigrants, but a negative effect of about 25% for coloured immigrants. He also concludes their relative earnings disadvantage increases with the number of years of schooling, meaning an extra year of schooling has less effect on the earnings of coloured immigrants than on the earnings of natives. Although Chiswick (1980) has a rather small sample of native-born coloured men (only 8 observations), he nevertheless finds significant results indicating they are at a much smaller disadvantage than coloured immigrants. He concludes this might imply a diminishing effect of ethnicity on the earnings in Britain since throughout the years more and more coloured people will become natives, but does note further research is preferable before drawing any hard conclusions due to the small sample of this group.

Blackaby et al. (1998) also take on a similar approach to Kee (1995) for Britain. By using a decomposition analysis as in (1), they estimate the wage gap between natives and certain nonwhite ethnic minorities in the 1990s. In their analysis they consider three separate groups: Indians, Black Caribbeans and Pakistanis. They show that these groups earn approximately 12%, 7% and 31% less than whites respectively. Furthermore, these differences are explained by discrimination for 79%, 145% and 52% respectively, showing that although Indians have the lowest observed wage difference, the actual difference due to discrimination is the largest. Nevertheless, they still conclude Indians appear to be doing better, based on their characteristics.

(9)

in the US. In their model they include the Armed Forces Qualification Test (AFQT) score, a test the military used for their screenings and job assignments, which was administered to more than 90% of their panel. This test score is intended to be a measure for skill, which they consider to be of influence on income. In their paper they conclude that while wage differences between blacks, Hispanics and whites are present, these are almost annihilated when they control for the AFQT score. More specifically, when they split up the results between men and women, they find that while a small wage difference still exists for men, this difference is completely gone for women.

Fan et al. (2017) analyse the wage differentials for individual occupations in the US labour market. Here, they make a distinction between hard and nonhard-skills using four descriptors on the one hand and soft and nonsoft-skills using six descriptors on the other (see Fan et al., 2017, p. 1037 for a more detailed description of the four skill types). Next, they assign each occupation to one or more groups using k-means clustering. As a result, they firstly find that the wage difference between blacks and whites is usually smaller for hard and nonsoft-skills job than for the other two. Furthermore, they conclude self-selection of blacks into hard and nonsoft-skills jobs, as modelled by the Heckman model, does exist.

2.3

Endogeneity of schooling

As stated in Section 1, the inclusion of schooling in the standard OLS earnings equation is com-monly believed to lead to endogeneity problems, causing inconsistent results. This is especially relevant here due to the focus on different ethnic groups with distinct characteristics and attitudes towards schooling.

Next to the use of IV, Harmon and Walker (1995) also exploit a Heckman selectivity model (Heckman, 1979) which allows for the endogeneity of schooling, as well as its discontinuous nature in the data. This model will be discussed in detail in Section 3.3.

Interesting in Harmon and Walker (1995) however is the way they produce instruments to use for the schooling variable. They created dummy variables indicating an exogenous change in legislation concerning minimum school leaving age (SLA), and used them as instruments. This resulted in a minimum SLA of 14, 15 and 16 for children who turned 14 before 1947, between

(10)

1947 and 1971 and after 1971 respectively.

The initial OLS regression results of Harmon and Walker (1995) showed that one extra year of schooling resulted in an increase of the earnings with 6.13%. When taking into account the endogeneity of schooling by using IV, the increase was already more than doubled to 15.25%. This already shows the importance of taking the properties of the used regressors into consideration. Moreover, the effect of an additional year of schooling increased even further to 16.88% when they used the Heckman model, which they believe to be consistent. Obviously, this increase in the coefficient for schooling had some effect on the coefficients of the other variables too, though this effect is much smaller in their findings. The coefficients of age and age squared for instance dropped from 0.0836 and -0.0009 to 0.0779 and -0.0008 respectively. Furthermore, a significant negative estimate of the selectivity correction term implied a negative bias for OLS, for which they give measurement error in OLS as explanation.

3

Model and methods

As stated in Section 1, four different models will be used to estimate the effect of ethnicity on income. This section will briefly present the ideas and equations of each of the four models.

3.1

Model I – OLS

The analysis of the effect of ethnicity on wage starts with conducting OLS on the structural equation

yi = Xi0δ + β Si + Ei0γ + εi (2)

where the index i stands for the observation, yi is the log monthly/hourly wage, Xi is a vector of

control variables (including dummies), Si contains the level of schooling, Ei consists of ethnicity

dummies depicting whether the individual is a first/second generation and Western/non-Western immigrant and εi is a noise term. Since the main purpose of this research is to find the effect of

(11)

will be unbiased, consistent and efficient.

3.2

Model II – IV

In case S is not exogenous, IV will be applied. From (2) a small alteration is made to get to the 2 stage model used for IV, with structural equations

yi = Xi0δ + β Si + Ei0γ + εi (3)

Si = Zi0α + vi (4)

where Zi are instruments and Siis obtained by OLS in (4). When in fact Si is endogenous, while

all other regressors are exogenous, OLS will become inconsistent, but IV will still give consistent results.

3.3

Model III – Heckman

Following Harmon and Walker (1995), the Heckman model is estimated additionally, allowing for endogeneity of schooling, as well as its discontinuous nature. The Heckman model results in the following structural equations

yi = Xi0δ + β Si + Ei0γ + θ λi + εi∗ (5)

S∗i = Zi0α + vi (6)

Si = j i f µj−1 < S∗i ≤ µj (7)

with ˆλias in (8) and εi∗= εi− θ ˆλi, a heteroskedastic error term. In this method, (6) is first estimated

by ordered probit, where after these estimates are used to retrieve the inverse Mill’s ratio (in an ordered probit specification)

ˆ λi =

φ ( ˆµj−1,i) − φ ( ˆµji)

Φ( ˆµji) − Φ( ˆµj−1,i)

(12)

which is derived in Chen (2008, p. 278). Thereafter, the inverse Mill’s ratio can be used for OLS in (5). The implications of this method are that as long as the model is correctly specified, E(εi∗|Si = j) = 0, so that OLS on this extended specification will give consistent results again.

Finally, the standard errors are bootstrapped to correct for the extra variation caused by the 2 step method.

3.4

Model IV – Cosslett

As briefly mentioned in Section 1, the Heckman model unfoundedly assumes normally distributed errors. To get more general results that rely upon less assumptions, it might be wise to consider a semiparametric approach. One instance of such an approach is the Cosslett model (Cosslett, 1983), which is the semiparametric analogue of Heckman’s model according to Powell (1994).

Cosslett’s model adopts a minor change in (5) by using dummies instead of the inverse Mill’s ratio. As is well described in Hussinger (2008), this leads to the following equation.

yi = Xi0δ + β Si + Ei0γ + M

m=1

θmDim(Z0iα ) + εˆ i∗ (9)

Instead of performing an ordered probit in the first step, the semi-nonparametric method of Gallant and Nychka (1987) will be used to estimate ˆα . Hereafter, Ziα will be ordered to createˆ

M+ 1 dummies based upon its quantiles. As for the Heckman model, this method will result in

consistent OLS estimates on the extended specification and the standard errors are bootstrapped.

4

Data

This section gives information on the data used in this research. Some descriptive features of this data such as the number of observations and the distribution of variables used in the model are presented. Next to that, it also states some hypotheses about the effects of these variables in the wage equation. First, the general results for the full sample of 6176 observations will be presented, where after the focus will shift to a subsample of only 2400 observations. Here, the distribution of

(13)

the variables will be partitioned based upon the ethnic background and sex of the respondent. The reason behind the consideration of a smaller subsample will follow later.

Section 1 already mentioned that the data originates from the Longitudinal Internet Studies for the Social sciences (LISS) panel, which is conducted by CentERdata (Tilburg University, The Netherlands). This panel started in 2007 and is updated every month. It consists of around 4500 households and 7000 individuals of the Dutch population. From these households and individuals, information about their demographics, socio-economic characteristics and ethnic background is obtained. CentERdata claims the LISS panel is based on a true probability sample of households drawn from the population register by Statistics Netherlands, causing it to be a good representation of the population. For this research we have chosen to use the data of December 2016 and consider the adult subsample of the population of 18 years or over with a positive income.

4.1

General distribution

Table 1 displays the description of the studied data. We start off by discussing the dependent variable, which is the monthly gross income of the respondents. It seems that more than half of the sample earns betweene1000 and e3000 euros. Furthermore, the distribution is skewed to the right, which is in agreement with both economic theory and an argument about an older population following next.

As can be seen in Table 1, more than half of this sample is approaching or already over the pension age (which is around 66 years). This indicates a large part of the respondents has a steady income which is quite high (as opposed to newcomers in the labour market). A majority of elderly is also in accordance with the ageing of the Dutch population, as discussed by de Kruijf and Langenberg (2017).

One important regressor in the earnings equation is missing from the data set: work experi-ence. Economically, more work experience leads to a higher productivity and thus more income. Luckily, work experience has a more or less linear relationship with age in the following sense:

Age = 5 + Years o f Schooling + Work Experience. Hence, age can be used as regressor instead

(14)

set, causing the linear relationship to not be en-tirely accurate. Still this relationship is applied and a quadratic term is added to allow for non-linearity.

Of the respondents, a slight majority is male, while according to the CBS a slight ma-jority of women can be found in the Dutch pop-ulation. One obvious reason is that men form a higher percentage of the working force. Never-theless, these numbers do not differ that much with 50.4% men in the sample and 49.6% in the Dutch population. Most researchers find a neg-ative effect of being female on income, where a large share of researchers (Chiswick, 1980; Har-mon and Walker, 1995; Kee, 1995; Devereux and Hart, 2010) believe male and female income equations to be distributed differently. Hence, when focusing on finding the effect of other vari-ables as ethnicity, they either split up the investi-gation for men and women or concentrate solely on the male income equation. Since the differ-ent immigrant groups already contain relatively few observations and splitting the analysis up even more would only reduce this further, this research will focus on estimating the effects to-gether in one equation.

In our sample 56.5% is married, which is close to the percentage of the true population. Economically, marriage is believed to have a

Table 1: Description of studied data

Variable N % Observations in sample 6176 100 Monthly wage

Less thane1000 1039 16.8 e1000 - e2000 1672 27.1 e2000 - e3000 1668 27.0 e3000 - e4000 959 15.5 e4000 - e5000 405 6.6 e5000 or more 433 7.0 Age 18 - 29 882 13.3 30 - 49 1919 31.1 50 - 69 2356 38.2 70 and over 1079 17.5 Gender Male 3112 50.4 Female 3064 49.6 Marital status Married 3486 56.4 Other 2690 43.6 Background Dutch 4625 85.4 First gen/Western 188 3.5 Second gen/Western 287 5.3 First gen/non-Western 196 3.6 Second gen/non-Western 119 2.2 Urban 1 - Extremely 940 15.2 2 - Very 1628 26.4 3 - Moderately 1405 22.8 4 - Slightly 1320 21.4 5 - Not 883 14.3 Education 1 - Primary school 213 3.5 2 - VMBO 1361 22.0 3 - HAVO/VWO 624 10.1 4 - MBO 1615 26.2 5 - HBO 1614 26.1 6 - University 749 12.1 Legislation 0 - Original 1955 31.7 1 - 1969 Change 572 9.3 2 - 1975 Change 2268 36.7 3 - 1985 Change 731 11.8 4 - 2007 Change 650 10.5

(15)

positive effect on income for males and negative for females, while discussion about the underlying reasons exists (Ginther and Zavodny, 2001).

Next, we have come to the discussion of our variable of interest. With more than 85% of the sample being of Dutch nationality, our reference group will be the vast majority. This might lead to some problems, since the main purpose of this research is finding the effects of ethnic background on income. Especially for the more flexible estimation techniques such as Heckman and Cosslett, the small sample sizes of certain immigrant groups might lead to insignificant results due to lower efficiency.

As mentioned in Section 1, we consider four different immigrant groups. Firstly, a distinction is made between first generation, being individuals born in a country other than the Netherlands, and second generation, being individuals born in the Netherlands whose parents (or at least one of them) were born in a different country. Secondly, these generations are partitioned upon originating from a Western or non-Western country (see Section 1 for a definition of a Western country). As can be seen in Table 1, second generation/Western immigrants are the largest share of immigrants in our data with 5.3%. Hereafter follow, in descending order of share, first generation/non-Western immigrants (3.6%), first generation/Western immigrants (3.5%) and finally second generation/non-Western immigrants (2.2%).

The next variable that can be found in Table 1 displays whether the respondent lives in an urban neighbourhood. It can be seen that most people live somewhere between a very urban and slightly urban area. The first class, extremely urban, will serve as reference group. Together with the belief that urban areas tend to be more expensive and require more income, we expect all other classes to have a negative effect on wages.

The education variable might need some explanation for foreign readers, as it is specified by the Dutch educational system. First of all, we assume primary school and university to be commonly used and to speak for themselves. The second and third level of the variable, VMBO and HAVO/VWO however are not internationally used, but are similar to junior and senior high school in the US respectively. Furthermore, HBO and MBO have a close relation with college and junior college in the US respectively.

(16)

We find that less than 4% of the respondents quit after primary school, and about 32% dropped out after obtaining a high school degree. Since children are legally bound to continue after primary school nowadays, we expect the group who only obtained a primary school degree to only consist of elderly. Next to these two groups, we find that a majority of respondents has graduated from (junior) college, whereas only 12% obtained a university degree. This indicates that while most people in this data set are higher educated, university is still quite exclusive. Note again that in our data set the variable is nominal, stating the highest completed level of education, rather than continuous or discrete stating the total years of schooling. Since primary school is taken as the lowest level, we expect all other (higher) classes to have a positive effect on income.

Lastly, we have come to the legislation instruments, which were already discussed in Section 2.3. These dummies are based upon four changes in legislation which will be addressed first. Hereafter a brief explanation of how the variable was obtained exactly will follow.

In 1969 the compulsory education duration was changed from 6 to 9 years, with a final age of 15 years. Hereafter, in 1975 the duration was extended to 10 years, leading to a final minimum age of 16 years. Instead of raising the final age, the age of entering the educational system was lowered from 6 to 5 years in 1985. Finally, in 2007 if you were under the age of 18 and did not have an MBO level 2 or HAVO/VWO diploma yet, you were still obliged to go to school until being 18.

The legislation changes in 1969, 1975 and 2007 were all considering an increase of the minimum school leaving age. Respectively, they changed the age from 12 to 15, to 16, to 18 years, which had an effect on people born after 1954 (1969 - 15 =1954), 1959 (1975 - 16 = 1959) and 1989 (2007 - 18 = 1989). Only the change in 1980 regarded a decrease of the entrance age from 6 to 5, effecting people born after 1980 (1985 - 5 = 1980).

The question however remains how well these instruments will behave in practice. Since first generation immigrants might have had education elsewhere, these instruments might not be relevant for them. Hence it is possible we will have to find other instruments later on.

(17)

Table 2a: Mean of variables grouped by ethnic background

Natives First gen/Western First gen/non-Western Second gen/Western Second gen/non-Western

Number of observations 2052 76 92 118 62

Monthly gross income 2794.88 3106.02 2567.50 2855.11 2770.78

Weekly hours worked 31.02 32.41 33.67 33.86 33.87

Hourly wage 23.39 24.38 20.47 20.84 21.99 Age 46.08 46.64 44.59 45.64 34.79 Female 0.53 0.54 0.41 0.51 0.58 Married 0.56 0.55 0.58 0.42 0.29 Urban level 3.01 2.41 2.15 2.41 2.00 Schooling level 4.15 4.68 4.03 4.20 4.24

4.2

Distribution per ethnic background

So far, we have had a look at the aggregated numbers and means over all ethnic groups. However, since we are interested in the characteristic and income differences between these groups, we will now consider them separate and repeat the discussion of the variables. We will also include the weekly hours worked. Since there are a lot of missing values for this variable, the number of observations drastically drops to only 2400. The results can be found in Table 2a.

Firstly, note that first generation/Western immigrants do not only have the highest average monthly income between the four immigrant groups, but that it is also higher than that of natives. Followed by their second generation, the Western immigrants seem to do quite well. This is not completely shocking, since we expect Western immigrants to mostly move to the Netherlands when they are offered well paid jobs.

We however see that these income differences are partly explained by the amount of hours they work. While natives only work around 31 hours a week on average, the first and second generation/Western immigrants respectively work more than 32 and almost 34 hours a week. The effect of these differences in hours worked is better displayed by the hourly wage, where we see that while first generation/Western immigrants still have a higher hourly wage than natives, the same cannot be said for their second generation. For the two non-Western immigrant groups, we find that they both have a lower monthly income, work more hours, and thus also have a lower hourly wage, than natives. As opposed to the Western immigrants, we expect non-Western immigrants to come here for opportunities they did not have in their home country and to be satisfied with less, as it is often already more than they are offered back home, which explains this result.

(18)

Table 2b: Mean of variables grouped by ethnic background and sex

Natives First gen/Western First gen/non-Western Second gen/Western Second gen/non-Western Male Female Male Female Male Female Male Female Male Female Number of observations 968 1084 35 41 54 38 58 60 26 36 Fraction 0.47 0.53 0.46 0.54 0.59 0.41 0.49 0.51 0.42 0.58 Monthly gross income 3475.13 2187.42 3596.09 2687.67 2780.72 2264.51 3428.61 2300.74 3411.35 2308.15 Weekly hours worked 36.22 26.38 36.54 28.88 36.31 29.92 37.33 30.52 35.15 32.94 Hourly wage 25.28 21.70 24.05 24.66 19.84 21.37 23.21 18.54 24.18 20.40 Age 47.30 44.99 47.97 45.51 45.07 43.89 45.60 45.67 38.50 32.11 Married 0.60 0.53 0.54 0.56 0.57 0.58 0.41 0.42 0.38 0.22 Urban level 2.97 3.05 2.29 2.51 2.11 2.21 2.57 2.25 1.92 2.06 Schooling level 4.17 4.13 4.80 4.59 4.04 4.03 4.14 4.27 3.96 4.44

Secondly, the second generation/non-Western immigrant group is by far the youngest on average. Where all the other groups have an average age of around 45 years, this group is only 35 years old on average. Hence we would expect them to have a lower wage than the others. This same group also has a considerable smaller fraction of married people, which might also be explained by the relative young age of this group, as people tend to marry at an older age nowadays (that is if they even get married).

Thirdly, Looking at the urban level we find that non-Western immigrants have the lowest average level (indicating living in a quite urban area), followed by Western immigrants and finally natives. Assuming Western immigrants are more alike natives than non-Western immigrants are, since they are both of Western origin, it seems people tend to move away from cities the more Western/Dutch they become.

Lastly, first generation/Western immigrants have a considerable higher average schooling level than natives, which probably explains why they also have a higher hourly wage.

Additionally, we will make a distinction between the distribution for male and female re-spondents, which is presented in Table 2b. We firstly find that females have a considerable lower monthly income. However, we see that they also work up to ten hours a week less, which explains the difference in monthly income for a big part. When taking this into account it turns out that for natives and second generation immigrants, women still have a lower hourly wage than men, though this effect is already much smaller. Remarkably so, first generation women even seem to get a higher average hourly wage than men. This cannot be explained by a higher educational level, since for these two groups women are less educated (although only marginally for the non-Western group).

(19)

5

Results

This section presents and explains the estimation results following from the different models of Section 3. Next to that, the results of each of the four methods are compared and discussed. Differ-ent instrumDiffer-ents and variables are used to get to a conclusion.

5.1

Monthly income

5.1.1 Legislation instruments Figure 1: Coefficients of Cosslett’s method dummies

Next to OLS, the research started with the use

of legislation dummies as instruments. The

Durban-Wu-Hausman test indicates we can re-ject the null of schooling being exogenous (p-value 0.0000), justifying to some extent the use of IV, Heckman and Cosslett to correct for this endogeneity.

The results of the (final stage)

regres-sions are presented in Table 3a. An important note to make is that additional to the variables in this table, the quantile dummies were also used in the Cosslett method (Column 4). However, these estimates are not listed in the table, but instead plotted in Figure 1. From this figure it seems the coefficients of the quantiles increase almost linearly. Furthermore, to avoid too much repetition, the Heckman model and Cosslett model will sometimes be referenced to by the use of just ‘Heckman’ and ‘Cosslett’ respectively.

To begin with, all four models agree that women earn around 25% less per month. Although this is in accordance with findings from other researchers (Bobbitt-Zeher, 2007; Merluzzi and Do-brev, 2015), another cause might be present. The dependent variable is the log gross monthly income, not taking into account the amount of hours someone has worked. Since women usually play a bigger role in the education of children, they might be more inclined to work part-time (or not at all) and hence less hours. Possible solutions to correct for this are to either include the log hours

(20)

Table 3a: Final stage regression results using legislation instruments

Dependent variable: Log income

(1) (2) (3) (4)

OLS IV Heckman Cosslett

Variable Estimate Significance Estimate Significance Estimate Significance Estimate Significance Constant 4.9478 *** 2.0176 *** 4.7813 *** 4.1751 *** Female -0.2562 *** -0.2436 *** -0.2540 *** -0.2593 *** Married 0.1117 *** -0.0341 0.1062 *** 0.0162 Female∗Married -0.4451 *** -0.2412 *** -0.1195 *** -0.3040 *** Age 0.0919 *** 0.0611 *** 0.0884 *** 0.0791 *** Age2 -0.0007 *** -0.0004 *** -0.0008 *** -0.0006 *** Schooling 0.1913 *** 0.9106 *** 0.1859 *** 0.1811 *** First gen/Western -0.0298 -0.3858 *** -0.1077 -0.3100 *** First gen/non-Western -0.0884 0.1222 -0.0613 0.0596 Second gen/Western -0.0333 -0.0378 -0.0420 -0.0407 Second gen/non-Western 0.0137 0.1476 0.0383 0.0902 Urban - 2 -0.0169 0.1668 *** 0.0512 0.0798 Urban - 3 -0.0363 0.2650 *** 0.0410 0.1497 ** Urban - 4 -0.0277 0.3109 *** 0.0448 0.1917 *** Urban - 5 -0.0798 0.2820 *** 0.0009 0.1190 * Selectivity correction -0.0277

Note: ***, ** and * are respectively 1%, 5% and 10% significant. Cluster robust standard errors are used for OLS and IV, while the standard errors of Heckman and Cosslett are bootstrapped with B = 1000.

worked as regressor in the equation, or use the hourly wage instead of the monthly wage. Since the log hours worked is most likely endogenous (for instance when women start working part-time when they get married or have children), we decide to avoid this specification. The other solution, where we consider the hourly wage instead, will be investigated in Section 5.2. We however lose a lot of observations due to missing values when considering the hours worked or hourly wage, thus the first part of this paper estimates a reduced form omitting the hours worked as explanatory variable.

Continuing with the reduced form, the different models do not give coherent results for the marriage variable. On the one hand OLS and Heckman agree marriage has a positive effect of around 11%. However, on the other hand IV and Cosslett do not find a significant effect. Since the reference group is unmarried men and an interaction term between female and married is included, this effect is for men only. According to Ginther and Zavodny (2001) married men earn 10 to 20% more than other men, which is in agreement with our findings. For women, the effect can be found by adding both terms concerning marriage together, which results in a negative effect of give or take 30% for OLS, IV and Cosslett. Heckman however finds a negative effect of only 1%.

As proxy for work experience the estimates for age show a positive effect of 6 to 9%, as well as a decreasing quadratic effect. This is fully in agreement with the argument of Section 4 that

(21)

more work experience leads to higher productivity and thus more income.

Coming to our endogenous variable, schooling, something peculiar seems to occur. While OLS, Heckman and Cosslett all find an effect of around 19% per increased level, IV finds an effect of 91%, which is more than four times bigger. This IV estimate seems very unrealistic, but note again that we consider a schooling level instead of the more standard years of schooling. Thus it might not be completely bizarre to experience a high increase in income when one completes another level. Nevertheless, a possible explanation could also be that the used instruments based on the legislation changes are weak and/or endogenous, leading to a bias in the estimates. The investigation of these instruments will follow. Yet, all methods find a positive and significant effect of schooling on income, which is consistent with the theory of return to schooling.

Not much significance is found for any of the ethnic background variables. Only for first generation/Western immigrants, significant estimates are found in the IV and Cosslett method. These models find a negative effect of 39% and 31% respectively. As was addressed in Section 4, while there might be little effect of background on income, the insignificant results might also be caused by the small number of observations in the immigrant groups.

From the urban dummies we would expect to see increasing negative signs such as for the OLS estimates. These estimates are insignificant, while positive significant estimates can be found for the IV and Cosslett models. This is counterintuitive since it contradicts the argument made in Section 4 of more expensive urban areas. Of course, one could think of other reasons that do support the data, such as wealthier people moving out of the city to buy bigger properties, in a better neighbourhood. Lastly, the selectivity correction term from the Heckman equation is insignificant, which might indicate there is no selection bias.

Table 3b shows the first stage regression results for IV, Heckman and Cosslett. We firstly note that while a constant is present for IV, it seems absent for Heckman and Cosslett. In fact, a constant is also included for these models, but it is absorbed by the thresholds and hence missing from the table. All three methods produce more or less similar results (same sign and magnitude), at least for the amount of interest we have in these auxiliary regression results. Since the results for ordered probit (Column 2) and the method of Gallant and Nychka (1987) (Column 3) are harder to

(22)

Table 3b: First stage regression results using legislation instruments

Dependent variable: Level of schooling

(1) (2) (3)

IV Heckman Cosslett Variable Estimate Significance Estimate Significance Estimate Significance Constant 4.0529 *** Female -0.0134 -0.0054 0.0125 Married 0.1964 *** 0.1542 *** 0.1414 *** Female∗Married -0.2885 *** -0.2146 *** -0.2261 *** Age 0.0172 0.0149 0.0164 Age2 -0.0004 *** -0.0003 *** -0.0004 ** First gen/Western 0.4827 *** 0.4090 *** 0.4449 *** First gen/non-Western -0.3009 *** -0.2297 ** -0.2440 ** Second gen/Western -0.0101 -0.0019 -0.0049 Second gen/non-Western -0.2179 -0.1738 -0.1431 Urban - 2 -0.2552 *** -0.2367 *** -0.1467 *** Urban - 3 -0.4022 *** -0.3638 *** -0.2679 *** Urban - 4 -0.4642 *** -0.4105 *** -0.3238 *** Urban - 5 -0.4856 *** -0.4373 *** -0.2847 *** Legis - 0 0.3413 * 0.2412 0.2069 Legis - 1 0.2905 0.1978 0.1601 Legis - 2 0.3231 ** 0.2008 * 0.2507 Legis - 3 0.6644 *** 0.4805 *** 0.5425 *** Note: ***, ** and * are respectively 1%, 5% and 10% significant. Cluster robust standard errors are used for IV, while the standard errors of Heckman and Cosslett are bootstrapped with B = 1000.

interpret than OLS (Column 1), we will focus the discussion on the latter.

Column 1 indicates that our reference group, unmarried native men living in an extremely urbanised area who were born after 1989, have an MBO degree on average. Next to that, there is no significant difference between the level of schooling of men and women. Married men seem to be higher educated on average, while the opposite is true for married women. Age does not play a big role, as is expected since we consider the working subsample, who have all completed their education and thus most likely will not gather (a lot) more degrees over the years.

We find that first generation/Western immigrants are higher educated than natives, which supports the argument made is Section 4.2 of this group only migrating for well paid jobs and thus probably also having a high education. Furthermore, first generation/non-Western immigrants seem to be lower educated, while both second generations have no significant differences with natives.

There is a high correlation between level of schooling and the urban dummies, indicating that the less urbanised people live, the lower their level of schooling is. Lastly, the estimates for

(23)

Table 4: OLS estimates including legislation dummies

Dependent variable: Log income

Variable Estimate Significance

Constant 5.0441 *** Female -0.2533 *** Married 0.1078 *** Female∗Married -0.4555 *** Age 0.0636 *** Age2 -0.0005 *** Schooling 0.1821 *** First gen/Western -0.0246 First gen/non-Western -0.1161 ** Second gen/Western -0.0484 Second gen/non-Western -0.0067 Urban - 2 -0.0185 Urban - 3 -0.0342 Urban - 4 -0.0281 Urban - 5 -0.0743 ** Legis - 1 0.1315 *** Legis - 2 0.3234 *** Legis - 3 0.5458 *** Legis - 4 0.0442

Note: ***, ** and * are respectively 1%, 5% and 10% significant. Cluster robust standard errors are used.

the legislation instruments are behaving oddly. Since we have taken the fourth legislation as the reference group, we would expect all the coefficients to be negative (the milder a legislation is, the lower the level of schooling), but the opposite is true. While they are jointly significant (p-value 0.0000), there is much less significance in Column 2 and 3.

Continuing with the investigation of the instruments, we firstly have a look at the Montiel-Pflueger robust weak instrument test, which with an effective F statistic value of 17.83 suggests the instruments are not very weak, giving a worst case bias of at most 10% (with 95% confidence). Focussing on the exogeneity secondly, whilst theoretically it seems odd these changes are endoge-nous, the data does suggest it. Based on the Sargan-Hansen statistic to test for overidentification of all instruments, we find a p-value of 0.0002, indicating we can reject the null of valid/exogenous in-struments. Furthermore, an additional OLS regression (Table 4) including the legislation dummies also shows that the instruments are correlated with log income, which is supported by a p-value of 0.0000 in a joint significance test. Note that while the reference legislation dummy has changed,

(24)

this has no real implications for the results. To get more realistic results, other instruments will be used instead next.

5.1.2 Regional instruments

Since some problems occurred with using the legislation dummies as instruments, which resulted in odd estimates, alternative instruments need to be found. Not a lot of additional variables being available in the data set, together with the fact that the urban dummies from Table 3a seemed irrele-vant, which is supported by a p-value of 0.1115 in a joint significance test, lead to the consideration of using these dummies as instruments for the schooling equation instead. While we do not act upon this, we must note the estimates for these dummies in Table 3a weren’t insignificant for IV.

Using these instruments, we find a Durbin-Wu-Hausman p-value of 0.0389, indicating there is enough evidence to reject the null of schooling being exogenous. Furthermore, the first stage test results indicate that these instruments are valid (Sargan-Hansen p-value 0.3707), relevant (underi-dentification test p-value 0.0000) and strong (weak i(underi-dentification test relative maximal relative bias around 5%). A theoretical justification for the use of these dummies as instruments is that proxim-ity to a school encourages to attend it. Since more schools, and definitely higher level schools, are present in an urban area this might effect the highest level of degree a respondent has obtained. To illustrate, when a respondent lives at walking distance from a university, he/she is more inclined to go there than someone that lives 50 kilometers away. One downside however of using these vari-ables as instruments is that people tend to move and might live elsewhere than they did when they attended school. Still, as discussed the instruments seem to work quite well according to statistical tests.

The final stage regression results can be found in Table 5a. From scanning the results it is already rather obvious the estimates are much more coherent throughout the different models. Firstly, all models agree on a negative effect of 25% for women, which coincides with the previous results from Table 3a.

The male marriage premium is estimated less coherent. While OLS and Heckman both find a positive effect of 10%, the estimates of IV and Cosslett are somewhat lower at 9% and 7%

(25)

Table 5a: Final stage regression results using urban instruments

Dependent variable: Log income

(1) (2) (3) (4)

OLS IV Heckman Cosslett

Variable Estimate Significance Estimate Significance Estimate Significance Estimate Significance Constant 4.9186 *** 4.5148 *** 4.888 *** 4.4219 *** Female -0.2554 *** -0.2547 *** -0.2553 *** -0.2500 *** Married 0.1054 *** 0.0892 *** 0.1035 *** 0.0695 ** Female∗Married -0.4388 *** -0.4082 *** -0.4328 *** -0.3722 *** Age 0.0823 *** 0.0793 *** 0.0864 *** 0.0939 *** Age2 -0.0007 *** -0.0007 *** -0.0008 *** -0.0008 *** Schooling 0.1929 *** 0.2993 *** 0.1901 *** 0.1872 *** First gen/Western -0.0252 -0.0836 0.4156 *** -0.1815 ** First gen/non-Western -0.0732 -0.0559 -0.2242 ** -0.0402 Second gen/Western -0.0282 -0.0331 0.0105 -0.0481 Second gen/non-Western 0.0274 0.0337 -0.1474 0.0429 Selectivity correction -0.0184 ***

Note: ***, ** and * are respectively 1%, 5% and 10% significant. Cluster robust standard errors are used for OLS and IV, while the standard errors of Heckman and Cosslett are bootstrapped with B = 1000.

respectively. Nevertheless, all methods agree on a significant positive premium, as opposed to Table 3a where significance and sign varied with model. Furthermore, all methods also find a marriage penalty for women between 30 and 33%, which is similar to Table 3a, although here Heckman is also in line with the other methods.

The estimates of age and age2 are again similar to the previous results, yet slightly more coherent over the different models, finding an effect of around 8% and -0.08% respectively.

For the effect of schooling most methods find a positive coefficient 0.19. IV however, still estimates this effect significantly larger with a coefficient of 0.30. Nonetheless, this difference is much smaller than in Table 3a.

It again turns out not many significant estimates for the immigrant dummies are obtained. While OLS and IV result in insignificant estimates for first generation/Western immigrants, Heck-man and Cosslett do find significance. However, the latter two contradict each other, with HeckHeck-man finding a positive significant effect of 42% and Cosslett a negative effect of 18%. Hence, taken all together not much can be said about this effect. For first generation/non-Western immigrants, only the Heckman method finds a significant effect (-22%). Nevertheless, all other methods agree on the negative sign of this effect, making it fair to assume a small disadvantage for this group can be found, though its magnitude seems unclear. For the two second generation immigrant groups no significance can be found in any of the models. Moreover, the sign of the effect even differs over

(26)

Table 5b: First stage regression results using urban instruments

Dependent variable: Level of schooling

(1) (2) (3)

IV Heckman Cosslett Variable Estimate Significance Estimate Significance Estimate Significance Constant 4.0736 *** Female -0.0176 -0.0087 -0.0192 Married 0.2027 *** 0.1572 *** 0.1447 *** Female∗Married -0.2743 *** -0.2032 *** -0.2091 *** Age 0.0299 *** 0.0210 *** 0.0281 ** Age2 -0.0005 *** -0.0004 *** -0.0005 *** First gen/Western 0.4950 *** 0.4156 *** 0.5091 *** First gen/non-Western -0.2928 ** -0.2242 ** -0.1964 * Second gen/Western 0.0062 0.0105 0.0178 Second gen/non-Western -0.1861 -0.1474 -0.1114 Urban - 2 -0.2554 *** -0.2351 *** -0.1724 *** Urban - 3 -0.4189 *** -0.3738 *** -0.3126 *** Urban - 4 -0.4708 *** -0.4127 *** -0.3640 *** Urban - 5 -0.5030 *** -0.4469 *** -0.3408 *** Note: ***, ** and * are respectively 1%, 5% and 10% significant. Cluster robust standard errors are used for IV, while the standard errors of Heckman and Cosslett are bootstrapped with B = 1000.

the four methods.

Lastly, the estimate of the selectivity correction term is significant, suggesting a selection bias indeed existed, resulting in biased OLS estimates and justifying the Heckman method. If we believe to correct for unobserved ability, the negative sign of the estimate is not as expected, since we consider ability to be positively correlated with both schooling level and log income. However, we can also think of other unobserved effects, such as the different attitude of certain groups towards schooling (see Section 1), or technical insight, causing some individuals to quit school early and start working, whilst still being productive and hence obtaining high wages. This would then imply a negative selectivity correction term coefficient.

Table 5b shows the first stage regression results using the urban dummies as instruments. Comparing this table to Table 3b, which displayed the first stage results using the legislation in-struments, the only noteworthy difference is that age does get significant estimates here. Moreover these estimates are all positive (although small, only 2 to 3%), indicating some positive relation between age and level of schooling exists. This would mean that the older people get, the more educated they get. Since employees sometimes obtain additional certificates/degrees to get a

(27)

pro-motion, this seems fair. The small magnitude of the effect can also be explained, since only a small number of people actually reach a higher level of schooling by retraining.

5.1.3 Aggregated effects

The previous models estimated the effect of the four different immigrant groups on income. The results were mostly insignificant, possibly due to the small sample sizes of these groups as discussed in Section 4. Of course it could also be possible no difference actually exists. Hence the research will now focus on estimating the aggregated effects of ethnicity on income, for which three different specifications are estimated.

Table 6a presents the results for these three specifications. Firstly, the total aggregated effect of being an immigrant, irrespective of country of origin or generation, is estimated (Column 1). Secondly, a distinction is made between a Western or non-Western origin (Column 2). Lastly, the immigrants are partitioned based upon generation, resulting in separate estimated effects for first and second generation immigrants (Column 3). Since the results for OLS and IV (see Appendix) are quite similar to Heckman and Cosslett, although IV still estimates the effect of schooling around 29% in all specifications, only the latter two are displayed in Table 6a.

First of all note that all the variables, apart from the ethnic dummies, give more or less the same results as in Table 5a. Thus, the discussion of these results will be skipped here.

Starting the discussion, Column 1 shows the effect of being an immigrant on income, com-pared to natives. As can be seen, Heckman estimates a negative effect of around 3%, while Cosslett finds a positive effect of less than 1%. The contradiction of the sign of the effect might seem trou-blesome, but since both estimates are insignificant (even at the 10% level), they both indicate there is not enough evidence to assume there is a difference in income between immigrants and natives, all else the same. This result is not so absurd, since Table 5a already showed some groups had an advantage while others had a disadvantage. When taken together these effects might cancel each other out.

(28)

15, 2017 Dif ferences in W ages Caused by Ethnicity in the Netherlands J.R. Tjon-Af fo

Dependent variable: Log income

(1) (2) (3)

Heckman Cosslett Heckman Cosslett Heckman Cosslett Variable Estimate Significance Estimate Significance Estimate Significance Estimate Significance Estimate Significance Estimate Significance Constant 4.9045 *** 4.4876 *** 4.8873 *** 4.4631 *** 4.8802 *** 4.4197 *** Female -0.2579 *** -0.2585 *** -0.2549 *** -0.2552 *** -0.2549 *** -0.2530 *** Married 0.1094 *** 0.0715 ** 0.1101 *** 0.0672 ** 0.1094 *** 0.0689 ** Female∗Married -0.4424 *** -0.3706 *** -0.4463 *** -0.3670 *** -0.4460 *** -0.3662 *** Age 0.0863 *** 0.0908 *** 0.0872 *** 0.0926 *** 0.0873 *** 0.0927 *** Age2 -0.0008 *** -0.0007 *** -0.0008 *** -0.0008 *** -0.0008 *** -0.0008 *** Schooling 0.1912 *** 0.1860 *** 0.1918 *** 0.1873 *** 0.1923 *** 0.1873 *** Immigrant -0.0293 0.0038 Western -0.0355 -0.1045 ** Non-Western -0.0370 -0.0082 First generation -0.0654 -0.1294 ** Second generation -0.0185 -0.0231 Selectivity correction 0.0072 0.0107 0.0098 Note: ***, ** and * are respectively 1%, 5% and 10% significant. The standard errors are bootstrapped with B = 1000.

Table 6b: Aggregated first stage regression results using urban instruments

Dependent variable: Level of schooling

(1) (2) (3)

Heckman Cosslett Heckman Cosslett Heckman Cosslett Variable Estimate Significance Estimate Significance Estimate Significance Estimate Significance Estimate Significance Estimate Significance Female -0.0086 -0.0059 -0.0039 -0.0011 -0.0039 -0.0039 Married 0.1568 *** 0.1348 *** 0.1609 *** 0.1481 *** 0.1549 *** 0.1322 *** Female∗Married -0.2024 *** -0.2035 *** -0.2070 *** -0.2196 *** -0.2021 *** -0.2070 *** Age 0.0206 *** 0.0313 *** 0.0210 *** 0.0313 *** 0.0207 *** 0.0309 *** Age2 -0.0003 *** -0.0005 *** -0.0004 *** -0.0005 *** -0.0003 *** -0.0005 *** Immigrant -0.0982 *** -0.1131 *** Western 0.1687 *** 0.2092 *** Non-Western -0.1952 *** -0.1619 ** First generation 0.0912 0.1205 Second generation -0.0318 -0.0184 Urban - 1 0.4463 *** 0.3111 *** 0.4476 *** 0.3276 *** 0.4243 *** 0.2957 *** Urban - 2 0.2129 *** 0.1628 *** 0.2096 *** 0.1645 *** 0.2026 *** 0.1561 *** Urban - 3 0.0711 * 0.0295 0.0718 * 0.0335 0.0673 0.0263 Urban - 4 0.0325 -0.0321 0.0326 -0.0196 0.0296 -0.0307 Note: ***, ** and * are respectively 1%, 5% and 10% significant. The standard errors are bootstrapped with B = 1000.

(29)

Next, Column 2 displays the effects of being either a Western or non-Western immigrant on income. First of all, both Heckman and Cosslett find a negative effect for Western immigrants, where the estimate of Cosslett is around 3 times bigger. While Heckman finds an insignificant effect of -4%, Cosslett finds an effect of -10%. Moreover, the estimate of Cosslett is significant. Together this might imply a negative effect of at most 10% indeed exists. Secondly, again Heckman and Cosslett agree on a negative effect for non-Western immigrants, but this effect is insignificant. Thus, there is not enough evidence non-Western immigrants earn differently than natives.

As we would expect non-Western immigrants to have bigger cultural differences with natives than Western immigrants, these results seem a bit contradictory. Another possible explanation might be present however. Individuals from Suriname and the Dutch Antilles form a large group of the immigrants in the Netherlands. These two countries were and are Dutch colonies respectively, causing their inhabitants to speak Dutch (amongst others) and be taught some Dutch culture. Hence immigrants from these groups resemble natives more in some ways, while they are counted as non-Western immigrants. Thus immigrants from Suriname or the Dutch Antilles might positively bias the results causing the estimates of non-Western immigrants to be insignificant.

Lastly, Column 3 presents the estimates for first and second generation immigrants. Looking at the results for first generation immigrants first, we again find a negative effect in both models. While Heckman finds an insignificant effect of -7%, Cosslett’s estimate is almost doubled at -13%. Furthermore, Cosslett’s estimate is also significant, implying a negative effect of at most 13% is present for first generation immigrants. However, for second generation immigrants both models estimate a negative effect of 2%. Nevertheless both these estimates are insignificant, giving a lack of evidence to conclude these effects are actually present. These results are consistent with the belief that second generation immigrants are already more adapted to the Dutch culture and probably have a better control of the language than their parents, causing them to experience a smaller disadvantage in the labour market. A small note is that using OLS and IV no significance whatsoever for the ethnic dummies can be found in any of the specifications (see Appendix).

Shifting our focus to Table 6b, we again skip the discussion of the first few variables since there is not much difference with the previously obtained results (Tables 3b and 5b). We however

(30)

find that immigrants on average have a lower level of education while when split up, Western immi-grants are higher and non-Western immiimmi-grants lower educated than natives, which is in agreement with Table 2a. When instead, we split the immigrants up by generation, no significant difference can be found. This is most likely caused by the effects of Western and non-Western immigrants cancelling each other out when they are grouped by generation.

Note that the sign of the urban instruments is reversed. This is due to Urban - 5 being the reference group here as opposed to Urban - 1 earlier. While this happened by accident, it has no real implications (other than the sign and estimate changes). We do find less significance here, which could indicate these dummies are less relevant instruments here. However, when conducting a Sargan-Hansen and underidentification test, testing for validity and relevance of the instruments, we find p-values of around 0.37 and 0.00 for all specifications. This indicates that the instruments are valid and relevant. Moreover, a weak identification test indicates there is a relative maximal bias of around 5%, indicating the instruments are still strong too.

5.2

Hourly wage

As discussed in Section 5.1.1, we have not taken hours worked into account yet, while this is obviously an important regressor in the monthly income equation. The reason for this was that log hours worked is most likely endogenous, so we decided to consider a reduced form, whilst still keeping as many observations as possible. Recall that previously we had 6176 observations, while we end up with 2400 observations after considering the hourly wage, a considerable loss of 60%. However, we expect to have omitted variable bias in the previous analysis, giving a bigger negative effect for women (who work less hours) and smaller negative effects for immigrants (who work more hours). Hence we repeat some of the steps using the hourly wage here instead.

5.2.1 Individual effects

The small p-values of both the joint significance test (0.0260) on the legislation instruments when they are included in the OLS equation, and the Sargan-Hansen test (0.0677) indicate these dum-mies are still invalid/endogenous. The joint significance test on the regional dumdum-mies in the OLS

(31)

equation however, indicates there is not enough evidence to reject the null of them being jointly sig-nificant (p-value 0.2415). Thus we continue this research with the regional dummies as instruments instead. The Sargan-Hansen test indicates these instruments are valid (p-value 0.8286), which is in agreement with the previously mentioned joint significance test, the underidentification test that they are relevant (p-value 0.0000) and the weak identification test that they are more or less strong (maximal relative bias around 10%). Lastly, the Durbin-Wu-Hausman test for endogeneity also indicates the schooling variable is still endogenous (p-value 0.0360), (partly) justifying the use of IV and the Heckman and Cosslett model.

To begin with, we analyse the individuals effects. The results are presented in Table 7a. In a perfect world where everybody would work full-time, these results should be identical to the monthly income case (Table 5a), since then monthly income should just be a multiplication of hourly wage. Only the constant term would increase by log(38).

Nevertheless, starting with females, we already find a considerable decrease of disadvantage. While all methods agree they still earn significantly less than men, this has been weakened to only 6% instead of 25%.

The male marriage premium has shrunken too. Where the estimates ranged from 7 to 10% in Table 5a, they are now topped at 5%. Moreover, in IV and Cosslett’s model the estimates are even insignificant.

More drastically however is the change of the marriage penalty for women. While in Table 5a, the penalty lay between 30 and 33%, now it only ranges from 7 to 9%, whilst still being signifi-cant for all methods. Together with the remarkable decrease of the gender coefficient, this supports the argument of women often working part-time and thus receiving a lower monthly income (see Section 5.1.1), which can also be seen in Table 2b.

The effect of age and its quadratic term have also diminished, where the coefficient of age declined from more than 8% to less than 3%. This in turn suggests younger people more often work part-time. This can also be found in Table 8, where we see the youngest group indeed works the least hours.

(32)

Table 7a: Final stage regression results of log wage

Dependent variable: Log wage

(1) (2) (3) (4)

OLS IV Heckman Cosslett

Variable Estimate Significance Estimate Significance Estimate Significance Estimate Significance Constant 1.5640 *** 1.0749 *** 1.5415 *** 1.4263 *** Female -0.0570 ** -0.0620 ** -0.0576 ** -0.0589 ** Married 0.0523 ** 0.0394 0.0534 ** 0.0361 Female∗Married -0.1378 *** -0.1157 *** -0.1367 *** -0.1103 *** Age 0.0266 *** 0.0258 *** 0.0281 *** 0.0256 *** Age2 -0.0001 * -0.0001 -0.0002 * -0.0001 Schooling 0.1462 *** 0.2531 *** 0.1460 *** 0.1436 *** First gen/Western -0.0416 -0.0998 -0.0559 -0.0765 First gen/non-Western -0.1103 ** -0.0916 * -0.1206 ** -0.0985 Second gen/Western -0.0778 ** -0.0830 ** -0.0813 ** -0.0816 ** Second gen/non-Western 0.0603 0.0773 0.0514 0.0776 Selectivity correction -0.0045

Note: ***, ** and * are respectively 1%, 5% and 10% significant. Cluster robust standard errors are used for OLS and IV, while the standard errors of Heckman and Cosslett are bootstrapped with B = 1000.

Table 7b: First stage regression results of log wage

Dependent variable: Level of schooling

(1) (2) (3)

IV Heckman Cosslett Variable Estimate Significance Estimate Significance Estimate Significance Constant 4.2813 *** Female 0.0484 0.0371 0.0249 Married 0.1614 ** 0.1379 ** 0.1627 * Female∗Married -0.1867 * -0.1515 * -0.1803 Age 0.0096 0.0008 -0.0117 Age2 -0.0004 * -0.0002 -0.0001 First gen/Western 0.4539 *** 0.5135 *** 0.6890 *** First gen/non-Western -0.2977 * 0.2741 *** 0.3948 *** Second gen/Western -0.0244 0.1465 ** 0.2079 ** Second gen/non-Western -0.2826 0.0847 0.1451 Urban - 1 0.5281 *** 0.4368 *** 0.5944 *** Urban - 2 0.2803 *** -0.2398 * -0.2059 Urban - 3 0.1505 * -0.0258 -0.0316 Urban - 4 0.0698 -0.2439 -0.2136

Note: ***, ** and * are respectively 1%, 5% and 10% significant. Cluster robust standard errors are used for IV, while the standard errors of Heckman and Cosslett are bootstrapped with B = 1000.

Table 8: Average weekly hours worked per age group

Age Mean N 30 or younger 30.6627 338 31 - 40 33.0136 516 41 - 50 32.0230 610 51 - 60 31.4067 627 61 or older 28.1036 309

(33)

the models. IV however estimates this effect at 25%, which like the other models is a decrease of 4 percentage points.

Coming to the variables of interest, the ethnicity dummies, we firstly note that all four meth-ods agree there is no significant (dis)advantage for first generation/Western immigrants with respect to hourly wage.

For first generation/non-Western immigrants however the disadvantage is present, ranging from 9% in IV to 12% in the Heckman model. One important note is that while the Cosslett method estimate is insignificant, the p-value actually equalled 0.1005, making it close to significant at the 10% level. The fact that the effect for first generation/non-Western immigrants is larger than for first generation/Western immigrants supports the argument that non-Western immigrants have bigger cultural differences with natives resulting in more (unexplained) characteristic differences.

According to the four methods, second generation/Western immigrants do experience a lower wage as opposed to their parents, where this effect is estimated at around 8%. This seems in contradiction with the idea of adaption to culture of immigrants. However, a Wald test of the coefficients in the specification

H0: δFirst gen/Western + δSecond gen/Western = 0 (10)

for both OLS and IV, result in p-values of 0.59 and 0.83 respectively, concluding there is not enough statistical evidence to reject the null that these coefficients are equal. Furthermore, another Wald test for the joint significance of these coefficients

H0: δFirst gen/Western = 0 (11)

δSecond gen/Western = 0

does reject the null (p-values 0.059 and 0.029) that they both equal zero at the 10% level. Together this implies that although the effects for first generation/Western and second generation/Western immigrants might be equal, this effect will be significantly different from zero.

(34)

al-though they are insignificant. Hence by only looking at the estimates it seems the adaption ar-gument definitely applies here. The Wald tests for OLS and IV as in (10) agree there is enough statistical evidence, at the 10% level, to reject the null that the two effects for non-Western immi-grants are equal (p-values 0.0457 and 0.0724). Thus, we can conclude there is no wage difference between second generation/non-Western immigrants and natives, all other things the same.

Lastly, the coefficient of the selectivity correction term in the Heckman model is insignificant. This implies there is not enough statistical evidence of self-selection and we cannot justify the use of the Heckman model.

The first stage results are listed in Table 7b. We firstly note that the estimates are much less coherent throughout the models than in the previous first stage results. The estimates for first generation/Western immigrants do still imply this group is higher educated than natives. For first generation/non-Western immigrants we find our first contradictory results. While both Heckman and Cosslett estimate a positive effect on schooling for this group, IV estimates a negative one. Looking at second generation/Western immigrants, the same applies, although here the estimate of IV is insignificant. Furthermore we find no significance for second generation/non-Western immigrants. Thus, apart from the fact that first generation/Western immigrants are still higher educated than natives, not much can be said.

The estimates for the urban dummies aren’t behaving nicely whatsoever anymore, due to contradicting signs and significance. Nonetheless, the underidentification and Sargan-Hansen test still indicate the instruments are relevant and valid with p-values of around 0.0 and 0.8 respectively. Moreover, the instruments still seem strong, based on a worst case bias of around 10% resulting from a weak identifcation test and Montiel-Pflueger robust weak instrument test. Hence, the credi-bility of the final stage regression results is not affected much.

Although we have found considerably more significant results than in the previous sections, we will still have a look at the aggregated results in the next section to see if the patterns discussed agree with those results.

Referenties

GERELATEERDE DOCUMENTEN

This paper presents a general class of models for ordinal categorical data which can be specified by means of linear and/or log-linear equality and/or inequality restrictions on

To gain more information about the view of the general population of the Netherlands, Greece and the United Kingdom on the influence of immigrants on their culture, data from

Analysis involved exploration of the relationship between specific beliefs and the use of coping strategies, changes in coping strategies over each of three phases of the

This paper serves as a first step towards the development of a log data protocol for data collec- tion, analysis, and interpretation to support eHealth research and will be

The participation of stakeholders in both the regional design ateliers and the broader energy transition or spatial planning discourse was given as a logical explanation of the

Maybe the local attractor state in the insula gives rise to phenomenal self-awareness, and its involvement in the brain wide attractor network incorporating the ACC , TPJ , SMG

Generally, the Advertising Industry relies solely on non-personally identifiable information that it collects through a computer’s browsing experience, so they don’t actually know

Besides, 14 respondents argue that no clear definition of a results-oriented culture is communicated and that everyone has its own interpretation of it. All of