• No results found

Determinants of the wage of university graduates in the Netherlands

N/A
N/A
Protected

Academic year: 2021

Share "Determinants of the wage of university graduates in the Netherlands"

Copied!
48
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Faculty of Economics and Business

Amsterdam School of Economics

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided up into a number of sections and contains references. An outline can be something like (this is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page) (c) Introduction (d) Theoretical background (e) Model (f) Data (g) Empirical Analysis (h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you use should be logical) and the heading of the sections. You have a free choice how to list your references but be consistent. References in the text should contain the names of the authors and the year of publication. E.g. Heckman and McFadden (2013). In the case of three or more authors: list all names and year of publication in case of the rst reference and use the rst name and et al and year of publication for the other references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty as in the heading of this document. This combination is provided on Blackboard (in MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number (d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics

Determinants of The Wage of University Graduates in

The Netherlands

Melissa Paans

11083042

MSc in Econometrics Track: Econometrics

Date of final version: 12-08-2016 Supervisor: J.C.M van Ophem Second reader: K.J. van Garderen

Abstract

In this research the determinants of the wage are examined. We focused on the variables of the different study categories, alpha, beta, gamma and health. We found that the study choice has a significant influence on the wage and that choos-ing a health study instead of an alpha study increases the wage the most, namely by 16.17%. For the estimation we used a wage and a study choice equation and added terms for the unobserved heterogeneity. The endogeneity problem we faced is due to the relation between these terms. We accounted for this by estimating the combination of these equations and thereby the combinations of the terms for the unobserved heterogeneity. We also investigated the different types of individuals in the sample, in which we found four distinct types. Two types that accept a low wage and two that accept a high wage combined with how likely they are to choose a certain study.

(2)

Statement of Originality

This document is written by Melissa Paans who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Contents

1 Introduction 1

2 Literature review 3

2.1 The endogeneity problem . . . 3

2.2 The estimation method . . . 4

2.3 The explanatory variables of wage and study choice . . . 5

3 The estimation method 8 3.1 The linear model . . . 8

3.2 The multinomial logit model . . . 9

3.3 The final model . . . 11

3.4 The estimation . . . 12

4 Data description 14 4.1 The Dutch school system . . . 15

4.2 Study types . . . 16

4.3 The wage specification . . . 17

4.3.1 Education levels . . . 17

4.3.2 Study process . . . 18

4.3.3 Activities next to the study . . . 19

4.3.4 Personal factors . . . 20

4.4 The study choice equation . . . 22

4.4.1 Education levels . . . 22

4.4.2 Secondary school process . . . 23

4.4.3 Personal factors . . . 24

5 Results 25 5.1 Estimation results for the wage equation . . . 25

5.1.1 The interpretation of the wage determinants . . . 28

5.2 Estimation results for the study choice equations . . . 30

5.2.1 The interpretation of the study choice determinants . . . 33

(4)

5.3.1 The interpretation of the constant terms combinations . . . . 35

6 Conclusion 36

6.1 Discussion . . . 38

7 Appendix 39

(5)

List of Tables

1 Descriptive statistics dependent variable wage equation . . . 17

2 Descriptive statistics education levels . . . 18

3 Descriptive statistics study process . . . 19

4 Descriptive statistics activities next to the study . . . 20

5 Descriptive Statistics personal factors . . . 21

6 Descriptive statistics dependent variables study choice equation . . . 22

7 Descriptive statistics extra variables education levels . . . 23

8 Descriptive statistics secondary school process . . . 24

9 Descriptive statistics extra variables personal factors . . . 24

10 Estimation results wage equation . . . 26

11 Estimation results study choice equation . . . 30

12 Estimation results constant terms combinations . . . 34

13 Studies categorized in different types . . . 39

List of Figures

1 The Dutch school system . . . 16

(6)
(7)

1

Introduction

There are plenty of different jobs with different wages, but even if people have the same type of job there is no guarantee that they have the same salary. People know this and therefore it is interesting to find out which factors have an influence on the height of the wage. Perhaps there are some factors on which the employee can have an influence and thereby increase their own wage. One influence on wage height could be the type of job and for this the study choice is an important determinant. Last year, 43, 876 new students enrolled in a university bachelor (VSNU/CBS, 2015) and it could be interesting for new students to know the effect of their study choice on their future wage. This brings us to the research question of this research:

What effect does the study choice have on the wage of university graduates in The Netherlands?

We investigate this effect together with other possible determinants of the wage to get a good idea what leads to a certain wage. Together with estimating the deter-minants of the wage we also try to account for endogeneity. This endogeneity is due to the unobserved heterogeneity we face in this research. We are not able to find all the personal characteristics which can have an influence on the wage or the study choice. The expectation is that there is a correlation between the unobserved heterogeneity for the wage and for the study choice equation and this caused the endogeneity. The main goal in this thesis is to account for the endogeneity and find the wage determinants together with the different types of subjects in our sample. We do this by using the discrete factor approach introduced by Heckman and Singer (1984) which was also used in another research by Van Ours and Williams (2011). The method is based on estimating a combination of constant unknown terms to find different types of subjects in the data set. In this research we use two linear models with different constant terms for the wage equation and two multinomial logit models with different constant terms for the study choice equation. By doing this we allow for four different groups of individuals in our sample. We estimate the four possible combinations of these four models where each combination occurs with a certain probability.

(8)

For the estimation we used data from SEO Amsterdam economics. This is an independent foundation that does economic research and gives advise to different parties. This data is obtained from a yearly survey and consists of 58, 081 university graduates distributed over sixteen years. The data set contains educational, work and life related information. Before deciding on the variables we will try to find evidence from the literature. For the different study categories we will use: alpha, beta, gamma and health. Each category contains studies related to that study field. We include the variables beta, gamma and health because alpha is chosen to be the reference category. The other variables we include in the wage equation are related to the education levels the subject followed before he went to the university or af-ter obtaining the masaf-ter degree in af-terms of the different diplomas you can get in the Netherlands. Furthermore, we include variables for the average exam grade at the university, for doing a shorter program and for doing a special research master. There are also variables included for activities next to the study and working expe-rience. The last set of variables we included are all personal related, like ethnicity and sex.

The outline of this thesis is as follows. In Chapter 2 relevant literature about the endogeneity and unobserved heterogeneity problems is discussed. Other literature discussed is to find evidence for using this estimation method and for the variables we use in the estimation. In Chapter 3 an extensive description of the estimation method is given, together with a brief description of the estimation we did with Matlab. The data used is discussed in Chapter 4 and in Chapter 5 all the estimation results for the wage equation and the study choice equation are discussed, together with the estimation results of the different types of subjects in the data. Finally, in Chapter 6 the conclusions of this research are discussed.

(9)

2

Literature review

In this section the relevant literature is discussed. We used this literature to find evidence for the unobserved heterogeneity and to find out what the proposed solu-tions are in the literature. We also used this literature to find out which variables could have an influence on the wage and what drives people to choose a particular study.

In subsection 2.1 we discuss the evidence for the endogeneity problem, previous literature about the estimation method is discussed in subsection 2.2 and finally subsection 2.3 discusses literature about the variables used.

2.1 The endogeneity problem

The main problem in this research is the presence of endogeneity. This endogeneity is due to the correlation between the unobserved heterogeneity in the wage equation and the study choice equation. There is not only a relation between the study choice and the wage but there is also a relation between the wage and the study choice. This means that the (expected) wage can have an influence on the study choice as well. If the wage for a certain study is higher there is a chance that people decide to choose this study because of the wage expectations. In the paper of Frick and Maihaus (2016) it is stated that due to the human capital theory, rational individu-als base their decisions on costs and expected returns. This means that individuindividu-als choose their study based on the expected return and pick the option with the highest outcome. In the paper of Arcidiacono, Hotz, and Kang (2012) there is evidence for the relationship between the expected wage and the study choice. In this research students are asked to give their expectations about future earnings for their own study and for other studies. They estimated a model which took these expectations into account and one of their main findings was that the expected future earnings are an important determinant for the study choice. They found that there are large earning differences between the different studies and this has an influence on the study choice. Willis and Rosen (1978) did an even broader research. They tested the influence of expected earnings on the demand for schooling. Their main ques-tion was ‘are people triggered by future earnings when they have to decide whether

(10)

they want to study or not’. They concluded that higher expected life earnings have influence on the decision to attend university. They did this research for two differ-ent groups. Group A consisted of people who are currdiffer-ently attending college and group B consists of people with only a high school diploma. They tested the ratio between the earnings between the two groups. The ratio is bigger than 1. This in-dicates that the influence of the earnings on the decision to attend college is bigger for group B, whom are not already attending college. These researches confirmed our suspicions about this correlation and therefore there is a confirmation for the endogeneity problem. The econometric consequence of this endogeneity is that the estimation results of an OLS estimation are inconsistent. Because of this there is need for an estimation method which accounts for this. This is discussed in the next subsection.

In previous literature there is evidence that personal characteristics and family re-lated characteristics have an influence on the wage. In subsection 2.3 these char-acteristics are discussed. It is not easy and most likely impossible, to observe all the personal characteristics which can have an influence on the wage. There is also a measurement problem, since it is hard to measure how e.g. professional or how spontaneous someone is, but these characteristics can have an influence on the wage. Therefore, it is not possible to find all the personal variables which differ per sub-ject. The same holds for the personal characteristics which can have an influence on the study choice. This is the problem of unobserved heterogeneity in both of the specifications.

2.2 The estimation method

The estimation method used in this thesis is already used before in two papers. In the paper of Heckman and Singer (1984) the estimation method was introduced for the first time. In this paper they introduced a model for the analyses of dura-tion models with unobservables. In economic theory there is always the assumpdura-tion that individuals are homogeneous although this is not realistic for a real life sce-nario. Therefore, they used a model which accounts for the observed but more

(11)

importantly the unobserved heterogeneity. They showed that, at that moment, al-ternative models used were not able to control for unobserved heterogeneity in a correct way. Their proposed model was based on a given empirical distribution for the durations and an assumed functional form for the duration distribution. With this model, they were able to consistently estimate the population distribution of the unobserved variables together with the parameters of interest. This means that they were able to estimate the unobserved heterogeneity together with all the other parameters. Van Ours and Williams (2011) used the method of Heckman and Singer to investigate the relation between the duration of cannabis use and mental health. In this research there was a problem of unobserved heterogeneity as well, due to the unobserved personal characteristics or circumstances that can have an influence on mental health and cannabis use. For this estimation they used hazard functions for the decision to start and quit cannabis use and a Tobit model for the state of mental health. They used the discrete factor approach which allows for correlation in unobservables across multiple equations without doing assumptions about the distribution. This correlation was taken into account by using the joint density function for the two models. The model we used is similar to the model used in this paper except for the fact that they used a duration model. In this way our research is different from these previous researches. However, the endogeneity problem in our research is due to the correlation between the unobservables, so we found evidence for using this method to account for the endogeneity by using a joint model of the wage and study choice equation. Both papers showed us that the method worked well in their cases, this gave us confidence to use this model for our research as well.

2.3 The explanatory variables of wage and study choice

In this subsection the variables which are used in other researches are discussed. We will start with the variables which can have an influence on the wage. Willis and Rosen (1978) included variables for the scores of tests and the education duration. In the paper of Dahl (2002) it is stated that the returns on having a university degree varies across the different states in the USA. This gave us evidence that the different parts in the Netherlands can have an influence on the wage as well. In the paper of Chia and Miller (2008) the determinants of the starting salary of graduates

(12)

from the University of Western Australia are explored. They included variables for age, gender, language spoken at home, country of birth, disability status and high school attendance. Their parameters of interest were based on the students’ aca-demic performance and the field of study. The variables for the field of study are the variables of interest for our research as well. They found that the main deter-minant is the average grade achieved at the university. This result is of course not based on Dutch data, but we assume that these variables can have an influence in our research as well. Similar findings are reported in the papers of Hossain, Haque, and Haque (2015) and Pfeffer and Davis-Blake (1990). The first research is based on data from Bangladesh and analyses the determinant of wage differentials. They found that the main determinants of wage differences are education, age, gender and place of work. The situation in the Netherlands might not really comparable with the situation there, but we think that these variables are important for the wage. The aim of our research is not to find determinants which cause the differences in wages, but these variables can still be important for determining the wage. The second research is based on examining the determinants of the degree of wage dis-persion within organizations. In their analysis they found that race, gender, type of education and region of living/working are important determinants. Finally, Waqas (2013) also found evidence for differences in wage between natives and immigrants in England and Wales. Although these researches are different from ours, we think that these variables can have an influence on the wage.

We also found evidence for using variables for the study choice equation. In the paper of Willis and Rosen (1978) it is found that schooling is related to a person’s ability and family background. They found that there was a possible effect of father’s and mother’s education and work on the decision to attend college. They also included variables like religion, family composition and ability. Ability was measured by variables for the scores on tests e.g. reading, mechanical problem solving and mathematics. Finally, there can also be an effect of personal related variables on the study choice. In the Netherlands 53% of the students in college are woman (Merens & Van den Brakel, 2014). This is the same in our sample, 53% of the university graduates are woman. There are differences in what study men or women choose. We see that in the Netherlands only 26% of the beta students and

(13)

65% of the health students are female. In our sample 53.25% of the beta students and 81.71% of the health students are female. The most remarkable number is the percentage of female students in beta studies, this is way higher. There are also more women than men who choose social studies. There are also differences in study choice between ethnicities, although these differences are smaller.

(14)

3

The estimation method

In this section the estimation method is discussed. Within this analysis the main challenge was to deal with the endogeneity problem. The endogeneity problem is due to the correlation between the unobserved heterogeneity in the wage equation and the study choice equation. The unobserved heterogeneity is due to the fact that not all the 53,582 subjects are the same and it is hard to find or observe all the personal characteristics which will have influence on the wage and on the study choice. These characteristics, as for example professionalism or social abilities are unobserved. To account for the endogeneity we used the discrete factor approach introduced by Heckman and Singer (1984). This method is also used in Van Ours and Williams (2011) to address unobserved heterogeneity. Both researches are al-ready briefly discussed in subsection 2.2. The model we used consists of two parts, a linear model and a multinomial logit model. The models are connected with each other by the correlation between the unobserved heterogeneity in both models. In subsection 3.1 we discuss the linear model. In subsection 3.2 the multinomial logit model is discussed. We start with the general model, after which we add the extra terms. In subsection 3.3 the complete model is discussed and finally in subsection 3.4 we will discuss some details about the estimation we did.

3.1 The linear model

The first model we used is a linear model for the wage specification. The most important explanatory variables for this research are the dummy variables for the study categories beta, gamma and health. The linear part of the model is

lwagei= Xiβ + betai· α1+ gammai· α2+ healthi· α3+ i (1) where X are the explanatory variables and  is the error term. For the error term we can distinguish two parts, one part has a correlation with the study categories, say θ, and the other part is just the rest of the error term, say ν. If we apply this to the model in equation 1, we get the following model

(15)

lwagei = Xiβ + betai· α1+ gammai· α2+ healthi· α3+ θi+ νi. (2) Under the assumption of a normally distributed error term, this model can be es-timated with a maximum likelihood estimation. For this estimation we need the residuals, the difference between the actual value and the estimated value of lwage. Therefore, the residuals become

resi = lwagei− Xiβ − betaˆ i· ˆα1− gammai· ˆα2− healthi· ˆα3− ˆθi.

Using this and the normality assumption, the likelihood function of this linear model is liki1= 1 σ · φ( resi σ ) (3)

where σ is the standard deviation and φ is the normal probability density function.

3.2 The multinomial logit model

The study choice is modelled by a multinomial logit model. We have chosen for a logit specification because of the simplicity of the interpretation. We used a multino-mial logit model, instead of another logit model, because the explanatory variables are the same for all the four different alternatives. First we start with the expla-nation of the general multinomial logit model and then we will extend this to the model we used in this research.

For the general multinomial model, the probability that individual i choose study category yj is given by

pij = P r[yij = 1]

where j=[1,2,3,4]. There holds yj is the variable beta for j = 1, gamma for j = 2, health for j = 3 and alpha for j = 4. These variables are all dummy variables which are 1 if the subject belongs to that category and 0 otherwise.

(16)

There holds that for each subject one of the four dummy variables is 1. This means that all the subjects are categorized in one of the four groups. From this, the multinomial density per observation is as follows

f (yi) = pyi1i1 · pyi2i2· pyi3i3· pyi4i4. (4) There holds that the probabilities in equation 4 are given by

pij =

exp(x0iγj)

exp(x0iγ1) + exp(x0iγ2) + exp(x0iγ3) + exp(x0iγ4)

. (5)

In general there holds pi1+ pi2+ pi3+ pi4= 1, therefore to impose an identification restriction we choose γ4 = 0. Applying this to equation 5, we get the probabilities we used for the multinomial logit model, namely

pij =

exp(x0iγj)

1 + exp(x0iγ1) + exp(x0iγ2) + exp(x0iγ3)

. (6)

Using these probabilities and the multinomial density in equation 4, the likelihood functions for the alternatives are given by the following equations

liki2.1=

exp(x0iγ1)

1 + exp(x0iγ1) + exp(x0iγ2) + exp(x0iγ3) liki2.2=

exp(x0iγ2)

1 + exp(x0iγ1) + exp(x0iγ2) + exp(x0iγ3) liki2.3=

exp(x0iγ3)

1 + exp(x0iγ1) + exp(x0iγ2) + exp(x0iγ3)

(7) liki2.4=

1

1 + exp(x0iγ1) + exp(x0iγ2) + exp(x0iγ3)

All the likelihood functions only consist of one term because yj = 0 for all cases where y 6= j. Therefore, all the terms from other alternatives are zero.

Now we discuss the general multinomial logit model and the corresponding likeli-hood functions. In our model we added one constant term to the model, say η.

(17)

This constant is not the same for all the alternatives, therefore we added different constants for every alternative. With these extra terms, the likelihood functions become

liki2.1=

exp(x0iγ1+ η11)

1 + exp(x0iγ1+ η11) + exp(x0iγ2+ η12) + exp(x0iγ3+ η13) liki2.2=

exp(x0iγ2+ η12)

1 + exp(x0iγ1+ η11) + exp(x0iγ2+ η12) + exp(x0iγ3+ η13) liki2.3=

exp(x0iγ3+ η13)

1 + exp(x0iγ1+ η11) + exp(x0iγ2+ η12) + exp(x0iγ3+ η13)

(8) liki2.4=

1

1 + exp(x0iγ1+ η11) + exp(x0iγ2+ η12) + exp(x0iγ3+ η13) .

3.3 The final model

The problem in this model is the correlation between θ in the linear equation and η in the multinomial logit equation. This estimation method tries to solve this by estimating the product of the two models. For this analysis we used two different linear models and two different multinomial logit models. The explanatory variables are the same for the two different models of the same type, only the constant terms θ and η are different. Using this, the corresponding likelihood functions become

liki11 = 1 σ · φ(

lwagei− Xiβ − betai· α1− gammai· α2− healthi· α3− θ1

σ )

liki12 = 1 σ · φ(

lwagei− Xiβ − betai· α1− gammai· α2− healthi· α3− θ2

σ )

liki2.k.1=

exp(x0iγk+ η1k)

1 + exp(x0iγ1+ η11) + exp(x0iγ2+ η12) + exp(x0iγ3+ η13) liki2.4.1 =

1

1 + exp(x0iγ1+ η11) + exp(x0iγ2+ η12) + exp(x0iγ3+ η13)

(9) liki2.k.2=

exp(x0iγk+ η2k)

1 + exp(x0iγ1+ η21) + exp(x0iγ2+ η22) + exp(x0iγ3+ η23) liki2.4.2 =

1

1 + exp(x0iγ1+ η21) + exp(x0iγ2+ η22) + exp(x0iγ3+ η23)

(18)

gamma and k = 3 for the category health. Finally, the likelihood functions for k = 4 are for the category alpha. For simplicity we only show the general function for the alternatives of the multinomial logit model. In total there are 10 likelihood functions which we want to estimate in combination with each other. The purpose of this research is to estimate the combinations of the constant terms, so we want to estimate the combinations (θ1, η1k), (θ1, η2k) ,(θ2, η1k) and (θ2, η2k). These com-binations occur with a certain probability. We assumed that the first combination occurs with probability p1, the second combination with probability p2, the third combination with probability p3 and the last combination occurs with probability p4 = 1 − p1− p2− p3. The last probability is given by 1 − p1− p2− p3 because the probabilities of all the different options have to sum up to 1. If we apply this and combine the likelihoods in equation 9, the final model we want to estimate is

liki= p1·liki11·liki2.l.1+p2·liki11·liki2.l.2+p3·liki12·liki2.l.1+p4·liki12·liki2.1.2 (10) with l = 1, 2, 3, 4, for the different study categories.

3.4 The estimation

The model in equation 10 is defined per individual. To define the likelihood function, it is important to determine, per subject, to which alternative the subject belongs. Therefore, for every subject we checked to which of the four categories he belongs. After which the corresponding likelihood function for the study choice equation is used for every subject. The corresponding likelihood function is liki2.1if the subject belongs to the category beta, liki2.2 if the subject belongs to the category gamma, liki2.3 if the subject belongs to the category health and liki2.4if the subject belongs to the category alpha. The multiplication in the final model in equation 10 happens on subject level and is summed at subject level as well. This results in a vector with the likelihood functions per subject as entries. We know that the likelihood function in the end is the sum of all the individual likelihood functions. But before we do this, we first take the natural logarithm of the individual likelihood functions, this gives

(19)

logliki = log(liki) = log(p1·liki11·liki2.l.1+p2·liki11·liki2.l.2+p3·liki12·liki2.l.1+p4·liki12·liki2.1.2) It is common to use the log likelihood function instead of the normal likelihood

func-tion. After taking the logarithm we summed all the individual likelihood functions. The log likelihood function of the whole model then becomes

loglik = 53,582

X i=1

logliki This function is maximized.

In the estimation all the parameters are estimated. We used the negative log likeli-hood function and minimized this function with respect to all the different param-eters. In general the estimations of a multinomial logit model and a linear model are unique. That means, regardless of the starting point of the optimization, the estimation results converge to the same point. This was also the case for the two sep-arate models we used for this estimation, but not necessarily for the final model. A possible reason for this can be that the log likelihood function is not smooth enough to find only 1 possible minimum. To account for this we did the estimation not only once, but sixteen times with different starting points. For the starting points we tried to choose the values closest to the possible boundaries. For some of the starting points all the estimation results were insignificant, which is probably not a good minimum. After estimating the model 16 times, we selected the model with the smallest log likelihood value and significant estimates. We used this starting point for the final estimation.

(20)

4

Data description

For this thesis we used the data from SEO economic research. The data is obtained by a survey which contains education, work and life related questions. Every year this survey is given to approximately 8,000 graduates of higher education in the Netherlands. These are graduates from universities of applied sciences and research universities. The final data set contains sixteen years of data for 113,000 gradu-ates. After selecting only the graduates from research universities there are 58,081 graduates left. For this research, it is important to know the wage of the subject. Therefore, subjects who did not fill in their wage are deleted. After deleting these subjects, there are 53,582 graduates left. This is the number of subjects we used in this research.

A common problem we had with the data was that not all the subjects answered all the questions. For some variables, like the wage, this makes the subject unusable. Because of the large amount of missing answers at ‘yes/no’ questions, we made some assumptions. If a ‘yes/no’ question was not answered we assumed that the answer was ‘no’. We assumed that the reason for this was that the question was inap-plicable for this subject. We did this for the variables diphve, diphve2, diphve1, dipunibach, dipunimas, dippostdoc, studabrd, studasso1, studasso2, internexp, studjobexp, studf ldexp, ethnic, partner03, diphse, dippue, prof ile cs, prof ile es, prof ile nh and prof ile nt. Only for the variable mothercnt, which asked if the subject was born in the Netherlands, we assumed not answering meant ‘yes’. Doing this we made the data maybe less reliable, but more useful for this research. The outline of this section is as follows. We start with the explanation of the Dutch school system in subsection 4.1. In subsection 4.2 we will categorize the different studies into different study categories. After which the explanatory variables for the wage equation and study choice equation are discussed in respectively subsection 4.3 and 4.4. For most of the variables we already gave evidence for using them in subsection 2.3.

(21)

4.1 The Dutch school system

Before we can discuss the variables we used, it is important to discuss the school system in the Netherlands. Every child has to go to school at the age of five. After 8 years of primary school, every child goes to a secondary school. For secondary schools there are three different levels, the lowest level is called ‘Lower Vocational Education’ (LVE) and takes four years. The higher level is called ‘Higher Secondary Education’ (HSE) and takes five years. The highest level is called ‘Pre-university Education’ (PUE) and takes six years. After primary school the pupil gets an ad-vice for one of the three levels. Until a couple of years ago this adad-vice was based on a test, but now it is based on the achievements of the pupil until the last year of primary school.

The level of secondary school provides the level of further education. The further education level after LVE is called ‘Intermediate Vocational Education’ (IVE). This takes four years and has different levels. This education level is not part of the Higher Education Levels in the Netherlands. The general further education level after HSE is called ‘Higher Vocational Education’ (HVE), which has a bachelor and master structure. The bachelor takes four years and the master one or two years, depending on the study. After finishing the first year of the bachelor these students are also allowed to attend University. The last option, after the bachelor, the students can attend University to get a master degree. The general further education level after PUE is the University. This level also has a bachelor and master structure. The bachelor takes three years and the master takes one or two years. With a PUE diploma the student can apply for all different education levels.

(22)

Figure 1: The Dutch school system

4.2 Study types

The data available contains 54 different studies. These studies can be classified into 15 study types. After doing this we can combine some study types to form the study categories. These study categories are the parameters of interest.

First we combined the different studies about the same kind of topic and categorized them in the different study types. The study types we distinguished are listed in Table 13. These study types can be classified in categories based on what the study is about. The categories we distinguish are alpha, beta, gamma and health. Alpha studies, study the products of human actions such as history. Beta studies, study inhuman nature such as Mathematics. Gamma studies, study human actions such as psychology. Health studies are about human or animal health. The study types are subgroups of the study categories. For alpha studies we can distinguish ‘education studies’, ‘language studies’, ‘history studies’, ‘communication studies’, ‘religion studies’, ‘media, culture and art studies’ and ‘law and politic studies’. For beta studies we distinguish ‘Technical studies’, ‘Pure sciences’ and ‘Other studies’. The last category contains of all the beta studies from whom it is not clear to which type their belong. For gamma studies we distinguish ‘Economic and Business’

(23)

and ‘Social studies’. For the last category, health, we distinguish ‘Human health’, ‘Animal health’ and ‘Research’.

4.3 The wage specification

There are many factors which can have an influence on the wage. These explanatory variables and the dependent variable are discussed in this subsection. For this analysis we decided to use the different study categories: alpha, beta, gamma, health as the main variables of interest. The dependent variable of this model is the natural logarithm of the wage. For the wage we used the gross monthly wage from the subjects’ current job. The wage is converted into 2016 Euros, so that the wages of the different years are comparable. The monthly wage used is the wage at the moment of the survey, which is approximately 1.5 years after graduation. We decided to use the monthly wage and with this decision we assumed that all the subjects work the same amount of hours. This is probably not likely if we look at the minimum and maximum wage in the sample.

Table 1: Descriptive statistics dependent variable wage equation Variables (N=53,582) Mean SD Min Max wage 2759.571 1105.129 94.51963 22169.33 lwage (log(wage)) 7.8508 .3994256 4.548808 10.00646

4.3.1 Education levels

The first set of explanatory variables contains different types of diplomas. The type of diplomas a subject has can be important. This means that it can make a difference how the subject got to the university degree and what type of university degree the subject has. Therefore, the variables diphve, diphve2, dipunibach, dipunimas, dipunipmas and dippostdoc can be important for the wage equation. The variables diphve and diphve2 are 1 if the subject has a ‘HVE’ diploma. These diplomas are the same level but diphve is the precursor of diphve2. Therefore, we generated the variable diphve1, which is 1 if one of the variables for the ‘HVE’ diploma is 1. The variables dipunibach, dipunimas, dipunipmas and dippostdoc are all related to the

(24)

university level. The first variable is 1 if the subject has a bachelor degree and the second variable is 1 if the subject has a master degree at the university. The variable dipunipmas is the precursor of the master degree. This variable is 1 for all subjects, because of collinearity this variable is omitted from the model. The fact that this variable is 1 for all the subjects leads us to suspect that this is a mistake in the data. We were not able to correct for this because without this variable not all the subjects had a university degree and all the subjects indicated that they have a university degree. The variable dippostdoc is 1 if the subject has a postdoctoral degree. This degree can be obtained doing a specialisation after obtaining the master degree at the university.

Table 2: Descriptive statistics education levels

Variables (N=53,582) Mean SD Min Max Higher education diplomas diphve .1093091 .3120295 0 1

diphve2 .0629129 .2428085 0 1 diphve1 .1669404 .3729261 0 1 dipunipmas 1 0 1 1 dipunibach .2863648 .4520662 0 1 dipunimas .4586055 .4982882 0 1 dippostdoc .0518458 .2217176 0 1 4.3.2 Study process

For the wage it can also be important how you performed your study. Was it easy or hard to obtain the diploma, did you do your study in the nominal duration or not and what was your average grade. Because of this the next variables could have an influence on the wage. The variable examavr is the average exam grade. The average in this data set is 7.29. The minimum average grade is a 5.5 and the highest grade is a 10. In the Netherlands grades are on a scale from 1 to 10, where 1 is the lowest grade and 10 is the highest grade. The lowest passing grade is a 5.5. It is not likely that someone only got tens at his exams, but this can be due to that the subjects round their average. The variable shortprog is 1 if the subject followed

(25)

a shorter program. This means that they did a special program which is shorter than the normal program. In this data set 13.27% followed a shorter program. The variable gradage is the age of the subject at graduation, the average age at graduation in this data set is 25.1 year. The variable ECT S tot gives the number of ECTS (European Credit Transfer System) the subject had to obtain before passing his bachelor and master degree. In the Netherlands every year has 60 ECTS, so a standard bachelor has 180 ECTS and a master has 60 or 120 ECTS. For some masters it is necessary to do a pre-master first which can be 30 or 60 ECTS. The average is 264.34 ECTS, which is the same as 4.41 years. The last variable which has to do with the study is the variable research. This variable is 1 if the subject followed a special research master instead of a normal master. In this data set only 1.97% of the subjects followed a research master.

Table 3: Descriptive statistics study process

Variables (N=53,582) Mean SD Min Max examavr 7.294736 .5421452 5.5 10 shortprog .1326752 .3392265 0 1 gradage 25.09578 2.379311 19 40 ECTS tot 264.3356 37.63558 240 420 research .0196894 .1389321 0 1

4.3.3 Activities next to the study

Now we discuss the variables which are related to activities next to the normal study. The variable studabrd is 1 if the subject studied at least three months abroad. In this data set 32.41% studied abroad for at least three months. The variable studasso1 is 1 if the subject was a member of a study association and the variable studasso2 is 1 if the subject was a member of a student association. For a study association the emphasis is on study related activities together with social activities with students from the same study. You can only become a member of a study association if you are a student in the field of the association. For a student association the emphasis is on doing social activities with students from all possible

(26)

studies. In this data set 44.43% was a member of a study association and 20.76% was a member of a student association. The variable managexp can have an influence as well, which is 1 if the subject has management experience. For this experience you can think of being member of a board or committee. In this data set 50.61% has management experience. Another thing that can be important for the wage is working experience. The variables workexp, internexp, studjobexp and studf ldexp are the variables which are related to the working experience. The variable workexp is 1 if the subject has work experience but not on the field of his study. In this data set 30.55% of the subjects have this kind of working experience. The variable internexp is 1 if the subject did an internship during his study, 46.66% of the subjects did this. The variable studjobexp is 1 if the subject has work experience in a typical student job. From all the subjects 49.00% had a typical student job. The variable studf ldexp is 1 if the subject has work experience in the field of his study, 34.86% has this experience. The first variable is the reference category because this is the most general work experience.

Table 4: Descriptive statistics activities next to the study

Variables (N=53,582) Mean SD Min Max Activities next to study studabrd .2445784 .429841 0 1

studasso1 .4443097 .4968935 0 1 studasso2 .2075884 .4055842 0 1 managexp .5061028 .4999674 0 1 Working experience workexp .3054571 .460605 0 1 internexp .4665933 .4988874 0 1 studjobexp .4899593 .4999038 0 1 studfldexp .3486059 .4765334 0 1

4.3.4 Personal factors

The last category of explanatory variables are person related. The variable ethnic is 1 if the subject is subjectively an immigrant. This means that the subject can be

(27)

born in the Netherlands, but not look like that or has a non typical Dutch name. This is the case for 5.71% of the subjects. The variable ethnicbzk is 1 if at least one of the parents from the subjects are not born in the Netherlands. This is the case for 13.02% of the subjects. The variable mothercnt is 1 is the subject was born in the Netherlands. For this question we made a different assumption, and assumed that the subjects who did not answer the question was born in the Netherlands. In this data set 94.29% of the subjects was born in the Netherlands. The variable man is 1 is the subject is a man. In this data set 47.07% is man and 52.93% is woman. There are also variables available for the region of living, the variable north is 1 if the subject lives in the North part of the country, the variable east is 1 if the subject lives in the East part of the country and the variable south is 1 if the subject lives in the South part of the country. The reference category here is the West part of the country. Most of the subjects are living in the West past of the country, namely 61.31% against 6.62% in the North, 16.59% in the East and 15.48% in the South. The last variables are related to the family situation of the subject. The variable partner03 is 1 if the subject lives together with his or her partner, 39.07% of the subjects lives together with his or her partner. The last variable which can have an influence on the wage if the variable child, which is 1 if the subject has children. In this data set this is the case for 4.73%.

Table 5: Descriptive Statistics personal factors

Variables (N=53,582) Mean SD Min Max Personal characteristics ethnic .0570714 .2319812 0 1

ethnicbzk .1302116 .3365393 0 1 mothercnt .9429099 .2320168 0 1 man .4706805 .4991443 0 1 Region of living north .0662349 .2486946 0 1 east .1658953 .3719901 0 1 south .1547908 .3617085 0 1 west .613079 .4870499 0 1

(28)

Variables (N=53,582) Mean SD Min Max Family situation partner03 .3906909 .4879098 0 1

child .047348 .2123841 0 1

4.4 The study choice equation

The dependent variables in this case are the different study types. For this spec-ification we took the category ‘alpha’ as reference category, as this is the biggest category with 34.01% of the subjects. The specification is written down for all the 3 different study categories in comparison with the ‘alpha’ category. As mentioned before there are no alternative specific individual variables.

Table 6: Descriptive statistics dependent variables study choice equation Variables (N=53,582) Mean SD Min Max alpha (reference category) .3400769 .4737392 0 1 beta .3089844 .462079 0 1 gamma .2122541 .4089076 0 1 health .1386846 .3456204 0 1

4.4.1 Education levels

For this specification some diplomas can have an influence on what type of study a subject chooses. As explained before, there are 3 types of diplomas which you can use to go to the university. This is the case for diphse, dippue and diphve1. The variable diphse is 1 if the subject has a ‘HSE’ diploma and dippue is 1 if the subject has a ‘PUE’ diploma. We expected that all subjects are in possession of a secondary school diploma, but it turns out that there are subjects who answered all the diploma questions with ‘no’. The reason for this can be that they attended a level with another name, or they went to secondary school abroad and they were not able to pick the corresponding level in the Netherlands.

(29)

Table 7: Descriptive statistics extra variables education levels Variables (N=53,582) Mean SD Min Max Secondary school diplomas diphse .1746295 .3796535 0 1

dippue .7605726 .426738 0 1 4.4.2 Secondary school process

It is reasonable to include some variables which give information about the sec-ondary school process. Therefore, we included the variables numbeta, prof ile cs, prof ile es, prof ile nh and prof ile nt. The variable numbeta gives how many beta courses the subject took in secondary school. This value lies between 0 and 5 courses with an average of 2.56 beta courses. The variables prof ile cs, prof ile es, prof ile nh and prof ile nt are 1 if the subject followed that type of courses in sec-ondary school. The first profile is ‘Culture and Society’, which includes languages, history and art courses. The second profile is ‘Economics and Society’, this is virtu-ally the same as the first profile but with economics as a mandatory course. For the first profile this is optional, but not mandatory. These two profiles are alpha and gamma profiles. The third profile is ‘Nature and health’, which includes biology, mathematics and physics. The last profile is ‘Nature and Technique’ which exists of only exact sciences, this profile is the most technical. These last two profiles are health and beta profiles. In this data only 23.58% of the subjects did at least one of the four profiles. This is due to the fact that this system was introduced in school year 1998-1999 (Spijkerboer, Maslowski, Keuning, van der Werf, & B´eguin, 2012), and there are subjects who followed education before this school year. There is also an option that a subject took more than one profile, a frequent combination is ‘Nature and health’ and ‘Nature and Technique’.

(30)

Table 8: Descriptive statistics secondary school process

Variables (N=53,582) Mean SD Min Max School characteristics numbeta 2.555398 1.378013 0 5

profile cs .0515658 .2211509 0 1 profile es .0646486 .245907 0 1 profile nh .0848233 .2786211 0 1 profile nt .0568848 .2316245 0 1 4.4.3 Personal factors

The last variables are again person related. We included again the variables ethnicbzk, mothercnt, man, north, east and south. We included the variables educparents4 and educparents5 as well. The first variable is 1 if the highest education level of the parents of the subject is ‘HVE’. This is the case for 31.80% of the subjects. The second variable is 1 if the highest education level is university, this is the case for 29.31% of the subjects.

Table 9: Descriptive statistics extra variables personal factors

Variables (N=53,582) Mean SD Min Max Education level parents educparents4 .3180359 .4657179 0 1

(31)

5

Results

In this section the estimation results are discussed. We differentiate four different study categories, ‘alpha’, ‘beta’, ‘gamma’ and ‘health’ where ‘alpha’ is the reference category. From the estimation of the model we get estimates for the parameters of the linear wage equation and the multinomial logit model. The main estimates are

ˆ

α1, ˆα2 and ˆα3 which correspond to the estimates of the effect of the study choice on the log wages. The other estimates are the estimates of the β0s corresponding to the other explanatory variables for the wage equation. For the estimates of the multinomial logit model we get estimates for the three different alternatives in com-parison to the reference category.

In this section we also compare the estimation results from Ordinary Least Squares (OLS) for the linear wage equation and the multinomial logit (MNL) estimation for the study choice equation with the estimation results from our model where we accounted for the endogeneity problem.

In subsection 5.1 the estimates for the wage equation are discussed, the estimates for the multinomial logit model are discussed in subsection 5.2 and finally subsection 5.3 discusses the estimation of the combinations of the constant terms and their probabilities.

5.1 Estimation results for the wage equation

In this subsection the estimation results of the wage equation are discussed. We estimated the linear model with Ordinary Least Squares (OLS), which probably gave incorrect estimates because of the endogeneity problem. Because of this we estimated the model with the method discussed in section 3. We included a constant term for the OLS estimation. The reason for this is that our model also has a constant term. The results from both estimations are given in table 10.

(32)

Table 10: Estimation results wage equation Variable OLS Heckman & Singer beta .0501*** .0322*** (10.49) (5.28) gamma .1095*** .0981*** (23.72) (19.01) health .1770*** .1617*** (25.85) (21.10) diphve1 .0094 .0055 (1.56) (.75) dipunibach -.0451*** -.0339*** (-9.12) (-6.27) dipunimas -.0244*** -.0165*** (-5.06) (-3.03) dippostdoc -.0120 -.0119 (-1.42) (-1.12) examavr -.0129*** -.0143** (-4.08) (-2.07) shortprog .0431*** .0385*** (-4.08) (3.87) gradage .0010*** .0111 (12.80) (1.28) research -.1385*** -.1449*** (-11.46) (-9.91) ECTS tot .0010*** .0009*** (14.76) (4.87) studabrd .0030 .0036 (.74) (.83) studasso1 .0123*** .0121*** (3.01) (2.44) studasso2 .0316*** .0280***

(33)

Variable OLS Heckman & Singer (7.21) (4.00) managexp .0590*** .0548*** (17.22) (9.9161) workexp -.0048 -.0057 (-1.30) (-1.16) internexp -.0222*** -.0180*** (-6.21) (-2.61) studjobexp .0098*** .0040 (2.81) (1.03) studfldexp .0591*** .0518*** (16.51) (12.43) ethnic .0040 .0093 (.51) (.91) ethnicbzk -.0023 .0034 (-.42) (.52) mothercnt .0052 .0030 (.65) (.31) man .1153*** .1119*** (32.35) (30.19) north -.0924*** -.0886*** (-13.79) (-9.12) east -.0376*** -.0356*** (-8.24) (-5.50) south .0013 .0001 (.28) (.01) partner03 .0427*** .0356*** (11.86) (9.58) child -.0403*** -.0294*** (-4.94) (-2.85) constant 7.281792*** (213.04)

(34)

Variable OLS Heckman & Singer

θ1 5.6388***

(143.49)

θ2 7.3157***

(206.98)

Note: t-statistics in parenthesis, coefficient is significant at a *** 1%, ** 5% or * 10% level of significance

In Table 10 we can see the difference between the estimation results of OLS and our method. For most of the variables the estimators are close to each other. The most remarkable differences are for the variables gradage and studjobexp. In the OLS estimation these coefficients are highly significant, even for a significance level of 1% but in our estimation these coefficients are insignificant for all the significance levels. This means that in the OLS estimation the effect of the age of graduation and work experience in a typical student job are overestimated.

5.1.1 The interpretation of the wage determinants

The interpretation of the estimates are now discussed. The model we used is a log-level model and therefore the effect of a 1 unit chance in the explanatory variables leads to a β ∗ 100% chance in the wage.

The main parameters of interest are all significant. The wage of a subject who did a study in the category beta is 3.22% higher, for gamma 9.81% higher and for health 16.17% higher than the wage of a subject who did a study in the category alpha. This shows that subjects with a health study relatively have the highest wage and subjects with an alpha study the lowest. This proves our expectation that the study choice has a significant influence on the wage.

We found that not all the higher education diplomas have an influence on the wage. The variables diphve1 and dippostdoc do not have a significant influence on the

(35)

wage. The variables dipunibach and dipunimas have a significant effect. The wage of subjects with a university bachelor degree is 3.39% lower than the wage of sub-jects who do not have a university bachelor degree. The same holds for subsub-jects with a university master degree, their wage is 1.65% lower. In the sample every-one has an old university master degree, so the negative sign of dipunibach means that the wage is higher if the subject obtained that master degree without doing a university bachelor. The only option to get a master degree without a university bachelor degree is with a HVE diploma. An explanation for the higher wage with a HVE bachelor instead of a university bachelor can be that a HVE study is more practical and therefore these subjects have more experience. Because of this, these subjects are more useful for a company. The negative sign of dipunimas indicates that having a new master degree as well has a negative influence on the wage. This result indicates that it has a negative effect on the wage if the subject has more then one master degree. Except for the age at graduation all the variables related to the study process are significant. The age at graduation does not have an influence on the wage. If the exam average of a subject increase by 1 point, the wage decreases by 1.43%. The reason for this negative effect can be that subjects with higher grades choose jobs with a lower wage. An example can be doing a PhD after their study, which has a relatively low wage. Subjects who did a shorter program earn 3.85% more than subjects who did a normal program. Having a research master degree effects the wage negatively. Having this degree, decreases the wage by 14.49%. This difference is enormous, so it does not pay off, in monetary terms, to do a research master. If the total study takes a year longer the wage increases by .09%.

For the variables related to the activities next to the study, we found that studying abroad does not have an influence on the wage. If the subject was a member of a study or a student association has a positive influence on the wage. The wage in-creases by 1.21% and 2.80% respectively. Having management experience inin-creases the wage by 5.48%. The reason for these positives effects can be because of per-sonal growth gained by being part of an association or a board. Having working experience is only an important determinant for the wage when this is related to the study. The variables for working experience outside the field of the study and typical student jobs do not have a significant effect on the wage. On the other hand

(36)

experience of an internship decreases the wage by 1.80% and experience in the field of the study increases the wage by 5.18%. The negative effect of the internship can be due to the fact that subjects do not search for another job but just take the offered job by the company where they did the internship. Therefore, they might accept a lower wage in comparison with the wage they would accept when they search for a job.

The variables which indicates if someone is foreign or not are all not significant. This indicates that there are no wage differences between foreign (looking) subjects and Dutch subjects. If the subject is a man the wage increases by 11.19%. People living in the North and East of the Netherlands earn less than people living in the West side of the Netherlands. The wages from subjects living in the South are the same as from subjects living in the West. The differences in wages can be due to the fact that a lot of the big companies are located in the South/West part of the country. Living together with a partner increases the wage by 3.56% and having children decreases the wage by 2.94%.

5.2 Estimation results for the study choice equations

In this subsection the estimation results from the study choice equations are briefly discussed. We discuss these estimates only briefly because our main interests do not lie with this estimation. Again we compare the estimation results without the connection with the wage equation with the estimation results from our method. In Table 11 the estimation results for the three different categories are given for the multinomial logit estimation and our estimation method.

Table 11: Estimation results study choice equation

Variable beta gamma health

MNL H&S MNL H&S MNL H&S diphse -.3743*** -.3914*** -.2134*** -.2301*** -.0530 -.0453

(-9.11) (-10.70) (-5.89) (-5.65) (-1.00) (-.92) dippue .0374 .0502* .0625* .1114*** .1344*** .1880***

(37)

Variable beta gamma health MNL H&S MNL H&S MNL H&S diphve1 .2689*** .2333*** .4529*** .5527*** -.0211 .0812 (6.29) (5.96) (12.33) (11.62) (-.36) (1.59) numbeta 1.1043*** 1.1132*** .2972*** .3319*** 1.5062*** 1.5755*** (84.97) (81.05) (25.76) (21.23) (83.77) (80.69) profile cs -1.4193*** -1.4205*** -.8857*** -2.0913*** -1.8421*** -3.0786*** (-13.59) (-14.74) (-16.15) (-10.51) (-9.79) (-11.70) profile es -.3796*** -.3851*** -.1015** -.1321** -.6234*** -.6593*** (-6.11) (-6.52) (-2.40) (-2.1416) (-5.05) (-5.81) profile nh -.8118*** -.7964*** -.8330*** -.8950*** -.1540*** -.1642*** (-15.01) (-15.97) (-12.09) (-14.12) (-2.87) (-3.28) profile nt 1.5671*** 1.5706*** .4418*** -17.3701*** .0548 -17.9081*** (19.50) (20.35) (4.52) (-54.9053) (.55) (-58.66) man 1.3000*** 1.3871*** .5767*** .5299*** -.3707*** -.5756*** (46.65) (51.34) (22.46) (17.27) (-10.47) (-12.94) ethnicbzk -.0220 -.0097 -.0682* -.0896* .0240 .0049 (-.50) (-.25) (-1.69) (-1.92) (.44) (.08) mothercnt -.3498*** -.3768*** -.0726 -.0558 -.1517* -.1098*** (-5.60) (-8.84) (-1.20) (-1.27) (-1.94) (-2.46) north -.1072* -.0926* -.2256*** -.2914*** .2129*** .1725** (-1.93) (-1.77) (-4.35) (-3.85) (3.26) (2.14) east .1024*** .1003*** -.2088*** -.3059*** .0998** .0342 (2.77) (2.56) (-5.76) (-6.66) (2.19) (.59) south .2317*** .2418*** .3427*** .4353*** .4603*** .5429*** (5.90) (6.06) (9.72) (10.79) (9.62) (9.94) educparents4 -.0120 -.0003 -.0479 -.0737** -.1194*** -.1432*** (-.37) (-.0123) (-1.63) (-2.20) (-2.95) (-4.01) educparents5 -.0514 -.0475 -.0841*** -.0892** .0385 .0406 (-1.54) (-1.51) (-2.69) (-2.21) (.94) (.77) constant -3.1895*** -1.1351*** -4.8820*** (-39.96) (-15.69) (-45.65)

(38)

Variable beta gamma health MNL H&S MNL H&S MNL H&S

η1 -3.2436*** -1.8095*** -5.6294***

(-56.59) (-40.57) (-47.90)

η2 2.9212*** 33.1896*** 29.3726***

(8.14) (98.31) (85.74)

Note: t-statistics in parenthesis, coefficient is significant at a *** 1%, ** 5% or * 10% level of significance

For most of the variables the differences between the MNL estimation and our method are small. The most remarkable differences are for the variables prof ile cs and prof ile nt. The coefficients for the categories gamma and health are way smaller in our estimation. For the category health the coefficient of prof ile nt is insignifi-cant in the MNL estimation and is highly signifiinsignifi-cant in our estimation. For these variables hold that in the MNL estimation the effects are underestimated for some study categories. The other differences are for the variables dippue, mothercnt, north, east, educparents4 and educparents5. For these variables the differences are all in the significance of the coefficients. For the category beta the coefficient of dippue is not significant for the MNL estimation but is significant for a significance level of 10% in our estimation. The same holds for the category gamma, in our estimation this coefficient is significant for a significance level of 1% but in the MNL estimation this coefficient is only significant for a significance level of 10%. For the category health the coefficient of mothercnt is significant for a significance level of 10% in the MNL estimation and is significant for a significance level of 1% in our estimation. For the category gamma the coefficient of the variable educparents4 is insignificant in the MNL estimation and is significant for a significance level of 5% in our estimation. For the variables north, east and educparents5 the opposite holds. For some study categories the coefficients of these variables are higher significant in the MNL estimation than in our estimation. For the category health the coefficient of the variable north is significant for a significance level of 1% in the MNL estima-tion and is significant for a significance level of 5% in our estimaestima-tion. The coefficient

(39)

of the variable east is significant for a significance level of 5% in the MNL estimation but is not significant in our estimation. For the category gamma the coefficient of the variable educparents5 is significant for a significance level of 1% in the MNL estimation and is significant for a significance level of 5% in our estimation. This means that in the MNL estimation these variables are overestimated.

5.2.1 The interpretation of the study choice determinants

The estimation results for the study choice equation are now briefly discussed. All the results are in comparison with the reference category alpha. We can see that for a subject with a HSE diploma it is equally likely to end up at a health or an alpha study at the university and it is less likely to end up at a beta or gamma study. For a subject with a PUE diploma holds that he is equally likely to pick an alpha or beta study but more likely to pick a gamma or health study. A subject with a HVE diploma is equally likely to pick a health or alpha study but is more likely to pick a beta or gamma study.

For the variables related to the secondary school process, we found that almost all the variables have a significant effect on the probability of choosing a given study. A 1 unit increase in the number of beta courses in secondary school leads to an in-crease in the probabilities of choosing a beta, gamma and health study instead of an alpha study. The biggest increase is for the category health. The profiles also have significant effects. Subjects who had the Culture and Society profile are less likely to choose a beta, gamma or health study instead of an alpha study. Pupils who had the Economics and Society profile are less likely to choose a beta, gamma and health study and the same holds for the Nature and Health profile. Subjects who had a Nature and Technique profile are less likely to choose a gamma and health study but more likely to choose a beta study, in comparison to choosing an alpha study.

Male subjects are more likely to pick a beta or gamma study instead of an alpha study and less likely to pick a health study instead of an alpha study. People who are born in the Netherlands are less likely to pick a beta and health study instead

(40)

of an alpha or gamma study. People living in the North are more likely to pick a health study, less likely to pick a gamma study and equally likely to pick a beta study, For the East holds that people are more likely to pick a beta study, less likely to pick a gamma study and equally likely to pick a health study and for the South holds that people are more likely to pick a beta, gamma and health study instead of an alpha study. The last variables are based on the education level of the pupils parents. If the highest education level of the parents is HVE, pupils are less likely to pick a gamma and health study and equally likely to pick an alpha and beta study. If the highest education level is the university, pupils are only less likely to pick a gamma study instead of an alpha study.

5.3 Estimation results for the constant terms combinations

In this section the estimation results for the constant terms combinations are dis-cussed. The estimations of the constant terms are already given in Table 10 and Table 11. All the terms are significant, so we have four combinations and every combination is given for the three study categories in comparison with the alpha category. These combinations and their probabilities are given in Table 12.

Table 12: Estimation results constant terms combinations

Type Combinations beta gamma health Probabilities Type 1 (θ1, η1) (5.64, -3.24) (5.64, -1.81) (5.64, -5.63) .0130*** (20.3316) Type 2 (θ1, η2) (5.64, 2.92) (5.64, 33.19) (5.64, 29.37) .0011*** (188.1097) Type 3 (θ2, η1) (7.32, -3.24) (7.32, -1.81) (7.32, -5.63) .8357*** (4.0814) Type 4 (θ2, η2) (7.32, 2.92) (7.32, 33.19) (7.32, 29.37) .1503 (.0751) Note: t-statistics of probabilities in parenthesis, probability is significant at a *** 1%, ** 5% or * 10% level of significance

(41)

5.3.1 The interpretation of the constant terms combinations

For this data set we can distinguish four types of individuals. The first type has the low and positive θ and a negative η and 1.30% of the subjects belong to this type. The second type has the low and positive θ and a positive η and 0.11% of the subjects belong to this type. The third type has the high θ and a negative η and 83.57% of the subjects belong to this type. The last type has the high θ and a positive η and 15.03% of the subjects belong to this type. The probabilty from this type is however not significant. The estimates for the η0s are different per study category but the signs are the same.

Most of our subjects belong to the third type, the type with the high wage and the biggest probability to choose an alpha study. All the η0s are negative, therefore it is less likely that subjects who belong to this group are choosing a beta, gamma or health study in comparison to an alpha study. This means that this type of individ-uals accept a high wage and are most likely to choose an alpha study. The second biggest group is type 4. This type has also the high wage but the smallest probabil-ity to choose an alpha study. For this type all the eta0s are positive, therefore it is more likely that these subjects pick anything but an alpha study. From the number we can conclude that these subjects are most likely to pick a gamma study. This type of individuals accept a high wage and are less likely to pick an alpha study. The probability of this type is insignificant, this means that we cannot be sure that there are subjects in this sample who belong to this type. We can conclude that the biggest part of the individuals in this sample belong to the types with the high wage. A small part of the subjects belong to the types with the lower θ. Subjects who belong to type 1 accept a low wage and are most likely to pick an alpha study. The smallest group of subjects is type 2. Almost nobody belongs to this type so this combination is not likely to happen. The subjects who belong to this type accept a low wage and are most likely to pick a gamma study. This implies that this combination almost never happens in this sample.

(42)

6

Conclusion

The aim of this thesis was to find the determinants of the wage and especially to find the effect of the study choice on the wage. We estimated these determinants with the discrete factor approach and find some interesting results.

The estimated coefficients of the study categories were all significant. Therefore we can conclude that the study choice has an influence on the wage. This effect is not the same for every study category. For example picking a health study instead of an alpha study increases the wage the most namely by 16.17%. The smallest increase is when the subject pick a beta study instead if an alpha study, namely 3.22%. Choosing a gamma study instead of an alpha study increases the wage by 9.81%. The conclusion is that picking a health study pays off the most and picking an alpha study pays off the least.

Other important determinants for the wage are the average exam grade which has a negative effect. This is against the expectations, it is strange that a higher grade leads to a lower wage. Doing a shorter program and doing a special research master also have an influence on the wage. The effect of the shorter program is positive and the effect of doing a research master is negative. This indicates that jobs after doing a research master have lower wages than other jobs. The more ECTS the study took has a positive effect on the wage.

For activities next to the study, being member of a study and student association has a positive effect on the wage. The same holds for management experience. The experience of an internship has a negative effect on the wage and the working ex-perience in the field of the study has a positive effect on the wage. Other working experience does not have an significant influence on the wage.

The variables related to the ethnicity do not have an influence on the wage but being a man increases the wage by 11.19%. The wages for people living in the South and West part of the Netherlands are the same and the wages of people living in the North and East part are lower. Living together with a partner increases the wage

(43)

and having children decreases the wage.

Furthermore we have found variables which have an influence on the study choice. The most important variables are the number of beta courses and which profile the subject has in secondary school. These variables have an huge influence on the study choice. There are also personal related variables which have an influence on the study choice e.g. region of living and the education level of both parents. Next to finding the determinants of the wage another important aim of this re-search was to deal with the endogeneity which is due to the correlation between the unobserved heterogeneity in the wage equation and the study choice equation. By using the estimation method of Heckman and Singer we allow for four distinct types of individuals in our sample. We estimated the combinations (θ1, η1), (θ1, η2), (θ2, η1), (θ2, η2) and find estimates for these parameters. We found θ1 = 5.64 and θ1 = 7.32 and for the η0s we got different estimates per study category with the reference category alpha. We found that η1 is negative for beta, gamma and health and η2 is positive for beta, gamma and health. Because of this we can generalize these results. The differences between different types are due to the fact that not all the individuals have the same personal characteristics or circumstances. The four different types of individuals we found are: type 1 accepts a low wage and is most likely to pick an alpha study and least likely to pick a health study, type 2 accepts a low wage and is least likely to choose an alpha study and most likely to pick a gamma study, type 3 accepts a high wage and is most likely to pick an alpha study and is least likely to pick a health study and type 4 accepts a high wage and is most likely to pick a gamma study and less likely to pick an alpha study. In this data set 1.3% of the individuals belong to type 1, 0.11% belong to type 2, 83.57% belong to type 3 and 15.03% belong to type 4. The probability of the last type is insignificant and therefore we cannot be sure that there are subjects in this data set who belong to this type. The conclusion we can draw from this is that almost all the subjects belong to the type which accepts a high wage.

By estimating the terms for the unobserved heterogeneity together with the other determinants we tried to solve the endogeneity problem in the model. We found

Referenties

GERELATEERDE DOCUMENTEN

The study investigates the complexity of HBB dynamics in the medium- sized city of George, and focuses on three aspects: first, a conceptual link between house

(E Homaie Rad PhD), School of Health (E Homaie Rad PhD), Guilan University of Medical Sciences, Rasht, Iran; Transdisciplinary Centre for Qualitative Methods (P Hoogar PhD),

We analyzed the relative risks of low SES, assessed using education and income, and Type D personality, assessed using the Type D Scale-14 (DS14), for different outcomes

Hypothesis 2: Adding a CSR variable to the determinants of CDS spreads to the equation as used by Ericsson, Jacobs and Oviedo (2009) increases the explanatory power of

The paper starts with a formal definition of a lambda calculus with abbreviation facilities, including a set of single-step reductions which can be used to effectuate substitution

Consequently, South African literature on the subject has centred on critiques of BRT-based policy changes and developments, emphasizing tensions between current paratransit

\Vhen the problem at hand contains censored observations, the missing data should be reintroduced as further unknowns, additional to the unknown model parameters, into the

These two considerations motivate the following research question: To what extent does space play an active role in homegrown Jihadist radicalization in the two German suburbs