• No results found

MASTER THESIS Gender Grading Bias in The Final Grade of Dutch Primary Education

N/A
N/A
Protected

Academic year: 2021

Share "MASTER THESIS Gender Grading Bias in The Final Grade of Dutch Primary Education"

Copied!
22
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

MASTER THESIS

Gender Grading Bias in The Final Grade of Dutch Primary

Education

Abstract

This paper investigates gender grading bias in the final grade of Dutch primary education. The grading bias is measured by comparing the difference between students’ test scores on two mathematics tests. Then, the association between this grading bias and gender is investigated. The results suggest the existence of a gender grading bias favoring girls.

Keywords: Teacher bias, grading bias, gender, socioeconomic status

Author:

S. Peer | 2390760

MSc Economics

(2)

1

INTRODUCTION

In the Netherlands, women have constantly outnumbered men in universities during the past decade (CBS, 2019). Surprisingly, the rate of women studying Science, Technology, Engineering, and Mathematics (STEM) has only been around 25% for the past ten years (CBS, 2019). The Netherlands is not the only country where this ratio has been this low.

Paradoxically, in more gender-equal countries, the number of women among STEM-related graduates is lower than in less gender equal countries (Stoet and Geary, 2018). This ratio of men to women is striking since one would expect that women would get more educational opportunities in more gender-equal countries. Therefore, the ratio of men to women in STEM-related degrees would be more in balance in these societies.

The gender wage gap has been a widely debated topic lately. Interestingly, this gap could be diminished by increasing the participation rate of women in STEM-related studies. Since STEM-related test scores are good predictors of expected income (Murnane et al. 1995). Yet, the question remains what the reason for this gender gap among STEM-related degrees is.

A paper by Lavy et al. (2018) looking into the long-run implications of teacher biases may partially explain this gap. Teacher bias consists of conscious and unconscious behavior of a teacher that affects student outcomes. The biased behavior of teachers at an early stage in a students’ education influences the mathematics test score in advanced mathematics and science courses in secondary education. These test scores are a prerequisite for post-secondary education and thus affect the frequency at which students enroll in high-quality post-secondary education (Lavy and Sand, 2018).

(3)

2 mathematics. This may lead to a self-fulfilling prophecy as lower self-confidence leads to lower grades in mathematics, maintaining the gender gap in mathematics (Carlana, 2019).

It can, therefore, be seen that teacher biases against women may be a driver of the gender gap in mathematics performance. Overcoming these biases may help to partly alleviate this mathematics gender gap (Lavy and Sand, 2018). Accordingly, this may result in a more balanced percentage of men and women studying STEM-related degrees. This may partially help with the diminishment of the gender wage gap because STEM-related degrees generally lead to higher paid jobs (Murnane et al. 1995).

This paper investigates the presence of teacher bias in Dutch primary education. However, it will not dive into teacher stereotypes. A grading bias will be investigated by comparing two tests. One test is a non-blind mathematics test score, which is a test that is graded by the teacher of the student. The other test is a blind test, which is graded by a machine. The difference between these two tests is assumed to be the grading bias in this paper. The relationship between this grading bias and gender is examined using a linear regression, where the socioeconomic status of the parents and the relationship between the student and the teacher are used as a control variables. This relationship will be investigated in different contexts. The dataset contains four different school types and the relationship between gender and grading bias will be viewed separately for all school types. The same will be done for the urbanization degrees of the schools. Furthermore, the difference between schools with one class and two or more classes is investigated.

(4)

3 Measuring the systemic difference between two tests (blind vs. non-blind) across groups to say something about stereotypes or discrimination in economics was introduced by Blank (1991). Lavy (2008) was the first to apply this method to the economics of education, measuring the gender grading bias of teachers. This paper also claims to identify a gender grading bias in Dutch primary education, using a simplified version of the method used by Lavy (2008). My paper adds to the existing research stream by examining the relationship between gender and grading bias using a linear regression. By using a different methodology new insights have been obtained that contribute to the knowledge of both scholars as practitioners.

The data allow me to investigate the relationship between grading bias and gender controlled for socioeconomic status in the Dutch setting. The grading bias will be investigated by looking at test scores of 2014 for students of the final grade of primary education. There are several primary school types in this dataset and the relationship between grading bias and gender will be compared between those school types. The dataset also contains information on the degree of urbanization that the school is located in. Moreover, information on the relationship between the student and the teacher was used.

The results suggest that there is a grading bias in the final grade of primary school in The Netherlands. Namely, there is a relationship between grading bias and gender. Girls tend to score significantly higher on the test graded by their own teacher. This gives reason to believe there is a positive grading bias for girls in The Netherlands during the final grade of primary school.

Therefore, this paper contributes to the existing literature on teacher bias by expanding the knowledge on grading bias in the Dutch context. Moreover, it presents striking results regarding the relationship between school characteristics and gender grading bias.

LITERATURE

(5)

4 also negatively influenced by discrimination. Another form of discrimination women experience regarding their mathematical skills was shown in an experiment. During this experiment, men and women perform an arithmetic task, which generally shows no difference in performance for men and women. The results show that when only the gender was given as information, males were hired twice as much by both men and women (Reuben et al., 2014).

This mathematics gender stereotype is already present at a young age. American second-grade students displayed that mathematics is for boys. Moreover, boys in primary school identified more strongly with mathematics, suggesting that the self-concepts regarding mathematics develop at an early stage. Before there are actual disparities in mathematics performance (Cvencek et al., 2011). Freyer and Levitt (2010) report there is no gender difference in the average mathematics score upon school entry. However, in the first six years of school girls lose 25% of a standard deviation relative to boys.

Consequences of these biases have also been examined extensively. Opposing views have been found in this literature. For instance, when teachers favor girls when they grade. This influences students in the medium run, where boys showed less progress than girls (Terrier, 2016). Girls, on the other hand, profited from the gender bias in mathematics and enrolled in a science track more frequently in secondary school. Apperson et al. (2016) study the effect of grading bias on the students’ achievement. They conclude that the year after the students got higher grades than they deserved, their results on test scores were lower than expected. Suggesting that grading bias negatively influences the students’ achievement in the future. These effects only lasted for one year regarding the mathematics test scores of the students. Contrasting results were also found. Namely, a paper on grading bias on a high-stakes test in New York found that grading bias may have positive as well as negative effects. For students who would have dropped out of the school, had they not received the cheated test results, the probability of graduating increased by 17%. Yet, some students preferred less advanced coursework after receiving the grading bias (Dee et al., 2016).

(6)

5 at which students enroll in high-quality post-secondary education. Therefore, the earnings during adulthood are affected by teacher bias (Lavy and Sand, 2018).

This difference in STEM-related fields due to teacher bias has been reviewed many times. For example, Ertl et al. (2017) claiming that stereotypes corrupt the view of students concerning STEM-related subjects. A reason for this may be the fact that STEM-related studies are classified as studies where good grades are needed in combination with an extremely high level of talent. Women with good grades may think they are not gifted enough to enroll in a STEM subject that is generally associated with men.

Female role models may be part of the solution to the STEM gender gap(Breda et al., 2018). An experimental study was done with female role models with a science background. The researchers looked at the influence of the role models on the attitude of students towards science-related careers. The results show that the amount of girls enrolled in a STEM track increased by 30%. Three reasons were mentioned to explain this result. The interest in STEM-related careers increased. Also, the stereotypes associated with STEM-STEM-related jobs decreased by the intervention. And, the intervention improved the girls’ self-concept.

According to Card and Payne (2017) this depends on some crucial differences between male and female students. One of which is that the average grades in mathematics and science are about the same when comparing men and women. However, the grades for female students are higher on other courses. Therefore, women have a comparative advantage for these other tracks. Moreover, this paper looked at students in Ontario, where pre-university schools tend to specialize early. So, to influence the ratio of women specializing in STEM-related fields, significant changes at the start of high school are necessary.

(7)

6

EMPIRICAL STRATEGY

Conceptual framework

In this paper two tests will be compared. One test is a blind test and the other is a non-blind test. A non-blind test is graded by the teacher of the students, while a blind test is not graded by the teacher of the students. In this case, the mathematics part of the Cito test is graded a machine and therefore it is the blind test in this paper. The regular mathematics test is the non-blind test in this research because it is graded by the teacher of the student.

The test outcomes used in this paper are based on the educational production function introduced by (Bowles, 1970). This author argues that a test score is based on several variables. For example, the school environment may influence the test score of a student. Moreover, environmental influences, like parents’ education, are considered to affect student outcomes. Also, students ability is a variable in this function. This paper adds the grading bias for the non-blind test. Because teachers grade this test and are therefore able to influence the grade. For the test scores in this paper the educational production functions are as follows:

𝑆𝑀𝑖 = 𝑓(𝑆, 𝐸, 𝐴) + 𝐺 (1)

𝑆𝐶𝑖 = 𝑓(𝑆, 𝐸, 𝐴) (2)

In this paper these educational production functions will be used to explain how the grading bias in this paper is constructed. The schooling environment does not differ much between both tests since both tests are made a few months apart from one another. And, both tests are made in the same grade. Therefore, the students have the same teacher and are in the same class during both tests. Environmental influences of the student are also the same during both tests. For instance, the education of the parents does not change between both tests.

(8)

7 the results of this paper. Another difference that is present between the two tests is that the Cito test is a high stakes test. This may influence the seriousness by which the students take the test. This may lead to the following problem. If girls make both tests with the same amount of concentration. The scores of girls will not differ much between the two tests. Yet, boys make the regular mathematics tests less serious than the Cito test, resulting in a bigger gap between the two test scores. This will lead to a problem with the grading bias measure in this paper. In this case, boys will score higher on their mathematics part of the Cito test compared to the regular mathematics test. This is interpreted as a grading bias of the teacher in this paper, leading to biased results of this paper. Students experiencing stress may lead to similar consequences. When gender is related to the amount of stress during a high stakes test this may lead to less reliable results of this paper. Because these stress levels may lead to lower results on the high stakes test.

Another important note on the ability of students is that teacher behavior that indirectly affects student ability is also incorporated in this measure. For example, students may also be influenced by how they are treated or taught as well as by the evaluations of a teacher. These may influence the self-perceptions of a student, leading to a different functioning of the student. However, these differences will influence the outcome of both test scores. Since the difference is calculated by subtracting one test score by the other those differences will even out.

Several assumptions have been made in order to set the educational production functions equal to one another. First of all, the schooling environment and environmental influences have been assumed to be the same. Secondly, the assumption is made that both tests measure the same student ability. This assumption consists of two parts. The first one assumes that the questions on the tests are the same. And, the second part states that the students perform the same on both tests.

Model

(9)

8 teacher. Both of these will find their way into the grading bias measure in this paper. This means that a higher bias is found when a student scores higher on the non-blind test compared to the blind test.

The difference between the blind and non-blind test will be regressed on gender using a linear regression. In this regression socioeconomic status of the parents and the relationship between the student and the teacher are used as control variables. The formula for this regression is as follows:

𝑖= 𝛼 + 𝛽1𝐵𝑜𝑦𝑖+ 𝛽2𝑆𝐸𝑆𝑖 + 𝛽3𝑅𝐸𝐿𝑖 + 𝜀𝑖

In this regression ∆𝑖 denotes the difference between both tests for student i, which in this paper is assumed to be the grading bias. This means that a higher bias is found when a student scores higher on the non-blind test compared to the blind test.

Boyi is the dummy variable for gender and SESi is a categorical variable for the

socioeconomic status of the parents of the student. RELi is a continuous variable that describes

the relationship between the student and the teacher. 𝛽1 is the coefficient of interest of this paper. It captures the difference between both test scores for boys and girls. Since the difference between both tests is assumed to represent grading bias, the coefficient measures the gender grading bias in this sample. If the coefficient is positive this means that there is a positive grading bias for boys in Dutch primary education. The positive coefficient indicates that boys will score higher on a test that is graded by their own teacher compared to girls. This directly means there is a negative grading bias for girls in this sample. The 𝛽2 coefficient measures the grading bias for the different categories of socioeconomic status of the parents of the student. 𝛽3 measures if the relationship between the student and the teacher influences the grading bias.

INSTITUTIONAL DETAILS

The Dutch education system

(10)

9 primary school students have to make a test and they get an advice of their teacher. This test is called the Cito test, consisting of four parts. Language, mathematics, study skills, and world orientation. World orientation is not always part of the Cito test because it does not count towards the final test score of the Cito test. The advice of the teacher is based on their experience with the student over the whole year. If the test score of the Cito test differs much from the advice of the teacher there are two possibilities. When the Cito test score is much higher than the advice of the teacher, the Cito test score weighs heavier than the advice of the teacher. However, when the Cito test score is much lower than the advice of the teacher, the advice of the teachers weighs heavier.

The two tests

This paper uses two tests in order to construct the grading bias measure. These two tests are a regular mathematics test and the mathematics part of the Cito test. The Cito score is a blind test since the test is graded by a machine of Cito, the institution that is responsible for the test. The machine is able to check the answers because the students use an answer sheet where they have to scratch one box, the machine only reads this box. The Cito test is also the most common test in Dutch primary education for the 8th grade. In 2014, the year this paper examines, 90% of Dutch students in the 8th grade made the Cito test. Since the blind test, the Cito test, has a high influence on the future of the student it is considered a high stakes test. It is only a low stakes tests when the students underperforms compared to the advice given by the teacher of the student. Because then the advice of the teacher weighs heavier than the Cito test.

(11)

10 the teacher knows which student is being graded. Therefore, this test is the non-blind test. This allows us to see whether there is a difference between the two test scores.

Assumption evaluation

Several assumptions have been made in order to create the grading bias measure for this paper. First of all, environmental influences and the schooling environment have been assumed to be the same. These assumption are reasonable because the students are in the same class environment during both tests. This is the results of the fact that both tests are only made a few months apart from one another. Secondly, the assumption is made that both tests scores measure the same mathematical ability. This fairly big assumption because the questions in both tests differ a little. Nonetheless, this assumption had to be made in order to create the grading bias measure in this paper. Because this assumption has been made, the results of this paper should be interpreted with caution. Another assumption that was made has to do with the difference in stakes of the two tests. First of all, the assumption that both tests are made with the same amount of seriousness has been made. Secondly, the assumption that boys and girls perform the same on high and low stakes tests. These factors may influence the grading bias measure of this paper. However, no literature on these factors were found and they cannot be controlled for. So, the assumptions have been made that this is the case. Therefore, the results of this paper should be interpreted with even more caution.

(12)

11

DATA

The data used in this paper comes from Cool5-18, which is a research that is finished in 2016. It looks at students of primary and secondary school in The Netherlands from the age of 5 to 18. Several times during these years, information of the students was put into datasets. This information consists of many variables. For example, the religion of the parents of the student, the amount of brothers and sisters the student has and their year of birth. This paper uses only a small part of the Cool5-18 dataset. Namely, the final grade of Dutch primary students in the year 2014.

In order to investigate the relationship between grading bias and gender five datasets are used. These datasets are combined to compare the blind and non-blind test scores of the students. They are also used to check whether these results hold in different contexts. Namely, the relationship between grading bias and gender is investigated for different degrees of urbanization. As well as, different school types and schools with only one class are compared to schools with two or more classes. These datasets will also be used to control the relationship between gender and grading bias for the socioeconomic status of the parents and the relationship between the student and the teacher.

(13)

12 socioeconomic status resulted in 271 less observations. After these adaptations the dataset contained 9,380 students.

The second dataset contained information on the mathematics test students made. The test score reported in the dataset was a weighted score, this was done so mathematics scores could be compared over time. Scores of this test in this dataset therefore range from 41 to 169. These scores were also standardized in order to be able to compare them to the Cito test score. After merging the two datasets the observations that missed information on the mathematics test got dropped. This resulted in 9,360 observations being deleted. After merging the datasets, the observations, that did not have information on all of the variables, got dropped as well, leaving a dataset with 8,714 students.

The third dataset this paper uses contains information on the Cito score of students. For this paper, only the mathematics part of the Cito test is used, this part contained 60 mathematical questions. So, the score on this test ranges from 0 to 60. This score was standardized in order to be able to compare it to the other test score. Standardizing causes the test score to have a mean of zero and a standard deviation of one. 3,997 observations were deleted because the information on the mathematics part of the Cito test was missing. After merging the data 4,874 students were left in the sample.

The fourth dataset this paper uses contains information on the schools in the sample. Four school types are present in the sample. These school types are public schools, Protestant Christian, Roman Catholic, and other. What other school types there are is not clear from the data. Moreover, data on the degree of urbanization the school is located in was available. These range from no urbanization to very strong urbanization. Both of these measures are used to see if grading bias differs among school types or degrees of urbanization. After merging this data and deleting observations that missed that on any of the variables 4,874 observations were left.

(14)

13

Table 1

Descriptive Statistics of students’ characteristics

Boy Girl Total

Number of students 2,303 2,398 4,701 Mathematics test 0.109 (1.015) -0.105 (0.975) 0.000 (1.000)

Mathematics part of Cito 0.129

(0.982) -0.124 (1.002) 0.000 (1.000) Difference -0.020 (0.646) 0.019 (0.617) 0.000 (0.632) Shares of parents’ socioeconomic status

LVE with MB 0.105 0.088 0.096 LVE with DB 0.099 0.098 0.099 SVE with MB 0.078 0.070 0.074 SVE with DB 0.370 0.385 0.378 HPE or UD with MB 0.033 0.041 0.037 HPE or UD with DB 0.315 0.318 0.317

Relationship between teacher and student 1.760

(0.907)

1.434 (0.656)

1.593 (0.806) Shares of urbanization degree

Very strong urbanization 0.196 0.187 0.191

Strong urbanization 0.188 0.173 0.180

Urbanization to some degree 0.190 0.183 0.187

Little urbanization 0.300 0.301 0.301

No urbanization 0.126 0.156 0.142

Shares of school type

Public school 0.294 0.275 0.284

Roman Catholic 0.392 0.400 0.396

Protestant Christian 0.211 0.225 0.218

Other 0.103 0.101 0.102

Notes: Standard deviations are shown in parentheses. LVE, SVE, HPE or UD, MB, and DB are abbreviations for lower vocational education, secondary vocational education, higher professional education or university degree, migration background and Dutch background respectively

(15)

14 can be seen in this table is that girls have a better relationship with their teacher. Since, their score on the relationship variable is lower. And the relationship variable should be interpreted that a lower score means a better relationship between the teacher and the student. The table also shows the frequency at which boys and girls are present for the five degrees of urbanization and the four different school types. The difference in frequency between boys and girls for different degrees of urbanization is not surprising. The only difference between the degrees of urbanization that stands out is that little urbanization is the most frequent of all degrees of urbanization. For the different school types, no surprising frequencies are present in the table. The table does show that students who attend Roman Catholic primary schools are most common in this dataset.

RESULTS

This paper investigates the grading bias in Dutch primary education. By looking at the difference between a regular mathematics test and the mathematics part of the Cito test. This difference is regressed on gender controlling for the socioeconomic status of the parents of the student and the relationship between the student and the teacher. This relationship is investigated under multiple scenarios. First, the grading bias is analyzed for schools with one class and for schools with two or more classes. Then, the relationship between gender and grading bias is looked at for different urbanization grades. Finally, several school types were compared.

Table 2

Results of the models for various school types

School type All schools Schools with one class Schools with two or more classes

Model type (1) (2) (3) (1) (2) (3) (1) (2) (3) Gender -0.039 (0.034)** -0.038 (0.038)** -0.041 (0.030)** -0.017 (0.455) -0.017 (0.461) -0.022 (0.354) -0.082 (0.005)*** -0.078 (0.007)*** -0.075 (0.011)** Parents’ SES base is LVE with MB

LVE with DB 0.040 (0.317) 0.037 (0.380) 0.007 (0.897) 0.008 (0.882) 0.107 (0.117) 0.108 (0.114) SVE with MB 0.024 (0.585) 0.033 (0.459) 0.065 (0.243) 0.066 (0.236) -0.056 (0.451) -0.057 (0.443) SVE with DB 0.040 (0.210) 0.049 (0.143) 0.006 (0.878) 0.106 (0.800) 0.144 (0.008)*** 0.143 (0.008)

HPE and UD with MB 0.030

(0.595) 0.040 (0.480) 0.019 (0.775) 0.023 (0.737) 0.069 (0.492) 0.071 (0.480)

HPE and UD with DB 0.058

(0.086)** 0.061 (0.074)** 0.034 (0.430) 0.040 (0.355) 0.129 (0.018)** 0.128 (0.020)

Relationship with student 0.009

(0.466) 0.016 (0.288) -0.007 (0.672) Subsample size 4,701 3,207 1,494

Notes: Model type 1 refers to the model without control variable and model type 2 refers to the model with one control variable. Model 3 refers to the model with both control variables.

(16)

15 The results of the grading bias for all schools, schools with one class, and schools with two or more classes are shown in the table above. Looking at the relationship between grading bias and gender for all schools. The results show that the coefficient for grading bias is -0.039, which is significant at the 5% level. Since gender in this paper is a dummy variable which is coded 1 for boys and 0 for girls, the coefficient can be interpreted as follows. Boys will on average score 0.039 standard deviation lower on a test that is graded by their own teacher compared to girls.

Looking at the same scenario but controlling for the socioeconomic status of the parents of the students, the coefficient does not change much. The coefficient, in this case, is -0.038, which is significant at the 5% level. This means that boys score 0.038 lower than girls on a test that is graded by the teacher compared to a test that is graded by a machine. The coefficients of the socioeconomic status of the parents are all insignificant, except for the parents with a Dutch background and a HPE or UD. The insignificant results indicate that there is no relationship between the socioeconomic status of the parents and grading bias. The significant result indicate that students, with parents that have a Dutch background and HPE or UD, score 0.058 standard deviation higher than students with parents that have a migration background and LVE. This is surprising because this indicates that students whose parents have a higher socioeconomic status experience a positive grading bias. In the final model the relationship is controlled for the relationship between the student and the teacher. This addition does not change much to the results mentioned above.

(17)

16 with parents with a Dutch background whose highest attained education is SVE. These students score 0.114 standard deviation higher on tests graded by their teacher compared to test graded by a machine compared to students whose parents have a migration background and whose highest attained education level is LVE. Also, students whose parents have a Dutch background and HPE or UD as highest attained level of education experience a positive grading bias of 0.129 standard deviation compared to students whose parents have a migration background and whose highest attained education level is LVE. Surprisingly, when controlling for the relationship between the student and the teacher the grading bias among different socioeconomic statuses diminishes. And, the coefficient for gender increases to -0.075.

Table 3

Results of the models for various urbanization degrees Urbanization degree Very strong urbanization Strong urbanization Urbanization to some degree

Little urbanization No urbanization

Model type (1) (2) (3) (1) (2) (3) (1) (2) (3) (1) (2) (3) (1) (2) (3) Gender 0.011 (0.782) 0.012 (0.753) 0.003 (0.937) -0.044 (0.333) -0.040 (0.384) -0.028 (0.553) -0.066 (0.124) -0.066 (0.130) -0.071 (0.106) -0.089 (0.005)*** -0.091 (0.004)*** -0.097 (0.003)*** -0.045 (0.409) -0.049 (0.381) 0.052 (0.361) Parents’ SES base is LVE with MB LVE with DB 0.063 (0.387) 0.062 (0.389) 0.091 (0.337) 0.093 (0.326) 0.096 (0.398) 0.096 (0.397) -0.007 (0.968) 0.002 (0.989) 0.067 (0.895) 0.075 (0.883) SVE with MB -0.012 (0.825) -0.011 (0.843) 0.107 (0.240) 0.101 (0.268) 0.040 (0.772) 0.036 (0.790) 0.166 (0.480) 0.179 (0.447) -0.178 (0.836) -0.176 (0.838) SVE with DB 0.076 (0.219) 0.078 (0.205) 0.064 (0.383) 0.055 (0.459) 0.031 (0.741) 0.036 (0.702) 0.072 (0.666) 0.085 (0.612) 0.065 (0.896) 0.071 (0.887) HPE and UD with MB -0.032 (0.679) -0.032 (0.685) 0.117 (0.279) 0.114 (0.291) 0.073 (0.628) 0.078 (0.606) -0.028 (0.905) -0.016 (0.944) 0.384 (0.503) 0.390 (0.497) HPE and UD with DB 0.033 (0.592) 0.038 (0.548) 0.082 (0.271) 0.069 (0.359) 0.085 (0.366) 0.091 (0.336) 0.109 (0.521) 0.123 (0.468) -0.004 (0.994) 0.000 (0.999) Relationship with student 0.030 (0.208) -0.030 (0.272) 0.019 (0.510) 0.021 (0.330) -0.011 (0.787) Subsample size 899 846 877 1,413 666

Notes: Model type 1 refers to the model without control variable and model type 2 refers to the model with one control variable. Model 3 refers to the model with both control variables. p-value are given in parentheses.

*indicates significant at 10% level ** indicates significant at 5% level *** indicates significant at 1% level

(18)

17 students are all insignificant. This suggests that the grading bias does not depend on the socioeconomic status of the parents of these students. Additionally, controlling for the relationship between the student and the teacher changes the gender coefficient to -0.097. Finally, these results suggest that out of all urbanization degrees in The Netherlands, grading bias is only present in the regions where there is little urbanization.

Table 4

Results of the models for various school types

School type Public schools Roman Catholic Protestant Christian Other

Model type (1) (2) (3) (1) (2) (3) (1) (2) (3) (1) (2) (3) Gender -0.132 (0.000)*** -0.128 (0.000)*** -0.129 (0.000)*** -0.012 (0.671) -0.011 (0.698) -0.001 (0.977) 0.007 (0.861) 0.007 (0.866) -0.007 (0.872) 0.025 (0.679) 0.018 (0.761) -0.013 (0.834) Parents’ SES base

is LVE with MB LVE with DB 0.062 (0.412) 0.063 (0.407) 0.007 (0.916) 0.013 (0.848) 0.000 (0.997) -0.005 (0.962) 0.210 (0.176) 0.198 (0.200) SVE with MB 0.073 (0.371) 0.074 (0.362) 0.039 (0.619) 0.038 (0.623) -0.151 (0.226) -0.161 (0.198 0.084 (0.379) 0.089 (0.353) SVE with DB 0.148 (0.017)** 0.150 (0.016)** 0.003 (0.955) 0.001 (0.992) -0.024 (0.791) -0.025 (0.786) 0.013 (0.888) 0.020 (0.822) HPE and UD with

MB 0.001 (0.990) 0.011 (0.991) -0.116 (0.223) 0.112 (0.239) 0.026 (0.106) -0.262 (0.098)* 0.141 (0.238) 0.125 (0.297) HPE and UD with

DB 0.204 (0.001)*** 0.207 (0.001)*** -0.004 (0.937) -0.011 (0.846) 0.010 (0.921) 0.012 (0.893) -0.075 (0.366) -0.058) 0.486 Relationship with student -0.028 (0.116) 0.058 (0.046)** 0.060 (0.088)* Subsample size 1,336 1,860 1,026 479

Notes: Model type 1 refers to the model without control variable and model type 2 refers to the model with one control variable. Model 3 refers to the model with both control variables. p-value are given in parentheses.

*indicates significant at 10% level ** indicates significant at 5% level ***indicates significant at 1% level

(19)

18 the test is graded by their teacher. Looking at the final model, where the grading bias is also controlled for the relationship between the student and the teacher. It can be seen that the gender coefficient decreases to -0.129. And, the coefficient for students whose parents are Dutch and have SVE or HPE or UD as highest attained education level increase to 0.150 and 0.207 respectively. Since all other school types show insignificant results, this suggests there is no teacher bias present in school types other than public schools. However, among Protestant Christian schools the relationship between the student and the teacher shows significant results. Indicating that a bad relationship between the student and the teacher increases the grading bias. This result is significant at the 5% level. Also, among Protestant Christian schools students, with parents with a migration background and HPE or UD as highest attained education experience a negative grading bias compared to students whose parents have a migration background and whose highest attained education level is LVE. Finally, the relationship between the student and the teacher is also significant in the school type other. In this case, a bad relationship between the student and the teacher also increases the grading bias.

CONCLUSION

(20)

19 with two or more classes. These results suggest that the grading bias was only present in schools with two or more classes.

This paper also has a number of limitations due to time and data restrictions. First of all, the assumption was made that the two tests measure exactly the same knowledge. However, the tests differ on some questions. Moreover, the mathematics test is a low stakes test and the Cito test is a high stakes test. The class composition may also be non-random for school with only one class. Because of these limitations, the results of this paper should be interpreted with care.

(21)

20

REFERENCE LIST

Alan, S., Seda, E., and Ipek, M. (2018). Gender Stereotypes in the Classroom and Effects on Achievement. Review of Economic and Statistics, 100, 876-890.

Apperson, J., Carycruz B., and Sass, T.R. (2016). “Do the Cheated Ever Prosper? The Long-Run Effects of Test-Score Manipulation by Teachers on Student Outcomes,” mimeo.

Blank, R.M. (1991). The effects of double-blind versus single-blind reviewing: experimental evidence from the American economic review. Am. Econ. Rev. 81, 1041-1067.

Bowles, S. (1970). Towards an Education Production Function. In Education, Income, and Human Capital, ed. W. Lee Hansen. New York: National Bureau of Economic Research. Breda Thomas, Grenet Julien, Monnet Marion, Van Effenterre Clémentine, “Can Female Role Models Reduce the Gender Gap in Science? Evidence from Classroom Interventions in French High Schools,” PSE Working Paper, 2018.

Caldas, S. J., and Bankston III, C. (1997). Effect of school population socioeconomic status on individual academic achievement. Journal of Educational Research, 90, 269–277. CBS. (2019, June 28). Leerlingen, deelnemers en studenten; onderwijssoort, woonregio.

Retrieved from:

https://opendata.cbs.nl/statline/#/CBS/nl/dataset/71450ned/table?fromstatweb

Card, A. and Payne, A. (2017). High school choices and the gender gap in STEM, National Bureau of Economic Research.

Carlana, M. (2019). Implicit Stereotypes: Evidence from Teachers’ Gender Bias. The Quarterly Journal of Economics Forthcoming.

Cvencek, D., Meltzoff, A. N., and Greenwald, A. G. (2011). Math-gender stereotypes in elementary school children. Child Development, 82, 766–779

Dee, T.S., Dobbie, W., Jacob, B.A., and Rockoff, J. (2016) “The Causes and Consequences of Test Score Manipulation: Evidence from the New York Regents Examinations,” National Bureau of Economic Research.

Ertl, B., Luttenberger, S., and Paechter, M. (2017) The impact of gender stereotypes on the concept of female students in STEM subjects with an underrepresentation of females. Front Psychol. ;8:703.

(22)

21 Hackman, D.A., Farah, M.J. (2009). Socioeconomic status and the developing brain. Trends

Cogn. Sci. 13(2):65–73

Lavy, V. (2008). Do gender stereotypes reduce girls’ or boys’ human capital outcomes? Evidence from a natural experiment. J. Public Econ. 92, 2083-2015.

Lavy, V., and Sand, E. (2018). On the origins of gender gaps in human capital: Short- and long-term consequences of teachers’ biases. Journal of Public Economics, 167, 263– 279.

Murnane, R. J., Willett, J. B., and Levy, F. (2016). The Growing Importance of Cognitive Skills in Wage Determination Author ( s ): Richard J . Murnane , John B . Willett and Frank Levy Source : The Review of Economics and Statistics , Vol . 77 , No . 2 ( May , 1995), pp . 251-266 Published by : The MIT Pr, 77(2), 251–266.

Reuben, E., Sapienza, P., Zingales, L. (2014). How stereotypes impair women’s careers in science. Proceedings of the National Academy of Sciences, USA, 111, 4403–4408. Stoet, G., & Geary, D. C. (2018). The Gender-Equality Paradox in Science, Technology,

Referenties

GERELATEERDE DOCUMENTEN

Lateral displacement of the pelvis, hip and knee kinematics, and spatiotemporal parameters during overground walking were determined at baseline and immediately following the

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

 Acoustic lenses dramatically improve the system’s lateral resolution,  Stycast lens further minimizes the forward loss and thus has higher image contrast compared

In the case of registering the different volumes obtained by ECG-gated CT, the used registration algorithm should be accurate in order to deal with the small motions present in AAA,

When making financial choices under risk, individuals thus do not significantly alter their choices, when they are in the presence of peers and they are provided

This paper quantifies that implication by means of two distinct, but related, measures: the risk quantifies the confidence in a system after it passes a test suite, i.e., the number

individual members of the family) when communication only takes place when the family gets together..