Social Network Dynamics in School Choice and School Segregation

(1)

Master’s Thesis

Social Network Dynamics in School

Choice and School Segregation

Author:

Nigel van Herwijnen

Supervisor:

dr. M. H. Lees dr. W. R. Boterman

A thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in Computational Science

in the

Institute for Advanced Study

and the

Computational Science Lab Informatics Institute

(2)

(3)

I, Nigel van Herwijnen, declare that this thesis, entitled ‘Social Network Dynamics in School Choice and School Segregation’ and the work presented in it are my own. I confirm that:

This work was done wholly or mainly while in candidature for a research degree

at the University of Amsterdam.

Where any part of this thesis has previously been submitted for a degree or any

other qualification at this University or any other institution, this has been clearly stated.

Where I have consulted the published work of others, this is always clearly

at-tributed.

Where I have quoted from the work of others, the source is always given. With

the exception of such quotations, this thesis is entirely my own work.

I have acknowledged all main sources of help.

Where the thesis is based on work done by myself jointly with others, I have made

clear exactly what was done by others and what I have contributed myself.

Signed:

Date: February 8th, 2021

(4)

(5)

(6)

(7)

Abstract

Faculty of Science Informatics Institute Institute for Advanced Study

Master of Science in Computational Science

Social Network Dynamics in School Choice and School Segregation by Nigel van Herwijnen

School segregation can limit social cohesion, creates unequal opportunities for children and is widely associated with social inequality. Understanding the true motivations for school choice is vital to effectively shape social reform policies and perform desegrega-tion intervendesegrega-tions. Parents that have difficulties to understand the school landscape may be liable to perceive the quality of schools different than they are and rely heavily on the information they receive through their social network. This thesis proposes an agent-based model to simulate the process of school choices based on residential pat-terns, school compositions, quality perceptions and social network dynamics. We find that moderate school segregation emerges when school choice is purely based on quality perceptions in systems with segregated social networks and a large uncertainty towards quality perception for halve of the population. This result is largely attributed to well-informed parents that are satisfied with their own interpretation of school qualities, whilst parents with high uncertainty towards school quality search for a school they as a group rate highest. When combined with composition and distance based preferences, quality perceptions contributes a relatively small amount to the overall school segre-gation in systems simulated. Only when quality perception is the driving force in the school choice process, that segregation levels significantly decrease. This result suggests that information based policy interventions have most effect if parents are motivated to make their school choice based on school quality, instead of composition preferences.

(8)

(9)

I would like to thank Dr. Mike Lees for his consistent support over such a long period of time. You guided me towards a thesis that is academically relevant, whilst giving me the freedom to explore a field I had never been a part of before. Thank you Dr. Willem Boterman for always encouraging me to be critical of the social implications of modelling in research. Also, I express my gratitude to Eric Dignum for always being available for a short brainstorm session.

I am extremely grateful for Thomas van der Veen, Efi Athieniti, Luyau Xu and everyone else at the IAS for making my thesis project a fun experience. It was the warm atmo-sphere, the lunch breaks, the seminars, the discussions, the drinks and everything else that made me feel part of a team. It is because of this that I can sincerely say that I thoroughly enjoyed this final part of my studies.

Finally, I would like to thank my friends and family for their endless support. Writing a thesis during a lockdown has been a struggle at times, but the immense support you all offered enabled me to push myself to finish the project in a way I can proudly look back on.

(10)

(11)

Declaration of Authorship iii

Abstract vii

Acknowledgements ix

Contents xi

List of Figures xv

List of Tables xix

List of Algorithms xxi

1 Introduction 1

1.1 Research question. . . 2

1.2 School choice as a complex system . . . 2

1.3 Research approach . . . 3

1.4 Thesis outline . . . 4

1.5 The COMPASS project . . . 4

2 Theoretical background 5 2.1 Factors affecting school choice . . . 6

2.1.1 Distance. . . 6

2.1.2 Composition and profile . . . 7

2.1.3 School quality. . . 8

2.2 Segregation measures. . . 9

2.3 Social networks . . . 10

2.3.1 Small-world networks . . . 11

2.3.2 Degree distributions . . . 12

2.3.3 Homophily and clustering . . . 15

2.3.4 Degree assortativity . . . 15

3 Literature review of existing models 17 3.1 Residential choice models . . . 17

3.1.1 Model extensions . . . 18

3.2 School choice models . . . 19 xi

(12)

3.3 Social network models . . . 21

3.4 Social networks of parents . . . 23

4 Methods 25 4.1 COMPASS model. . . 25

4.1.1 Global model overview . . . 26

4.1.2 Agents . . . 26

4.1.2.1 Residential utility . . . 27

4.1.2.2 School utility . . . 28

4.1.3 Process overview and scheduling . . . 29

4.1.4 Initialization . . . 32

4.1.5 Design concepts. . . 32

4.1.6 Stochasticity . . . 32

4.1.7 School allocation mechanism . . . 33

4.2 Homophilic network model. . . 33

4.2.1 Edge utility and connection probability . . . 33

4.2.1.1 Social distance utility . . . 34

4.2.1.2 Residential distance utility . . . 35

4.2.1.3 Degree utility. . . 36

4.2.2 Process overview and scheduling . . . 36

4.2.3 Stochasticity . . . 37

4.2.4 Emergence . . . 37

4.3 Network integration in COMPASS model . . . 38

4.3.1 School utility . . . 38

4.3.2 Information spreading . . . 39

5 Model analysis 41 5.1 COMPASS model. . . 41

5.1.1 System parameters . . . 41

5.1.2 Agent’s preference parameters. . . 42

5.1.3 Residential and school choice model . . . 43

5.2 Homophilic network properties . . . 45

5.2.1 Warm-up period and selection pressure . . . 46

5.2.2 Networks based on single utility component . . . 46

5.2.3 Networks based on multiple utility components . . . 51

6 Experiments 57 6.1 Heterogeneity in school quality and quality perception . . . 57

6.1.1 Differences in quality perception . . . 58

6.1.2 Strongly opinionated uninformed parents . . . 60

6.2 Reversed ranking . . . 62

6.3 Desegregation intervention . . . 64

6.4 Interplay with ethnic and distance preference . . . 66

7 Conclusion 69 7.1 Main findings . . . 69

(13)

A Model analysis 73

A.1 Warm-up period network model. . . 73

A.1.1 Selection algorithms . . . 74

A.1.2 Strictness in selection algorithm . . . 76

A.2 Topological properties network model . . . 79

B Results 83

(14)

(15)

2.1 Probability distribution for a Poisson distribution and binomial distribu-tion, hki = 100. . . 13

2.2 Degree distribution of a scientific collaboration network [1]. A Poisson distribution has been fitted to the data with λ = 1.79 ± 0.07. A power law distribution has been fitted with C = 4.16 ± 0.41, λ = −2.44 ± 0.04. . 14

3.1 Different utility functions used by Pancs and Vriend [2] in their extensions of the Schelling model. . . 18

3.2 Example of an agent’s local Moore neighbourhood (blue) and bounded neighbourhood (yellow) in a grid of 16 by 16 grid cells and 4 neighbour-hoods [3]. . . 19

3.3 School segregation index in simulations by Stoica and Flache [4], using only school composition (c) or using both school composition and school distance (d) in behaviour rule. Image taken from [4]. . . 21

3.4 Connection probability distribution for two nodes with distance dn(hni, hnj)

for different values of αn and bn in the social distance model by Bogu˜n´a

et al. [5].. . . 22

4.1 UML diagram representing the COMPASS model. . . 26

4.2 Visualization of the 2D grid with 40 by 40 cells, filled with blue and yellow agents with capacity c = 0.9 and 4 schools (red cells). . . 27

4.3 Residential utility based on composition of a neighbourhood or school. Variable p represents the total number of agents in the environment, f the preferred fraction of same-group individuals and M the desirability of a homogeneous environment. . . 28

5.1 Examples of different residential patterns emerging for different values of b using Tu = 0.75 and radius = 3 after 20 iterations. Red lines indicate

neighbourhood borders. . . 43

5.2 Segregation index in residential grid and schools. For variable b, Tu =

0.75; for variable Tu, b = 0.2; for all f = 0.7. . . 44

5.3 Fraction of satisfied agents at end of simulation. For variable b, Tu= 0.75;

for variable Tu, b = 0.2; for all f = 0.7. . . 45

5.4 The degree distribution in networks generated with either ws= 1, wr = 1

or wd = 1. Each figure shows data from 5 simulations. In the figure

showing networks created with only degree utility, the prediction from the BA model is shown. . . 47

5.5 The clustering coefficient as a function of network size in networks created with either ws= 1, wr = 1 or wd= 1. . . 47

(16)

5.6 The attribute assortativity as a function of social homophily. Networks are made using social distance utility (ws = 1) for a network of size

N = 9003. . . 48

5.7 Networks created with ws = 1, N = 8971 and either αs = 2 or αs = 20.

Colors represent the group the agent belongs to. . . 49

5.8 The average shortest path length as a function of network size in networks created with either ws= 1, wr= 1 or wd= 1. . . 49

5.9 The average shortest path length as a function of social homophily in networks created with ws= 1. . . 50

5.10 The average shortest path length as a function of the system size in net-works created with ws= 1. . . 50

5.11 The degree assortativity as a function of system size in networks created with either ws= 1, wr = 1 or wd= 1. . . 51

5.12 The clustering coefficient for different values of the social homophily. . . . 52

5.13 The attribute assortativity for different values of the social homophily. . . 53

5.14 The modularity for different values of the social homophily. . . 53

5.15 The average shortest path length for different values of the social homophily. 54

5.16 The degree assortativity for different values of the social homophily. . . . 54

5.17 An example network created with utilities weighted by ws = 0.4, wr =

0.4, wd= 0.2 with N = 8971, αs= 20, αr= 20. . . 55

6.1 School segregation as a function of the standard deviation in the sampled school quality by the uninformed group. Only the quality component is considered in the utility function (α = β = 0, γ = 1). The informed group samples initial Q with σQ= 0.01. . . 58

6.2 Standard deviation in Q of uninformed group at end of simulation, as a function of the standard deviation in Q at the start of the simulation. The informed group samples initial Q with σQ = 0.01. . . 59

6.3 Mean difference in average Q between the two groups at the end of the simulation, as a function of the standard deviation in Q at the start of the simulation. The informed group samples initial Q with σQ = 0.01. . . 59

6.4 Mean difference in average Q between the two groups at the end of the simulation, as a function of the standard deviation in Q at the start of the simulation. Initial µQ is sampled uniform on [0, 1]. The informed

group samples initial Q with σQ= 0.01. All iterations allow information

spreading without interaction error. . . 60

6.5 Average school score vs average school rank.. . . 61

6.6 School segregation as a function of the standard deviation in the sampled school quality by the uninformed group. In all systems µQ = 0.5 for all

schools. Only the quality component is considered in the utility function (α = β = 0, γ = 1). The informed group samples initial Q with σQ= 0.01. 62

6.7 School segregation as a function of the standard deviation in the sampled school quality by the uninformed group.In all systems µQ is sampled

uni-form on [0, 1] for each school. Only the quality component is considered in the utility function (α = β = 0, γ = 1). The informed group samples initial Q with σQ = 0.01. . . 63

(17)

6.8 Mean difference in average Q between the two groups at the end of the simulation, as a function of the standard deviation in Q at the start of the simulation. For all schools µQis sampled uniform on [0, 1]. The informed

group samples initial Q with σQ= 0.01. . . 63

6.9 Mean absolute difference in average Q between the two groups at the end of the simulation, as a function of the standard deviation in Q at the start of the simulation. Initial µQ is sampled uniform on [0, 1]. The

informed group samples initial Q with σQ = 0.01. All iterations allow

information spreading without interaction error. Interactions with out of group members diverge. . . 64

6.10 School segregation values when additional connections are made between the two groups. The amount of extra connections per node is equal to f < k >. . . 65

6.11 School segregation values when additional connections are made between the two groups. The amount of extra connections per node is equal to f < k >. After the simulation, all agents are forced to recreate school rankings. . . 65

6.12 School segregation with different weights for ethnic, distance and quality perception preference. Initial µQ = 0.5 for all schools. . . 66

6.13 School segregation with different weights for ethnic, distance and quality perception preference. Initial µQ is sampled uniform on [0, 1]. . . 67

A.1 The utility of each added edge over time (left column), the average util-ity at each time step (middle column) and the distribution of all utili-ties of all added edges. Networks are made only with residential utility (ws, wr, wd = 0, 1, 0) and parameters αr = 2, br = radius = 3. Top row

is from grid of size 20 by 20 cells; bottom row is from grid of size 100 by 100 cells. Error signifies 0.05 and 0.95 quantile. Results are from a single simulation per figure.. . . 74

A.2 Iteration relative to the system size at which the average utility of added edge becomes stable. Stability is defined when the variance in the utility of added edges in the last N/10 steps falls below 10−6. Figure shows 30 runs per network size. . . 74

A.3 The utility of each added edge over time (left column), the average util-ity at each time step (middle column) and the distribution of all utili-ties of all added edges. Networks are made only with residential utility (ws, wr, wd = 0, 1, 0) and parameters αr = 2, br = radius = 3. Top row

is from grid of size 20 by 20 cells; bottom row is from grid of size 100 by 100 cells. Error signifies 0.05 and 0.95 quantile. Results are from a single simulation per figure. In the network model, m edges with the highest available utility are made. . . 75

A.4 The fraction of the roulette wheel with utility above a certain threshold plotted over time, for different system sizes. Systems only use residential utility with tolerant agents (ws, wr, wd= 0, 1, 0 and αr= 2). . . 76

A.5 The fraction of the roulette wheel with utility above a certain threshold plotted over time, for different system sizes. Systems only use residential utility with intolerant agents (ws, wr, wd= 0, 1, 0 and αr = 20). . . 78

A.6 The fraction of the roulette wheel with utility above 0.5 at the end of the algorithm for different slopes in the utility function. . . 78

(18)

A.7 The utility of each added edge over time, the average utility at each time step and the distribution of all utilities of all added edges. Networks are made only with residential utility (ws, wr, wd = 0, 1, 0) and parameters

br = radius = 3, αr = 20, N = 8971. Error signifies 0.05 and 0.95

quantile. Results are from a single simulation. . . 78

A.8 The clustering coefficient for different values of the social homophily. . . . 79

A.9 The attribute assortativity for different values of the social homophily. . . 79

A.10 The modularity for different values of the social homophily. . . 80

A.11 The average shortest path length for different values of the social homophily. 80

A.12 The degree assortativity for different values of the social homophily. . . . 81

B.1 Time step at which agents become satisfied. Agents not shown in figure did not become satisfied. Initial µQ is sampled uniform on [0, 1]. The

informed group samples initial Q with σQ = 0.01; the uninformed group

with σQ= 0.5. Systems have 8971 agents. Shown are results from 5 runs. 83

B.2 Mean ranking as a function of the mean quality perception for each school in a single simulation. Initial µQ is sampled uniform on [0, 1]; initial

perceptions are sampled with σ_QA= 0.01, σB_Q = 0.5. Data shown is from a single simulation with 16 schools and 8971 households. . . 83

B.3 Ethnic preference utility P required for an agent to become satisfied as a function of distance preference utility D when all Q = 0.5. Plotted are different combinations for α, β and γ.. . . 84

(19)

5.1 System wide parameters used . . . 42

5.2 Parameters used in agent’s utility function. . . 42

5.3 Parameters used in residential choice and school choice model . . . 43

5.4 Variables in homophilic network model. . . 45

(20)

(21)

1 Residential choice model . . . 31

2 School choice model . . . 31

3 Network algorithm . . . 37

4 School choice model with quality perception . . . 39

5 Quality perception update . . . 40

(22)

(23)

Introduction

Segregation is widely associated with inequality between social groups. Although schools can be the first environment in which young individuals experience a variety of social norms, segregated classrooms limit or even negate the contribution schools can have towards social cohesion in society. Education has therefore been a high priority in social reform policies worldwide [6].

The Dutch educational system is uncommon, because free school choice and school autonomy has been a constitutional right since the early 20th century [7]. Parents from all ethnicities, with any religion and all socioeconomic backgrounds can choose any public school for their child to attend. Despite this large degree of freedom, segregation in schools is still a phenomenon that surfaces [8]. Whilst researchers and policy makers worldwide often advocate freedom of choice as a solution to school segregation, the consequences of such policies are often difficult to predict. Studying the Dutch school system can help, because it allows researchers to study the true motivations of school choice in a free choice context. As the Dutch government aims to reduce educational inequality, finding the driving forces of school choice has become a topic with both political and academic value.

Literature on school choice agree that residential segregation patterns and school com-position preferences are among the largest contributing factors to school segregation [9] [10]. Parents are found to use the distance towards a school as an initial filter of viable schools. Despite being more common amongst parents with a high socioeconomic status, schools outside the neighbourhood are generally not considered. Schools that parents do consider are mainly ranked based on composition preference, preference towards certain pedagogical principles and overall school quality.

(24)

Although Dutch parents are found to search for a school that matches the cultural values and principles of their home, ethnic minority parents mention they focus on quality more often [11]. Amsterdam parents also mention they don’t use the information published by the Inspectorate of Education. Rather, a large part indicate they obtain information through unofficial routes around preschool playgrounds or day nurseries. As quality measures are often highly ambiguous, contextual and subjective [12], the information passed on through parental social networks can be a source for bias towards certain schools which can influence the school choice parents eventually make.

1.1 Research question

Although qualitative studies have shown that parents use their social network to gain (sometimes ambiguous) information on the quality of a school, the effects on school seg-regation as a result of the information spreading has not yet been thoroughly studied. School choice literature has mainly focused on the role of composition and the role of dis-tance based choices. If school choices fully based on quality perceptions, combined with the process of sharing quality perceptions in social parental networks, can be sufficient to generate segregated schools, is a question that is left unanswered. More generally phrased, this thesis attempts to answer the following:

How can school quality perceptions and their spread through a parental social network influence school segregation?

Although the true characteristics of social networks of parents are unknown, echo cham-bers in other social networks are known to have steered the opinion of a subset of the network into a specific direction [13]. The research in this thesis therefore aims to study the conditions in which the social network, and the information passed through it, could be a cause of the segregated schools seen in reality.

1.2 School choice as a complex system

A wealth of qualitative studies are available on school choice. These studies, which involve e.g. parent surveys and interviews, have given the social sciences a great under-standing of the individual factors parents incorporate in their decision making. Simul-taneously, statistics on segregation patterns are used to study trends and the effects of policies. Although these statistics are often the most straightforward measurements to use to study segregation effects, conclusive cause-and-effect relations are often difficult

(25)

to construe due to the complex and non-trivial interactions between the micro-level of the households and the macro-level of the city wide segregation patterns [14].

The research described in this thesis takes a complex systems approach to link the findings from the household level with findings at the school level. Complex systems lack a specific definition, but a consensus exist of its general properties [14]. These systems consist of many individual agents that can interact with each other and their environment [15]. The individual interactions often have a non-linear effect on properties at the global scale of the system, resulting in emergent characteristics that are not directly predictable from the individual interactions and choices [16]. The agent’s rule sets are often dependent on the environment and cause changes in the environment. By influencing the environment, agents cause feedback loops which influence their future behaviour [16].

When interpreting school choice as a complex system, the agents represent households and schools. Each has specific individual properties, but all households and schools have the same rule set. The individual school choice of a household doesn’t cause the system wide segregation patterns, but they do emerge from the collective decisions of all households. If school choice is dependent on composition, the choice for a school itself causes a change in the school’s composition and causes a feedback loop with its own choice. This can have non-linear effects on the resulting global segregation patterns. Using a stylized environment allows us to study which elements of the individual rule set cause which global emergent properties. This means that taking a complex systems approach to study school choice, allows us to study what the effects are of individual choices on global segregation patterns.

1.3 Research approach

A complex system approach is taken to find an answer to the research question. The research consists of four elements. First, an agent-based model [17] is create to simu-late residential choice and school choice. This model (from now on referred to as the COMPASS model) is based on work by Stoica and Flache [4] and Athieniti [3] and uses both local and neighbourhood compositions in the residential choice process and school composition and distance in the school choice process. Secondly, a network is created to represent the social network of the parents. Finally, a new component is introduced into the COMPASS model signifying a household’s quality perception of a school. Experi-ments are performed to find configurations that cause school segregation and to study the elements causing the segregated schools.

(26)

1.4 Thesis outline

This thesis is structured as followed. Chapter 2 will present the context of school seg-regation in The Netherlands and discuss the theoretical foundation in school choice and social networks. This is followed by a literature review in chapter 3, that discusses relevant literature on residential choice, school choice and social network models. In chapter4 a detailed description of the models used to study school choice will be given. An analysis of all model components is given in chapter 5. The experiments performed to answer the research question are discussed in chapter6, followed by a brief conclusion in chapter 7.

1.5 The COMPASS project

This thesis is part of the Computational Modelling of Primary School Segregation (COM-PASS) project. The COMPASS project is an interdisciplinary research collaboration be-tween the University of Amsterdam and its Institute of Advanced Study, the University of Groningen, the City of Amsterdam and the Dutch Inspectorate of Education. The research team aims to uncover all factors in school segregation and explore the dynamics of parental school choice.

(27)

Theoretical background

The Netherlands has a long history of separation in society between religious groups, known as pillarisation. This was still very visible until the 1950s, when society was highly segregated along religious lines: Protestant, Roman Catholic and secular [18]. This was reflected in schools that identified themselves along one of those pillars. Although high segregation based on religion was persistent, segregation based on other social properties such as socioeconomic status was not yet prevalent [19].

This changed when a large number of low skilled guest workers were recruited from the Dutch colonies, Morocco and Turkey in the 1960s to fill the gap in the Dutch job market. Together with the secularization in the Dutch society after the Second World War, this led to what is referred to as ”black and white schools”: schools highly segregated in ethnicity, educational attainment and income of the parents, and socioeconomic status overall [19] [20]. These schools highlighted the possibility of educational disadvantage in the Dutch school landscape. Government intervention in the rise of school segregation has proven difficult, because of the constitutional right to free school choice and school autonomy. Committed to these rights, the Dutch government has a minimal amount of methods to intervene with individual schools.

Instead, government policy focuses on city or country wide integration methods. In the 1980s, school funding was increased for schools with more students from disadvantaged families in an attempt to give disadvantaged students a better change at succeeding in schools at the national average level [19]. More recently, a limit on the voluntary contri-bution a school can ask from parents is put in place in Amsterdam in an effort to combat unofficial gate keeping based on household income [21]. Other interventions involve bet-ter information distribution on different schools and supporting parent initiatives to

(28)

promote integration [20]. More restrictive measures were used in Nijmegen, where stu-dent background is taken into account in a centralized stustu-dent allocation mechanism [22].

2.1 Factors affecting school choice

Although the reasoning behind school choice in a pillarized society may be considered fairly predictable, motivations for school choice in the currently secularized school sys-tem are less straightforward [23]. As school segregation policy remains on the political agenda, understanding the motivations of parents becomes more important. The fol-lowing sections summarize what is currently found in literature to be the main driving motivations of parents to choose or reject certain schools. Although motivations are highly contextual, this section discusses the main overlapping methods as well as the Amsterdam context.

2.1.1 Distance

A strong component in primary school choice mentioned in the literature is the distance to the schools. Although different groups of people tolerate different distances to schools, attending a school close to home is found to be one of the main factors in school choice. A school’s composition will therefore be influenced by the local residential composition and the resulting school segregation can for a part be a reflection of the segregation across neighbourhoods [9] [10]. In the interviews with Amsterdam based parents described by Boterman [24], most of the parents mentioned they prefer a school that is near their home. This is backed up by the results of a questionnaire filled in by 931 parents in Amsterdam as reported by Karsten et al. [11]. He finds that nearly 70% of the participating parents had not considered any schools outside of their postal code area. Although this decreases the amount of options significantly, the high school density in Amsterdam leaves sufficient choice within the postal code area [25].

There is, however, heterogeneity found in the distance parents are able to or willing to travel. Parents with a higher socioeconomic status were found to take schools at a larger distance into consideration than parents with a lower socioeconomic status [26]. Equal to literature in other Western countries, a clear correlation is found by Karsten et al. [11] between the ethnic background and education level and the willingness or ability to travel further for a school. Higher educated or native Dutch parents tend to consider schools outside of the postal code area more often than lower educated or non-native Dutch parents.

(29)

Boterman [25] shows that children from highly educated parents attend schools outside of their neighbourhood in Amsterdam more often if they are a minority in their own neighbourhood. He does however find contradicting consequences of gentrifying neigh-bourhoods. As high demand schools become more popular and start to ’overflow’, middle class parents are forced to settle for schools with a higher concentration of children from low socioeconomic status and thereby mixing segregated schools. Simultaneously, an influx of highly educated parents into a neighbourhood results in more parents willing to travel outside the neighbourhood and thereby strengthening segregation in already segregated schools.

2.1.2 Composition and profile

For the schools that are considered, the ethnic, social and cultural composition is found to have a large influence on the choice that is eventually made by parents. The preference for being part of a majority is found in literature on school choice in cities in the US [27] [28] [29], UK [30] [31] and other Western countries [32] [33] [12].

Neighbourhoods in Amsterdam are relatively diverse in comparison to cities in other Western countries and most of its inhabitants celebrate a multi-cultural society [24]. Despite this preference, school segregation levels exceed neighbourhood segregation lev-els in most of the major Dutch cities [34]. In the interviews with Boterman [24], most parents mention explicitly that they prefer a multi-cultural school composition as well, one that reflects the diversity of the city. They did, however, express fear of different struggles their child would face if they would choose for a school with a diverse popula-tion. Some parents were anxious putting their child in a minority position because they were afraid it would make them an outsider at a school with a non-Western majority. Others were concerned schools with predominantly non-Western or lower class pupils wouldn’t meet the same quality standards as schools with Dutch pupils. Although most parents prefer a school with a mixed composition, they are held back by fears of a lower school quality to choose for their ideal composition.

Searching for a mixed composition in a system where most schools are already segregated can in itself create a dilemma for parents. Some parents in the interviews with Boterman [24] mention that even though they preferred a mixed composition, they felt they had to choose between ’black’ and ’white’ schools. An already segregated school system, in combination with a preference to at least not be in a minority position can create a positive feedback mechanism that strengthens the segregated state [35] [36].

Besides the composition of the school, the way the school profiles itself to the public can be important for the parents’ ”match between home and school” [11]. In the Dutch

(30)

school system, schools have the constitutional right to offer a specific curriculum, be founded on religious views or offer a certain pedagogical principle [37] [7]. Since these are highly specific profiles, they only target a specific fraction of parents. If that specific fraction of parents (the schools target group) is highly segregated, it is natural that the school’s composition follows that segregated configuration. Whilst it is possible to choose a school solely based on its profile, it can indirectly by a choice for a specific composition as well.

2.1.3 School quality

Although Karsten et al. [11] found the main factor in school choice for native Dutch parents to be the ’match between home and school’, they found that for ethnic minority parents the degree of differentiation and academic standard of the school were the most important components. Ethnic minority parents found elements such as the reputation of a school and the capacity for individual attention to be more important than the match with the cultural or religious views of the household. Strikingly, only 10% of all parents participating in the questionnaire mentioned to use official information published by the Inspectorate of Education. This indicates their perception of the academic standards of schools is not entirely based on officially published documents.

Since information from the Inspectorate of Education is not often used, but ’quality’ is usually regarded as an important factor in school choice in and outside Amsterdam [7][12][9], the question arises what parents consider as quality measures. Standardized exam grades give an indication of how students perform in different schools. This method has been criticised as it gives a reflection of the level of the students, rather than the quality of the school in guiding students to that level [7]. Other more ambiguous concepts are often used as well as proxy for school quality, such as school climate, reputation, order and discipline and the attractiveness of school buildings [12]. These concepts are highly subjective and the quality of a school can therefore be perceived differently for different groups of parents.

Instead of using documents published by the inspectorate, most parents mention they gathered information on schools at their preschool playground or day nursery. 94% of all Dutch parents who participated in the questionnaire from Karsten et al. [11] mentioned their child had attended preschool and 78% of children from non-Western parents had attended preschool. Of these parents, 61% had heard about the primary schools in the area at or around the preschool. Non-Western parents scored significantly higher on the question if this information was useful to them in making a school choice than Dutch parents did. Furthermore, native Dutch parents were found to visit schools more often

(31)

than non-Western parents before making a decision. This makes non-Western parents more dependent on the information they receive through unofficial routes.

The response to social network information is different amongst parents, as was found by Ball and Vincent [38] when studying the use of social network information in school choice in the London area. They find that those suspicious of social network informa-tion, or view it as ’gossip’, are often professional middle-class parents who have the resources to extensively study the different schools themselves. A different group of mixed-class parents used but doubted the information they received and mainly used it to confirm the suspicions they already had or things they had read of certain schools. For a third group, the school landscape was often difficult to understand or information received from schools themselves was viewed as advertisements and therefore not com-pletely trustworthy. They relied more heavily on the information they received from other parents, which they deemed more reliable than the information they received from schools. A parent from this category mentioned, for example, that it is only possible to really know how a school is performing from a parent whose children go to that school.

2.2 Segregation measures

Generally, segregation is defined as the degree to which two or more groups are sep-arated. This can be specified in multiple ways, as described by Massey et al. [39]. In their paper, five axes of segregation are defined: evenness, exposure, concentration, centralization and clustering. Evenness describes the relative distributions of groups over areal units. Exposure describes the possibility of interactions between the different groups. Concentration describes the relative amount of space occupied by the different groups. Centralization describes to the distance of groups to the centre of urban areas. Clustering describes the distribution of individuals compared to individuals of the same group. The most commonly used definitions of segregation can be assigned to evenness, which this thesis will focus on as well.

A number of different evenness measures are compared by Reardon and Firebaugh [40], such as the Dissimilarity index D, Gini index G and Theil’s entropy index H. A list of mathematical properties that should hold for segregation measurements are defined and verified. They find that Theil’s information index H is the only index which obeys the ’principle of transfer’: a guaranteed declining value when an individual moves to a unit where the proportion of its group is smaller than in its previous location [40]. Furthermore, the index can be decomposed, which is a strong computational benefit. For these reasons, Theil’s entropy index H will be used to study segregation measures in this thesis.

(32)

The entropy index H as proposed by Theil [39], describes the deviation of a units local diversity from the systems global diversity. The global diversity is described as global entropy E and given by

E = P log 1

P + (1 − P ) log 1

1 − P, (2.1)

where P denotes the fraction of the minority in the total system. The entropy Ei of a

single unit i is defined as

Ei = pilog 1 pi + (1 − pi) log 1 1 − pi , (2.2)

where pi is the fraction of the minority in unit i. For a system with N spatial units,

total population size T and population size ti in unit i, the entropy index H is given by

H = N X i=1 ti T E − Ei E . (2.3)

As shown in equation 2.3, the entropy index of a unit is defined as the deviation of the unit’s entropy from the global entropy, normalized by the global entropy. The total entropy index is the sum of the unit’s entropy indices, weighted by their relative sizes.

2.3 Social networks

Using networks when studying complex social behaviour can be useful in explaining events as a result of social interactions. Although for a long time, no easy method for analysing social networks existed, the currently easily available computing power has enabled large scale data analysis and with it a great number of new complex network theories were created. Determining how to compute an artificial social network can be a difficult task, since the underlying structure, and therefore its properties, differ greatly when using different methods [41]. Asserting which properties the desired net-work should have is therefore important before deciding on an algorithm. In this section, the different properties a network can have and which algorithms produce networks with these properties will be discussed.

(33)

2.3.1 Small-world networks

A long believed realistic property of social networks known as ’Six Degrees of Separation’ became popular in the late 1990’s, after being mathematically formalized by De Sola Pool and Kochen in 1978 [42]. The popular definition states that any two randomly selected individuals on the planet are only separated by a maximum of six social connections. Although the popularized definition of a low fixed maximum often seen in popular culture has been debunked [43], the underlying implications for the network structure are still valid, as will be shown below.

We study a random Erd˝os–R´enyi network [44], which was long used as a reasonable approximation of real complex social networks [45]. For a network with average degree hki, the expected number of nodes N (d) up to a distance d is given by

N (d) ≈ 1 + hki + hki2+ · · · + hkid (2.4) ≈ hki

d+1_{− 1}

hki − 1 (2.5)

≈ hkid, (2.6)

in which we assume hki 1 in the last step. The maximum distance dmax between two

nodes is found when the expected number of nodes at between a distance of 0 and dmax

equals the system size. If we implement this in the approximated value of N (d), we find the following [45]. N (dmax) ≈ N (2.7) hkidmax _{≈ N} _(2.8) dmax≈ ln N lnhki (2.9)

By using the equation found, the ’Six Degrees of Separation’ statement seems reason-able. However, empirical research has since shown that the given statement is a better approximation for the average distance hdi instead of the maximum distance [45]. It is expected that this works well, since large distances are realistic, albeit improbable. The small world property of a network is therefore often defined by

(34)

hdi ≈ ln N

lnhki, (2.10)

which describes the relation between the average distance and the system size. From the equation, we find that the distance grows logarithmically with system size. This small-world property is therefore often described as a system of which the diameter grows logarithmically, instead of linear or exponentially [45]. Real world networks often deviate from this relation a bit, since the approximation does not hold when hkidapproaches N . Better approximations have been created to correct for this issue [46]. Since the network properties in this thesis will not be fitted to real world data, the approximation given in equation 2.10will be sufficient to describe the small-world property of a system.

2.3.2 Degree distributions

As mentioned before, the random network was long studied as a reasonable approxima-tion of real social networks. In these random networks, all edges are either present or absent. This is sampled independently from a Bernoulli random variable with bias prob-ability p, such that the system has an average degree hki [47]. The degree distribution is then given by a binomial distribution, defined as

pk =

N − 1 k

pk(1 − p)N −1−k, (2.11) where N denotes the number of nodes in the system. In reality, networks are sparse. In the limit of hki N , the probability distribution of a binomial distribution is approxi-mated by

pk= e−hki

hkik

k! , (2.12)

which is the description for a Poisson distribution [48]. The probability distributions of the binomial distribution and the Poisson distribution with hki = 100 is plotted in figure 2.1. From the figure, it is indeed clear the approximation’s accuracy increases as the ratio hki/N decreases.

When looking at the different probability distributions, the question arises how realistic the different distributions are. Intuitively, it is clear that individuals exist with a large

(35)

Figure 2.1: Probability distribution for a Poisson distribution and binomial distribu-tion, hki = 100.

amount of connections, whilst the opposite can coexist at the same time. We use the standard deviation of the Poisson distribution to find how much of a spread in degree is expected in a random network. The standard deviation [45] is given by

σk=phki. (2.13)

For hki = 100, we find σk = 10; for hki = 1000, we find σk = 31.62. The peak of the

Poisson distribution is very narrow, leaving insignificant probabilities for outliers. In a random network, it is therefore expected to have a degree very close to the average. Research done since the end of the 1990’s has sparked researchers to question the va-lidity of a Poisson distribution to describe the degree distribution in social networks and complex networks in general. When studying the connections in the network of a university domain, researchers found the degree distribution followed a power law dis-tribution instead of a Poisson disdis-tribution [49]. In the period that followed, the power law distribution was found in a great number of different networks, such as in networks of e-mail correspondents [50], scientific collaborations [51] and sexual contacts [52]. The probability for a node with degree k to exist according to a power law distribution is given by

pk= Ck−γ, (2.14)

where C is used as a normalization constant and γ determines the slope of the function [45]. To highlight the difference between the power law and Poisson distribution, they

(36)

are both plotted in figure2.2, fitted to a data set of a network representing the scientific collaborations of 22990 researchers between 1993 and 2003 [1] [45]. It is clear from the figure that the power law distribution has a significant better fit than the Poisson distribution. The main difference is the presence of high degree nodes, which is not probable according to the Poisson distribution but is present in the data. These nodes in the tail of the distribution function as hubs, connecting the abundant low degree nodes with the rest of the network.

Networks with a power law degree distribution are called scale-free networks. This name follows from the variance of this distribution. The variance is calculated as σ2_k= hk2_{i − hki}2_{. Many common networks have an exponent of 2 < γ < 3. For these values,}

the approximation of the second moment for a power law distribution diverges in the limit of N → ∞, resulting in a diverging variance and scale [45]. The lack of a bound on the variance means no accurate expectation can be made of a degree, since all options are possible.

Figure 2.2: Degree distribution of a scientific collaboration network [1]. A Poisson distribution has been fitted to the data with λ = 1.79 ± 0.07. A power law distribution

has been fitted with C = 4.16 ± 0.41, λ = −2.44 ± 0.04.

Although, theoretically, the variance diverges for N → ∞, real networks do not have infinite size. Furthermore, in the context of a network of close friends or parents, an extremely large degree is shown to be unrealistic. This has been accounted to the fact that maintaining a relation with someone cost mental and physical resources, which are not infinite. This often results in degree distributions which do follow power law distributions in the lower and mid end, but deviate earlier and more from the power law distribution for higher degrees. The distribution often does have a scale and therefore a finite first and second moment [53].

(37)

2.3.3 Homophily and clustering

Humans have the tendency to create connections with individuals they share some com-mon ground with. This can be any factor (personal interests, education level, etc.) and any type of connection (pen palls, book clubs, work, etc.) and is called homophily [54]. This results in networks with group structures where individuals within a group are often connected to other individuals in the same group, whilst being less connected with individuals outside of the cluster [41]. The emergence of clustering is typical for social networks and is often defined as the number of transitive relationships in an individuals personal network [5]. Mathematically this is given by the fraction of possible triangles that exist through a node, such that

cu = T (u) deg(u)(deg(u) − 1), (2.15) C = 1 N N X i ci, (2.16)

where T (u) is the amount of triangles through node u, deg(u) is the degree of node u and C is the average clustering of the total network. Since the clustering coefficient counts transitive links, it gives an indication of how well someones friends are connected to each other.

A clear definition of a community is given by a hypothesis by Barab´asi [45], which states that “a community is a locally dense connected subgraph in a network”. The fact that it is locally dense indicates that individuals in a community have a higher probability of linking to other individuals within the community than individuals outside of the community. The mention of it being a connected subgraph means all individuals in the community should be reachable through other members of the community. A great number of community detection algorithms have been created to research the clustering of networks. In the research this thesis describes, the individual clusters do not need to be identified. Therefore, the metric described here suffices to describe the presence of clustering in the networks studied.

2.3.4 Degree assortativity

A specific clustering tendency observed in humans is one based on the amount of con-nections. It has been shown that in real networks, individuals tend to connect with

(38)

individuals that have a similar degree [55]. A degree correlation is something seen in other networks as well, although not necessarily positively [56].

Degree assortativity can be described by using the degree correlation coefficient, based on the Pearson correlation coefficient [57]. We can define a matrix e as the joint probability matrix of the degrees, such that eij is the fraction of all edges that join a node with

degree i with a node j. Since e is the joint probability matrix for the whole network, it holds that

X

ij

eij = 1. (2.17)

Furthermore, we define the quantities ai and bj as

X j eij = ai, (2.18) X i eij = bj, (2.19)

such that ai and bj are the fractions of edges that start and end at nodes with degree i

and j respectively. Using this, the degree correlation coefficient is defined as

r =X

ij

ij(eij − aibj)

σ2 , (2.20)

where σ is the standard deviation in the distribution of e. The coefficient has values in the range −1 ≤ r ≤ 1, where positive values are obtained by assortative networks (with positive degree correlation), negative values are obtained by disassortative networks (with negative degree correlation) and r = 0 indicates no degree correlation.

(39)

Literature review of existing

models

3.1 Residential choice models

One of the most studied models for residential segregation was proposed by Schelling [58]. In his model, he considers a system with two types of agents. These agents are randomly distributed (representing an integrated state) on a bounded lattice. The amount of agents is chosen such that there are still empty grid cells available. At a single time step, all agents evaluate if they are satisfied with the composition of their neighbourhood. They perform this task by checking the type of agents in their Moore neighbourhood with a specific radius. If they are part of a minority in their neighbourhood, they become unsatisfied and move to an empty location where they are not a minority. This process is repeated until no agent is inclined to move.

The typical stable state the model converges to is a highly segregated one. The model shows that even though the individual does not have a preference for a segregated global system, the individual choice set can drive the collective into a segregated state [2]. Before the Schelling model became popular, the consensus amongst sociologists for the main driving force for social segregation was housing discrimination [59]. When residential segregation became a larger political issue in the 1980’s and 1990’s in the US, Schelling was able to provide an explanation where the individual preference alone was the driving force for segregation. The model highlighted the influence of preference dynamics, which is now supported by studies performed since its release [59]. The model is intuitive and often easy to implement and is therefore still used in an extended form.

(40)

3.1.1 Model extensions

The Schelling model and the assumptions it is based on have been thoroughly studied since its publication. A number of extensions have been proposed in those studies. A point of criticism made by Pancs and Vriend [2] on the behaviour rule of the agent, is that even though it does not prefer segregated states, it does not penalize it either. They propose the use of a utility function dependent on the number of same-group member in their neighbourhood. The functions used are shown in figure 3.1. At each iteration, all agents move sequentially to the vacant location that maximizes its utility function. The first utility function in the figures resembles the original process as described by Schelling, in which agents prefer not be in a minority. The other three explicitly obtain higher utilities for integrated neighbourhoods. In the simulations they performed, they find that using any of the four utility functions lead to a segregated equilibrium state. Even with the agent’s preference for integrated neighbourhoods, the quickly disappearing integrated locations force the agent to choose sub-optimal neighbourhoods. This results in a segregated system, even though the agent explicitly prefers integration.

Figure 3.1: Different utility functions used by Pancs and Vriend [2] in their extensions of the Schelling model.

An extension concerning the definition of a neighbourhood is proposed by Stoica and Flache [4]. When asking the composition of an agent’s surrounding, the answer differs depending on how the agent’s neighbourhood is defined. In the original Schelling model, a Moore neighbourhood is used, whilst the option to use larger bounded neighbourhoods can replicate hard boundaries often observed in large cities [60]. Studies have shown that in reality the definition of what an individual portrays as its neighbourhood can have different length scales. Furthermore, in the case of Amsterdam, the large neighbourhoods are often heterogeneous with smaller homogeneous clusters. This indicates a combination of the two methods mentioned above.

Figure 3.2 shows an agent with both the bounded neighbourhood and the local Moore neighbourhood of radius r which it takes into account when computing the composition of its surrounding. The total composition is the average of the local composition xlocal

(41)

Figure 3.2: Example of an agent’s local Moore neighbourhood (blue) and bounded neighbourhood (yellow) in a grid of 16 by 16 grid cells and 4 neighbourhoods [3].

and the bounded composition xbounded, weighted by a variable b, as shown in equation

3.1.

x = bxbounded+ (1 − b)xlocal (3.1)

Using both the bounded neighbourhood and the local neighbourhood allows for different length scales to influence the resulting residential patterns. It allows an individual to choose a specific neighbourhood that follows the wishes of the individual, and a specific location within that neighbourhood as well. This is in line with the residential patters observed in Amsterdam.

3.2 School choice models

A data-driven model to study the influencing factors of secondary school choice is pro-posed by Ruijs and Oosterbeek [61]. The main goal of the study is to find the importance of school quality in the choices made by students going from primary to secondary school in Amsterdam. To study this, a conditional logit model is defined with a number of dif-ferent possibly important factors. The model is then fitted to data of school choices made by students between 2007 and 2010. The relative weights of the different factors then give an indication of the relative importance of the factors. In their main find-ings, Ruijs and Oosterbeek [61] highlight that the student’s distance to a school is the dominant factor in the final decision on school choice.

A research using a school choice model that does focus on primary school segregation is published by Stoica and Flache [4], which uses an extension to Schelling’s residential

(42)

segregation model discussed in section 3.1. The focus of the study lies in the trade-off between the distance to schools and the composition of a school, whilst also taking the residential segregation into account. Since empirical research has shown that both preference dynamics and school distance play a role in school choice, a utility function is proposed that incorporates both factors. The utility function is shown in equation 3.2.

U = PαD1−α (3.2)

In equation3.2, an agent’s utility U for a school is calculated from an ethnic component P dependent on the agents ethnicity and the schools composition, and a normalized distance component D. The variable α is used to control the weights of these two components. Initially, students are enrolled randomly over the schools. Satisfaction is determined by comparing all utilities with a satisfaction threshold T . All unsatisfied students are enrolled simultaneously in a random school for which satisfaction holds. This process is repeated till convergence.

The simulations are performed in a configuration where mild residential integration is present. It is therefore believed that by only using the distance component in the utility function (α = 0), the schools would equally integrate as the neighbourhood. Furthermore, a simulation fully based on composition (α = 1) could explain the findings of students travelling outside their neighbourhood to attend a school in which they are not a minority if that is not available in their own neighbourhood. The main topic of the study is for intermediate values of α, where a parent has to make a balanced decision on both factors. The question arises if global segregation from individual composition based decisions can be curbed if parents choose closer, ethnically less favourable schools when the ethnically more favourable schools are deemed to far away.

The results of the simulations for varying optimal compositions fa and fb are shown

in figure 3.3. The left figure shows the results for a fully composition focused decision (α = 1). This results in mild school segregation. The simulations shown in the right figure take distance into account as well, where no school segregation is measured. The results show that in an integrated residential system, the presence of distance preference can curb school segregation. This raises the question how much of the high school segregation observed in reality is due to ethnic preference dynamics and how much is a direct results of the presence of residential segregation.

(43)

Figure 3.3: School segregation index in simulations by Stoica and Flache [4], using only school composition (c) or using both school composition and school distance (d)

in behaviour rule. Image taken from [4].

3.3 Social network models

The concept of homophily, mentioned before as the driving force for community struc-tures, has inspired researchers in creating network algorithms based on the idea that a social likeness contributes to the forming of connections. An example is the model proposed by Gilbert and Hamill [55] that is based on the concept of social circles. A two dimensional social space is defined, in which each axis represents an arbitrary social trait. Each agent in the model is randomly assigned a point in this space and is able to connect with agents within a circle in this space with a radius determining the social reach, or homophily, of the agent. The model allows for agents to connect to other agents with similar social traits, which drives clustering and segregation.

The paper mentions the network not performing as expected, because of the hard cut-off at the edge of the circle and low variability in range. They propose an extension to the model by allowing different agents to have a different social reach. A more general non-agent model based on the same concepts has been proposed by Bogu˜n´a et al. [5]. In the model, a social space S with dimensionality dS is defined. Again, agents are placed

randomly (or following a certain distribution) in the social space, where the assigned location of agent i is defined as ~hi≡ (h1i, . . . , h

dS

i ). Identically to the social circle model,

each dimension in the social space represents a social trait, such as socioeconomic status, religion, ethnicity, etc., where each dimension is independent and uncorrelated to the other dimensions. The connection probability between agents i and j is then defined as

(44)

p(~hi, ~hj) = dS X n=1 ωnpn(hni, hnj), (3.3) = dS X n=1 ωn 1 1 + [b−1n dn(hni, hnj)]αn , (3.4)

where ωn is a weight factor determining importance for each social trait, dn(hn_i, hn_j)

represents the distance between i and j in dimension n, bn determines the length scale

of pn(~hi, ~hj) and αn is a measure determining the slope of pn(hni, hnj). The connection

probability pn(hni, hnj) has been plotted in figure 3.4 for different values of bn and αn.

We find that parameter bn equals the distance for which pn(hni, hnj) = 0.5 and therefore

is used as a translation parameter to influence the total expected number of connections. Variable αn influences the slope of the function. It determines how probable an agent

is to connecting to individuals with a large social distance and therefore is used as a parameter of homophily.

Figure 3.4: Connection probability distribution for two nodes with distance dn(hni, h

n

j) for different values of αn and bn in the social distance model by Bogu˜n´a

et al. [5].

The model used in this paper is a small simplification of the model above as proposed by Talaga and Nowak [53]. They make the assumption that the weights ωn, length scales

bn and homophilies αn are equal for each n. This decreases the parameter space of the

model by a large amount. Both Bogu˜n´a et al. [5] and Talaga and Nowak [53] find their models to produce networks with high clustering and positive degree assortativity; two important properties to hold for realistic social networks.

(45)

3.4 Social networks of parents

Section2.1.3described the involvement of parental social networks in the school choice process. Despite this known, not much research has been conducted into the charac-teristics of these social networks. A study by Sheldon [62] focused on the influence of a parental social network on the involvement of parents at school. This was done by analyzing data from survey responses from 195 parents from elementary schools in the United States, containing information on which other parents they discussed their child’s education with. He found the size of the parents’ network was positively correlated with the involvement of parents on their child’s education. The study did not, however, study other properties of the social network.

Since the common characteristics of parental social networks are unknown, it is difficult to perform validation of network models with high certainty. Instead, the common, and realistic, network properties as described in section 2.3 are used to validate the parental social networks created. The networks should have the small-world property. It’s degree distribution should follow a power law distribution for low to medium high degrees. The network should have a high degree of clustering. Lastly, the network should have a positive degree assortativity. This will ensure the model creates networks that are realistic with the assumption that parental social networks follow the same characteristics as other common social networks.

(46)

(47)

Methods

To study the influences of social dynamics in school choice, an agent-based model has been implemented. As a starting point, the COMPASS model was made. This model is a recreation of the model made by Athieniti [3], which is in turn based on a model developed by Stoica and Flache [4]. Stoica and Flache [4] and Athieniti [3] used their models to study the relative importance of both ethnic composition in and distance to schools in the school choice process. It has been refined to allow easy implementations of new extensions. An in-depth description of the COMPASS model is given in section

4.1.

Additionally, an artificial parental network is created based on homophily and residential patterns. The algorithm used to create this network is discussed in section 4.2. The implementation of the network in the decision process of parents in the COMPASS model will then be discussed in section4.3

4.1 COMPASS model

The COMPASS model is a combination of a residential choice model with a school choice model. During the residential part of the model, household agents move around in a spatial environment to find the location that meets their satisfaction based on their sur-rounding. During the school choice part of the model, the students move around different schools till they find a school that meets their satisfaction based on the school location and its composition. The detailed processes in which this is simulated is discussed in this section.

(48)

4.1.1 Global model overview

Figure 4.1 gives a visual representation of the OOP structure the model is built with. It shows the interactions between different parts of the model and the relations between the different classes/objects. The high level controls are performed by the object called ‘SchoolSegregation’, which stores the environment and directs the agents during a run. Two of the classes displayed, the Neighbourhood and School class, are part of the envi-ronment. The Student and Household class represent the agents in the model and have an aggregated relation.

Figure 4.1: UML diagram representing the COMPASS model.

The environment is represented by a 2D grid with discrete grid cells using periodic boundary conditions. Each cell can be empty, contain a single household or contain a single school. An example is given in figure 4.2. The amount of households on the grid is determined by the density d. The grid is divided into nneighbourhood square

neighbourhoods and a total of nschool homogeneously spaced schools are placed. In the

simulations performed, nneighbourhoods= nschools, which results in a school at the centre

of each neighbourhood. Each school has capacity for a limited number of students. This capacity c is defined as the fraction of agents in the school’s catchment area.

4.1.2 Agents

There are two agents in the simulation: the household and the student. Each student is part of a household. During the residential movement steps, a household moves as a whole to a new grid cell. All students of the household move with the household. School

(49)

Figure 4.2: Visualization of the 2D grid with 40 by 40 cells, filled with blue and yellow agents with capacity c = 0.9 and 4 schools (red cells).

choices, on the other hand, are made for a single student. It is therefore possible for different students in a single household to be enrolled in different schools. Although a household can hold multiple students in reality, it is fixed to a one-to-one relation for the simulations used in this thesis to prevent unexpected complex behaviour from that part of the simulation.

The agents are assigned a certain group. This group can represent any personal trait, such as ethnicity, socioeconomic status or education level of the parents. These traits are the same for both the students and the household they are a part of. For the rest of this thesis, we will use agents of a blue and orange type which are equally represented in the grid.

All agents have the objective to find a situation in which they are satisfied. The satis-faction is always based on a utility score which indicates an agent’s score of its current surrounding. Since the model is two-fold, the “surrounding” can either describe a agent’s residential surrounding or the schools attended by the agents. How these utilities are exactly defined, is described in the paragraphs below.

4.1.2.1 Residential utility

During residential steps in the simulation, the utility Ur of an agent is based on the

fraction of agents of the same group in its surrounding. This ethnic preference P is a single peak function defined by equation 4.1[4]. The shape of the function is shown in figure4.3.

(50)

Ur= P (x, f, p, M ) =    x f p, x ≤ f p M + (p−x)(1−M )_{p(1−f )} , x > f p (4.1)

In this equation, p denotes the total number of agents in the neighbourhood, f the opti-mal fraction of agents of the same type, M the utility in a homogeneous neighbourhood and x the number of agents agents in the neighbourhood of the same group. Note that x can be a weighted average between the local composition and bounded composition such that

x = bxbounded+ (1 − b)xlocal, (4.2)

where b is used to specify the weight on the compositions. The function for the residential utility builds on the assumption that people do not favour a homogeneous neighbour-hood with individuals of the same group, but often prefer a mixed neighbourneighbour-hood [59]. Therefore, the utility is maximized for a certain fraction f , and a penalty is applied when the fraction of same group individuals exceeds this fraction. The function linearly increases from the origin to one at the optimal fraction and linearly decreases to M when the composition is completely homogeneous.

Figure 4.3: Residential utility based on composition of a neighbourhood or school. Variable p represents the total number of agents in the environment, f the preferred fraction of same-group individuals and M the desirability of a homogeneous

environ-ment.

4.1.2.2 School utility

Similar to the residential utility of an agent, the school utility of an agent is dependent on the composition of its surrounding. In this case, the surrounding is defined as the