Computational framework for modeling psychological and social dynamics

(1)

Computational framework for modeling

psychological and social dynamics

Author:

Mathijs Maijer

Supervisors: Dr. Michael Lees Maarten van den Ende

Abstract

Advancements in complexity science, formal theories, network science and data col-lection technologies make network analysis all the more important. One of the still open problems in network science is how to model, in a generic way, the coupling between dynamics on the network and dynamics of the network. Unfortunately, there is no main-stream framework readily available to support simulation and analysis of general dynamics on and of the network. DyNSimF allows modeling of generic methods for dynamics, while including a large amount of functionalities such as automatic network formation, sensitiv-ity analysis, and visualization. To analyze generic methods for modeling these dynamics, this thesis reimplements and extends a psychological model as well as an opinion dynamics model. Network formation based on utility and cost are explored using a school segrega-tion model as well. After analyzing the core elements of models utilizing both social and psychological dynamics, the developed computational framework allows for modeling of dynamic social networks with inter- and intra-personal dynamics in a generic way. Keywords: dynamic networks, Python, software package, simulation.

(2)

1. Introduction

The importance of complexity science has increased over the past decades as more fields col-laborate to solve complex problems (Cohen and Havlin 2010;Forsman, Moll, and Linder 2014; Fortunato 2011;Fan, Meng, Ludescher, Chen, Ashkenazy, Kurths, Havlin, and Schellnhuber 2020; Arthur 2014). Networks are often at the root of models of complex phenomena. For instance, Social Network Analysis (SNA) has been used to model social and psychological phenomena. Interestingly, it has been shown that an integration of social science and psycho-logical models is required to model phenomena such as of depression or segregation (Cooley 2007;Clark 1991;Tennen, Hall, and Affleck 1995). Thus networks are not only used to model social relationships, but also to model inter- and intrapersonal dynamics (Olivar-Tost, Gómez-Gardeñes, and Hurtado-Heredia 2018). One of the still open problems in network science is how to model properly, in a generic way, the coupling between dynamics on the network and dynamics of the network (i.e., structural changes). As an example, one might consider how people adapt their friendship groups in response to information (or rumors) spreading, or how people adapt their contacts in response to a spreading virus. Progress in this field requires new modeling techniques and software tools.

There are different programming languages that enable a user to model and simulate networks. We focus on Python as it is one of the fastest growing languages used by, among others, scientists and engineers (Millman and Aivazis 2011;Srinath 2017). Python is relatively easy to learn due to its simplified syntax with emphasis on natural language. At the moment, there are several Python solutions available to model or simulate network phenomena (Rossetti, Milli, Rinzivillo, Sîrbu, Pedreschi, and Giannotti 2018;Ahrenberg, Kok, Vasarhelyi, and Rutherford 2016;Miller and Ting 2020;Dobson 2020). Unfortunately, most of these packages have clear limitations or have a different focus. For example, they may only support static networks, have no visualization framework, only focus on epidemic models, or have no support for continuous states. This leaves a few things to be desired, especially for modeling that integrates inter-and intra-personal dynamics.

This is where the dynamical network simulation framework (DyNSimF) comes in, the first open-source package that allows modeling of and on the network dynamics in a generic way. It thus allows for the modeling of dynamic social networks with inter- and intrapersonal dynamics. Furthermore, it comes packed with analysis tools and allows for the validation of model predictions. Visualizations are available, together with sensitivity analysis, and calibration options. By utilizing this package, modelers can focus less on how to implement their model and more on the model itself. Using the package will increase maintainability as well as readability, which will enhance cooperation as well as further advancing created models. Another goal is to make modeling accessible to users who are not experts in programming by providing a platform that is easy to work in. This will hopefully accelerate the creation and development of models in this field.

The main methodology is explained in section2. In subsection2.3, the key concepts of utility and cost in a social network are explained. Extra functionalities such as property functions, visualization and sensitivity analysis are introduced in subsection 2.5. Several examples are provided in section 3, demonstrating the general use and flexibility of DyNSimF. Related software will be discussed in section 4; the specifications of other solutions will be described, as well as reasons to consider using DyNSimF instead. Implementations and execution times will be compared to the other software tools to provide a clear overview. The discussion of the

(3)

results can be found in section 5. Afterwards, to provide more detail on the topics touched in the first sections, section6 will focus more on the research that was done for this thesis, as sections2to5 mainly focus on the computational framework.

1.1. Thesis contributions

The research shown in this thesis was executed with cooperation from the Centre for Ur-ban Mental Health (UMH). The UMH aims to understand and intervene upon the mental health problems in an urban environment. Complexity science can be used to understand and intervene upon the complexities and dynamics of mental health problems in an urban envi-ronment. The goal of the research presented in the appendix is to analyze the combination of social and psychological dynamics in a model. Models often focus on either a psychologi-cal aspect, or a social aspect, however, by combining the two different dynamics in a single model, more complex dynamics can be captured and analyzed. This thesis revolves around answering the following research question: How can the dynamics on and of a network be modeled in a generic way. This research question has the following sub-questions: can these models be implemented in a single computational framework, can a model featuring only a single dynamic be transformed in a model that utilizes both dynamics, and how are models with static networks impacted when they are changed to incorporate dynamic networks. We hypothesize, that by using a concept of updates to change internal states, as well as network structures, that general dynamics of and on a network can be modeled. By combining these updates with a concept of utility and cost, that will be discussed later, a generic method for coupling social and psychological dynamics will be created.

The findings acquired by solving these questions can be applied in many areas. Many phe-nomena relating to urban mental health show both psychological dynamics as well as social dynamics. The psychological dynamics can be understood as dynamics applied on a network, while the structural changes of the network caused by social dynamics can be seen as dynam-ics of the network. For example, in the case of addiction there is clear impact of addictive substances on the psychology of a user (Koob and Volkow 2010). But addiction does not only affect someone’s psychological dynamics, it may also affect someone’s social dynamics. This could be in the form of self-isolation (Tateno, Teo, Ukai, Kanazawa, Katsuki, Kubo, and Kato 2019), or a search for new connections that share the same substance abusive behavior (Jones 1994). Currently there are different models that either focus on describing the psychological dynamics (Grasman, Grasman, and van der Maas 2016; Solomon 1977; Koob 2009) or the social dynamics (Burke and Heiland 2007; Merry 1972; Rosenquist, Murabito, Fowler, and Christakis 2010), however, there are no clear strategies to combine the two. By analyzing the intra- and interpersonal dynamics of a specific phenomena, such as addiction, more underly-ing mechanics should become clear. This will benefit society with more approaches to new interventions and policy making towards the promotion of mental health.

The first sections (2 - 5) of the thesis are written in the format of a paper and focus on the computational framework that was created in the process of researching psychological and social dynamics. The main goal of these sections is to show the novelty of DyNSimF and how it can catch dynamics on as well as of networks. The reason the thesis was structured this way, is due to the aim to publish this thesis as a paper in the Journal of Statistical Software 2021, which explains the focus on the software and less so on the surrounding research and

(4)

its applications. To compensate a potential lack of background information or applications of the software, section 6 contains multiple subsections to provide this. The thesis part of the paper, which starts from the appendix, is similar to the paper but contains much more details.

This thesis has two main contributions: one is the computational framework DyNSimF which was created in the process of the research and the other contribution are several case-studies in the form of reimplementations and extensions of several models using DyNSimF. While the paper sections have a focus on solely the software and its capabilities, the thesis sections in the appendix focus more on answering the research questions and the applicability of the software. Two different models are reimplemented and extended using DyNSimF in section 6.3. Next to that, the example from section 3.2is validated and analyzed according to real-world data in section6.4to analyze the workings of a model relying on utility and cost. This is in contrast to the examples from section3, which are basic reimplementations for the sake of showing the capabilities of the developed software. The examples in the appendix contain much more detail with regards to the research as they are thoroughly analyzed.

2. Methodology

In this section the main reasons for creating and using DyNSimF are explained, as well as the required components to create a model. Several extra functionalities of the package are also shown, which help with validating and simulating a created model.

2.1. Rationale

The aim of DyNSimF is to provide an open-source framework for generic methods for dynam-ics on and of a network. By utilizing this package, modelers will have to focus less on how to implement their model, allowing them to spend time much more efficiently by only having to think about their model and less about the implementation. This will increase maintain-ability as well as readmaintain-ability, which will enhance cooperation as well as further advancing created models. Another goal is to make modeling accessible to users who are not necessarily programming or engineering experts and to give them a platform that is easy to work with. This will accelerate the creation and development of models as less experts are necessary for the actual implementation and simulation of a model.

2.2. Components

There are several main components to DyNSimF that allow modelers to create a plethora of different models. This subsection will describe the general workflow of implementing a model using the package. The components mentioned in the modeling process will then also be described individually with enough code to utilize them. To allow maximum flexibility when creating models, there are several available components such as: updates, conditions, and schemes. Updates are used to apply changes on and of the network. This could be in the form of structural network changes, as well as state changes of individual nodes. When an update should take place and whom should be affected, are controlled using both conditions and schemes. Conditions allow different nodes in the network to change when updates take place, based on certain user-specified values. Additionally, schemes control in which part of

(5)

the simulation updates should be active. As can be seen from figure 1, a scheme can have multiple updates and an update may have multiple conditions. The schemes can be used to sample a set of the nodes for updates and determine when the updates are active using a lower and upper iteration bound. The updates apply changes on and of the network. To control which specific nodes or agents are updated, an update can use one, or multiple conditions.

Figure 1: The basic objects of a model created using DyNSimF. A scheme may have multiple updates. Schemes control which nodes are sampled for updates, as well as when the updates should be active. The update objects themselves apply changes on and of the network. An update may contain multiple conditions, that determine which specific nodes can be updated. This can be decided using user-defined values, e.g: a specific threshold of a state.

Process

The standard approach for creating models using DyNSimF will be outlined in this section. By following this process, clear steps can be taken to create a model in a systematic fashion, while retaining clean code with a clear overview. Some parts of this process are mandatory, but most of it is optional. Every model that is created using DyNSimF should at least contain a NetworkX (Hagberg, Swart, and S Chult 2008) graph. The developed framework tracks all the nodes’ states and relations using this graph, which is why it is mandatory to first define a graph. If a model only has one individual, the network can simply be one node. If multiple independent processes should be simulated, a network with multiple nodes without edges can be used instead.

There are two parts, or sub-models, that a model can have; an internal model and a utility cost model. At least one of these has to be defined, but they can, and typically are, used together. The internal model will define how all the internal state variables change for each node. Manual relationship changes between nodes can also be defined in this internal model. The second part, a utility cost model, should be used when the network should automatically reconfigure itself by changing the relationships of the nodes as to maximize a user defined utility for each node. These sub-models essentially define how the states of the nodes change and how the network structure changes. The interaction of these models defines how the network structure reacts to changes in node state and how node state is affected by network structure. Both sub-models and their respective parts will be explained in more detail in later sections. For more information on literature about these models, section 6.2can be visited. The general modeling process for a model that contains both an internal model and a utility cost model is outlined as follows:

• Configure the package • Define a graph

(6)

– Add internal states

– Define constants and initial values – Create updates and conditions – Add iteration schemes

• Create a utility cost model – Define utility and cost – Choose an iteration method – Choose a sample method • Simulate

• Analyze / Visualize

Setup

After installing the package, the first step is to setup a model. The setup involves several steps, that each serve different purposes. Normally, the first thing to do is to import all the required components that a model will need. Every subsection that describes a component will also contain the location of where it can be imported from. Every model will need at least three imports, the model class, the configuration class and the NetworkX package. This can be done as follows:

import networkx as nx

from dynsimf.models.Model import Model

from dynsimf.models.Model import ModelConfiguration

Note that every time a NetworkX feature should be accessed, it can be done by typing nx instead of networkx due to its import. Now that the imports have been made, the next step is to create a graph. NetworkX contains a lot of documentation and examples on how to create a graph that fits the requirements for the desired model. One possible example of defining a graph would be: g = nx.erdos_renyi_graph(n=1000, p=0.1), which would store an Erdős-Rényi graph (Erdős and Rényi 1960) or a binomial graph with 1000 nodes and a 0.1 probability for edge creation into g.

After the network has been defined, the Model class should be configured using the ModelConfiguration class to determine which data should be kept in memory. A very basic setup would simply

include creating a ModelConfiguration object and then creating a Model object using the defined NetworkX graph and the ModelConfiguration object.

g = nx.erdos_renyi_graph(n=1000, p=0.1)

cfg = ModelConfiguration() # A configuration object for the model model = Model(g, cfg) # Initialize a model object

Most of the storage/memory configuration of the model, can be specified using the previously empty ModelConfiguration object. This mainly serves as configuration for which parts of

(7)

the model that should be kept in memory or written to disk. There are four quantities that can be stored into memory during the simulation: states, adjacency, edge values, and utility. By default, only the states of each node are stored in memory for each simulation step, by using the ModelConfiguration object this can be specified by the user.

Almost any configuration specification from a user is done by using Python’s dictionaries. These data structures allow values to be paired to keys. By inputting a key in the dictionary, the value will be returned. The keys here refer to the option that should be specified, while the values are used to configure them. In this case the ModelConfiguration class takes in a dictionary when an object is created. In the case of memory/storage configuration, there are four possible keys that the are expected: state_memory_config, adjacency_memory_config, edge_values_memory_config, and utility_memory_config. A value that should be tied to this key should be an object from the MemoryConfiguration class. This class requires two arguments, one indicating which type the configuration is for, while the second argument is another dictionary for all the specific options. For example, to create a model, where each simulation step the adjacency of the network should be kept into memory, would look as follows:

from dynsimf.models.Model import Model

from dynsimf.models.Model import ModelConfiguration

from dynsimf.models.components.Memory import MemoryConfiguration from dynsimf.models.components.Memory import MemoryConfigurationType cfg = { 'adjacency_memory_config': \ MemoryConfiguration(MemoryConfigurationType.ADJACENCY, { 'memory_size': 0 }), }

model = Model(g, ModelConfiguration(cfg))

There are several options, or keys, to configure the MemoryConfiguration: save_disk, memory_size, save_interval, and memory_interval. The save disk option is a boolean value that allows the user to have all the values saved to a file on disk during the simulation. The memory size can be used to set the max amount of iterations the values should be stored into memory, where a -1 indicates it should not be saved at all and a 0 or 1 indicates it should store every iteration. The save and memory interval values determine how many iterations should be between storing.

At this point, all the actual importing and basic configuration has been completed. The next steps to complete the setup are to define any constants, states, and the initial values of the states. To define constants, a simple dictionary can be used to store values under a variable name, or key. Then the constants can be added to the model using the Python syntax to create a class member like so:

# Create the constants dictionary constants = {

'constant_1': 0, # this can be any name or value 'constant_2': 1 # this can be any name or value

(8)

}

model.constants = constants # Set the constants

In the example above, strings are used as keys, while integers are used as values, but this could be any key or value the user desires, as this is completely user-defined. The final part of the setup involves setting up the states of the nodes and initializing them. To tell the package which states are used, the model.set_states() function should be used. It takes a list of strings with the states as input, like so: model.set_states([‘state1’, ‘state2’]). Before the simulation starts the states need initial values, which are set using the model.set_initial_state() function. The first argument of this function is a user-specified function that should return a dictionary where each key refers to the same state that was set and a value that should contain a list of values with the same length as the amount of nodes in the network. The second argument is a dictionary that specifies the arguments of the user-defined function in the first argument.

Updates

Updates are the main logic of most models and determine how states are changed during the course of the simulation, as well as how the network should change. To update any of the states defined in the setup process of section 2.2.2, a user defined function can be created. This function should return a dictionary where the keys refer to the state, while the values are a list indicating the values for that state per node. This means that one function can update all of the states at once. This is optimized using Numpy (Harris, Millman, van der Walt, Gommers, Virtanen, Cournapeau, Wieser, Taylor, Berg, Smith, Kern, Picus, Hoyer, van Kerkwijk, Brett, Haldane, del R’ıo, Wiebe, Peterson, G’erard-Marchant, Sheppard, Reddy, Weckesser, Abbasi, Gohlke, and Oliphant 2020) arrays.

After a function has been created, it can be added to the model using the add_update() command of the Model class. The first parameter of this function is the update function created by the user. The second parameter is a dictionary that contains key/value pairs that should be passed to the function. E.g., if the update function has a parameter called constants, the second argument of the add_update() function will take in a dictionary with a key called ‘constants’ of string type and the value will be passed as argument.

An example is shown below. A state called state_1 is added to the model, then a user defined function is created that has constants as a parameter, which is used to create 3 new values. The return statement of the function shows a dictionary with a key that refers to the state being returned. The function is then added to the model using the add_update function and the second argument is a dictionary that contains the values that should be passed to the function. A third argument, a boolean called get_nodes, is optional; but if set, the update function is called with a list of nodes that will be updated using the function, which means that the update function should have one more argument. In this example there are three nodes in the network, which is why only 3 values are returned.

model.add_states(['state_1']) def update_1(constants):

new_values = [constants['c'], 2, 3] return {'state_1': new_values}

(9)

As many updates as desired can be added to the model and the content of the updates is completely defined by the users. The only actual requirements are that an update should return a dictionary with at least values for one state in the model and that the amount of values returned should match the amount of nodes in the network.

When the setup has been completed and updates have been implemented, it becomes possible to simulate a simple model. The next subsections will explain how to model more complex situations, but if a model only contains logic that can be modeled using the updates explained in this subsection, it can be simulated. By taking the model object that has been created using the Model class, a simulate function can be ran with the amount of iterations like so: model.simulate(N). N here refers to the amount of iterations the model should be ran for. The output of this function can be stored in a variable so that the results can be used for further analysis or visualization. The output has the following structure, it is a dictionary containing three keys: states, adjacency, and edge values. Each key has another dictionary as value, that contains a key value pair for each iteration, where the iteration is the key and the value is a nodes × states matrix A such that its element A_ij shows the value for state j of node i. The order of the columns, the states, matches the order of the states set using the set_states() function.

Conditions

Conditions serve as potential triggers for updates on specific nodes that meet the conditions. Some updates should only occur when specific conditions are met for certain nodes. To achieve this, nodes can be filtered using conditions that can be potentially chained together and linked to specific update functions. Every condition can be seen as a filter and only the nodes that meet all the conditions are updated by the update function that the conditions apply on. There are three condition types that can be used.

• Stochastic condition

This is the most basic condition, that allows the trigger of an update for each node based on probability. It only requires a probability P to be satisfied. A random value

p is generated in the continuous range [0 − 1] for each node in the network. For every

node where p ≤ P is satisfied, the update will be triggered. • Threshold condition

The threshold condition filters nodes based on a chosen variable, threshold and operator. The threshold can be any given value and the possible threshold operators are: ≤, <

, >, and ≥. The variable can be a constant, or a selected value from the model, for

example: a state of each node. Each node will be tested by using the operator on the variable and threshold. Then, the updates will be applied on each node that satisfy the condition.

• Custom condition

Custom conditions give complete freedom to the user to filter nodes to apply an update on by defining a custom function. If the model requires conditions that do not fit the stochastic or threshold conditions, a custom condition can be defined. The only requirement for the function is to return a list of the NetworkX nodes in the graph that

(10)

satisfy the condition. Custom conditions are executed on every node each time step, just like the other conditions, which means that they can be efficiently implemented.

Schemes

Schemes are used to achieve two things, they set iteration boundaries on updates and they sample nodes. The iteration boundaries allow updates only to be executed when the current iteration is equal or larger than the bound and less than the upper bound. The sampling function of a scheme allows the user to filter and order the nodes in a specific manner before updates are executed on them. In general, this means that a scheme can have multiple updates, while an update can have multiple conditions. Updates are only executed if a scheme is active and they are only executed on nodes that were sampled by the scheme and meet the conditions attached to the update.

In the example above, it is shown how a scheme with updates can be added to a model. When the Scheme and Update classes are imported, first an update object can be created, which in this case is linked to a previously defined function named update_fun. Then a scheme object is created, which takes the sample function as the first argument. The second argument is a dictionary that takes four keys: the arguments for the sample function, the lower and upper bounds, and a list of updates. When the scheme is created, it can be added to the model using the add_scheme function.

from dynsimf.models.components.Scheme import Scheme from dynsimf.models.components.Update import Update u = Update(update_fun)

s = Scheme(sample_function, {

'args': {'graph': model.graph}, 'lower_bound': 0,

'upper_bound': 10, 'updates': [u] })

model.add_scheme(s)

In the example the update function update_fun would be executed during the first 10 it-erations (0 - 9). In this case the function sample_function takes in one parameter called graph, which is the model’s graph. When creating a scheme, any of the keys except updates can be left out. If the lower bound is missing, it will assume the scheme should be active from iteration 0, while if the upper bound is missing it will be active until the end of the simulation.

Manual Network Updates

DyNSimF is intended to build models that involve dynamic networks, where nodes are re-moved, added, or have their connections changed. This part will explain how one can dy-namically change the network using updates. DyNSimF also has the ability to dydy-namically reconfigure the network, but this will be explained in section2.3. For now, we will show how the network can be manually updated. To add a network update function to the model, the add_network_update() function should be used, where the first argument is the function

(11)

and the second argument is an optional boolean to get a list of updated nodes as argument for the function. To update the network using a function, a dictionary should be returned with specific keys indicating what kind of change the values of the dictionary are. There are three options: ‘add’, ‘edge_change’, and ‘remove’. The remove option is the most straight-forward, if this key is included in the dictionary that is returned by the function, then the corresponding value should be a list of the names of all the nodes that should be removed from the network. When using the ‘add’ key, the corresponding value should be a list of dictionaries, where each dictionary inside the list will serve as the initialization for a new node. This dictionary should contain 2 keys: ‘neighbors’ and ‘states’. The ‘neighbors’ key should have a list as value, where each entry in the list should be a name of a neighbor. This way new nodes can immediately be connected to other nodes when they are added to the network. The ‘states’ key should have another dictionary as a value, in which every key can refer to a state with corresponding initial values for that node. This way, nodes can now be added to the network and their connections and states can immediately be initialized. Finally, the last key ‘edge_change’ can be used to update the connections of individual nodes. This is done by creating a dictionary as the value for the ‘edge_change’ key, where every key should correspond with the name of the node that should be updated in the network. Then for each node in the dictionary, another dictionary with 3 possible keys can be created: ‘over-write’, ‘add’, and ‘remove’. This allows the user to add or remove nodes from its connections, or to overwrite the complete set of neighbors with new ones. All three keys should have lists as values, where each list contain the names of the nodes to add, remove, or overwrite. If a weight should be added for a certain edge, the value should be a list containing the node name, the in-going weight value and the outgoing weight value.

Using the previously explained concepts of updates, conditions, and schemes, models with dynamics on and of a network can already be implemented. An example simulation timeline is shown in figure 2. In the figure, two schemes are active during different segments of the timeline. The first scheme is active from iteration 450 to iteration 900 and contains two updates. These updates are thus also only active when their scheme is active. Generally, updates are applied on all the nodes sampled by their scheme, which is by default all of the nodes in a network. If there are specific conditions for which nodes an update should be applied on, conditions can be used; such as condition 1 in the figure. Condition 1 filters the first 6 nodes in this case, which allows update 1 to only affect those nodes. These conditions could be of different types, such as only applying an update based on chance, or a certain threshold of a state. Update 2 and update 3 are applied on all of the nodes in the network, albeit at different segments of the simulation as dictated by their schemes.

2.3. Dynamic Network Formation

There are many models that heavily rely on social networks. Social networks can be impor-tant determinants of individual traits, such as socioeconomic performance. For example, as shown by Topa (2001); Cooley (2007); Acemoglu, Dahleh, Lobel, and Ozdaglar (2011), the number of social ties may affect employment prospects, job performance, risky behavior and health outcomes. By being able to infer this kind of data from a social network, it becomes possible to create models that use this data. Unfortunately, complete real-life social network information is not always readily available. Instead of using actual social networks, there has been a lot of research in how social networks can be formed and generated. One approach to

(12)

Figure 2: An example simulation timeline showing the three concepts of schemes, updates, and conditions. The lower part of the figure indicates nodes in a network that are affected by the updates. In this instance, the first scheme is active from iteration 450 to iteration 900. This means that update 1 and 2 are also only active during these iterations. Update 1 has a condition attached, which limits the nodes affected by this update to the ones in the circle. Update 2 affects all nodes in the network. The third update, because it is connected to the second scheme, affects all nodes, but is only active from iteration 2000 to 5000.

analyze existing networks, as shown byBala and Goyal (2000);De Marti and Zenou (2009); Currarini, Jackson, and Pin(2010);Jackson(2010), is to interpret an observed network as the equilibrium of a game. As shown in the literature mentioned, there are different approaches to network formation: strategic or random. Strategic models approach network formation using game theory (Myerson 2013), where individuals choose who to form connections with based on weighing the costs of doing so against the potential benefits, or utility. When using a random network formation approach, a connection between individuals can occur depend-ing on probability, which would make the network formation a stochastic process. Based on the previous research, a model of strategic network formation with heterogeneous players has been created by Mele (2013). His research shows that the empirical model of network formation converges to a unique stationary equilibrium, which should be a good indication

(13)

for a generated network that reasonably matches a real-life network with the same properties. Even thoughMele (2013) has used his network formation model to study segregation within social networks, it can be used for many different models that rely on the same principles. This section will explain how DyNSimF enables modelers to form new social networks using the principles described by Mele (2013) whilst still being able to use all of the previous functionalities of the package. There are several assumptions that come with this method, such as that there is no inherent spatial component and it is assumed that all nodes in the network interact in a common social context. However, additions such as a spatial component can be modeled using less straightforward means, which will become clear later. The basic idea that DyNSimF leverages is that every node in the network is seen as an agent that aims to maximize its own utility. An agent’s net utility is determined by the sum of the utility from its connections. Each agent can gain a certain utility by forming new connections, but the action of forming a new connection comes with a certain cost. There is one global cost threshold defined, which may not be exceeded by any agent after summing their total costs. In social networks this cost represents a maximum amount of effort or time that an individual can spend making and maintaining friendships. This means that agents in the model will form new connections to maximize their utility, whilst staying below a cost threshold based on the cost for each connection they currently maintain.

To utilize the network formation functionalities of DyNSimF, several things should be config-ured. A utility and cost function should be defined, as well as a selection for how agents will determine which other agents are eligible for a connection. There are two options available for the definition of the utility/cost functions, they can either return a value for node pairs, where each pair of nodes will be provided as arguments to the function, or they can return a complete matrix. In the case of a matrix M , it should be of the form n × n where the network has n agents. Each entry Mij should then represent either the utility that agent i

will gain by forming a connection with agent j, or the cost of maintaining / forming a new connection. After defining the utility function and the cost function, the next step is to select the restrictions for how agents can form new connections with other agents. There are three options available: the first option allows every agent in the model to form a new connection with any other node, the second option only allows new connections when two agents have a common connection, and the third option is to define a custom method that determines eligibility.

Note that during the network formation process, all the functionalities such as updates, conditions, and schemes are still available. This means that the states, or characteristics of nodes inside the network can still be updated while the network is dynamically re-configuring itself to maximize the utility of every node. To add dynamic network formation to a model, instead of using the previously shown Model class, another class called UtilityCostModel should be used instead. This can be done as follows:

from dynsimf.models.UtilityCostModel import UtilityCostModel from dynsimf.models.Model import ModelConfiguration

from dynsimf.models.components.Memory import MemoryConfiguration from dynsimf.models.components.Memory import MemoryConfigurationType cfg = {

(14)

MemoryConfiguration(MemoryConfigurationType.ADJACENCY, { 'memory_size': 0 }), 'utility_memory_config': \ MemoryConfiguration(MemoryConfigurationType.UTILITY, { 'memory_size': 0 }) } cost_threshold = 1

model = UtilityCostModel(g, cost_threshold, ModelConfiguration(cfg))

As can be seen from the code above, to initialize a model that will use dynamic network formation, a cost threshold should be defined. If the adjacency and utility values should be stored during the simulation, to analyze or visualize, it is necessary to configure it in a way where they are stored for every iteration, because this is disabled by default. The next step is to define a utility function and a cost function and add them to the model. As said before, this function can either return a matrix or one value by taking in a node pair as argument. To tell DyNSimF which function is being used, the FunctionType enumeration class is used. As shown below, after importing FunctionType and defining a utility and cost function that return a (in this case random) matrix, they are added to the model using the add_utility_function() and add_cost_function() functions. These functions take their respective custom defined functions as the first argument, while the second argument should indicate what kind of function it is. There are two options for this: FunctionType.MATRIX and FunctionType.PAIRWISE. In case of a pairwise function, the utility or cost function should have 2 parameters, one for agent i and the other for agent j.

from dynsimf.models.UtilityCostModel import FunctionType from dynsimf.models.UtilityCostModel import SampleMethod def utility_calculation():

return np.random.random((n_nodes, n_nodes)) def cost_calculation():

return np.random.random((n_nodes, n_nodes))

model.add_utility_function(utility_calculation, FunctionType.MATRIX) model.add_cost_function(cost_calculation, FunctionType.MATRIX)

After adding the functions to the model, the final step is to select the manner in which agents should sample eligible connections. This can be done using the set_sampling_function(), which has one argument: an option of the SampleMethod enumeration class. This class has three options: SampleMethod.ALL, SampleMethod.NEIGHBORS_OF_NEIGHBORS, and

SampleMethod.CUSTOM. In case of the custom sample function, a second argument should be provided with a custom function that returns an n × n binary matrix g, with entry g_ij = 1 if agent j is an eligible connection for agent i. When the sample method has been selected, the model can be simulated as usual using the simulate function. All the other functionalities, such as adding and initializing states, updating them using schemes and conditions, also

(15)

still work alongside the dynamic reconfiguration of the network. First all the updates are calculated for each node and state and afterwards the network re-configures itself in a way to optimize the utility of the nodes. Section3.2 shows an example of a model describing school segregation dynamics that relies on these concepts of utility. This model is further analyzed in section 6.4, where the impact of, for instance, the cost function is analyzed with regards to the simulation outcomes.

2.4. Helper functions

Many models require access to different parts of the network, such as neighbors of a node, or the values for different states of a node during the past iterations. To help modelers with accessing this data, several functions that allow a user to easily access this data have been implemented. A list below shows the most prominent functions that users may generally use when creating a new model.

• get_state(state_name) Gets all the values of a state. This function is generally used to update the state for all nodes by retrieving the current values and modifying them. • get_previous_nodes_states(n) Get all the nodes’ states from the n’th previous saved

iteration. If T is the current iteration, then the states from iteration T − n will be returned. Using this function, it becomes possible to use differential equations or get the average state values for the nodes for the past few iterations for example.

• get_adjacency() Gets the adjacency matrix. This is a square n × n matrix A such that its element Aij is 1 when there is an edge from vertex ui to vertex uj and 0 when

there is no edge (Biggs, Biggs, and Norman 1993, definition 2.1, p. 7).

• get_neighbors(node) Get all neighbors of a node. This function returns a list of the NetworkX node names that are adjacent to the node provided as the argument to the function.

• get_neighbors_neighbors_adjacency_matrix() This function provides an adjacency matrix for neighbors of neighbors. This is also a square n × n matrix A such that its element Aij is 1 when there exists a vertex uk that has an edge from ui to uk and there

is an edge from vertex u_k to vertex u_j. It is 0 when there is no such vertex.

• get_neighbors_neighbors(node) This function can be used to only retrieve the neigh-bors of neighneigh-bors of one node. It returns the row of the neighneigh-bors of neighneigh-bors adjacency matrix that matches the node given as the argument, but provides only a list of the NetworkX names of the neighbors of neighbors.

• get_utility() This function returns a square n × n matrix A such that its element Aij

shows the currently calculated utility for the edge from vertex u_i to u_j. In subsection 2.3 the concept of utility is explained.

• get_previous_nodes_utility(n) This function returns the matrix from the function get_utility(), but from a previous iteration that is provided as argument. If T is the current iteration, then the utility from iteration T − n will be returned. Note that the memory has to be configured to store at least as many iterations as T − n.

(16)

2.5. Extra functionalities

The previous subsections have shown how a model can be created and simulated. After the simulation, there are many different steps that can be taken. This section will, among other things, explain how DyNSimF can be used to analyze and visualize the results so that meaningful conclusions can be taken. It will also show some other tools that can help with gathering desired results, such as visualization, metric collection, or sensitivity analysis.

Property functions

Property functions are user-defined metric functions, or measures, that allow the user to keep track of properties of the model during the simulation(s). This way, different algo-rithms can be implemented to, for example, keep track of network coefficients, network scale metrics, or any other measures. A property function consists of four things, a name, a function to execute, the arguments of the function, and an iteration interval. The name is used to keep track of which values correspond to which property and the iteration in-terval is used to define when the function should be executed during the simulation. A property function object, which can be added to the model later, can be created using us-ing its class. After importus-ing the class, an object can be created by makus-ing an instance as follows: PropertyFunction(’property name’, custom_function, iteration_interval, ’param_1’: argument). The returned object can then be added to the model using the add_property_function(property_function_object) function. During the simulation, when the simulation iteration modulo the iteration interval equals zero, the property function is executed and the results are stored in a dictionary. This dictionary contains a key matching the name of every property function that was added to the model, while the values are an array containing all the values for every execution of the function.

Visualization

Another functionality of DyNSimF is the visualization of the network and the states of nodes during a simulation. This allows users to easily setup and analyze the behavior of a network in a visual manner. To keep the visualization as flexible as possible, there are a lot of configuration options available. To visualize the network of a model during the course of the simulation, a dictionary containing visualization options should be provided as first argument for the configure_visualization of a Model object. The second argument should be the output from the simulation function. An example of one frame of a visualization can be seen in figure3. A few of the most important settings for the visualization are: the plot interval, that describes interval of the iterations that should be shown, the plot variable, which decides the state that should be used for the color of the nodes, and the plot output, which allows the user to save the animation of the simulation somewhere on the hard disk. When the visualization has been configured, it can be shown using the visualize(’animation’) function on the Model object. This will launch a Matplotlib (Hunter 2007) window, showing the animation. As shown in figure3, the window displays the network, where the nodes are colored according to the selected state; barplots are also shown beneath the network of all the states in the model by default. Using the histogram_states key in the configuration, specific states can be selected if not all states should be shown.

(17)

Figure 3: One frame produced by the visualization tool. It shows the title, current iteration, the network, and barplots of the model states. To the right of the network, a colorbar showing the colorscale of the nodes is displayed. The color of the nodes match one selected state of the model, in this case state ‘A’ that stands for addiction.

Sensitivity analysis is often a part of the complete modeling process and is seen as a good practice. Being able to study the uncertainty of a model’s output based on its input, has multiple benefits. Especially as models grow larger, the relationships between input and output can get convoluted. To gain more confidence in the output of the model, it is often necessary to find the strength and relevance of the inputs in determining the variation in the output (Saltelli, Ratto, Andres, Campolongo, Cariboni, Gatelli, Saisana, and Tarantola 2008).

There are a few options available for sensitivity analysis in Python. One of the more popular ones is the SALib package (Herman and Usher 2017). To help modelers some basic func-tionalities of the SALib package have been incorporated into DyNSimF. This way users can run basic sensitivity analysis on their models, without having to implement this from scratch. The specific method that is implemented using SALib is the Sobol sensitivity analysis (Saltelli 2002;Saltelli, Annoni, Azzini, Campolongo, Ratto, and Tarantola 2010;Sobol 2001).

There are several built-in methods that allow the sensitivity analysis of different properties of the model. Next to the standard input required by SALib, some more settings are required to configure what output the simulation should yield. In general, sensitivity analysis will be performed on the values of the states of the model, the network, or both. To deal with

(18)

multiple nodes, having multiple states, and a model being ran for multiple iterations, there are four different output options for each iteration: the mean, variance, min, and max of the model states. Next to that, custom algorithms can also be used to shape the output of the model states in a format fit for analysis. Another option is to analyze the network as output of the model instead. Algorithms from the NetworkX package can be used to gather the desired results that can be analyzed.

To perform sensitivity analysis using DyNSimF, first a configuration dictionary should be created using the SAConfiguration class. This dictionary should take the relevant parameters that provide the correct values for generating the samples that will be used as input for the simulations. The algorithm that decides the output should then be set, as well as the amount of total iterations and samples. When the SAConfiguration object has been created, a SensitivityAnalysis object can be created using its class. To create this object, it takes in the configuration object as the first parameter and the created model object as second parameter. To run the Sobol analysis, the analyze_sensitivity() function can be called on the object that was just created. This will return the results of the analysis in the same format as SALib.

3. Examples

This section will show how DyNSimF can be used to implement several models from different literature. This section should provide some inspiration on how the package can be utilized to implement different kinds of models that have different requirements. The first example will be an implementation of the SIR model (Kermack and McKendrick 1927) which shows how a basic model can be implemented. The second example is about segregation within schools, which is a version heavily inspired by the model created by Mele (2013). It will show how utility and cost can be utilized by relying on edge values and manual network updates to do manual network formation. This model is validated and analyzed thoroughly in section6.4. Two more examples can be found in section6.3. These two models have been reimplemented and extended as a means of answering the research questions posed in section 1.1. The GitHub1 repository of DyNSimF can be visited to explore the source code of any of the models described in this thesis.

3.1. SIR

The original SIR model was introduced by Kermack (Kermack and McKendrick 1927), it shows how different nodes in a network can change their status from susceptible (S) to infected (I), and finally to removed (R). In this basic model, a fixed population is considered, with only the three compartments S, I, and R. The S compartment is used for individuals who have not been infected with the disease and are thus susceptible to it. The I compartment denotes the part of the population that is currently infected with the disease and are also capable of spreading it to other individuals. Finally, the R compartment is used for individuals who have recovered, and been removed from the disease. In the simple case this is only by recovery, but more advanced versions may also include a reason of death which removes a person from the network. The individuals who have recovered, are not able to transmit the disease or to be infected again. In general, an individual will start in the S compartment, then when the

1

(19)

person is infected, he will move to the I compartment and finally to the R compartment. There are different implementations of this model; the one originally used was one using ordinary differential equations. However, another version was introduced in Milli, Rossetti, Pedreschi, and Giannotti(2018), where the transitions between compartments are a stochastic process, where the disease diffuses through the network. As this version is more intuitive to show how the disease propagates through a network, this version will be implemented. To test the implementation, another version using ordinary differential equations will be implemented as well. The version of Milli et al. (2018) assumes that if, during a generic iteration, a susceptible node comes into contact with a node from the infected compartment, it will become infected with probability β. Every node in the I compartment has a probability

γ to transition to the R compartment. The implementation for this model using DyNSimF

is fairly straightforward. The first step is to initialize the network, model, constants, and initial states. It is helpful to follow the process from section 2.2.1whenever a new model is being created. We first create a network with a thousand nodes (N = 1000) and initialize the model2. Then we set the constants where β = 0.4 and γ = 0.04, furthermore, we will initialize the network with three infected.

# Network definition n = 1000 g = nx.random_geometric_graph(n, 0.05) model = Model(g) constants = { 'n': n, 'beta': 0.4, 'gamma': 0.04, 'init_infected': 3 }

To model the compartments, each node will have a single state that represents its compart-ment. The susceptible nodes will have a state of 0, infected nodes’ state will be 1, and the state of the recovered nodes is 2. We can initialize the state of the nodes by creating an array, where each index will match a node and each entry value will match the compartment the node will start in. First an array is created where every entry is 0, which would mean that the state of all nodes is susceptible. To set the state of some nodes to infected, Numpy is used to sample three random nodes from the network, and set their entries to 1, which will make them infected.

def initial_infected(constants): state = np.zeros(constants['n']) sampled_nodes = np.random.choice(np.arange(constants['n']), constants['init_infected'], replace=False) state[sampled_nodes] = 1 2

Note that the model could also have been designed as a 1 node network, where the processes take place internally. In this case the node could represent a country in an international epidemic model for example.

(20)

return state initial_state = { 'state': initial_infected } model.constants = constants model.set_states(['state'])

model.set_initial_state(initial_state, {'constants': model.constants})

After finishing the configuration of the model and initializing the state of each node, we can write the actual update function that will be executed each iteration. The goal of this function will be to iterate over the susceptible neighbors of every infected node, draw a random sample, and then change the state of the neighbor to infected if the random value is lower than β. After this, for every infected node, a random sample is drawn and for each sample that is less than γ, the corresponding infected node should have its state updated to 2, so that it can become recovered. The code below shows how first the indices of the infected nodes are found using Numpy, afterwards the infected nodes are iterated over and a helper function from section 2.4 is used to iterate over the neighbors of each infected node. As can be seen, the state is updated to infected by changing the entry of the neighbors to 1 if the sample is less than β. Finally, for every infected node, a random sample is drawn. Then, leveraging Numpy, a boolean array is created by comparing each sample to γ. By multiplying the boolean array by 2, all the entries where sample < γ become 2, while the other entries become 0. Because these are all infected nodes, the entries with 0 are changed to 1, so that the non-recovered infected nodes remain infected.

def update_state(constants):

state = model.get_state('state')

infected_indices = np.where(state == 1)[0] # Update infected neighbors to infect neighbors for infected in infected_indices:

nbs = model.get_neighbors(infected) for nb in nbs:

if state[nb] == 0 and \

np.random.random_sample() < constants['beta']: state[nb] = 1

# Update infected to recovered

recovery_chances = np.random.random_sample(len(infected_indices)) new_states = (recovery_chances < constants['gamma']) * 2

new_states[new_states == 0] = 1 state[infected_indices] = new_states return {'state': state}

(21)

Finally, the update function should be added to the model, completing the process. At this point, the model can be simulated, visualized, or analyzed. A code example of a simulation of 100 iterations and a corresponding visualization is shown below:

model.add_update(update_state, {'constants': model.constants}) output = model.simulate(100)

visualization_config = {

'initial_positions': nx.get_node_attributes(g, 'pos'), 'plot_interval': 2, 'plot_variable': 'state', 'color_scale': 'brg', 'variable_limits': { 'state': [0, 2], }, 'show_plot': True,

'plot_title': 'SIR probabilistic model', }

model.configure_visualization(visualization_config, its) model.visualize('animation')

By running all of the code from this example the model can be simulated and visualized. If the Numpy and Python random seeds are set to 0, a snapshot of the visualization at iteration 0 and 14 can be seen in figure 4. From the figure, it can be seen how three initial infected nodes influence the whole network and how the disease propagates. Another plot that is often shown, is the amount of nodes per compartment for every iteration. Figure 5 shows how the amount of susceptible nodes gradually decreases, while the amount of recovered start increasing. The amount of infected nodes increases at first, then gradually starts to decrease as more and more infected nodes start to recover, while there are less susceptible nodes remaining. This figure also shows the output for the other implementation of the SIR model, which uses ordinary differential equations (ODE). The remainder of this subsection will demonstrate how SIR can be implemented using these equations in DyNSimF.

The model using the ODEs was first described inKermack and McKendrick(1927) and uses the same parameters β and γ, albeit with slightly different meanings. β describes the contact rate of the disease. An infected node will contact βN other nodes per unit of time. From the contacted nodes, the fraction that is susceptible to contacting the disease is _NS. γ is the mean recovery rate, which means that _γ1 is the mean period of time that infected nodes spread their disease. The differential equations describing the model are shown in equation1, they describe the rate of change for each compartment.

dS dt = − βSI N , dI dt = βSI N − γI, dR dt = γI (1)

Because the ordinary differential equations on their own can be solved numerically or analyt-ically to obtain the amount of individuals for each compartment, it can not be used explicitly to model the diffusion of a disease through a network, as the equations are not based on an

(22)

actual network. However, because differential equations are often used in different kinds of models, an example will be shown of how the equations from equation1can be implemented as three different states for one node. This means that instead of modeling the dynamics of a disease spreading through a network, the internal dynamics of a node will be modeled. This is done by creating a network with 1 node and three states: S, I, and R. The values for the internal state of this single node, will show the amount of nodes for each compartment. In this example, the differential equations could represent internal processes of different nodes, where nodes could represent countries in meta-population model with a network for example. The first steps remain the same, handle imports, create a network, create the constants, and initialize the states, as shown below:

import networkx as nx import numpy as np g = nx.random_geometric_graph(1, 1) model = Model(g) init_infected = 3 initial_state = { 'S': 1000 - init_infected, 'I': init_infected, 'R': 0, } constants = { 'N': 1000, 'beta': 0.4, 'gamma': 0.04, 'dt': 0.01, } model.set_states(['S', 'I', 'R']) model.set_initial_state(initial_state)

In this case, the single node in the network has three internal states, that are each a single continuous variable. The other difference with the previous network implementation is that there is an extra constant, ‘dt’, which describes the amount of change per iteration, or step size. In this case every iteration should be a time step of 0.1. The next steps are like before, to create an update function that will update the network or the states. In this case, the internal state of the single node is updated each iteration. This is done by numerically integrating the ODEs using the Euler method (Euler 1845). First, the ODEs from equation1 are written as a separate function for convenience:

def deriv(S, I, constants): N = constants['N']

beta = constants['beta'] gamma = constants['gamma'] dSdt = -beta * S * I / N

(23)

dRdt = gamma * I

return dSdt, dIdt, dRdt

This function will be used in the actual update function, where the states of the node are updated every iteration by integrating over the ODEs. The actual update function retrieves all the states, which are in this case all only one continuous value, as there is only one node. Then the derivatives are calculated and multiplied by the step size, which are then added to the states of the current iteration. This function can then be added to the model in the same manner. By simulating the model 10000 times, the actual amount of iterations would be 100, due to the step size dt = 0.01. After simulating this, each iteration will have a node with 3 internal state, each state representing the amount of individuals in their respective compartment. When the internal states are plotted, they show the same result as shown in figure 5. As can be seen, the same general behavior is shown when using the ODEs as compared to the network diffusion version. The reason they have differences is due to the stochastic processes involved with the network version, where there is a random initialization of initial infected individuals, network creation and structure, and chance to move from compartment to compartment. The ODE version of the SIR model makes the assumption that an underlying network is fully connected. The rest of the code to run this simulation is shown below:

def update(constants): dt = constants['dt'] S = model.get_state('S') I = model.get_state('I') R = model.get_state('R')

dSdt, dIdt, dRdt = deriv(S, I, constants) return {

'S': S + (dt * dSdt), 'I': I + (dt * dIdt), 'R': R + (dt * dRdt) }

model.add_update(update, {'constants': constants}) its = model.simulate(10000)

3.2. School Segregation

The example discussed in this subsection is a variant based on the model discussed inMele (2013), this was chosen to demonstrate how to implement a model that relies on manual network updates using edge values, i.e. the weights between links. This agent-based model uses nodes to represent agents in a network formation game, while the network topology functions as the environment of the model. By simulating a network formation model, that is partially based on the network formation functionality described in section 2.3, network structures can be generated that resemble real-life scenarios. More information about dif-ferent network generation algorithms can be found in section6.1 and more literature on the

(24)

Figure 4: A snapshot of the visualization of the probabilistic SIR model. N = 1000, β = 0.4 and γ = 0.04. The random seeds are set to 0. It shows the propagation of the disease throughout the middle of the network towards the edges. The left figure shows the network at iteration 0 with 3 infected nodes. The right figure at iteration 14 shows how many nodes have become infected while some have recovered.

model can be found in section 6.2. By generating network structures that resemble real-life scenarios effects of policy interventions could be studied for example. The model that will be implemented has heterogeneous autonomous decision-making agents, where each agent has different characteristics, resembling real social networks. Each agent will sequentially revise their social connections inside the network to maximize their own utility based on their own and other agents’ attributes. The network is directed, which means that if an agent i forms a connection with agent j, then agent j does not necessarily have a connection with agent i. A combination of strategic and random network formation is combined by selecting a random agent i who meets another random agent j based on a similarity probability distribution, as shown in equation 2. Where the meeting m between agents i and j is represented at time t as mt = i, j and pmin > 0 such that all meetings are possible. The probability for meetings

is unbiased. Agent i now changes his connection with agent j in a meeting mt = i, j by either creating or removing a connection and thus updating gij. The choice of forming or

removing a link is made based on increasing the utility based on the characteristics of the agents. Each agent has three individual attributes: sex, race, and grade. The data used as input was taken from Add Health Wave I survey (Harris and Udry 2015) 3. For simplicity, every agent has full information on another agent that is selected for the meeting. The order of agents per iteration is determined randomly and every agent has a chance per iteration to form or remove a connection. This is different from the implementation of Mele (2013) for

3

(25)

Figure 5: The amount of nodes per compartment per iteration. N = 1000, β = 0.4 and

γ = 0.04. The random seeds are set to 0. It shows a comparison of the stochastic SIR network

model and the implementation using ODEs. As can be seen, even though the stochasticity has impacted the model, the same behavior is shown by both models. The lines with less opacity show the behavior of the model that uses ODEs.

the sake of faster computation.

P r(mt= ij|X) = _Pu(Xi, Xj) + pmin

i,ju(Xi, Xj) + pmin

(2)

As mentioned before, agents try to maximize their utility every iteration by forming or re-moving links with other agents. The utility an agent receives from forming a connection with another agent is defined in equation3. It shows the utility of an agent i for another agent j from a network g with population attributes X = (X₁, ..., Xn). The utility stems from direct

friendship, reciprocated friendship, and being indirectly linked to friends of a friend. To form and maintain a link, a certain cost is required, which prevents agents from simply forming connections with every other agent in the network to get maximum utility. The total net utility U of agent i is the sum of the net utilities received from equation3.

Ui(g, X) = n X j=1 gijuij | {z } direct links + γ n X j=1 gijgjiuij | {z } mutual links + δ n X j=1 gij n X k=1,k6=i,j gjkuik | {z } indirect links − d2_ic |{z} cost (3) i.i.d∼ N (0, σ2₎ ₍₄₎

(26)

Where u_ij = u(X_i, Xj) = exp(−b|Xi − Xj|) and can be seen as a similarity score. uij is

normalized to 1 if agents have complete identical attributes. The cost d2_ic consists of the

outdegree d_i of the agent and a constant cost c ∈ (0, 1). δ ∈ (0, 1) is the weight or importance of indirect links and γ > 0 is the weight or importance for connections that are mutual between agents. Before agent i updates his link with agent j, agent i receives an idiosyncratic shock to its preferences, which is used to model unobservable events that influence utility, which is shown in equation4. is assumed to be a Type I extreme value i.i.d. among links and across time. This shock causes agents to sometimes create less optimal connections, adding stochasticity to the model and may prevent it from reaching a local minimum as well. A link is formed from agent i to agent j if and only if:

Ui(gijt = 1, g_ijt−1, X) + 

t

1 > Ui(gtij = 0, gt−1ij , X) +  t

0 (5)

and agent i will delete or withhold from creating a link if:

Ui(gijt = 1, g_ijt−1, X) + 

t

1 < Ui(gtij = 0, gt−1ij , X) + t0 (6)

The total utility of the network g given the population X is the sum of each agent’s utility:

Q(g, X) =

n

X

i

Ui(g, X) (7)

To implement this model using DyNSimF, first the model should be configured to store adjacency and edge values using the MemoryConfiguration class. Afterwards the network can be created. In this example, the network is randomly generated using Numpy to create a network with an average degree of 5. Agents are connected to other agents with a uniform probability of p₀ = d0

n where d0 = 5. The following constants are set: δ = 0.05, γ = 0.65,

c = 0.35, B1 = 0.1, B2 = 0.1, B3 = 0.2, min_prop = 1000, σ = 0.035. The B parameters

influence how much different traits impact utility. The constant ‘X’ shown in the model was taken from Harris and Udry(2015) to get the traits of students from different schools. The first step is to create a matrix containing initial utility values based on the connections in the network, as well as an initial probability matrix containing the probabilities that agents might meet based on their similar attributes.

def initial_utility():

utility = np.zeros((constants['n'], constants['n'])) race = list(constants['X']['race']) sex = list(constants['X']['sex']) grade = list(constants['X']['grade']) for i in range(constants['n']): for j in range(constants['n']): weighted_diffs = \ [constants['B1']*abs(sex[i] - sex[j]),

constants['B2'] * (0 if grade[i] == grade[j] else 1), constants['B3'] * (0 if race[i] == race[j] else 1)] utility[i, j] = math.exp(-sum(weighted_diffs))

(27)

return utility def initial_prop():

prop = np.zeros((constants['n'], constants['n'])) utility = initial_utility()

# Loop over the person and their peers for i in range(constants['n']):

for j in range(constants['n']): if i == j:

prop[i, j] = 0 else:

prop[i, j] = utility[i, j] + constants['min_prop'] # Normalize

prop[i, :] = prop[i, :] / np.sum(prop[i, :]) return prop

constants['probability'] = initial_prop() constants['utility'] = initial_utility()

The functions shown above are executed and saved as constants which are later utilized by the model. Then, a function is written to directly calculate the utility of one node based on equation 3. This function is called when the utility of any node should be calculated. It looks at the direct links, indirect links, and mutual links to determine the net utility. The input of the function is an individual node, as well as the adjacency matrix of the network at the current time step.

def node_utility(node, adj):

utility = constants['utility']

# degree, connection gain and cost calculations d_i = adj[node].sum()

direct_u = np.sum(adj[node] * utility[node])

mutual_u = np.sum(adj[node] * adj.T[node] * utility[node]) # indirect connection gain

a = (adj.T.dot(adj[node, :]) * utility)[node] a[node] = 0

indirect_u = np.sum(a)

return direct_u + constants['gamma'] * mutual_u + \

constants['delta'] * indirect_u - d_i ** constants['c']

Finally, the last function that manually updates the network based on the meetings can be written. It randomly shuffles the order of the agents and then starts the meeting process for each agent based on the similarity between agents as shown in equation 2. For each agent, the utility with and without the connection is compared. If the utility with the connection is higher, then as explained in section 2.2.6, the connection is added and otherwise removed. When the constants have been added to the model, as well as the network update function using model.add_network_update(network_update, get_nodes=True), the model can be