The design of dynamic incentive mechanisms in agent-based simulation models from a sustainability viewpoint

(1)

The design of dynamic

incentive mechanisms in

agent-based simulation

models from a sustainability

viewpoint

(2)

Layout: typeset by the author using LA_TEX.

(3)

The design of dynamic incentive

mechanisms in agent-based

simulation models from a

sustainability viewpoint

Enno R. Kuyt 11869399 Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor MSc. X. Zhou dhr. dr. A.S.Z. Belloum Informatics Institute Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam June 26th, 2020

(4)

1 Introduction

The design of incentive mechanisms aiming to guide individuals to behave in a desired way is an important topic for any social system. Even institutional con-structs such as markets that let participants decide about their own transactions, typically require some higher authority enforcing deeds when expectations are not met. This third-party needs resources to design and carry out the policies, re-sources which are eventually collected from the stakeholders in the system. This kind of interactions between stakeholders and a third-party authority are often analysed utilising an evolutionary game theory framework. A common game is the Prisoner’s dilemma[1], because it very well captures the behaviour of the participants when they have to make uncertain decisions. Table 1 shows an exam-ple. In the game, two rational agents can choose to either cooperate or defect (work against the other). Hence, they are called cooperators and defectors respectively. The agents receive pay-offs based both on their own strategy and the strategy of their peers. To create incentives to cooperate, one could introduce rewards for co-operating and punishments for defecting. Previous research has provided insights in the efficiency of several incentive mechanisms under different situations.

Table 1: Basic example of Prisoner’s dilemma Agent 1Agent 2 _{C D}

C 1 T

D T 0

In ’Carrot or stick?’[2], it is shown that "punishment is a cheaper and more reliable way of inducing cooperation than is reward”. Nakamura (2019)[3] examined the ef-fect of risk attitude, showing that risk-averse individuals require significantly lower frequency of punishment to achieve a stable cooperation than risk-neutral and risk-prone individuals. Additionally, experiments considering punishment variants related to the way defectors get punished showed that both punishing one defector and punishing all defectors eﬀectively promote cooperation. This in contrast to punishing only defectors that exploit cooperators.

(5)

The above mentioned papers provide relevant contributions to the eﬀectiveness of diﬀerent policies in various situations. However, these papers have little attention to the sustainability of a certain policy. Yet a third-party authority has to gather enough resources over time to maintain its application continuously, so the sus-tainability of a policy should be regarded as an important consideration. Another point of attention that goes unmentioned in the literature, is that the policies are all static, meaning that they do not change over time. However, in a social sys-tem, the population distribution is usually subject to change due to the policy that the third-party authority is enforcing. Subsequently, the third-party authorities wealth is subject to change due to the changing population distribution. There-fore, it could be profitable for the sustainability of the third-party authority to enforce a policy that is not constant through time.

Hence, this research seeks to innovate by introducing dynamic policies: incen-tive policies that dynamically change over time, based on the distribution of the population. Thus, the aim is to explore the potential improvements of enforc-ing dynamic policies on the sustainability of a third-party authorities resources in agent-based simulation models. To ensure an holistic approach, also both the sustainability of the wealth of groups of cooperators and defectors as a whole, as well as individual agent wealth will be taken into account.

That provides some outlines to set evaluation criteria, so that the policies can be compared in a fair way. The most important evaluation criterion will be the sustainability of the wealth of the third-party authority. The sustainability of the wealth of the other parties involved (cooperators, defectors and average agent wealth) will be set as a secondary evaluation criterion. A final evaluation criterion will be the speed of the population evolution. The exact implementations of these evaluation criteria will be presented in the Method section.

The expectation is that dynamic changes over time will have a significant in-fluence on the sustainability of a policy. This is expected because this research intends to exploit the dependence that the sustainability of a policy has on the population distribution of a model. Furthermore, it is expected that under the dynamic policies, the cooperators as a whole will acquire more wealth, while the group of defectors will end up with less wealth. This expectation is provoked by the idea that as the third-party authority will acquire more wealth due to the dynamic policy, the group of cooperators will most likely benefit from this, whilst the group of defectors will probably have to compensate.

(6)

2 Method

The analysis framework that was utilised for this research, was that of replicator dynamics from the field of evolutionary game theory. This approach was chosen, because it proved to be very robust in the past, as shown by Smith et al. (1973)[4]. In replicator dynamics, large groups of rational agents repeatedly play Prisoner’s dilemma type of games. The games are played in rounds, where each agent plays the game exactly once per round. At the end of each round, the agents have a possibility to update their strategy, based on the pay-oﬀs received by themselves and their peers. Then naturally, the distribution of the population can alter over time. This is a suitable framework to compare between varying incentive policies because it nicely captures the development of an population through generations based on the pay-oﬀ received by individual agents.

The games played by the agents in this research were Prisoner’s dilemma games with the structure as shown previously in Table 1. However, there was a proba-bility z that a third-party authority intervened, granting cooperators rewards and punishing defectors. These rewards and punishments were variable and not only dependent on the agents choice itself, but also on the choice of the other agent. This meant there was a specific parameter for each situation. The reward for agents that both cooperated was denoted as Cc, the reward for cooperating while

the other agent defected Cd, the punishment for defecting while the other agent

cooperated Dc, and the punishment for both defecting was denoted Dd. These four

parameters will be referred to as the incentive values from now on. The resulting Prisoner’s dilemma structure is shown in Table 2.

Table 2: Prisoner’s dilemma structure for this research Agent 1Agent 2 _C _D

C 1 + Cc⇤ z T + Cd⇤ z

D T Dc ⇤ z 0 Dd⇤ z

To experiment with different policies, an agent-based simulation model was cre-ated. This is the most efficient way to create and track possibly thousands of agents for hundreds of steps, while all performing operations during those steps. Agent-based simulation models also allow for extensive documentation of the results, since data collection is available on all levels of a model. To explore the sustain-ability of a third-party authorities wealth under these varying policies, simulations have been run, experimenting with a different policy every simulation. Aside from

(7)

the incentive mechanisms defining the policies, the simulations have been identical to ensure objective comparison. A number of criteria was determined for later evaluation.

2.1 Evaluation Criteria

First and foremost, in this research, the sustainability of the wealth of the par-ties involved was measured in terms of the growth-rate of said wealth. A higher growth-rate corresponded to a more sustainable policy. The growth-rate of the wealth of the third-party authority was the most important evaluation criterion for this research. To provide more insight, and include all parties involved, the growth-rates of the total wealth of all cooperators, all defectors, and every indi-vidual agent were set as secondary evaluation criteria. A final evaluation criterion was the amount of steps a model had taken before a policy could induce a stable population of cooperators, starting from a population with only defectors. This criterion was set to determine the eﬃciency of a policy in terms of speed. This is relevant, because if two policies are equally sustainable, one would most likely prefer the policy that achieves a stable population of cooperators fastest.

One might argue that the population distribution should be an evaluation cri-terion, as it indicates whether a population demonstrates the desired behaviour. However, in this research, the population distribution was not used as an evaluation criterion, as each policy was designed to at least accomplish a stable population of cooperators. This method has been chosen because if a policy does not achieve a stable state of cooperators, that means that the agents do not behave as desired, and so the sustainability of the wealth of any of the parties is less meaningful.

2.2 Design

One agent-based simulation model outline was constructed, utilising the Python Mesa[5] package in Jupyter Notebook. Then, for each policy to be tested, a model could be initialised and run with the desired parameters. These parameters were the population size N, the starting population profile ¯xs, the parameter T , the

incentive values Cc, Cd, Dc and Dd, the third-party intervention probability z,

the tax value, and a string denoted whether a static or dynamic policy should be enforced. The meanings and values of all parameters can be found in Appendix B. To prevent complexity, and make a fair comparison between the incentive mecha-nisms, some of the initialisation parameters were fixed throughout all experiments, after testing to find sensible values.

(8)

Fixed Parameters

The first of these fixed parameters is the population size, which was set to 1000 after testing showed that this value exhibited the best trade-oﬀ between reliability and simulation speed. The following fixed parameter is the population distribution at the start of the simulation, or starting population profile. This parameter was fixed at 0, meaning that there would only be defectors in the population at the start. This was necessary to measure the speed with which the population as a whole changed from strategy, as the policies would only then be able to cause the distribution to shift to a population of cooperators through time. It also provided some insight in the other evaluation criteria during the transition from a popula-tion of defectors to one of cooperators.

Next, the parameter T as shown in Table 2. The sustainability of all parties was dependent on this parameter. However, it does not define the policy of the third-party authority, and could therefore possibly distort the results, thus it was chosen to fix the value of this parameter. This parameter was fixed at a value of 2, which was chosen because naturally it should be higher than what two cooperators gain when meeting, but setting it too high would be unrealistic. Utilising a value of 2 gave sensible initial results.

Finally, the probability z of third-party intervention. To let the simulations imi-tate a social system more realistically, there was a cost function for the third-party authority to intervene. Intervening more often meant more cost for the third-party authority, and so less wealth. To compensate this loss, agents were taxed. A higher intervention probability meant that the tax had to become higher as well. Thus, the probability z has significant influence on the sustainability of the wealth of the third-party authority. One could experiment with policies that dynamically change this probability z. However, this research focused on the incentive values of the policy, and not fixing this parameter would increase the amount of results in such extend that drawing any conclusion would become unconvincing. Therefore it was chosen to fix this parameter. If it would be fixed too low, the incentive values had to be set unrealistically high to achieve a stable population of cooperators. If it were set too high, the costs would get too high and the policies would not be sustainable for any of the parties involved. Initial tests showed that a probability of z = 0.35 gave credible results.

(9)

Model Steps

When a model was initialised, all agents were assigned a strategy based on the starting population profile, and the wealth of all parties involved was set to 0. It utilised a step function to acquire step-by-step information on the process.

Each step started by letting all agents play the game. It was ensured that ev-ery agent would play the game exactly once per step. The agents got randomly paired and received pay-oﬀ according to their strategy and whether or not there was a third-party intervention. After this, the agents got taxed and all relevant wealth parameters (such as total wealth, wealth per step, wealth of individual agents, etc.) were updated.

Next, the agents would update their strategy. Again, agents were randomly paired. This time, if they did not have the same strategy in the game played at that step, the cooperator would change its strategy with a probability (1 p), and the defec-tor with probability p, utilising p as shown in Equation 1, where Cw and Dw are

the average accumulated wealth for that generation per cooperating and defecting agent respectively.

p = 1

1 + exp (Cw Dw) (1)

This equation is a slight alteration to the equation used for calculating an imitation probability shown in research by Wardil and da Silva (2013), and proved to be a reliable way of calculating the probability of an agent changing strategy, based on pay-oﬀs that can be both positive or negative. Pseudocode of the step function of the model can be seen in Figure 1.

Figure 1: Step Function of Model

1: while Some agent has not yet played the game do

2: Pair two random agents and let them play game 3: Collect taxes and update wealth accordingly 4: end while

5: while Some agent has not yet updated their strategy do 6: Pair two random agents

7: if Agents have the same strategy then

8: Agents keep strategy

9: else

(10)

After the strategies had been updated, all relevant parameters were updated as well, and the model was ready for a next step. The models in the experiments were all run for 100 steps, as this proved to be suﬃcient for each policy to reach a stable population of cooperators.

A ’stable population of cooperators’ needs some elaboration. A mixed strategy

⇤ _{is a so-called evolutionary stable state (ESS), if the average pay-oﬀ of any agent}

deviating from this strategy is less than the average pay-oﬀ of an agent with strat-egy ⇤_{, given that the proportion of agents deviating is suﬃciently small. It means}

that if a certain mixed strategy was applied by the agents in the model, the popu-lation profile would never deviate from this mixed strategy, despite suﬃciently few agents occasionally deviating from this mixed strategy. This of course, given that the mixed strategy was an ESS. Because the desired behaviour for the agents was to cooperate, it was sought that the mixed strategy ⇤ _{= (1, 0)}_{(always cooperate)}

was an ESS, so that the population of agents would continuously cooperate. Then, a stable population of cooperators was achieved.

To be sure that the policies implemented would always end up in an evolutionary stable state, a mathematical analysis was performed. The analysis can be found in the appendix, but the most important derivation can be seen in Equation 2, that shows the conditions under which the mixed strategy ⇤ _{= (1, 0)}_{is an ESS.}

⇤ _{= (1, 0)} _{is an ESS if}

(

T 5 z(Cc+ Dc), if Cd+ Dd Cc Dc > 0

T < z(Cd+ Dd), otherwise

(2) The policies were designed to satisfy these conditions at all times. Fortunately, satisfying these conditions imminently meant the conditions for the mixed strat-egy ⇤ _{= (0, 1)}_{to be an ESS were not met. Thus, the policies ensured that if the}

population started with only defectors, it would not be stable and so deviate from that state.

To simulate suﬃciently few agents to occasionally deviate from the current mixed strategy, an epsilon function was implemented. This function would let a small proportion of the population randomly change strategy at each model step that concluded consisting of a population with a mixed strategy of either ⇤ _{= (0, 1)}_or ⇤ _{= (1, 0). At each step that this occurred, every agent got a probability of} 1

1000

(11)

2.3 Static Policies

Experiments were run with three static policies with increasing incentive values, to compare with dynamic policies that employed similar incentive values. The policies will be referred to as Low policy, Medium policy and High policy, with corresponding incentive values. For simplicity, each policy had one value which was equal across all incentive parameters. For the Low policy this value was 4, for the Medium policy it was 6, and for the High policy it was 9. For example, for the Low policy, Cc, Cd, Dc and Dd were all set to 4. Again, these values can all

be found in Appendix B.

2.4 Dynamic Policies

Then, experiments were run with the dynamic policies. Three policies were de-signed, which were referred to as Rewarding policy, Punishing policy, and Fully Dynamic policy. All models initialised with the incentive parameters set to 6, equal to that of the Medium policy. However this time, these incentive parameters were variable through time, and dependent on the population distribution. The idea was that higher incentive values would be more profitable when the population predominantly consisted of defectors, while lower incentive values would be more profitable when the population consisted of mostly cooperators.

For the Fully Dynamic policy, if the population distribution would be lower than 0.8, the incentive values would increase 1.5 times, while otherwise these would decrease 1.5 times. The Rewarding policy would only dynamically change the re-wards like this while keeping the punishments static at a value of 6. The Punishing policy would only dynamically change the punishments in this way, while keeping the rewards at a static value of 6.

The amounts with which the incentive values got increased and decreased were appropriately chosen to make them end up equal to one of the incentive values of either the Low or High policy, to provide a fair comparison between the static and dynamic policies. The choice for three separate dynamic policies was to ensure an holistic approach, by not only inspecting the effect of dynamically rewarding and punishing simultaneously, but also inspecting the effect of solely dynamic reward-ing and the effect of solely dynamic punishreward-ing.

For all policies, both static and dynamic, the models of 100 steps were run 10 times to assure trustworthy results and prevent random errors to occur. Also, some of the graphs utilised for visualisation will only display the first 40 steps of the model, because this suﬃced to capture all relevant information.

(12)

3 Results

The following paragraphs show the results. The results are presented per eval-uation criterion, starting with the sustainability of the wealth of the third-party authority. After that, the results covering the sustainability of the accumulated wealth of all cooperators, all defectors and all agents combined is presented. Fi-nally, results are shown that captured the amount of steps that had been taken by the models before a stable population of cooperators was achieved. In all figures, SL, SM, SH, DR, DP, and DF will denote the regarding policies. These stand for the Low, Medium, High, Rewarding, Punishing and Fully Dynamic policy respec-tively.

3.1 Third-party wealth analysis

Figure 2 shows the comparison between the wealth of the third-party authorities under the various policies. The total accumulated wealth of the third-party au-thority was plotted against the model steps for each of the policies. In the legend, the colours for each distinctive policy along with their corresponding growth-rate of the wealth of the third-party authority can be found.

(13)

It is shown that the Low policy, the Rewarding policy, and the Fully Dynamic policy had a similar growth rate when reaching a stable population of cooperators. Note how both the highest growth-rate, as well as the highest total accumulated wealth were achieved by enforcing dynamic policies, namely the Rewarding and Fully Dynamic policy respectively. The Medium, High and Punishing policy all had negative growth-rates, showing that these policies are not sustainable for the third-party authority.

3.2 Cooperators wealth analysis

Figure 3 shows the comparison between the wealth of all cooperators as a group under the various policies. Again, the total accumulated wealth is plotted against the model steps, and the legend denotes each policy along with its growth-rate. However, this time the figure only has attention for the first 40 steps, as the growth-rates had become stable after this.

(14)

As can be seen, all policies are sustainable for the cooperators, with the High, Medium and Punishing policy as most lucrative. The policies that were sustainable for the third-party authority, showed a lower relative growth-rate, but had very similar growth-rates between themselves.

3.3 Defectors wealth analysis

Next, Figure 4 shows the comparison between the wealth of all defectors as a group, in similar fashion as was done for the cooperators. Again, the figure only shows the first 40 steps. The accumulated wealth of the group of defectors ended up negative under every policy. Yet, the Low, Punishing and Fully Dynamic policy showed a positive growth-rate after the initial negative growth-rate when the population was still evolving. Thus, these policies were sustainable for the defectors as a whole once a stable population of cooperators was reached. Despite having negative growth-rates, the Medium and Rewarding policy resulted to be the least unprofitable after the first 100 steps of each model.

(15)

3.4 Average agent wealth analysis

The average wealth of all agents per step under the various policies can be seen in Figure 5. Naturally, the results were similar to those found in Figure 3, because the population consisted of mostly cooperators for the majority of the time. This figure only shows the first 40 steps as well.

Figure 5: Average Agent Wealth

The three policies that were sustainable for the wealth of the third-party authority, all three produced an growth-rate of exactly 1.5, meaning that these policies are equally sustainable from an average agent viewpoint.

(16)

3.5 Population evolution analysis

Finally, the evolution of the population under each policy can be seen in Figure 6. Here, the population distribution was plotted against the model steps. The legend shows each policy along with the average amount of steps the model took before a stable population of cooperators was achieved.

Figure 6: Population evolution per policy

All policies took approximately the same amount of steps to achieve a stable population of cooperators, except for the Low policy, which took significantly longer to reach this point. For example, with an average of 29.8 model steps, it took twice as long under the Low policy as under the Rewarding policy, which only took an average of 14.9 model steps.

(17)

4 Discussion

First, the growth-rate of the wealth of the third-party authority under varying policies will be evaluated. Despite the growth-rate of the wealth of the third-party authority under the Rewarding policy is the highest among these growth-rates, it is not significantly higher than the growth-rate of the wealth of the third-party authority under the Low policy. The minor diﬀerence in growth-rate between these two policies in the results could be induced by randomness. Thus, the dynamic policies in this research do not improve in terms of sustainability opposed to the most sustainable static policy.

In the following paragraphs, the growth-rate for all other parties involved will be evaluated. The High, Medium and Punishing policy are the three most sus-tainable policies regarding some of these groups. However, because these three policies are not sustainable for the third-party authority, the results of these poli-cies for the other groups are meaningless, and so are not discussed.

That leaves only the Low, Rewarding, and Fully Dynamic policy to evaluate re-garding the sustainability of the remaining parties. Surprisingly, the Low policy is at least as sustainable as both the Rewarding and Fully Dynamic policy, for all parties involved. Thus, the dynamic policies described in this research are not more sustainable for any party than the most sustainable static policy.

The final evaluation criterion that will be discussed is the amount of steps a model has taken before a policy could induce a stable population of cooperators. Inter-estingly, every policy takes similar time to achieve this, except for the Low policy, which is notably slower. Thus, the sustainable dynamic policies introduced are about two times as fast in reaching a stable population of cooperators opposed to the only sustainable static policy.

Before any final conclusion can be made, one must seek an explanation for the results as they are. An important question one might ask, is why the dynamic policies could not achieve a higher growth-rate of the wealth of any of the parties than the Low policy. The solution can be found in the mathematical analysis of this problem. All policies were designed to at least satisfy the requirements to end up in a stable state of cooperators. When this evolutionary stable state is achieved, the growth-rate of the wealth of the parties involved stabilises as well. Thus, because the dynamic policies only apply the exact same parameter values as can be found in the static policies, the dynamic policies can never achieve a higher growth-rate of wealth than the most sustainable static policy. However, a diﬀerence in total wealth can be made in the early generations, before the

(18)

evo-lutionary stable state is reached. Thus, despite that the dynamic policies do not make an improvement opposed to the most eﬃcient static policy from a sustain-ability viewpoint, they do make improvements regarding the total accumulated wealth per group, and the individual wealth of all agents.

To summarise: the Medium, High and Punishing policy are not sustainable for the third-party authority. The Low policy is at least as sustainable as the most sustainable policy for every party involved. The Rewarding and Fully Dynamic policy have similar sustainability as the Low policy for all parties involved. These two dynamic policies also have a faster evolution to reach a stable population of cooperators than the Low policy, and produce higher accumulated wealth for most of the parties involved than the Low policy. Thus, the dynamic policies proposed in this research do not improve the sustainability of the wealth of the third-party authority, nor do they improve the sustainability of the wealth of any of the other parties involved. However, the dynamic policies are able to outperform the static policies in terms of total accumulated wealth and evolution speed.

The results instantly spark ideas for future research. As said before, one could optimise existing policies by applying the dynamic changes in this research. Or, one could experiment with other forms of dynamic policies. A policy where the probability z of third-party intervention changes dynamically based on the popula-tion distribupopula-tion for example. Or a policy that dynamically changes the incentive parameter values based on the accumulated wealth of all parties instead of bas-ing it on the population distribution. To conclude, addbas-ing a dynamic factor to an incentive policy in a social system is a potential improvement on many levels, despite not increasing the sustainability.

(19)

Bibliography

[1] Anatol Rapoport, Albert M Chammah, and Carol J Orwant. Prisoner’s dilemma: A study in conflict and cooperation, volume 165. University of Michi-gan press, 1965.

[2] Christoph U Keller. Carrot or stick? Transfer, 110:1771–1779, 2009.

[3] Mitsuhiro Nakamura. Rare third-party punishment promotes cooperation in risk-averse social learning dynamics. Frontiers in Physics, 6:156, 2019.

[4] J Maynard Smith and George R Price. The logic of animal conflict. Nature, 246(5427):15–18, 1973.

(20)

(21)

Appendix A

Mathematical Analysis

To find the conditions for a mixed strategy to be an evolutionary stable state, some mathematical derivations are required. This starts by finding the fixed points of the model. A fixed point is a population that satisfies equation A.1.

8i( ˙xi = 0) (A.1)

In this research, the average pay-oﬀs per strategy are:

⇡(C,x) = (1 + Cc⇤ z)x1+ ( T + Cd⇤ z)x2 (A.2)

⇡(D,x) = (T Dc⇤ z)x1+ (0 Dd⇤ z)x2 (A.3)

Which brings the average pay-oﬀ for all strategies on: ¯

⇡(x) = ⇡(C, x)x1+ ⇡(D,x)x2 (A.4)

This gives the corresponding rates of change for each strategy:

˙x1 = x1(⇡(C,x) ¯⇡(x)) (A.5)

˙x2 = x2(⇡(D,x) ⇡(x))¯ (A.6)

Setting these equal to each other gives:

˙x1 = ˙x2 = x1(1 x1)[⇡(C,x) ⇡(D, x)] (A.7)

Setting Equation A.7 equal to zero will give all fixed points in the model. Trivially, it can be seen that Equation A.7 is equal to 0 under three conditions: x1 = 0,

x2 = 0or ⇡(C, x) ⇡(D, x) = 0. This third condition will produce the fixed point:

(22)

For the fixed point x⇤ _{= 0}_{, let s}⇤ _{= (0, 1)} _{and s = (p, 1} _p) _{6= s}⇤ _with

(0 < p5 1). This gives the following equations:

⇡(s⇤, s⇤) = Dd⇤ z (A.8)

⇡(s, s⇤) = p( T + Cd⇤ z) + (1 p)( Dd⇤ z) (A.9)

⇡(s⇤, s) = p(T Dc⇤ z) + (1 p)( Dd⇤ z) (A.10)

⇡(s, s) =p2(1 + Cc⇤ z) + p(1 p)( T + Cd⇤ z)

+ p(1 p)(T Dc⇤ z) + (1 p)2( Dd⇤ z) (A.11)

The mixed strategy s⇤ _{= (0, 1)} _{at fixed point x}⇤ _{= 0} _{is a Nash Equilibrium if}

⇡(s⇤_{, s}⇤₎ _{= ⇡(s, s}⇤₎ _{for every s = (p, 1} _p) _{6= s}⇤_{. Furthermore, if the mixed}

strategy s⇤ _{= (0, 1)} _{is a Nash Equilibrium and ⇡(s}⇤_{, s) > ⇡(s, s), then the mixed}

strategy s⇤ _{= (0, 1)}_{is an evolutionary stable state. This calls for the following two}

equations:

A.8 A.9 = p( T + Cd⇤ z) + p( Dd⇤ z)

= p(T z(Cd+ Dd))

(A.12) A.10 A.11 = p2(T 1 z(Cc + Dc)) + p(1 p)(T z(Cd+ Dd)) (A.13)

So, A.13 has to be larger than 0 for the mixed strategy s⇤ _{= (0, 1)} _{to be an}

evolutionary stable at the fixed point x⇤ _{= 0}_{. Further derivation of Equation A.13}

gives: p2(T 1 z(Cc+ Dc)) + p(1 p)(T z(Cd+ Dd)) > 0 p(T 1 z(Cc+ Dc)) + (1 p)(T z(Cd+ Dd)) > 0 p_{⇤ T} p p_{⇤ z(C}c + Dc) + T z(Cd+ Dd) p⇤ T + +p ⇤ z(Cd+ Dd) > 0 T p > p_{⇤ z(C}c + Dc Cd Dd) + p⇤ z(Cd+ Dd) (A.14) Because the value of p is set as (0 < p5 1), the following if/else statement can be derived to utilise as conditions for the evolutionary stable state:

⇤ _{= (0, 1)} _{is an ESS if}

(

T 1 > z(Cc+ Dc), if Cc+ Dc Cd Dd= 0

T 1= z(Cd+ Dd), otherwise

(23)

For the fixed point x⇤ _{= 1}_{, let s}⇤ _{= (1, 0)} _{and s = (1} _{p, p)} _{6= s}⇤ _with

(0 < p5 1). This gives the following equations:

⇡(s⇤, s⇤) = 1 + Cc⇤ z (A.16)

⇡(s, s⇤) = (1 p)(1 + Cc ⇤ z) + p(T Dc⇤ z) (A.17)

⇡(s⇤, s) = (1 p)(1 + Cc⇤ z) + p( T + Cd⇤ z) (A.18)

⇡(s, s) =(1 p)2(1 + Cc⇤ z) + p(1 p)( T + Cd⇤ z)

+ p(1 p)(T Dc⇤ z) + p2( Dd⇤ z) (A.19)

The mixed strategy s⇤ _{= (1, 0)} _{at fixed point x}⇤ _{= 1} _{is a Nash Equilibrium if}

⇡(s⇤_{, s}⇤₎ _{= ⇡(s, s}⇤₎ _{for every s = (1} _{p, p)} _{6= s}⇤_{. Furthermore, if the mixed}

strategy s⇤ _{= (1, 0)} _{is a Nash Equilibrium and ⇡(s}⇤_{, s) > ⇡(s, s), then the mixed}

strategy s⇤ _{= (1, 0)}_{is an evolutionary stable state. This calls for the following two}

equations:

A.16 A.17 = p(T Dc⇤ z) + p(1 + Cc ⇤ z)

= p(1 T + z(Cc + Dc))

(A.20) A.18 A.19 = p2(z(Cd+ Dd) T ) + p(1 p)(1 + z(Cc + Dc) T ) (A.21)

So, A.13 has to be larger than 0 for the mixed strategy s⇤ _{= (1, 0)} _{to be an}

evolutionary stable at the fixed point x⇤ _{= 1}_{. Further derivation of Equation A.21}

gives: p2(z(Cd+ Dd) T ) + p(1 p)(1 + z(Cc+ Dc) T ) > 0 p(z(Cd+ Dd) T ) + (1 p)(1 + z(Cc+ Dc) T ) > 0 p_{⇤ z(C}d+ Dd Cc Dc) + z(Cc+ Dc) + 1 T p > 0 T + p 1 < p_{⇤ z(C}d+ Dd Cc Dc) + z(Cc+ Dc) (A.22)

Because the value of p is set as (0 < p5 1), the following if/else statement can be derived to utilise as conditions for the evolutionary stable state:

⇤ _{= (1, 0)}_{is an ESS if}

(

T 5 z(Cc + Dc), if Cc+ Dc Cd Dd= 0

T < z(Cd+ Dd), otherwise

(24)

Appendix B

Tables

Table B.1: Notation Tabel

Cc, Cd, Dc, Dd ) Incentive values

Policy with low incentive values ) Low Policy / SL Policy with medium incentive values ) Medium Policy / SM

Policy with high incentive values ) High Policy / SH Policy with dynamic rewarding incentive values ) Rewarding Policy / DR Policy with dynamic punishing incentive values ) Punishing Policy / DP

Policy with fully dynamic incentive values ) Fully Dynamic Policy / DF Initial incentive value ) T

Population size ) N TPA intervention probability ) z

Starting population profile ) ¯xs

Table B.2: Fixed Parameter Tabel T = 2 N = 1000 z = 0.35 ¯ xs = 0 Low policy ) Cc = 4, Cd= 4, Dc = 4, Dd = 4 Medium policy ) Cc = 6, Cd= 6, Dc = 6, Dd = 6 High policy ) Cc = 9, Cd= 9, Dc = 9, Dd = 9 Rewarding policy ) Cc = (4_ 9), Cd= (4_ 9), Dc = 6, Dd= 6 Punishing policy ) Cc = 6, Cd= 6, Dc = (4_ 9), Dd= (4_ 9)