Indirect reciprocity and choice of donation to strangers : a simulation based study

(1)

Master thesis for Msc. Economics, University of Amsterdam

Student:

Li Ai

Student number:

10225897

Title:

Indirect reciprocity and choice of donation

to strangers: A simulation based study

Supervisor:

Dr. Aljaz Ule

(2)

Indirect reciprocity and choice of donation to strangers: A simulation-based study

Abstract:

Existing studies on indirect reciprocity have revealed that reputation plays a role in a person's decision on the degree of kindness to strangers. Image scoring, standing, and indirect punishment are criterion proposed to describe the mechanism in the literature. The article explores this topic by examining the evolution in numbers of players and respective payoffs in a helping game of five strategies, namely defector, altruist, cautious defector, rewarder and punisher. Results from simulations show that the defector strategy is nonetheless the best in various scenarios.

(3)

Indirect reciprocity and choice of donation to strangers: A simulation-based study

The concept of reciprocity is deeply integrated in human social life. There are two kinds of reciprocity: the direct reciprocity and the indirect reciprocity. The former is known as the behaviour of helping someone who may later help oneself in return, while the latter is slightly different in form. In the case of indirect reciprocity, when one helps, one does not really expect a return from the current recipient, but hopes that later the similar sort of help might come from someone else. Indirect reciprocity, among all cooperative behaviours, is certainly interesting to an observer of human behaviour. As a way of decision making, it is prevailing in human society (Alexander, 1979) throughout time. In everyday life, we see people help strangers out of their troubles, of which the scale can range from little favours such as pointing out of the way, to significant events such as saving someone's life. It has considerable impact on humans. Even the moral system could, according to researchers, be seen as the system of indirect reciprocity (Alexander, 1987). Some also state that the emergence of indirect reciprocity is “a decisive step for the evolution of human societies” (Nowak and Sigmund, 1998, 573).

The reason for the popularity of indirect reciprocity and its divergence from direct reciprocity are worth exploring. It could really be mysterious to a calculated mind. Unlike in the case of direct reciprocity where the parties involved help each other in turns, the helper here cannot expect a guaranteed repay in the future from her current recipient. In other words, when one makes the decision to spend one's resources on someone in need, the expense could hardly be recovered from the person receiving it, whereas in the case of direct reciprocity, help is in fact an investment in the individual recipient who will return the favour in the future. Admittedly, under some

(5)

circumstances, the helper may reasonably entertain the hope of being treated the same way someday by someone else, e.g. living in a very small and closed community. However, this is rarely to be expected. The horizon of one's life is usually too broad to run into a stranger twice, and sometimes the helper herself will not even need the same sort of help she has offered, as in the case of donation from the rich to poverty assistance programmes. The question of how we should explain such kind of cooperative behaviour is fair to be raised.

To explore the underlying mechanism of indirect reciprocity, we need to understand the altruism of human society from the evolutionary perspective. There have been theories seeking to interpret indirect reciprocity from this perspective, developed on the basis of analytical models, computer simulations, and lab experiments. Nowak and Sigmund (1998) provided a theoretical framework to explain indirect reciprocity centred on the concept of image scoring. The theory focuses on reputation assessment among social group members, which is embodied in an image score given by others to every member of the community. The image score of a member expresses her “value” as a function of past altruist behaviour. If the member is seen to help a fellow group member, her image score will see an increase; if she does not, her image score will decrease accordingly. Understandably the society will favour those who have a history of cooperating with others, so a person who has not helped in previous time will not be receiving help from the new partners. Computer simulations are presented to further specify the conditions for indirect reciprocity to be evolutionarily stable.

This framework, albeit adequately explanatory, is drastically simplified. Leimar and Hammerstein (2000) point out an inherent weakness of the image scoring theory, that it

(6)

may backfire on the donor. It fails to suit the strategic interest of the donor who should by assumption be a rational player and will not ignore what herself can attain. If she meets someone with a low image score, the image scoring theory dictates refusal on her side to cooperatively offer help. Then her own image score will also decrease, which in the next round will in turn put herself in a less advantageous position to receive help from another member. Hence, to improve the framework they introduced an indicator of reputation, “standing”, offered in the literature (Sudgen, 1986). Standing as a strategy evaluates the motivation for the action the individual has displayed. Analytically, a person, endowed with a good standing from the beginning of the game, will be able to keep it if having helped someone also with a good standing and lose it if not, but will still maintain the good standing if having not helped someone with a bad standing. This strategy is supported to a certain extent by experimental data (Wedekind & Milinski, 2000), and by the computer simulations offered by Leimar and Hammerstein, showing its advantage in robustness to the image scoring strategy.

Further modification following this path is made by Ule, Schram, Riedl and Cason (2009). Another option “indirect punishment” is added. Therefore, the options available to the members are now not confined to “pass” or “help” depending on the recipient's reputation, but to punish as well. An experiment is designed and conducted in the form of “direct helping game”(Wedekind & Milinski, 2000; Seinen & Schram, 2006), where participants are randomly paired as donor and recipient. The donor has to choose from the three actions, i.e. to help, to pass, and to punish. Both the recipient's action history as a donor in previous rounds and the information about her opponent, are available to the current donor. That is to say, the donor has access to the details of what and why the recipient has chosen in the past. The researchers identified types of strategies among the

(7)

participants. Depending on the criterion self- or other-regarding, and discriminate or indiscriminate, the most common behavioural types in the game are recognised as being defectors, altruists, cautious defectors, rewarders and punishers. The latter two could be further divided into image or standing rewarders/punishers according to how they make use of the second order information.

It is discovered that if unkind strangers cannot be punished, defectors earn most, but if they can, then it is image rewarders that earn most. In the control group where punishment has no material influence on the recipient, the defector enjoys the highest average payoff, while in the treatment group where the donor can choose to hurt the recipient, those who always defects earns significantly less compared to the control group. The cautious defectors, image rewarders and standing rewarders are more successful than the indiscriminate defectors.

The aim of this paper is to further investigate indirect reciprocity by offering computer simulations based on the experiment of Ule et al. (2009). The two methods, simulation and experiment, naturally compensates for each other. Simulation allows the game to be carried on in a highly controlled, uninterrupted environment, though often greatly simplified under strict assumptions. In contrast, the lab experiment provides the opportunity to see how it works on a group of human subjects in a regulated real life scenario. The literature listed above also adopted the two methods in the research. Thus this paper will focus on contributing to the knowledge of indirect reciprocity from the perspective of reputation building by exploring it using an alternate methodology.

(8)

this paper will further modify the model and introduce a learning strategy on the basis of the experiment framework. Richerson and Boyd (2005) posit that imitating the most successful in one's life circle is important in the real world. Bravo (2008) has further documented through several simulations that learning is the mechanism most likely leading to the spread of cooperative behaviour. Thus this paper will add to the agents the ability to learn and adjust behaviours. Before each round and the interaction with others begin, the agent will “look around” and, if finding that herself is not the most successful one, copy the strategy of the most successful neighbour. This means that the agent will adopt the strategy of the neighbour with the highest payoff, as long as the highest payoff obtained by the neighbour is still higher than own score. After that, the agent will start the next round of interaction as this newly modified type of player.

Furthermore, the aforementioned model with learning behaviour integrated will be considered in two scenarios. One is to have the agent use this strategy from the very beginning of the simulation run, i.e. from Round 1. The other is to include it not immediately but at a later point in time, which is set as Round 10 here. That is to say, for the first ten rounds, the agents will interact in the same manner as previously without learning. Then from the 11th_{round on, they start to look around before every} round and copy the one in the neighbourhood that looks doing best. This allows the agents some time to build up the reputation and thus gives everyone a better view on degree of success. The reason for introducing such a simulation setting is to take a step closer to reality. There is a possibility that some strategies turn to be competent in the long run which however will not be immediately seen after some time, and it is commonly seen in real life that certain strategies are successful in the long run but not so much in a short while.

(9)

The paper is structured as follows. Section 2 will introduce the game to which the participants will be subjected when interacting with each other. Section 3 will describe the simulation model in details such as the features of the world and strategies programmed. The results will be presented in Section 4, being compared to that which was acquired in the aforementioned lab experiment (Ule et al., 2009). Section 5 will be a discussion on the indication and limitation of this study.

2. The model

In this section, the structure and components of the model will be introduced from three aspects. In the first sub-section the virtual world where interactions take place will be briefly described. Section 2.3 presents the strategies which agents will adopt, and in the end details about the interaction between agents will be explained.

2.1 The helping game

The game used here is the indirect helping game, in which players are randomly and anonymously paired and assigned the role of donor and recipient. The donor has three options: to help, to pass, or to punish. The recipient only needs to accept the donor's decision. The payoff structure is set in proportion to what it is in the treatment of aforementioned experiment of Ule et al. (2009) (hereafter the experiment), in detail described as follows. If the donor chooses to help, the recipient's payoff will increase by 5, while the donor's decrease by 4. If the donor decides to pass, then no change in payoff will occur on both sides. If the donor exercises “punish”, the recipient's payoff will

(10)

decrease by 5, and the payoff of donor herself will also decrease by 1 as a cost of putting punishment on the recipient.

Table 1 Change in the payoffs of the pair by Donor's choice of actions

Donor's choice of actions

Help Pass Punish

Change in Donor's payoff 5 0 -5 Change in Recipient's payoff -4 0 -1 2.2 The world

The simulation model is constructed on the platform Netlogo 5.0.5 (Wilensky, 1999) (See the Appendixes for more details of the code and the interface). The world is shaped in the form of square with a 25-cell side, containing 625 cells in total. The n agents, which are called “turtles” in Netlogo, will interact for t rounds that make up the whole simulation run. At the very beginning of a simulation run, the agents will be created and randomly distributed on the regular lattice surface. The performance of the agents is assessed by the score they carry, of which the further details will be explained in the following sections.

2.3 The strategies

(11)

mentioned previously, there are seven strategies identified among the lab experiment subjects. They differ on whether taking other's interest into account, whether being indiscriminate towards anyone regardless of their history, and furthermore whether having utilised the second order information. Nevertheless, in this study, we will omit the last criteria and not distinguish between imaging and standing rewarders and punishers. It is firstly because the world described above will be crowded with seven types of players . Also, the merging of the two subcategories “imaging” and “standing” will not change substantially how indirect punishment will play a role. Therefore, the types of players in this model are listed as follows: 1) the defector who will never help, 2) the altruist who on the contrary will always help, 3) the cautious defector who will only help when it is necessary in order to keep her own reputation at a good level, 4) rewarder who will cooperate when the recipient has a good reputation, and 5) the

punisher who will cooperate on the same condition as the rewarder's but will punish

when the recipient has a bad reputation. Note that all strategies are strategies of the donor's. The type of the recipient in the pair is not significant to the present interaction, because an agent, when being in the position of recipient, only has to wait for and accept the decision from the donor, and then it will incur a change in her score.

Details about how the five strategies are operationalised in the coding need to be mentioned. For the first two indiscriminate strategies namely defector and altruist, there has not been much complexity in coding, owing to their unconditional response to all kinds of players and disregard to the recipient's history. Neither are the other-regarding discriminate players the rewarder and the punisher difficult to describe – the donor only needs to integrate the first order information on the recipient's history into the decision making process. As to the cautious defector, however, such players by definition will

(12)

only cooperate to maintain the look of a good reputation in order to attract help from discriminate donors in the future rounds. The help they offer from time to time is the result of a total self-regarding calculation. In order to describe such behaviour in the programming, it is important to define the threshold of time when the player feels the risk of gaining bad reputation by having not been helping and thus the necessity to help. For the convenience of coding, it is assumed here that 50% is the threshold percentage of times with help offered for the cautious defector to keep a good reputation. Therefore in the model, the cautious defector is set to cooperate every other round, starting with cooperating.

2.4 The interaction

At the beginning of each simulation run, the agents of all strategies will be created respectively in the number set on the slide bar on the interface. They will also be assigned an initial payoff of 5.

The steps of interaction are as follows. The agents will randomly turn and walk around to be partnered up with another agent. Some agent may not be able to find a partner in a certain round. Then the system will randomly pick one of the pair to be the donor, the other automatically becoming the recipient. The donor will act according to her own strategy and the recipient's reputation, and accordingly the scores of both sides change. After that, the partnered agents release each other, which concludes this round, and start another round by repeating the procedures above. For every agent, the role of donor or recipient will be cleared after each round, while the score could be taken into the next round and accumulated, depending on the setting. Specifically in the scenario integrated with learning, before all above takes place, the agents will look around before each

(13)

round, and copy the strategy of one of the eight neighbours that bears the highest score.

The total score earned by each strategy will be recorded by summing up the individual score of everyone who is this type of player and gets to participate (i.e. successfully paired) in the very round. The total number of interactions that every strategy has in every round will also be counted, and thus the average score of each strategy could be attained both overall of all rounds and upon each individual round, which will be seen in the plotting diagram. In this way, we are able to see how well the strategies do throughout time.

3. The simulation

In this section two scenarios are investigated. The first one features the experiment (Ule et al., 2009) which is to have agents interact with each other according to their own strategies, and to see the dynamics among the population and the plotted average payoff of each strategy group. The second scenario further adds skill of learning to the agents on basis of the first one, which means that the agents will look around at all the neighbours before the interaction of every period, then switch to the strategy of the neighbour bearing the highest score. We will assign 100 agents to every strategy of the five, which are defector, altruist, cautious defector, rewarder and punisher.

(14)

3.1 Simulation of the original experiment

3.1.1 Average payoff per strategy in each round

Firstly, we will simulate the situation as is described by the experiment. Figure 1 below plots the average payoff of each strategy with respect to each round.

The average payoff of a strategy in one round = (total payoff earned by all agents of this strategy in this round) / (number of players using this strategy in this round)

It means that the participants will not keep their payoff on them accumulated round after round, and that the number of interactions having taken place of each strategy will be reset to zero before a new round begins. Figure 1 depicts the average payoff per strategy in 100 rounds.

According to the results shown below, the detector turns out to be the strategy that is best off among the five, while the other four appear not having obvious advantages above each other. The average payoff earned every single round by detectors rises steadily above the other four as early as after ten rounds. Meanwhile, the earnings of altruists, cautious defectors, rewarders and punishers share a similar trend of progressing, the four plotted lines intertwined with each other and the scores varying slightly without a significant rise. In 30 rounds, the average payoff of the defectors hits as much as about 90. Comparing to the scores of the other four which rise modestly to 30 approximately, the defector strategy displays an advantage, which is also illustrated in the growing pace. This pattern continues, that the average payoff of the defector strategy keeps rising up to around 150 by the end of the 50th round and 320 the 100th

(15)

round. Meanwhile, the average payoffs of altruists, cautious defectors, rewarders and punishers increase as well during the time to about 30 in 50 rounds and 40 in 100, but still fail to prosper as much as the defectors.

3.1.2 Average payoff per strategies overall

The average payoff of the five strategies overall is shown in Figure 2. Here, not only is the accumulation of scores introduced, but the total number of interactions every strategy has had will be added up round after round respectively in the same manner as well. In other words, both payoff and number of interactions of previous rounds will be added to that of the present round according to the strategy type, which offers the view on the progress of the five strategies in the long run.

Figure 1 Average payoff per strategy in each round simulating the experiment (100 rounds)

(16)

The average payoff of a strategy overall = (sum of payoff earned by all agents of this strategy so far) / (number of players using this strategy so far)

From Figure 2 below, we can see that the average payoff over the whole time shares the trend more or less regardless of strategies, though in different scales. Consistent with the result in Figure 1, the strategy of detector is always earning more than the other four strategies, and the four other strategies almost share path over time.

This characteristic is already visible when the iteration turns 10. After a drop in the very beginning at Round 2, the average payoffs overall of all five strategies decrease mildly. After 30 rounds, the five lines of average payoff overall turn out to be level, and this tendency is kept on to the point of 50 and further 100 rounds.

Figure 2 Average payoff per strategy overall simulating the experiment (100 rounds)

(17)

3.2 Learning behaviour integrated

In this section, the agents will not simply stick to the strategy that has been assigned to them at the very beginning of the game. Rather, their strategies will evolve by copying the most successful one around them before the new interaction begins. In this light, a model with an adjustment on the previous version including this learning behaviour presents two scenarios. Briefly, in the first one, the agent starts to use this strategy from the first round, while in the second the agent adopts it from Round 10 onwards. The plotting will still be based on the two perspectives of the previous section, namely average payoff per strategy both in each round and overall.

3.2.1 Learning behaviour adopted from the beginning

In this sub-section, the scenario presented is that agents start to look around for better strategies. The result is not so much at variance with the previous section. As could be

Figure 3(a) Average payoff of strategies per round with learning adopted immediately (10 rounds)

(18)

seen in Figure 3(a), (b), (c) and (d), the defector strategy is still the best. The four other types of agents besides defectors, namely altruists, cautious defectors, rewarders and punishers, extinct quickly when learning by imitation is introduced, which is also why only 20 rounds are included here. Moreover, altruists, rewarders and punishers all cease to exist around round 2, while cautious defectors survive for two more rounds . The defectors' average payoff each round is still kept on the highest level among the five generally, while the strategy's average payoff in all ten rounds declines overall.

Figure 3(b) Average payoff of strategies per round with learning adopted immediately (20 rounds)

(19)

Figure 3(c) Average payoff of strategies overall with learning adopted immediately (10 rounds)

Figure 3(d) Average payoff of strategies overall with learning adopted immediately (20 rounds)

(20)

3.2.2 Learning adopted after a period of time

Figure 4(a) and (b) shows the change in average score by strategy in twenty rounds per round and overall. The first ten rounds is no different from Section 3.1, where the agents interact with each other under the experiment conditions (Ule et al., 2009) without changing strategies during the time. We could observe the similar trend from the first half of time in Figure 4(a) and (b). From the 11th_{round on, the difference is obvious.} The average scores of the four strategies other than the defector see an fluctuating increase in each round, although in (b) it could be seen that the average score overall of the defectors is still leading and even experiencing a short increase at the beginning of the second ten rounds. This means that the rise in average score per round of altruists, cautious defectors, rewarders and punishers may just result from the decrease of the number of agents of the corresponding type, which means that more agents are taking on the defector strategy. This could be confirmed in both figures, that the agents of the aforementioned four strategies are seen to extinct around the 15th_{round. In (a) the lines} ends and in (b) the average scores become zero. Therefore, the defectors are the best-off.

(21)

Figure 4(a) Average payoff of strategies per round with learning introduced after 10 rounds (20 rounds)

Figure 4(b) Average payoff of strategies overall with learning introduced after 10 rounds (20 rounds)

(22)

4. Discussion

4.1 Results

To sum up, the defector strategy turns out to the best one among the five according to the series of simulations in this study. The result attained from simulations is not similar to that from the experiment (Ule et al., 2009). In the treatment of the original experiment, the winning strategies are the cautious defectors and rewarders (both image and standing ones), while in the controlled condition where there exists no substantial cost for punishing, the indiscriminate defector proves to be the winning strategy. The variance reveals the impact of the option of punishing. When being punished is possible in the players' horizon, indiscriminately defecting turns out to be a less desirable way to behave as it is in the classic game, or as we can see in the control group. However, the first part of the simulation formulated in the style of the experiment (Section 3.1) shows that the defector strategy is uniformly the advantageous one regardless of average payoff gained either on every round or overall, while other strategies yield much lower payoffs and see no significant difference between each other.

After the integration of learning strategy into the second part of simulation (Section 3.2), this tendency is the same. The advantage of the defector strategy remains obvious. As was seen in Section 3.2, agents of strategies other than the defector will die out when every agent learns to follow the strategy of the most successful neighbour before every new round. Whether the learning behaviour is taken into the series of interactions initially or after a certain length of time does not make a significant impact on the outcome.

(23)

The reason that the other four kinds of agents die out soon is not strange based on the result of the first part of simulation (simulation of the experiment). If defectors are already possessing the advantage, the learning strategy will only accelerate the process that defectors take over the whole population, by letting agents learn from their most successful neighbour, should this neighbour outperform the agent herself. Therefore, it is much faster to see the success of defectors.

4.2 Possible reasons for difference in results between experiment and simulation

4.2.1 Limitation of simulation

The difference in outcomes between the simulations above and the experiment (Ule et al., 2009) could partly be explained by the difference in the environments of the two studies. The computer modelled world is highly organised and well controlled. Even with the same payoff structure and similar principles that dictate actions, a group of people and a set of virtual agents can exhibit different behaviours in time. Some factors have an influence on decision making, and they can be a challenge to express in simulation. For example, misreading information, personal disposition, and so on.

Specifically in this simulation study, there are many factors greatly simplified for the convenience of coding. First of all, in the case of cautious defector, it is supposed to be a self-regarding strategy that will only offer help at times for the sake of maintaining good reputation (i.e. making self “look good”) but not for altruism. Here in this model a cautious defector is set to defect every other round for the times being selected as donor. She will defect at first, then in order to keep up the good reputation, will choose to help

(24)

the next time being a donor. Undoubtedly it is much more complicated for a person to decide whether she is in the danger of losing good reputation, therefore this simplified way of coding could have impact on the results.

Also, in the section where learning strategy is introduced, we could not run many rounds, because as can be seen on the graphs, participants of strategies other than defectors die out quickly. This small number of rounds may make the results different from if able to run dozens or even hundreds of rounds.

Thirdly, regarding how agents find partners in this model, there could be a more accurate way. In this model the agents will randomly move forward a certain length and check whether there is an unpartnered agent in the neighbourhood. If so, the two will partner up. If not, this agent will stay inactive in this round. While this is more like real life where not everyone makes such decisions together all the time, it is unlike the situation in the lab, where all participants are actively making decision always. This could also be a reason for the difference in results between this simulation and the experiment.

4.2.2 Pre-assigned strategies versus identified strategies

The difference in logic of this simulation and the experiment could also be responsible for the discrepancy in results. The simulation has given each agent a strategy at the very beginning. The agents have clear instructions on how to behave, and they strictly carry it on in the well-controlled computerised world. In the experiment, however, participants are not asked in advance to behave in a certain way, i.e. they have not been

(25)

pre-assigned strategies. They are simply required to behave as they like and given the payoff table. Then the researchers manage to identify different type of players among the participants by statistical means, and recognised several strategies. Since the strategies are a post-experiment finding, it means that the participants could not be expected to behave consistently according to the definition of each strategies in the experiment. In addition, there are a small percentage of participants in the experiment that could not fall under any category of strategies. The results obtained from such different designs, even using the same strategies and payoff structure, could easily be different from each other.

5. Conclusion

This paper studies indirect reciprocity in a simulated environment, by asking agents of five strategic types (defector, altruist, cautious defector, rewarder and punisher) to interact with each other in a helping game. Two scenarios are introduced. The first one is a standard scenario just like the experiment (Ule et al. 2009), where the agents are interacting in pairs for rounds. The second scenario implants a learning strategy on the agents making them switch to the strategy of their most successful neighbour. The result shows that defectors perform the best among the five strategies, and with the learning strategy the advantage of defectors is even more clear.

This exploration of helping behaviour to strangers also offers further opportunities in the future for a better understanding of indirect reciprocity in the following aspects. Firstly, more strategies could be added. On the basis of the type and amount of information that players use in the experiment, rewarders and punishers are further divided into image rewarder, standing rewarder, image punisher and standing punisher.

(26)

This simulation study has included rewarder and punisher, but did not distinguish between the image and standing types. Also, some classic strategies such as tit-for-tat could also be included, although having not been observed in the experiment. Secondly, some coding details could be re-defined. The cautious defectors could use a more sophisticated method when determining the right time to help in order to keep up good reputation, instead of helping every other round. Also, the way to pair up agents here makes it possible to leave some agents unpartnered in certain rounds, which could have impact on the outcome. Future work could also broaden the context of economic interactions by including factors such as social prejudice, the emergence of language capabilities, and distorted information in the transmission process.

(27)

References

Alexander, R.D (1979) Darwinism and Human Affairs, University of Washington Press, Seattle, WA

Alexander, R. D. (1987) The Biology of Moral Systems, Aldine de Gruyter, New York

Bravo, G. (2008) Imitation and Cooperation in Different Helping Games, Journal of Artificial Societies and Social Simulation, 11(1), 8,

http://jasss.soc.surrey.ac.uk/11/1/8.html

Leimar, O. and Hammerstein, P. (2001) Evolution of cooperation through indirect

reciprocity, Proc. R. Soc. B. 268, pp. 745–753

Nowak, M.A. and Sigmund, K. (2005) Evolution of Indirect Reciprocity, Nature, 437, pp. 1291-1298.

Richerson, P.J. and Boyd, R. (2005) Not by Genes Alone: How Culture Transformed

Human Evolution, The University of Chicago Press, Chicago, IL.

Seinen, I. and Schram, A. (2006) Social status and group norms: indirect reciprocity in

(28)

Sugden, R. (1986) The Economics of Rights, Co-operation, and Welfare, Blackwell, Oxford

Ule, A., Schram, A., Riedl, A., Cason, T. N. (2009) Indirect Punishment and generosity

toward strangers, Science 326, pp. 1701-1704

Wedekind, C. and Milinski, M. (2000) Cooperation through image scoring in humans, Science 5:288, pp. 850-852

Wilensky, U. (1999) NetLogo, Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL. http://ccl.northwestern.edu/netlogo/

Wilensky, U. (2002). NetLogo PD Basic Evolutionary model, Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

http://ccl.northwestern.edu/netlogo/models/PDBasicEvolutionary

Wilensky, U. (2002). NetLogo PD N-Person Iterated model, Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

(29)

Appendix 1. Interface of the model

Image (a) is the interface of the Netlogo model employed in this paper. Chief elements included are listed as follows: the buttons “setup”, “go” and “go once”, the sliders “n-defector”, “n-altruist”, “n-caudefct”, “n-rewarder” and “n-punisher”, the plot, and the view of the world on the very right.

Their respective functions should be briefly explained. The three buttons, which are procedures coded, control the running of the model. The five sliders allow the number of agents vary according to the need of research, which, in this case, means that for each strategy, it is possible to set its number of agents between 0 and 100. The plot is a automatically re-sized one that shows how average payoff change over time, where the result of simulation is obtained. The view of the world gives a direct view of the interaction of the agents in different colours that stand for different strategies.

It is also worth noted that the drawing of the plot is coded in the setting of the plot, but not in the main body of codes (see Appendix 2), as is shown in Image (b).

(30)

Image(a) Interface

(31)

Appendix 2. Main body of codes with brief explanation

The codes must be included in order for re-examination and tests on recreating the result. Because various simulations have been run in this study, and thus the setting of simulation is modified each time according to the scenario, it will also be mentioned in the explanation if a certain paragraph is active or de-activated owing to situation. Explanations will be inserted after semicolon(s) by the sentences.

Part 1 Define the variables

The following codes establish the global variables for number of turtles (agents) with each strategy, number of interactions by each strategy, and total score of all turtles playing each strategy.

globals [ num-defector num-altruist num-caudefct num-rewarder num-punisher num-defector-games num-altruist-games num-caudefct-games num-rewarder-games num-punisher-games defector-score

(32)

altruist-score caudefct-score rewarder-score punisher-score

]

The following defines the variables only applying to the agents, here known as “turtles”.

turtles-own [

score strategy

donor? ;;donor or recipient? Partner-defected? ;;action of the partner partnered? ;;am I partnered?

Partner

;;WHO value of the partner (nobody if not partnered)

previous-helped? ;;action of self, whether I have helped in the previous round

defect-now? ;;whether I defect in this round

partner-history ;;a list containing information about past interactions with other turtles (indexed by the rounds)

]

Part 2 Set-up Procedures

(33)

on the interface will refer to the run of this part.

to setup clear-all

set-default-shape turtles "person" ;;agents now in the form of person

store-initial-turtle-counts ;;record the number of turtles created for each strategy

setup-turtles ;;set up the turtles and distribute them randomly

reset-ticks end

;;record the number of agents of each strategy according to the slider value

to store-initial-turtle-counts set num-defector n-defector set num-altruist n-altruist set num-caudefct n-caudefct set num-rewarder n-rewarder set num-punisher n-punisher end

;;set up the turtles and distribute them randomly to setup-turtles

make-turtles ;;create the appropriate number of turtles playing each strategy

setup-common-variables ;;sets the variables that all turtles share end

(34)

assign colours to make-turtles

crt num-defector [ set strategy "defector" set color brown - 2 ] crt num-altruist [ set strategy "altruist" set color red ]

crt num-caudefct [ set strategy "caudefct" set color lime + 1 ] crt num-rewarder [ set strategy "rewarder" set color yellow -1 ] crt num-punisher [ set strategy "punisher" set color blue ] end

;;set the variables that all turtles share to setup-common-variables

ask turtles [

set score 5 ;;set the initial score of a turtle to be 5 so as to avoid negative score

set partnered? false set partner nobody set donor? false

setxy random-xcor random-ycor

]

setup-history-lists ;;initialize PARTNER-HISTORY list in all turtles

end

;;initialize PARTNER-HISTORY list in all turtles to setup-history-lists

let num-turtles count turtles

let default-history [] ;;initialize the DEFAULT-HISTORY variable to be a list

(35)

histories

repeat num-turtles [ set history (fput false default-history) ]

;;give each turtle a copy of this list for tracking partner histories

ask turtles [ set partner-history default-history ] end

Part 3 Runtime Procedures

This part is responsible for the running when the “go” button is pressed.

to go

clear-last-round

ask turtles [evolve-strategy]

ask turtles [ partner-up ] ;;have turtles try to find a partner ask turtles [ if score < 0 [set score 5]] ;;turtles with a negative score will be given the initial score 5, deemed "reborn" let partnered-turtles turtles with [ partnered? ]

ask partnered-turtles [ pick-role ]

ask partnered-turtles [ select-action ] ;;all partnered turtles select action

ask partnered-turtles [ play-a-round ] do-scoring

tick end

to clear-last-round

(36)

set num-altruist-games 0 set num-caudefct-games 0 set num-rewarder-games 0 set num-punisher-games 0 set defector-score 0 set altruist-score 0 set caudefct-score 0 set rewarder-score 0 set punisher-score 0

let partnered-turtles turtles with [ partnered? ] ask partnered-turtles [ release-partners ]

ask turtles [ set donor? false] end

;;release partner and turn around to leave to release-partners

set partnered? false set partner nobody rt 180

end

;;have turtles try to find a partner to partner-up ;;turtle procedure

if (not partnered?) [ ;;for those who have not been partnered

rt (random-float 90 - random-float 90) fd 5 ;;move around randomly

set partner one-of (turtles-at -1 0) with [ not partnered? ] if partner != nobody [ ;;if successfully running into another turtle, partner up

(37)

set heading 270 ;;face partner ask partner [

set partnered? true set partner myself set heading 90 ]

] ] end

to evolve-strategy ;;the agent will copy the strategy of its most successful

;;neighbour if its own score is not the highest in the neighbourhood.

if any? turtles-on neighbors [

if score < [score] of max-one-of turtles-on neighbors [score] [set strategy [strategy] of max-one-of turtles-on neighbors [score] ]

] end

;;to assign the role of donor and recipient to the pair to pick-role

let partnered-turtles turtles with [ partnered? ] ask one-of partnered-turtles [set donor? true] end

;;choose an action based upon the strategy being played to select-action ;;turtle procedure

(38)

set color brown - 2 ]

if strategy = "altruist" [ act-altruist set color red]

if strategy = "caudefct" [ act-caudefct set color lime + 1 ]

if strategy = "rewarder" [ act-rewarder set color yellow - 1 ]

if strategy = "punisher" [ act-punisher set color blue ]

end

to play-a-round ;;turtle procedure

get-score ;;calculate the score for this round update-history ;;store the results for coming rounds end

;;calculate the payoff for this round ;;the donor makes the decision in the pair to get-score

if strategy = "defector" [ if donor? = true

[ set score score - 0

ask partner [ set score score + 0 ] ];;defectors will not help so no change for both occurs

]

if strategy = "altruist" [ if donor? = true

ask partner [ set score score + 5 ] ];;altruists always help ]

(39)

if strategy = "caudefct" ;;;need to look at donor's own history [ if donor? = true

[ set previous-helped? item ( [who] of self ) partner-history ifelse ( previous-helped? )

[ set score score - 0 ;;if having helped in previous round, then ask partner [ set score score + 0 ] ];;this round the donor will not bother to help

[ set score score - 4 ;;the donor will help in this round if it hasn't in the previous one

ask partner [ set score score + 5 ] ];;in order not to make own history look too bad

]]

if strategy = "rewarder" ;;need to look at the recipient's history [ if donor? = true

[ set partner-defected? item ( [who] of partner ) partner-history ifelse ( partner-defected? );;will only help those who have [ set score score - 0 ;;helped others in the previous round ask partner [ set score score + 0 ]];;and defect on those who defected previously

ask partner [ set score score + 5 ]] ]]

if strategy = "punisher" ;;need to look at the recipient's history [ if donor? = true

[ set partner-defected? item ([who] of partner) partner-history ifelse ( partner-defected? ) ;;will punish those who have

[ set score score - 1 ;;defected in previous round at a small cost ask partner [ set score score - 5 ]]

(40)

]] end

;;update PARTNER-HISTORY based upon the strategy being played to update-history

if strategy = "defector" [ defector-history-update ] if strategy = "altruist" [ altruist-history-update ] if strategy = "caudefct" [ caudefct-history-update ] if strategy = "rewarder" [ rewarder-history-update ] if strategy = "punisher" [ punisher-history-update ] end

Part 4 Strategies

This part defines each strategy and updates the history.

to act-defector

set num-defector-games num-defector-games + 1 set defect-now? true

end

to defector-history-update

;;uses no history. This is kept for unity in form with the other strategies

end

to act-altruist

set num-altruist-games num-altruist-games + 1 set defect-now? false

(41)

to altruist-history-update

;;uses no history. This is kept for unity in form with the other strategies

endto act-caudefct

set num-caudefct-games num-caudefct-games + 1

set previous-helped? item ([who] of self ) partner-history ifelse (previous-helped?) [

set defect-now? true ] [

set defect-now? false ]

end

to caudefct-history-update set partner-history

(replace-item ([who] of self) partner-history defect-now?) ;;updates own history

end

to act-rewarder

set num-rewarder-games num-rewarder-games + 1

set partner-defected? item ([who] of partner) partner-history ifelse (partner-defected?)

[set defect-now? true] [set defect-now? false] end

to rewarder-history-update if partner-defected? [ set partner-history

(42)

(replace-item ([who] of partner) history partner-defected?)

] end

to act-punisher

set num-punisher-games num-punisher-games + 1

set partner-defected? item ([who] of partner) partner-history ifelse (partner-defected?) [

set defect-now? true ] [

set defect-now? false ]

end

to punisher-history-update if partner-defected? [ set partner-history

(replace-item ([who] of partner) history partner-defected?)

] end

Part 5 Plotting Procedures

This part defines the outlook of plotting. Noted that the details of plotting update is included in the interface, which could be seen in Image b, Appendix 1.

(43)

;;calculate the total scores of each strategy to do-scoring

set defector-score (calc-score "defector" num-defector) set altruist-score (calc-score "altruist" num-altruist) set caudefct-score (calc-score "caudefct" num-caudefct) set rewarder-score (calc-score "rewarder" num-rewarder) set punisher-score (calc-score "punisher" num-punisher) end

;; returns the total score for a strategy if any turtles exist that are playing it

to-report calc-score [strategy-type num-with-strategy] ifelse num-with-strategy > 0 [

report (sum [ score ] of (turtles with [ strategy = strategy-type ]))

] [

report 0 ]

Indirect reciprocity and choice of donation to strangers : a simulation based study

Master thesis for Msc. Economics, University of Amsterdam

Student:

Li Ai

Student number:

10225897

Title:

Indirect reciprocity and choice of donation

to strangers: A simulation based study

Supervisor:

Dr. Aljaz Ule

Table of contents

1. Introduction

2. The model

3. The simulation

4. Discussion

5. Conclusion

References

Appendix 1. Interface of the model

Appendix 2. Main body of codes with brief explanation