Daredevils in a robot swarm: optimizing the trade-off between risk and reward

(1)

between risk and reward

Daredevils in a robot swarm: optimizing the trade-off

Academic year 2019-2020

Master of Science in Computer Science Engineering

Master's dissertation submitted in order to obtain the academic degree of

Counsellor: Ilja Rausch

Supervisors: Prof. dr. ir. Pieter Simoens, Dr. Yara Khaluf

Student number: 01200867

(2)

(3)

Preface

When writing these last words, a tumultuous academic year comes to an end. Never could I have imagined that I would have to ﬁnish my masters thesis dur-ing a global pandemic. That is why it is appropriate to use this preface to thank everybody that helped me get through this diﬃcult period.

First I would like to thank prof. dr. ir. Pieter Simoens, dr. Yara Khaluf and Ilja Rausch for providing me with feedback and help when needed.

I would not have been able to do this without the people in my surrounding. I would especially like to thank my dear parents for giving me the opportunity to pursue my ambitions for all this time. I would also like to thank my friends for enduring my ramblings about robots. And last but certainly not least, I want to thank my sweetest Viktoria for always being there for me.

Jannes Moeskops August 16, 2020

(4)

Abstract

Robot swarms are characterized by their inherently high redundancy and de-centralized control. This means they are capable to cope with losing agents. That should make robot swarms exceptionally suitable to perform tasks in dan-gerous environments. In this paper a model for a robot swarm that is able perform a foraging task in a risky environment is proposed. First a homoge-neous swarm is considered where all agents will mitigate the risk in the same way. This model is then further optimized by adding more risky individuals daredevils to the swarm. Then inﬂuence of the amount and the riskiness of the daredevils is analyzed.

(5)

Daredevils in robot swarms:

a tradeoff between risk and reward

Jannes Moeskops

Promotors: prof. dr. ir. Pieter Simoens, dr. Yara Khaluf

Supervisor: Ilja Rauch

Abstract—Robot swarms are characterized by their inherently high redundancy and decentralized control. This means they are capable to cope with losing agents. That should make robot swarms exceptionally suitable to perform tasks in dangerous environments. In this paper a model for a robot swarm that is able perform a foraging task in a risky environment is proposed. First a homogeneous swarm is considered where all agents will mitigate the risk in the same way. This model is then further optimized by adding more risky individuals daredevils to the swarm. Then influence of the amount and the riskiness of the daredevils is analyzed.

Keywords: Robot swarm, swarm engineering, risk, for-aging

I. INTRODUCTION

Swarm robotics and swarm engineering are fields that research systems of robots that work together to complete a task. These tasks will typically be better suited to be completed by a cooperating group than by a single individual [1]. Since the first introduction of the term ”swarm robotics” around the end of the previous millennium[2] many applications have been described in literature. Many problems in the field have been solved by now [3]. However existing research all took place in a controlled and more importantly safe environment [3]. Most solutions for problems like path finding and foraging tasks are solved in rather greedy ways, spreading the swarm out across the entire (often unknown) area [3][1], not taking into account possible risks. As a robot swarm often consists of cheap agents and have an inherently high redundancy, they are exceptionally suited to deal with risky situations. In this thesis a model for a robot swarm to deal with such risky environment will be explored.

In real life, some individuals are more willing to take risks to achieve some form of reward. These kind of individuals can become an asset to a certain population if those rewards bring a benefit to the whole group. The same might be true for the population of a robot swarms. In this thesis the influence of such risky individuals or daredevils on a population will be analyzed. Then daredevils are defined within this model and a heterogeneous population can be analyzed.

II. EXPERIMENTAL SETUP A. Test scenario

In the experiments the agents will need to perform a

swarm depends on an energy level that is decreasing over time. In order to maintain this energy level, the agents will need to go and explore a risky environment in search of resources. The experiment ends when the energy level reaches zero or if all agents are gone.

B. Environment

The environment consists of 3 parts:

• Nest area: This is where the found resources need to

be taken to. The nest also serves a safe zone for agents between explorations. It can be easily found by the lights that are positioned over it. Here agents will be able to sense the energy level.

• Foraging area: This is where resources can be found.

These resources are distributed uniformly. Each agent in the foraging area has a probability to die. This probability follows a sine wave. While in the foraging area, agents can sense the risk of dying.

• Graveyard: This is an area on the edge of the arena

outside the walls. This is where unlucky agents are moved. This area is not reachable for living and secluded enough to prevent any unwanted interaction between the ”dead” and ”living”.

C. Agent model

The agents in the experiments are modeled by footbots. The foraging task can be split up in 3 main tasks:

• Exploring: The agents are moving random through the

arena in search for resources. They maximize their spread using two techniques, anti-phototaxis and diffusion. First they move away from the nest by driving away from the lights using anti-phototaxis. Diffusion is achieved by first calculating a weighted sum of the reading of the proximity sensors that are placed around the body of the agent. Then change the heading of the agent to the negative of this vector. The agents will return to the nest when they find a resource or when their predetermined exploration time is up.

• Repatriation: When returning home, the agent will

per-form phototaxis to reach the nest. The diffusion vector is added to the vector that points toward the light instead of outright moving away from other agents. When entering

(6)

the risk of dying they last sensed to other agents. After that they will go to the resting state.

• Resting: Agents that are in the resting state will switch

between phototaxis and anti-phototaxis in order to pro-vide better mixing with each other. This will help the spread of information. After resting a random amount of ticks they will determine wether or not to explore or rest again.

III. RISK MITIGATION

In order to increase their survival chances, agents need to make well considered decisions. This section will describe the measures taken by the agents to deal with the risks imposed on them by the environment.

The amount of risk an agent experiences is proportional to the time spent in the foraging area. Since the resources are distributed randomly across the entire arena, a longer explo-ration time means a higher chance for a reward. Therefore the calculation of the optimal exploration time is one of the two critical parameters that need to be calculated. The second one is the exploration probability. This parameter decides the fraction of the swarm that will expose itself the risks outside the nest.

A. Optimal exploration time

To calculate the optimal time to search for resources, spending enough time outside the nest while simultaneously considering the risk, a utility function is calculated as follows. Eexp= (1−Pdpt)t·[1−(1−Prpt)t]·(1−S)+[1−(1−Pdpt)t]·S

(1) Where:

The first term is the chance of surviving a trip of t ticks and finding a resource in those t ticks times the internal value the agent assigns to finding a resource. The second is the probability of dying during a trip of t ticks times the internal value agents assign to their life.

Fig. 1. Utility in function of time for different risk levels In figure 1 the utility is plotted in function of time. It can be

die grows. This utility function is concave with one maximum in function of t. This means it is easy to find an optimum for t that maximizes the utility of an exploration. This can easily be done by calculating the derivative, equal it to zero and solve for t. Solving for t gives:

t =

ln(1−Pdpt)

(S−1)·ln[(1−Pdpt)·(1−Prpt)]

ln(1− Prpt) (2)

Fig. 2. Optimal exploration time in function of risk of dying for different probabilities of finding a resource.

In figure 2 the optimal exploration time achieved with this equation is plotted for different probabilities of finding a resource. When the risk approaches zero, the exploration time goes to infinity. When the probability of finding a resource is high, the time function will always allow for short explorations as the chance of the trip being worth the small risk is high. This time however stays low as there is no need to risk longer times. As the time needed to find a resource grows, the exploration time achieves higher values but goes below zero faster as the risk will be too high as more time is needed to make a trip worthwhile.

B. Exploration probability

Now that the agents are able to calculate the optimal exploration time, they also need to determine the probability of leaving the nest. In order to prevent the unnecessary endangerment of agents, the fraction of agents outside the nest should be proportional to the required energy intake.

1) Maintaining a constant energy level: One way to model the exploration probability is to try and estimate the optimal probability based on the slope of energy level curve. The goal of the agents is on one hand to prevent the energy level to go to zero and thus the exploration probability needs to increase when the energy level is decreasing (a negative slope). On the other hand, the agents want to prevent taking unnecessary risks and therefore the exploration probability needs to drop when an excess of energy is being taken in (a rising slope). The update rate is reversely proportional to the energy level

(7)

Pexp(t) = Pexp(t− 1) − r ·

Einit

Et · Slope

(3) With:

• Pexp(t) : The exploration probability at time t.

• r : The rate of how much the slope increases/decreases

the probability

• Slope: The slope of the energy level curve obtained with

linear regression.

• Einit : The initial energy level. • Et: The energy level at time t.

C. Daredevils

Daredevils are more risky individuals. This is reflected in the model by giving daredevils a lower S value. In order to maintain a certain presence of daredevils, agents determine to whether to be a daredevil or not randomly. This selection happens periodically.

IV. RESULTS

The experiments were simulated using the ARGoS simula-tor.

A. Performance metrics

Two performance metrics were defined. Since there are two goals: keeping the swarm alive and maintaining the energy, the slope of both curves is used. Since the end of the experiment is determined by both the energy level and the number of dead agents, these slopes can be used to calculate the projected survival time (PST). This is the number of ticks until one of these values reaches zero when extending these slopes beyond the duration of the simulation.

B. Base models

The PST for each S value are plotted in figure 3. As S rises, the PST remains more or less constant until S approaches one. As those agents will barely explore they will not be able to maintain the energy level. They will only explore when the risk of dying is low enough. This period is too short to find enough resources and more selfish models will starve. There is only a small increase in PST noticeable for the safer models. This is caused by the avoidance of explorations during high risk periods. This causes the energy to drop. A drop in energy results in higher exploration probability and thus more explorers during medium risk periods. This increase in

Fig. 3. The projected survival time C. Influence of daredevils

First daredevils with an S value of zero were added to the swarm. When looking at figure 4, it can be seen that the added daredevils have a positive effect on the PST. The added daredevils will constantly explore, resulting in a more consistent energy intake. This will allow the other agents to limit their explorations to the short periods where the probability of dying becomes negligible. The previously non-viable models can compete with the addition of daredevils in the swarm. A clear peak can be seen as S equals 0.9 and the amount of daredevils is 10%. A greater amount of daredevils will make more selfish models viable, but will also greatly increase the death rate as more agents will be risking themselves.

Fig. 4. The projected time of survival for different percentages of daredevils When looking at the PST for different amounts of daredevils in the swarm it can be noted that there is a critical amount of daredevils to make the swarm viable. When increasing the amount further, the PST lowers as more agents exposing themselves to higher risks will lead to more casualties.

It would make sense for the PST to rise when more safer daredevils are used but this seems not to be the case. When looking at figure 6, it can be seen that the performance

(8)

Fig. 5. The projected survival time in function of the amount of daredevils increased explorations during medium risk periods negates the effect of avoiding the high risk periods. As long as the S value of the daredevils allows for a consistent energy intake, the effects will be equally beneficial to a model would otherwise fail.

Fig. 6. The projected time of survival for different values of ddm V. CONCLUSION

In the experiments it was clear that a robot swarm consisting of homogeneous agents are not sufficient to deal with the risky environment. While the most selfish models manage to greatly increase their survival time, they were not capable of maintaining the nest energy. It was shown that adding daredevils to those swarms proved to be beneficial to their performance in the experiments. While adding more risky indi-viduals may seem counter productive, they in fact manage save otherwise non-viable models by increasing their energy intake. As daredevils tend to be more effective into accomplishing the foraging task, the rest of the swarm is less pressured to go out foraging. This allows for regular individuals in a swarm to be more selfish. The majority of the swarm only goes out exploring during a small time period when the probability of dying is almost negligible. These explorations will provide the bulk of the energy intake. Since this safe

to the nest, a safe base model has a high chance of starving. A small amount of daredevils will provide a small but consistent income of resources to prevent the nest energy from going to zero before the more selfish agents get their chance to explore. The most effective approach for a robot swarm dealing with a risky environment is to limit the foraging to safe periods and having a small amount of daredevils to give the energy intake a little boost to prevent starvation. Having a smaller group of risky foragers puts a cap on the amount of casualties during higher risk periods. The amount of daredevils directly determines the trade-off between risk and reward. Models with more daredevils will perform better in the foraging task but the swarm will not survive very long. The selfishness of the daredevils does not seem to matter for the PST as long as it stays under the critical value. This is because all viable models managed to maintain the energy level without getting close to zero. What the best performing model needed was just a small but consistent boost to prevent the minimums in the energy level from reaching zero.

REFERENCES

[1] Levent Bayındır. A review of swarm robotics tasks. Neurocomputing, 2014.

[2] S Kazadi. Swarm engineering. Ph.D. thesis, California Institute of Technology, Pasadena, CA, USA., 2000.

[3] Mauro Birattari Marco Dorigo Manuele Brambilla, Eliseo Ferrante. Swarm robotics: a review from the swarm engineering perspective. Springer Science+Business Media New York, 2013.

(9)

Acronyms

Ad amplitude of dying probability. 6, 19

Pd period of dying probability. 6, 19

Pr Probability of ﬁnding a resource. 14

mdd Daredevil multiplier. viii, x, 15, 24, 30, 31

FA Foraging Area. ix, 5, 6, 9, 10, 17, 18 FB Footbot. 8

NA Nest Area. ix, 5, 6, 9, 17

NE Nest Energy. 5, 6, 19, 20, 23, 25, 30, 32

PTS Projected time of survival. x, 19, 23, 25, 28, 30, 33 S Selﬁshness. 12, 15, 20, 22–25, 28, 30

(10)

List of Figures

3.1 The test environment . . . 7

3.2 Utility in function of time for different risk levels . . . 13

3.3 Optimal exploration time in function of risk of dying for different probabilities of ﬁnding a resource. . . 13

3.4 Optimal exploration time in function of risk of dying for different values of S. . . 16

4.1 Number of exploring agents for different S values . . . 21

4.2 Nest energy for different S values . . . 22

4.3 The projected time of survival . . . 23

4.4 Number of lost agents for different values of S . . . 24

4.5 Number of exploring agents for different values of S and % dare-devils . . . 26

4.6 Energy level for different values of S and % daredevils . . . 27

4.7 The projected time of survival for different percentages of dare-devils . . . 28

4.8 Number of lost agents for different values of S and % daredevils 29 4.9 The projected survival time in function of the amount of daredevils 30 4.10 The projected time of survival for different values of_m_dd . . . 31

(11)

1 Introduction 1 1.1 Problem description . . . 1 2 Literature 2 2.1 Swarm robotics . . . 2 2.1.1 Foraging . . . 3 2.1.2 Navigation . . . 3 2.1.3 Information sharing . . . 4 3 Research approach 5 3.1 Test scenario . . . 5 3.2 Environment . . . 5 3.2.1 Nest Area . . . 6 3.2.2 Foraging Area . . . 6 3.2.3 Graveyard . . . 6 3.3 Agent controller . . . 8 3.3.1 Sensors/actuators . . . 8 3.3.2 Exploration . . . 9 3.3.3 Repatriation . . . 9 3.3.4 Resting . . . 10 3.3.5 Dead . . . 10 3.4 Risk mitigation . . . 10

(12)

3.4.1 Optimal exploration time . . . 10 3.4.2 Exploration probability . . . 14 3.5 Deﬁning daredevils . . . 15 3.6 Information sharing . . . 16 3.6.1 Mixing . . . 16 3.6.2 Probability of dying . . . 17 3.6.3 Time to ﬁnd a resource . . . 17 4 Results 18 4.1 Performance metrics . . . 18 4.1.1 Death rate . . . 18

4.1.2 Nest energy slope . . . 19

4.1.3 Projected time of survival (PTS) . . . 19

4.2 Environment parameters . . . 19

4.3 Base model behaviour . . . 20

4.3.1 Exploration . . . 20 4.3.2 Energy level . . . 21 4.3.3 Death rate . . . 23 4.4 Inﬂuence of daredevils . . . 24 4.4.1 Exploration . . . 25 4.4.2 Energy . . . 25 4.4.3 Death rate . . . 28 4.4.4 Varying_m_dd . . . 30 5 Conclusions 32 5.1 Conclusions . . . 32 5.2 Future work . . . 33

(13)

Chapter 1

Introduction

1.1 Problem description

Swarm robotics and swarm engineering are fields that study systems of robots that work together to complete a task. These tasks will typically be better suited to be completed by a cooperating group than by a single individual [1]. Since the first introduction of the term ”swarm robotics” around the end of the previous millennium [2] many applications have been described in literature. Problems as navigation, foraging, construction and many others have been solved by now [3]. However existing research all took place in a controlled and more im-portantly safe environment [3]. Most solutions for problems like path finding and foraging tasks are solved in rather greedy ways, spreading the swarm out across the entire (often unknown) area [3][1], not taking into account possible risks. As a robot swarm often consists of cheap agents and have an inherently high redundancy, they are exceptionally suited to deal with risky situations. In this thesis a model for a robot swarm to deal with such risky environment will be explored.

In real life, some individuals are more willing to take risks to achieve some kind of reward. These kind of individuals can become an asset to a certain popula-tion if those rewards bring a benefit to the whole group. The same might be true for population of a robot swarms. In this thesis the influence of such risky individuals or daredevils on a population will be analyzed. Then daredevils are defined within this model and a heterogeneous population can be analyzed.

(14)

Chapter 2

Literature

2.1 Swarm robotics

As the name might give away, the ﬁeld swarm robotics deals with systems that consists of a large number of individual robots. Swarm robotics research can be distinguished from other robotic disciplines by ﬁve criteria [4]:

1. Robots are autonomous: they can move and interact within an environ-ment without any form of centralized control.

2. Tasks can be performed in group.

3. The swarm consists of only a few homogeneous groups. There is no large number of groups that are tasked with a speciﬁc sub-task beforehand.

4. The individual capabilities of a robot are limited: meaning that their is a limit om computational, sensing and communicative capabilities

5. Agents only work on a local scale: communication and sensing can only work locally to ensure a distributed system without global coordination.

The locality constraints and limitation on computational capabilities may seem like disadvantage but in fact they are beneﬁcial to the scalability and robustness of the system. Global coordination would only get more complex for increasing numbers while more powerful agents would greatly increases costs. The main downside of this approach is that optimization of micro-behaviour might not result in optimal macro-behaviour [1].

(15)

decades now and many problems have been solved and published [3]. While in the literature found no risk of losing agents was incorporated, these examples can be used as a basis to build the models in this thesis.

2.1.1 Foraging

Foraging is a task that is inspired by a common activity of insects such as ants. Agents are required to move items located in an environment and move them to an area that is called the nest. The items that need to retrieved are often modeled as energy needed by the swarm to survive. This mirrors the retrieval of food in the animal world. A variant of foraging is multi-foraging where agents can ﬁnd different items at different sites.

The foraging task can be split up into different sub-tasks. While some agents are exploring, others are retrieving items, while some are spending their time doing nothing to reduce interference between agents [1]. Different approaches exist to allocate the different sub-task to individual agents. They can be sub di-vided in probabilistic allocation and threshold based allocation. In the former tasks each agent randomly chooses a task with a certain probability. This prob-ability can be constant or changed dynamically based on observations of the environment. The randomness makes sure that not each agent chooses a task at the same time [5]. In the other approach, agents choose a task based on values of observations reaching a certain threshold. Just like the probabilistic approach, this threshold can be ﬁxed or variable [6].

2.1.2 Navigation

As simple agents are preferred in swarm robotics, navigational capabilities will often be limited. Some external solutions need to be present in order for agents to be able to navigate towards a destination. One way this is achieved is by using beacons to mark specific points of interest. In many foraging applica-tions, the nest is marked using a gradient field that increases in intensity near the nest [7]. A way to physically implement such a gradient field is by using a light source. Even simple agents such as Braitenberg vehicles can navigate to-wards a light using phototaxis [8]. Phototaxis is the action of moving toto-wards a light source. Anti-phototaxis is moving away from a light source. Such gradient fields are not only useful for localization but can also be used for inter agent organisation or collision avoidance. Designing such a field can be done using artificial physics where objects exert an attractive/repulsive force on each other

(16)

based on their distance [9]. Deﬁning such a gradient ﬁeld is easy as proximity sensors often return a continuous value that is proportional to the distance to an obstacle.

2.1.3 Information sharing

There are multiple approaches to share information between agents that are commonly used: using a shared memory, the environment or direct local com-munication. Using a shared memory goes against the constraint of limiting communication to only local interactions but is often used to get a better in-sight in the information [10]. Communicating through the environment can be achieved by mimicking nature. Many insects leave a trail pheromones behind as a way to communicate with their peers. This technique can also be used in swarm robotics where agents leave behind a trail that can be sensed by other nearby agents. Another way to communicate using the environment is by hav-ing agents assume the role of markers [11]. A downside of this technique is that communication is limited to agents passing by the pheromone or marker. Lastly, if agents have capabilities to share data with their neighbours, direct communication can be realized. A. Reina et al. proposed a way to share infor-mation in value sensitive scenarios. In the test setup they used agents explore to discover the quality of one of multiple ”nesting areas”. The swarm must then make a collective decision on which one is the best. To share ﬁndings with the group, each agent is capable of transmitting low bandwith data. Since there is the constraint of only local communication, agents have to mix to meet as many other agents as possible. Mixing was achieved by having agents walk randomly across the arena [12].

(17)

Chapter 3

Research approach

3.1 Test scenario

The experiments will not be performed with physical robots but rather simu-lated using the ARGoS simulator. In the experiments the agents will need to perform a foraging task inside a walled-off arena. Foraging is chosen as the main task because it has been studies thoroughly in the past [3], providing a good basis. Another important perk is that risks can be intuitively introduced here. Just like animals searching for food in the wild have to deal with preda-tors, agents will experience a certain danger when foraging. The predators in the simulations are implemented in a more abstract way as each agent has cer-tain probability to die each tick. The goal of the foraging task is to maincer-tain the Nest Energy (NE) by retrieving resources scattered in the arena. To provide an incentive for the agents to to go out, the NE lowers over time. The NE decreases at a rate proportional to the amount of surviving agents. This can be seen as energy consumption of all agents. The experiment will end if the NE reaches zero or if their are no living agents left.

3.2 Environment

The environment consists of 3 main parts: the Nest Area (NA), the Foraging Area (FA) and the graveyard. The environment is controller though a loop con-troller that sets up the area and updates it every tick. The concon-troller is respon-sible for resource placement, calculating the energy level and probabilistically determining which agents will die in the arena.

(18)

3.2.1 Nest Area

The NA is located in the center of the environment. This is where the agents start the simulation. It is centered in order to give the agents maximal access to the surrounding FA. Agents that reside in the nest are safe from the risks in the arena. As agents enter the nest, any resource they carry is automati-cally dropped and added to the NE. There are 5 yellow lights suspended above the NA. These make it possible for the agents to locate the NA with their light senors. There is one light for each corner to ensure visibility from the entire arena. The light in the middle is used for a mixing algorithm of the agents. In order for the agents to know that they are in the nest, the ﬂoor of the NA is coloured gray so that it can be distinguished from the FA. Agents that reside in the NA are safe from the risks in other parts of the arena. Inside the nest area agents can sense the NE level.

3.2.2 Foraging Area

The FA is the region around the nest where the agents go to explore for re-sources. When exploring the FA, agents expose themselves to a risk. More concrete: every tick of the simulation, there is a probability for each agent in the FA to die. This probability to die is distributed uniformly over the entire FA. This probability however, varies periodically over time following a sine shape with amplitude: _A_d and period: _P_d. The sine wave allows for safe periods so that agents have a chance to forage safely. Resources, represented by black blobs on the ground, are distributed uniformly over the FA just like the dying probability. There are a maximum 200 items and they reappear randomly when taken by a forager. Figure 3.1 gives an overview of the arena.

3.2.3 Graveyard

This is an area on the edge of the arena outside the walls. This is where un-lucky agents are moved. Outright removing them from the simulation tends lead to inconsistent behaviour of the simulator, so moving them out of range of the experiment proved to be a more reliable solution. This area is not reach-able for living agent and secluded enough to prevent any unwanted interaction between the ”dead” and ”living”.

(19)

(20)

3.3 Agent controller

Each agent in these experiments will be represented by a Footbot (FB). This section will detail how the FBs are modeled in order to complete their task. First an overview of the available sensors and their implementation is given. Then the different states a controller are detailed. Each controller can be one of 4 states: exploration, repatriation (with or without resources), resting or dead.

3.3.1 Sensors/actuators

The footbot model of the ARGoS simulator provides many implementations of different sensors and actuators. The ones that are used in these experiments are listed below along with a brief description .

• Motor ground sensor: The ground sensor can sense the colour intensity of the ground below the FB. Reading returns a value ranges between zero and one for white and black respectively. The robots use this sensor to check if they are in the gray coloured nest or not.

• Light sensor: The robot has 24 light sensors positioned in a circle along the body. Each sensor returns its angle along with a value between zero and one where zero means no light was detected in that direction. Values increase the closer the FB gets to a light source. The light sensor is used to perform phototaxis.

• Proximity sensor: Just like light sensors, proximity sensors are placed in a ring of 24 sensors around the body of the FB. They also return a value between zero and one where zero means no object was detected. The value rises the closer the object gets. Proximity sensors are used to avoid collisions.

• Range and bearing sensors: The range and bearing system allows for communication between nearby robots. The system only allows broad-cast communication between robots that are in line of sight of each other. Every FB can write data into 10 slots of one byte each. Because of this limitation extra steps are needed to send non-integer and bigger values. Floats are ﬁrst multiplied to an integer range. Then 2 slots are combined into a 16 bit slot. The ﬁrst slot will then contain the value divided by 256 and the second will contain the remainder. The original value can be re-constructed at the receiving side by doing the reverse process.

(21)

Two other virtual sensors were implemented to detect the risk outside the nest and the energy level inside the nest. The loop controller directly communicates these values to the agents controller. The controller will add some Gaussian noise to the value to simulate a more realistic sensor. It was chosen to use this cheating method because there were no sensors available to use for these use cases. In this scenario both nest energy and risk are abstract concept but in real life scenarios these may be values that can be deduced using more realistic sensors.

3.3.2 Exploration

The goal of exploration is to find resources in the FA. The resources are dis-tributed uniformly around the entire FA. In order to maximize the probability to find a resource agentss need to spread out as much as possible. To achieve this, two methods are used. The first method by means of anti-photo-taxis. agents will move away from the lights above the nest that are detected with their light sensors. The central location of the NA and the random spread of the agent will result in robots entering the FA in different directions. Another method is the use of a diffusion algorithm. Each tick, a weighted sum of the readings from the proximity sensors is calculated. This results in a collision vec-tor that points towards the closest obstacles. Diffusion is achieved by changing the heading of the agent to the inverse of the collision vector. Two (or more) nearly colliding agents will be travelling in opposite directions increasing the area coverage.

3.3.3 Repatriation

It is important for the agents to return to the NA as eﬃciently as possible be-cause each extra tick they stay outside is another chance to die but without the possibility to ﬁnd another reward. Despite the different setups between both scenarios, the approach for this task is similar. The nest is easy to locate by the overhead lights so all the agents have to do is use move towards the yel-low lights. This can be easily implemented using the light sensor to perform photo-taxis. In order to prevent collisions, equation_{?? is used to determine the} heading of the agent. Robots that enter nest will spent some time searching for a place in the nest before switching to the resting state. While doing so, they will share information about their latest trip as described in section 3.6.

(22)

3.3.4 Resting

Robots that are resting wait until they rested the minimum resting time. They will probabilistically decide whether or not to leave the nest. This decision mak-ing process is detailed in section 3.4. If they will leave the nest, they switch to the exploration state. Else the resting timer gets reset to zero. While waiting, they will listen to incoming transmissions from their returning peers. A mixing algorithm is used make room for returning robots and optimize the amount of agents that get within line of sight. This is described in section 3.6.

3.3.5 Dead

Agents that are in this state will turn of all functionality. This way there is no communication with other agents nor unwanted movements or collisions pos-sible.

3.4 Risk mitigation

In order to increase their survival chances, agents need to make well consid-ered decisions. This section will describe the measures taken by the agents to deal with the risks imposed on them by the environment.

The amount of risk an agent experiences is proportional to the time spent in the FA. Since the resources are distributed randomly across the entire arena, a longer exploration time means a higher chance for a reward. Therefore the calculation of the optimal exploration time is one of the two critical parameters that need to be calculated. The second one is the exploration probability. This parameter decides the fraction of the swarm that will expose itself the risks outside the nest.

3.4.1 Optimal exploration time

To calculate the optimal time to search for resources, spending enough time outside the nest while simultaneously considering the risk, a utility function is calculated as follows. First the expected value of an exploration during_{t ticks is} considered. There are 2 outcomes: the agent either dies or returns to the nest. The expected value of the exploration thus equals the sum of the expected value of both outcomes weighted by their respective probabilities.

(23)

With:

• Eexploring: The expected value of an exploration of t ticks.

• Esurvival: The expected value of surviving the exploration.

• Edying: The expected value of dying during the exploration.

• Pdying: The probability of dying during the exploration of t ticks.

3.4.1.1 Calculating Pdying

The probability of dying in t ticks can be calculated by considering the fact that an agent can survive_n_{− 1 ticks and die on the the n}thtick for all_n_{≤ t.}

Pdying=

t

X

n=0

(1− Pdpt)n· Pdpt (3.2)

With_P_dpt: The probability of dying per tick. Calculating the series sum and simplifying gives:

Pdying = 1− (1 − Pdpt)t (3.3)

This can also be interpreted as the chance of not surviving t ticks.

3.4.1.2 Calculating EVdying

The expected value of dying is constant in this model and a parameter that needs to be optimized:

Edying= S (3.4)

S is the the Selﬁshness of the agent. A value between zero and one where a value of one means the agent only values its own survival. A value of zero means it does not care about dying.

3.4.1.3 Calculating Esurvival

Calculating the expected value of surviving the exploration is more complicated as there are again two possibilities when an agent survives a trip outside. The agent can return successfully with resources or return empty handed. This is calculated the same way as in (3.1).

Esurvival= (1− Psuccess)· Eno resource+ Psuccess· Eresource (3.5)

(24)

• Eno resource: The expected value of returning with no resource.

• Eresource: The expected value of ﬁnding a resource.

• Psuccess: The probability of ﬁnding a resource in t ticks.

The probability of ﬁnding a resource can be calculated analogous to (3.3): the probability of not ﬁnding nothing for t ticks:

Psuccess= 1− (1 − Prpt)t (3.6)

With_P_dpt: the chance of ﬁnding a resource per tick.

In this model the value of returning to the nest with no resources is 0.

Eno resource= 0 (3.7)

As S defines the selfishness of an agent, the value of finding a resource needs to be inversely proportional. S is a value between zero and one so the value of finding a resource is defined as follows:

Eresource= 1− S (3.8)

This way a selfish agent with an S value of one will give no inherent value to finding a resource and vise versa. Now with (3.4)-(3.9) filled in (3.1), the following utility function is obtained.

Eexploring= (1− Pdpt)t· [1 − (1 − Prpt)t]· (1 − S) − [1 − (1 − Pdpt)t]· S (3.9)

In ﬁgure 3.2 the utility is plotted in function of time. The probability to ﬁnd a re-source is 1

500and risks of dying per tick is respectively 0.0001, 0.0003 and 0.0006

for low, intermediate and high risk. It can be seen that the optimum moves to-wards zero as the probability to die grows. This utility function is concave with one maximum in function of t. This means it is easy to ﬁnd an optimum for t that maximizes the utility of an exploration. This can easily be done by calcu-lating the derivative, equal it to zero and solve for t.

∂Eexploring

∂t = 0 (3.10)

Solving (3.10) for t gives:

t = ln ln(1−Pdpt) (S−1)·ln[(1−Pdpt)·(1−Prpt)] ln(1_{− P}rpt) (3.11)

(25)

Figure 3.2: Utility in function of time for different risk levels

Figure 3.3: Optimal exploration time in function of risk of dying for different probabilities of ﬁnding a resource.

(26)

In ﬁgure 3.3 the optimal exploration time achieved with this equation is plot-ted for different probabilities of ﬁnding a resource. These probabilities being

1 100,

1 500,

1

1500. For eachPrthe optimal exploration time goes to inﬁnite when

the risk of dying becomes zero. When the chance of finding a resource is high, the optimal exploration time will be equal to the time needed to find something disregarding the risk as this time is small enough to keep the total probability of dying low. For the intermediate_P_r, the exploration time only begins to rise from a certain_P_d. Endangering the agent for short times is not worth it since the chance of the trip being successful is smaller. The curve then rises at a faster rate, as more time is needed to find a resource. When_P_rbecomes lower, it takes a much lower_P_r before the time exceeds zero as short explorations will have an even lower chance of being worthwhile. The curve then becomes steeper than the others since the lower_P_rrequires longer exploration time to be worth it.

3.4.2 Exploration probability

Now that the agents are able to calculate the optimal exploration time, they also need to determine the probability of leaving the nest. In order to prevent the unnecessary endangerment of agents, the fraction of agents outside the nest should be proportional to the required energy intake.

3.4.2.1 Maintaining a constant energy level

One way to model the exploration probability is to try and estimate the optimal probability based on the slope of energy level curve. The goal of the agents is on one hand to prevent the energy level to go to zero and thus the exploration probability needs to increase when the energy level is decreasing (a negative slope). On the other hand, the agents want to prevent taking unnecessary risks and therefore the exploration probability needs to drop when an excess of en-ergy is being taken in (a rising slope). The exploration probability is optimal when the energy level remains constant over time (slope is zero). If the energy keeps dropping for a longer duration, it becomes more and more critical to increase the exploration probability. The rate with which the exploration prob-ability increases must be reversely proportional to the energy level. This leads to the following model.

Pexp(t) = Pexp(t− 1) − r · Einit

Et · Slope

(3.12)

(27)

• Pexp(t) : The exploration probability at time t.

• r : The rate of how much the slope increases/decreases the probability • Slope : The slope of the energy level curve obtained with linear

regres-sion.

• Einit: The initial energy level.

• Et: The energy level at time t.

Having the energy level in the denominator of the formula means that when the energy is closing in to zero, the probability of exploring will shoot to one im-mediately. This iterative approximation will sometimes lead to a value greater than one or smaller than zero. After each update its value will need to be trun-cated in order to maintain the properties of a probability distribution.

3.5 Deﬁning daredevils

With the above described model for risk mitigation, every individual in the swarm will more or less make the same assessment when calculating their exploration time. To introduce the concept of daredevils in this model the S parameter is used. Daredevils are defined in this context as beings that give a higher inherent value to rewards. This value to rewards is reflected in the utility function 3.9 as₁_{− S. The daredevil property is then translated to the} mathematical model by adding a multiplier to S. This multiplier is referred to as Daredevil multiplier (_m_dd). Its value can range between zero and one. Using this multiplier a daredevil will explore for a longer time than a regular agent. This definition can be interpreted intuitively as a daredevil will spent more time/effort into acquiring resources despite the same risk. Daredevils have the same goal as their peers to maintain a constant energy level. When the energy level remains constant, it is not necessary for both daredevils as regular agents. So since it does not make sense for agents to explore when not it is not necessary, daredevils will use the same exploration probability as their peers. In figure 3.4 the influence of S on the optimal exploration time can be seen for S = 0.3, 0.6 and 0.8. While low values of S allow for a small exploration time for all risks, a high S value gives a way smaller range of dying probabilities in which the agents can explore.

(28)

time S.png

Figure 3.4: Optimal exploration time in function of risk of dying for different values of S.

3.6 Information sharing

The information that a single agent can gather is limited. Luckily the swarm can use its numbers to aggregate more accurate data. The range and bearing sensors/transmitters are used to share the data between agents. To prevent the use of duplicate date, each agent has a randomly generated 8 bit id that is created at the beginning of each experiment. This id is sent along with the data. On the receiver side, a list of all received ids is kept. Transmissions containing an id that appears in this list is discarded. The list is cleared each time an agent switches from the resting state to the explore state.

3.6.1 Mixing

Since data can only be received between robots that are within line of sight of each other, mixing is necessary. Reina et al. [12] describes a random walk al-gorithm to improve mixing. However in this case agents need to be careful not to leave the nest. Therefore the algorithm used here makes the robots move in a more controlled way to keep them in the nest. Robots that are resting al-ternate between phototaxis and anti-phototaxis to the nearest light source for

(29)

a few ticks. Colliding robots will move in opposite directions. Agents will move towards the light source that has the highest sensor reading so this diffusion increases the probability for agents to focus on a different light source. This increases the amount of robots that get within a line of sight with each other.

3.6.2 Probability of dying

The most important piece information in this scenario is the current risk of dy-ing in the FA. Robots that are in the NA only have outdated knowledge of the risk from the last tick they were in the FA. To be able to calculate the correct optimal exploration time, they would have to ﬁrst enter the risky FA. This re-sults in unnecessary risk taking. Instead incoming agents will transmit their last measurement of the probability of dying.

3.6.3 Time to ﬁnd a resource

The other shared parameter is the time needed to ﬁnd a resource. To get an accurate estimation, each agent will keep a list of the times it needed to en-counter a resource. Returning agents will be transmitting the mean value of their list. Each agent then uses the mean of the received times and their own mean search time in further calculations.

(30)

Chapter 4

Results

In this chapter the results from simulations performed with the setup described above will be analysed. First the relevant performance metrics will be dis-cussed. Then the performance of the heterogeneous and homogeneous mod-els will be analysed and compared. To decrease the statistical dependency on one random seed, each simulation was repeated with 32 different random seeds.

4.1 Performance metrics

In order to evaluate and compare the performance of different models, two performance measures were used.

4.1.1 Death rate

The death rate is considered to quantize the ability of the agents to deal with the dangers of FA. No matter how good a model tries to optimize its survival chances, there is always a non-zero probability to die when exploring. This results in an always ascending number of casualties. Since the probability of dying ﬂuctuates over time, it is expected that better models will concentrate their exploration efforts around the minimums while avoiding the maximums. A lower risk of dying means a lower death rate meaning a more gentle slope of the death curve. Better models are thus differentiated by a lower death rate.

(31)

4.1.2 Nest energy slope

Keeping the swarm intact is key to survival but the prime objective is to keep the NE above zero. A model with a death rate of zero will not be able to gather resources. That is why at the ability of the agents to keep the NE stable is also considered as a quality metric. Since there is a constraint on the duration of the simulations, the overall slope of the energy curve is observed as a metric. While a model can still have a certain amount of energy left at the end of a simulation, a descending slope means that the swarm will starve at a certain point later on. A rising slope is also non-preferred as this means that there are too much agents risking their lives. This will however be reﬂected by the death rate metric.

4.1.3 Projected time of survival (PTS)

Both performance metrics have significantly different scales, and that makes them difficult to compare. Therefore for each metric the projected survival time is calculated. This time is calculated by extrapolating the data based on the previously calculated slopes. The time it takes for this extrapolated curve to reach zero is then the projected survival time. Since the simulation ends when one either the energy level reaches zero or all agents die out, the actual survival time will be constrained by one of them. The minimum of the two PTS values is thus used as the final metric.

4.2 Environment parameters

The parameters set for the environment were empirically determined in order to produce interesting results.

• Ad: The amplitude that gave the clearest results was 0.0005. Lower

am-plitudes made the risk irrelevant and a higher probability prevented each model from succeeding. This amplitude range was a soft spot to be able to observe interesting behaviour.

• Pd: The period chosen for the dying probability was 9000 ticks. Short

periods would be negligible compared to the time it took to ﬁnd an item making it seem like their were no safe periods. Longer periods were not an option since simulations take a long time to perform and enough pe-riods need to be simulated in order to properly analyze the behaviour.

(32)

• Amount of resources: The amount of resources is 200. This value made sure that ﬁnding a resource would take around 1000 ticks on average. This results in a probability of 36% of surviving such a round trip. This made sure that although foraging during high risk is possible it is heavily costed.

4.3 Base model behaviour

The ﬁrst range of experiments aims to create some insight in the behaviour of an homogeneous population of agents. Later the performance of hetero-geneous populations will be compared to these in order to understand the inﬂuence of adding daredevils to the swarm.

To be able to see the inﬂuence of the risk mitigation a model with S zero is used as a baseline. Agents in this model will only care about maintaining the NE, ignoring any risk.

4.3.1 Exploration

The influence of the selfishness is most noticeable when looking at the explo-ration behaviour. In figure 4.1 the percentage of exploring agents each tick is plotted for a few values of S. Clear fluctuations are visible for models with a higher value of S. When the risk of dying is higher, exploration times become shorter, resulting in less exploring agents on average. While the number of ex-ploring is lower during high risk periods, this number peaks higher during low risk periods for high S values. This is because less resources will be found in when exploration times are low, resulting in descending energy curve. The ex-ploration probability increases when the energy curve descends leading to an increased number of exploring agents when the lower risk allows longer explo-rations. The less selfish model achieve a more constant energy intake resulting in a more constant exploration probability.

(33)

Figure 4.1: Number of exploring agents for different S values

4.3.2 Energy level

In figure 4.2 the curves of the nest energy for different values of selfishness is plotted. Each model, except where S approaches one, manages to maintain the energy level above zero. The higher the selfishness, the higher the amount of fluctuations in the energy level. These peaks correspond to the minimums of the dying probability. This makes sense because the agents will be exploring for a longer time during low risk periods, increasing the probability of finding

(34)

a resource. Despite the fluctuations in the energy curve, most models seem to keep the energy level in a steady range. The safer models barely (or never) explore and most trips will be unsuccessful. Therefore they will not be able to keep up with the constantly dropping energy level. When S is 0.9, it can be seen that the swarm manages to survive one period. The energy level drops low during the second period so it becomes critical to find resources to recover. But as the probability of finding enough resources during that short period is low, the swarm will starve for some seeds.

(35)

4.3.3 Death rate

In figure 4.4 the amount of lost agents at each tick can be observed for different selfishness levels. The models with a higher S value manage to temper this slope a bit. However the difference in death rate seems small because while there are less agents outside during high risk periods, there are more agents exploring on average to keep up with the energy level. The slope is steepest on the right side of each peak in the probability of dying. As more resources are gathered during the minimums, the probability of exploring is lower on the ascending side. Since exploration is down during the peak, the NE lowers and that results in a higher exploration rate on the descending slope. Since more agents are exposing themselves to the medium risk moments, the effect of avoiding the high risk is voided. This effect can be seen when looking at figure 4.3 where the PTS is plotted for different values of S. It is noticeable that safer models do not have a significantly higher PTS. Models with an S value close to one mange to keep the slope of the casualties really low. But as stated above: this extra safety results in a too low energy intake leading to an early end of the simulation. This is also reflected by their PTS. Their seems to be a critical S value at 0.9 where the time where the energy runs out becomes lower than the survival time.

(36)

Figure 4.4: Number of lost agents for different values of S

4.4 Inﬂuence of daredevils

Now that the behaviour of the models with a heterogeneous population is known, the experiments can be repeated with daredevils among the popula-tion. It was clear that safer models had problems maintaining the energy level, making their lower death rate irrelevant. The behaviour of the swarms is anal-ysed for the previously used S values with different percentages of daredevils in the population to try and help mitigate this problem. At ﬁrst the used_m_dd is zero, meaning that the daredevils will not care about the risks. This will bet-ter make sure that the inﬂuence of the daredevils is less subtle to get an initial

(37)

impression. Then this value can be optimized looking at the PTS.

4.4.1 Exploration

In ﬁgure 4.5 the overall exploration behaviour is shown. It is noticeable that as the percentage of daredevils rises, the ﬂuctuations that were characteristic for high values of S get less distinguished. Both the minimums and maximums of the curves are less pronounced. The higher minimums are because the dare-devils will spent more time outside of the nest. Where regular agents tend to return home when the risk increase, daredevils persist in their search, resulting in more agents out of the nest on average. The lower maximums are caused by the higher success rate of exploring agents. As the daredevils explore longer, their chances of bringing home resources increases. This in turn results in a lower increase of the exploration probability.

4.4.2 Energy

The NE for different models is plotted in figure 4.6. Again it can be observed for higher values of S that the high fluctuations disappear for most models as the percentage of daredevils increases. As daredevils explore until they find something, there will be a more consistent energy intake. For lower values of S, there are no differences noticeable in the energy curves. This is because the base models already managed well in enough to maintain the energy. The most obvious difference can be seen for the case where S approaches one. Where these models previously could not compete at all with the other models, they can now keep up with the others when the amount of daredevils is high enough. When S becomes 0.9, the energy fluctuates heavily but it still manages to keep it consistently above zero. The heavy fluctuations are caused by the exploration probability that becomes more sensitive as the energy gets low. The fast increasing probability will all agents to explore when the risk of dying allows it. While the base model did not succeed in gathering enough energy during this low risk window of time, the added daredevils that are not bound by the the risk outside manage to increase the energy intake just enough. The probability will then lower again as the energy increases again. This pattern will than repeat causing the fluctuations. As the selfishness increases further, a small amount of daredevils will not be enough anymore to maintain the NE on their own.

(38)

Figure 4.5: Number of exploring agents for different values of S and % dare-devils

(39)

(40)

4.4.3 Death rate

There are more exploring agents now for each value of S, so the death rate of each model will be a bit higher resulting in a lower PTS. This can be observed in figure 4.7. This makes sense as the daredevils are exploring more recklessly than the homogeneous models. Longer exploring agents during high risk peri-ods mean that there is a higher probability to die. When the amount of dare-devils is 10%, things get interesting, the best performing model according to this metric is one that were not viable without daredevils. There is a significant increase when S becomes 0.9. As the model now manages to maintain the en-ergy level, its lower death rate becomes relevant and the PTS rises. In figure 4.8 it can be seen that this model almost manages to straighten out the curve of casualties. By having only a small portion of the group being a daredevil, the swarm can use their better foraging abilities while still limiting the amount of agents that risk themselves.

Figure 4.7: The projected time of survival for different percentages of daredev-ils

As the amount of daredevils added to a safe model sets an upper limit on the amount of explorers during higher risk moments, it controls the death rate. A lower amount of daredevils will result in better survival capabilities, but a too low amount means the swarm will starve to early to beneﬁt from the increased survival time. This can be seen in ﬁgure 4.9. The PTS is at its maximum at the critical amount of daredevils that prevents the model from starvation. Then by adding daredevils, the performance of the models lower.

(41)

(42)

Figure 4.9: The projected survival time in function of the amount of daredevils

4.4.4 Varying m

dd

The addition of daredevils to the otherwise homogeneous population resulted in an huge performance increase in terms of energy intake for otherwise non-viable models. Because the daredevils in those experiments are reckless the performance dropped when looking at the death rate. It would seem to make sense that the performance of the model could be improved by making the daredevils more risk averse. When looking at ﬁgure 4.10 where the PTS is plot-ted in function of_m_dd, it can be seen that this is not the case. As long as the daredevils S value stays below the critical point discovered previously, the en-ergy intake is consistent enough to ensure that the NE stays above zero.

(43)

(44)

Chapter 5

Conclusions

5.1 Conclusions

In the experiments it was clear that a robot swarm consisting of homogeneous agents are not sufficient to deal with the risky environment. While the most selfish models manage to greatly increase their survival time, they were not ca-pable of maintaining the nest energy. It was shown that adding daredevils to those swarms proved to be beneficial to their performance in the experiments. While adding more risky individuals may seem counter productive, they in fact manage save otherwise non-viable models by increasing their energy intake. As daredevils tend to be more effective into accomplishing the foraging task, the rest of the swarm is less pressured to go out foraging. This allows for reg-ular individuals in a swarm to be more selfish. The majority of the swarm only goes out exploring during a small time period when the probability of dying is almost negligible. These explorations will provide the bulk of the energy intake. Since this safe window might be to short to consistently bring back resources to the nest, a safe base model has a high chance of starving. A small amount of daredevils will provide a small but consistent income of resources to prevent the NE from going to zero before the more selfish agents get their chance to explore. The most effective approach for a robot swarm dealing with a risky environment is to limit the foraging to safe periods and having a small amount of daredevils to give the energy intake a little boost to prevent starvation. Hav-ing a smaller group of risky foragers puts a cap on the amount of casualties during higher risk periods. The amount of daredevils directly determines the trade-off between risk and reward. Models with more daredevils will perform

(45)

better in the foraging task but the swarm will not survive very long. The selﬁsh-ness of the daredevils does not seem to matter for the PTS as long as it stays under the critical value. This is because all viable models managed to maintain the energy level without getting close to zero. What the best performing model needed was just a small but consistent boost to prevent the minimums in the energy level from reaching zero.

5.2 Future work

As in these experiments only a uniform resource and risk distribution were considered, there is the obvious expansion where resources and risks are dis-tributed in space. Agents will have to choose between different areas with dif-ferent associated risks and rewards to be found there. The inﬂuence of dare-devils on the distribution of explorers between the different sites can be inter-esting.

Not only foraging can be used as a task in a risky environment but more real life applications can be tested. As there is a plethora of already studied be-haviours, each of those could be adapted in a scenario where there is a danger to loose agents.

(46)

Bibliography

[1] Levent Bayındır. A review of swarm robotics tasks.Neurocomputing, 2014. [2] S Kazadi. Swarm engineering._{Ph.D. thesis, California Institute of Technology,}

Pasadena, CA, USA., 2000.

[3] Mauro Birattari Marco Dorigo Manuele Brambilla, Eliseo Ferrante. Swarm robotics: a review from the swarm engineering perspective. Springer

Sci-ence+Business Media New York, 2013.

[4] E. S¸ ahin. Swarm robotics: from sources of inspiration to domains of appli-cation._{Springer, Berlin, 2005.}

[5] J.L. Deneubourg T.H. Labella, M. Dorigo. Division of labor in a group of robots inspired by ants’ foraging behavior. _{ACM Trans. Auton. Adapt. Syst. 1,} 2006.

[6] J.B. Billeter M.J. Krieger. he call of duty: self-organised task allocation in a population of up to twelve mobile robots.T, Robot. Auton. Syst. 30, 2000. [7] H. W¨orn H. Hamann. An analytical and spatial model of foraging in a swarm

of robots.Springer, Berlin, 2007.

[8] V. Braitenberg. Vehicles: Experiments in synthetic psychology. _Cambridge

MA: MIT Press, 1984.

[9] J. Meskas R. Fetecau. A nonlocal kinetic model for predator–prey interac-tions.Swarm Intell. 7, 2013.

[10] G.S. Sukhatme M.J. Matari´c R.T. Vaughan, K. Støy. Whistling in the dark: cooperative trail following in uncertain localization space. Proceedings of

(47)

[11] R.J. Wood R. Nagpal N.R. Hoff, A. Sagoff. Two foraging algorithms for robot swarms using only local communication. _{IEEE International Conference on}

Robotics and Biomimetics, 2010.

[12] V. Trianni J. A. R. Marshall A. Reina, T. Bose. Effects of spatiality on value-sensitive decisions made by robot swarms.

Daredevils in a robot swarm: optimizing the trade-off between risk and reward

between risk and reward