Optimizing Artificial Evolution Manager using Celery

(1)

Master thesis

Optimizing Artificial Evolution

Manager using Celery

August 5, 2020

Author

S. F. Ferwerda

Daily supervisor

: M. De Carlo & D. Zeeuwe

Assessor

: M. De Carlo

(2)

Abstract

This thesis focuses on the performance issues that the virtual evolution software Revolve experiences when running an offline artificial evolution using Neural Networks and Evolu-tionary Algorithms. We will increase the performance by implementing Celery library and use it to implement Distributed Tasks Queues in Revolve. The scope of this thesis is on the implementation and speedup results of Revolve using Celery, such that the most significant bottleneck of Revolve (robot evaluation stage) is removed or minimized. Before using Celery, we will analyze Revolve for speedup possibilities using Amdahl’s Law. The implementation of Celery results in a 7.4 times speedup when running a single experiment to 2.3 times speedup when running multiple experiments.

Furthermore, this work researches the difference between different fitness functions and their resulting morphologies. We will compare a displacement-based fitness function and a rotational-based fitness function such that there are different objectives in these experiments. Therefore, the research goal is to study the difference in morphology and behavior resulting from changing the objective. We will compare the usage of modular components of these morphologies. Both experiments result in snake-like robots, where robots consist of only one long limb. However, there are small differences in the morphology. The rotation-based robots are smaller and use less active hinges in their body structure.

(3)

Acknowledgements

I want to thank Guszti Eiben for his enthusiasm for the subject and that he introduced me to the Computational Intelligence team. Special thanks to Matteo De Carlo and Daan Zeeuwe for their everlasting support during my project as well as giving feedback on my writing. I would also like to thank my girlfriend Claudia, for always being open for discussion and bringing the motivation to always keep going.

(4)

Chapter 1 Introduction

Algorithms based on evolutionary processes, called Evolutionary Algorithms (EA), are used in a wide range of optimization problems and especially in problems where the desired so-lutions are unknown[6]. Evolutionary Algorithms can help in cases where other artificial intelligence techniques cant. An EA is a population-based algorithm that evolves towards better solutions using the concept of selection and survival of the fittest. Evolutionary Al-gorithms are used in Robotics, creating the field of Evolutionary Robotics. In Evolutionary Robotics, robots learn a specific motor skill like walking, rotating, or climbing using Evolu-tionary Algorithms and neural networks. The EA is useful because it can help robots work in unknown environments, while the neural network represents the brain of the robot. EA is ideal for robots learning how to perform necessary motor skills without prior knowledge of the terrain[21].

Revolve is a program that evolves modular robots using Evolutionary Algorithms and Neural Networks. The program is created by the Computational Intelligence (CI) group at the Vrije Universiteit (VU). Revolve manages the evolution of modular robots, where robots are simulated and evaluated based on a given objective, such as walking or rotating. This type of hybrid experimentation allows comparing simulated and real-world performance[14]. However, simulating generations of robots brings a high computational cost, which creates one of the issues that Revolve is facing a lack of performance. One experiment can easily cover a few days, making gathering empirical evidence a time-consuming task. Therefore, this research aims to increase the performance of Revolve, such that Revolve can be more efficient. Revolve is broken down and analyzed, such that we can find the performance bottleneck. Once the computationally costly stages are known, we can estimate the ideal speedup with Amdahl’s Law[1]. This law will estimate a bound on the achievable speedup for Revolve.

To increase the performance of Revolve, a library called Celery is introduced. This library offers the creation of parallelized workers communicating with carious Distributed Task Queues[24]. Queues are essentially lists of commands or tasks that require execution. Tasks can be queued (added) or dequeued (removed and executed) sequentially. Furthermore, Distributed Task Queues allow the execution of tasks outside of the primary running process, enabling parallel execution. The processes that connect to this task queue are called workers.

(6)

Workers claim a task from the queue, execute it, and send back a result if necessary. Celery makes it easy to implement a Distributed Task Queue in a complex system with workers that effectively handle these tasks. Due to the simplicity and effectiveness of Celery, Celery is used by a large number of companies like Instagram, Udemy, Mozilla, and Trivago. After implementing Celery, the parallelized version of Revolve will be analyzed and compared against the baseline setup. We will compare the observed speedup to the theoretical speedup derived from Amdahl’s Law.

There are a lot of different animals on earth, and the morphology of those animals varies even within the same family tree. The family tree of deer includes the moose and the white-tailed deer. Anyone can distinguish these two deer by the differences in body size, the shape of the antlers, and the hooves. The survival of both deer depends on the different qualities of their body. A white-tailed deer is small and is a prey for many predators like wolves, coyotes, bears, and humans. Its survival strategy is to be quick and agile and outmanoeuvre the predator. Thus a white-tailed deer that is quicker and more agile than others is more likely to survive within the herd. The moose is the biggest deer on earth and can weigh over 460 kilograms, which is eight times heavier than a white-tailed deer. The moose still has some natural predators like packs of wolves, bears, or humans, but are much less likely to get attacked than white-tailed deer. A reason for this is that moose can defend themselves pretty heavily with their body weight. Therefore, a moose does not rely on speed or agility. It best survives if it can displace itself to new feeding areas while defending itself.

If we convert this example to Evolutionary Robotics, the survival of these two animals depends on two different performance functions. This function represents the goals that need to be achieved, like rotating, undirected, or directed-locomotion[21]. The white-tail deer evolves using a performance function that measures the possibilities of movement and rotation, while a moose evolves using a displacement function. This example is a motivation to research in what way robot morphology varies when we use other performance functions in two robot groups. During my research, the research goal is to study the differences in evolved morphology and behavior using different performance functions. To study these differences, we will use the displacement performance function and introduce a rotational performance function, such that the main objective of a robot is to rotate without (much) movement. The Computational Intelligence group noticed that modular robots evolved using the displacement fitness function result in snake-like robots. Such a robot has only one long limb connected to its head, and the limb does not have any branches, making it look like an actual snake. The hypothesis is that changing the performance function allows other morphologies to evolve.

Chapter 2 will explain the vital building blocks of Evolutionary Algorithms, Evolutionary Robotics, Revolve, and Amdahl’s Law. In chapter 3, we will introduce the Celery library, which will increase the performance of Revolve. Furthermore, this chapter will also elaborate on the rotational performance function used in the experiments. The results of the achieved speedup and observed morphological differences are part of the chapter 4. We will immedi-ately discuss the results as they are essential for further derivements. The last chapter 5 will consist of a conclusion of the results, combined with the possibilities in future work.

(7)

Chapter 2 Background

Parallelizing a sequential program requires a lot of knowledge of the program’s structure, which can make parallelization a complex task. To grasp its complexity, Maurice Herlihy & Nir Shavit propose a clear hypothetical situation in their book[13, p.13]. The authors introduced a scenario in which a house with five rooms requires painting. If we hire five workers and the rooms are of equal size; then the work will be done efficiently in parallel, resulting in a five-fold speedup. However, when the rooms are of different sizes, the speedup depends on the size of the largest room. Furthermore, if everyone paints at a different speed, then the speedup will depend on the slowest member of the group. The combination of different computational efficiency and task size, thus complicates the issue of converting a sequential program into a parallel one.

The proposed scenario shows us that many components influence the efficiency of paral-lelization. However, a more common problem in the parallelization of a sequential program is that some programs may perform better in sequence than in parallel. Because some parts of the program require much communication when parallelized, creating high communication costs that can transcend the achievable speedup. This problem is first addressed by G. M. Amdahl in 1967[1], who stated that the possible speedup is limited by the percentage of time spent running sequential code. The equation Amdahl implied is commonly referred to as Amdahl’s Law and is used broadly by researchers and engineers that want to estimate speedup in their programs [27]. The parallelization of software code requires a careful under-standing of the program under consideration. It is important to research where bottlenecks occur and where sequential code can be converted into parallel to benefit from parallel ef-ficiency fully. In this case, the program under consideration is Revolve. Hence, we need to understand how Evolutionary Algorithms (EA) work, how we use them in Evolutionary Robotics, and how these algorithms are part of Revolve.

This chapter introduces the technical details of Amdahl’s Law in section 2.1, where we will discuss the impact of sequential code. section 2.2 will introduce Evolutionary Computing and especially the applicability for Robotics. The combination of Evolutionary Computing and Robotics is called Evolutionary Robotics. The chapter will be finalized with section 2.3, in which the program under consideration, Revolve1, is examined.

(8)

2.1 Amdahl’s Law

In a perfect world, we would parallelize everything of the software and observe the hopefully linear speedup. However, some computations or stages in software bring more computa-tion costs in parallel than in sequential. These stages can contain dependencies, such that distributing this stage will only require more communication between different processors. These communication costs make some parts of the software better off in sequential. For an experimental program, this can be the setup of parameters or the collection of data. These stages remain in sequence (even in the parallel program) and impact the total computation time because of it. Amdahl’s Law computes the theoretical speedup with a formula, which we introduce in this section.

Converting a sequential program to a parallel program requires parallelizable parts that take a lot of computation time. The analysis of the portion of parallelizable code dictates a theoretical bound on the speedup. Amdahl stated that the speedup of any complex task limits by the percentage of sequential (or parallel) run-time. Assuming that T is the total computation time, we can divide T into two parts: Ts and Tp, which is the time spent in

sequential- and parallel code. Using this notation and making n the number of processors, we can write the maximum speedup as:

Sn=

T Ts+ T_np

(2.1)

This form is one of many possible forms of Amdahl’s Law[27], and it is useful when the com-putation times of Ts and T are known. Equation 2.1 does not clearly show the relationship

between the speedup and the proportion of sequential code, because the equation depends on three variables: T, Ts and Tp. However, if we substitute s = Ts/Tw as the percentage of

sequential code, then 1 − s is the percentage of parallel code. The substitution allows a more commonly used form of Amdahl’s Law is given by:

Sn= T Ts+ T_np = T T · s +T ·(1−s)_n = 1 s +1−s_n (2.2)

However, often p is used as the portion of the parallelized code. This results in a different, but mathematically equivalent formula:

Sn =

1

1 − p + _np (2.3)

These formulas make it easier to visualize the speedup as a function of the percentage of parallel code. In case we take the limit n → ∞ of Equation 2.2, the final solution still depends on the percentage of sequential code s. Namely, S∞ = 1/s and this function is

(9)

Figure 2.1: Theoretical speedup of executed code based on the percentage of sequential code. The limit of Amdahl’s Law implies that even when our program consists of 60% parallel code, we will not be able to go over a speedup of 2.5 times. However, many developers have achieved higher speedups in large-scale systems and wrongfully suspected Amdahl’s Law[20], which shows that calculating the portion of sequential code is a precise task. For large-scale systems, it is better to use the extension of Amdahl’s Law created by Gustafson[11].

If the structure of the software does not allow parallelization, then parallelization is some-times not worth it. Especially if it costs processors that we could use for other programs on the same machine, therefore, knowing the bound on the speedup can be helpful to determine the effectiveness of the parallelization. The parallelization efficiency is calculated using the following formulas: Eprocessor = Ep = So n or Espeedup = Es = So Sn (2.4) The equation for Ep calculates efficiency based on the number of processors used, while Es

calculates the efficiency of the program based on the expected speedup. The equation of Es should be maximized if there are no limitations on the number of processors, where the

best n relates to the maximum So. With unused processors in the machine, the main goal

is achieving the highest speedup possible no matter the efficiency of Ep. However, if the

processors can be used for something else during our program, Ep should be maximized.

Maximizing Ep makes sure that we use the optimal number of processors for the experiment,

such that we can use other processors on the machine for other means.

Even though Amdahl’s Law is helpful for programmers to obtain a theoretical upper-bound for the speedup, in practice, it might not even come close to the observed result. Therefore, many researchers are working on extensions for Amdahl’s Law to make it more accurate[11, 25, 27]. These extensions have their limitations and can be biased to a certain type of computation. Therefore, a commonly used method to gain an accurate indication

(10)

of Amdahl’s Law is adding a penalty to the denominator, which can be communication-, synchronization-, or other program-dependent costs[27]. If parallelization demands inter-process communication2, then Equation 2.1 can be rewritten as:

Sn = T Ts+ Tp n + Tc (2.5)

Where Tc stands for the extra communication costs: time spent sending or receiving

mes-sages3.

2.2 Evolutionary Computing

Alan Turing first introduced the concept of Evolutionary Computing in 1950[26]. Even though it has fallen a bit out of favor compared to reinforcement learning and deep learn-ing, Evolutionary Computing is still a growing field of research. Evolutionary Algorithms (EAs) are algorithms inspired by biological evolution. Hence they are built like evolutionary processes, which means that EAs use recombination, mutation, and reproduction to solve complex problems that have large search spaces. Especially search spaces that are large enough to make it impossible to use a brute force algorithm to check every possible solu-tion. EA evolves towards quality solutions by using a selection process while at maintains the diversity in the population at the same time. Nowadays, EAs are used in all kind of different research fields, one of them being the evolution of robots in unknown environ-ments[21]. Within Evolutionary Robotics, Evolutionary Algorithms improves the controllers and morphologies of autonomous robots.

2.2.1 Evolutionary Algorithms

An Evolutionary Algorithm is a heuristic optimization algorithm that uses a generic pop-ulation of candidate solutions to improve solution quality. These algorithms are inspired by biological processes like survival of the fittest. Therefore, the components of these al-gorithms represent all kind of biological processes[18][19][22]. There are many varieties of different EAs, but most of them have the same underlying idea[6].

First, every EA needs a population of candidate solutions. In this population, every candidate solution is an individual with a genetic code. Second, the quality of an individual solution, or, the fitness of the individual depends on a given fitness function. This fitness function evaluates the performance of an individual according to the task and environment. Third, every EA needs selection; only the fittest individuals may procreate and survive. Finally, all EA solutions evolve over several generations, creating new offspring based on recombination and mutation. Recombination and mutation create diversity through the

2_{Most of the time, data needs to be stored and read by different processors during the execution of a}

program.

(11)

evolutionary process by changing the genetic code of the offspring. The crossover will create the offspring by combining the genetic code of its parents, while mutation is a random occurrence on the individual changing the genetic material slightly.

This underlying idea of every EA can be framed by a general procedure, as shown in Fig-ure 2.2. This process is based on the scheme given in the book Introduction to Evolutionary Computing[6, p.27]. In this diagram, termination should happen when the desired solution is found, or a generational threshold is reached. Another threshold which is independent of the stages of the algorithm is the computation time, but this is rarely used.

Figure 2.2: The workflow of an Evolutionary Algorithm is started by the initialization of a population, after the population is evaluated, parents are selected. The parents create offspring using the instructed recombination and mutation techniques. The offspring are evaluated and a new population is selected based on the quality of the individuals.

2.2.2 Evolutionary Robotics

In Evolutionary Robotics, autonomous robots use Evolutionary Algorithms to learn a specific skill like walking, rotating or climbing. EA help robots learn in unknown environments, and they can deal with the size of the search space more appropriately. The general form of an evolutionary robotics algorithm requires initiation followed by a repeated evaluation, selection, and variation. The program terminates if we have the desired fitness (performance) or ran a given number of generations.

In most Evolutionary Algorithms, the evaluation of an individual is simple: The fitness defines the quality of the individual’s phenotype (solution representation). The difference between EA and ER is that robots need to be simulated (or tested in a real environment) before evaluation. The robot will be evaluated based on the behavioral metrics from its test run. Some of these metrics can be: total distance travelled, distance from start- to

(12)

end-point4_{, simulated time in seconds, and the rotations that the robot made. The performance}

measure dictates which behavioral metrics we want to use for evaluating robot performance. For example, if a performance function should encourage directed locomotion, the function requires body angles and the distance from the robot to a goal[21].

The evaluation of the robots in Evolutionary Robotics is either physically or virtually. We can make two groups, virtual evolution and embodied evolution. During virtual evolution, the EA is centralized, and the evaluation is inside a simulator. Embodied evolution is the evolution of robots without a centralized EA, so the interaction among the robots drives evolution.

In virtual evolution, the usage of simulators may cause problems when we extend the results to real-time and -space. This problem is known as the reality gap[17]. Namely, simulated robots might evolve to exploit the simulator leading to surpassing performance in real-life. When robots are build and tested in a laboratory setting, they fail to replicate their simulated behavior. The reality gap effects can be mitigated by adding noise to the sensors, and actuators[15] and do minimal simulations[16]. In this way, we can minimize overfitting to the physics engine of the simulator, such that simulators are still useful for training evolving robots.

Evolutionary Robotics is still facing a lot of issues[23]. While embodied evolution does not suffer from a reality gap, it is very time-consuming and might require human inter-vention. However, whenever the issues are resolved or worked around, evolutionary robotics promises to be a powerful tool to create self-sustainable and self-learning robots for unknown environments[3, 23].

An example of embodied evolution for controllers and morphologies has been given in the Triangle of Life[7] concept. The Triangle of Life describes the repeating cycle of robots from birth to mature life and giving birth. This concept remains closely related to a normal EA cycle with the addition of an intermediary learning phase. The aim of the project is to spawn, train, and evolve robots without human interaction[5]. A birth clinic controls birth, where components can be 3D printed or salvaged from older robots. The birth clinic does not only give birth to new robots, but it can also act as a kill switch to prevent uncontrolled replication[9]. A robot goes into nurturing after it is born. There is a playground for the robot to learn how to walk with its new body. This step is necessary because the given brain may not fit the given morphology of the robot.

2.2.3 On- and offline Evolution

In Evolutionary Robotics, an offline architecture is an EA running on a machine. The machine hosts a population of individuals, representing the controllers and has access to the selection and mutation operators. For evaluation, the machine will send the robots to simulators to evaluate their controllers. This architecture allows execution of the whole process of evolutionary robotics inside a machine, which makes it a virtual evolution.

In online architectures, a machine can still be in control of the evolutionary operators like

(13)

selection and mutation, but the robots operate and evaluate at the same time. This means that the robots learn while running. The machine oversees them and sends new controllers to the robots to evaluate and operate on[6, p.253-256]. The main difference between offline and online evolution is that in offline evolution, the EA can wait for the evaluation of every robot. In online evolution, the EA operates in real-time. Offline evolution also means that the assessment of the population is in isolation. In isolation means that we are simulating every individual without any other robots present[14]. What method to use, offline or online evolution depends on the research question.

2.3 Revolve

Revolve manages the evolution of modular robots using EA and Neural Networks, allowing the program to compare simulated and real-world performance. The use of modular robots makes sure that all robots are constructible in laboratory settings. Furthermore, Revolve uses Gazebo simulators5 to simulate the modular robots. The simulators are used to evaluate every individual of the population. Therefore, Revolve is a virtual evolution manager, which allows physical simulation as well.

Revolve is created by the Computation Intelligence group of the VU Amsterdam. Their work is open-source and can be downloaded from GitHub6_{. The motivation for Revolve}

originates from research on the Evolution of Things[8]. The goal of the intelligence group is embodied evolution of the robots controller and body[5].

Unfortunately, the current technology for automated robot fabrication does not yet allow to give a solution for robot reproduction without human help. However, project ARE7 _is

working on a fabricator to resolve this issue[12]. Until such a robot fabricator is available, Revolve can be of great value for researching artificial evolution.

In subsection 2.2.3, we have discussed the difference between online and offline evolution. Even though a lot of research of the Computational Intelligence group uses offline evolution, Revolve offers the option for online evolution. Revolve is one of the programs that allows both offline and online evolution [4, 10].

2.3.1 An Overview of Revolve

The user creates a manager, which starts the experiment. The manager contains parameters like population size, offspring size, number of generation, configurations, selection operator, mutation operator and so on. To handle the manager and all simulation processes, Revolve uses a Python synchronization library called Asyncio8_{. Revolve works with asynchronous}

tasks, meaning that the manager will start a given number of Gazebo instances to simulate the robots and they are inserted/removed one after another. However, using more Gazebo

5_{http://Gazebosim.org/} 6 https://github.com/ci-group/Revolve 7 https://www.york.ac.uk/robot-lab/are/framework/#tab-2 8 https://docs.python.org/3/library/Asyncio.html

(14)

instances will not mean that you have more parallel processes. Every robot needs to be implemented, removed and evaluated by the same manager. More simulators can make the simulation of the robots run in parallel, but the manager still needs to insert, remove and assess all of these robots in sequence. Therefore, simulators and robots might have to wait for other processes. This waiting time will only increase when the number of Gazebo instances increases.

When running an offline evolution, Revolve is following a common EA scheme to evolve modular robots and simulates their phenotype in a simulator. The whole process of Revolve is visualized in the following diagram.

Figure 2.3: Structure of Revolve. The colors represent the computation time needed to pass the step. Green means below 10 seconds, orange between 10 and 60 seconds, and red are above 60 seconds.

An experiment executed by Revolve will spend most of its computation time inside step 2: evaluation, and in step 3: exportation. The evaluation step requires every robot to be evaluated apart from each other. Which is making this step highly convertible to parallel code. Furthermore, the exportation is also taking a lot of time, because the step is generic. So knowing this, we can use Figure 2.3 to conclude that the bottleneck of Revolve is located inside the evaluation step, but when this step is resolved; the bottleneck might move to step 3.

(15)

Chapter 3 Method

In the last chapter, we have located the bottleneck of Revolve: the evaluation of the robots. This step is computationally expensive because we need to evaluate every robot in a simulator during this stage. However, with sufficient CPU and memory, this should not be a problem. Revolve evaluates robots independently in offline evolution. So an ideal system1 _should

evaluate all robots in their own simulator and return the behavioral metrics to the Evo-lutionary Algorithm (EA) algorithm. Thus, the evaluation process of a whole generation using offline evolution depends on the slowest evaluation2_{. However, when an EA needs n}

individuals to be evaluated, using n simulators require way too much CPU. A more viable option is the use of queues, which is a way to distribute robots over a few simulators. The library to implement these queues will be introduced in this chapter.

This chapter will elaborate on the use of Distributed Task Queues and especially the Python library Celery[24]. Celery is used to implement Distributed Task Queues in Revolve, to increase the performance of Revolve. Furthermore, this chapter aims to show how to implement Celery and how it works in the context of Revolve. Finally, we will use the last section of this chapter to elaborate on the experimentation between different fitness functions. We will introduce a performance measure that rewards robots based on their rotational movements.

3.1 Celery

Celery is a Python-based library that can be used for the implementation of Distributed Task Queues3_{. Distributed Task Queues are built upon the sequential processing of tasks.}

Tasks can be as simple as a single command, like opening a text file. However, it can also be a combination of multiple computations and actions (like the evaluation of a robot). In a task queue, tasks are ordered such that new tasks are put in the tail of the queue while the longest queued tasks are in the head. In this case, the order of the queue is First in, First

1_{Assuming that we have unlimited CPU’s and RAM Memory.} 2_{Communication costs will also be included in the evaluation period.} 3

(16)

out (FIFO) making the longest queued task the next one to be executed.

A Distributed Task Queue (DTQ) is a task queue that operates asynchronously, outside of the current operating process. Where a regular task queue can only have one consumer processing the tasks, a Distributed Task Queue allows multiple consumers to process tasks at the same time. If there is a queued task, then other workers4 _{can acquire these tasks and}

process them. Therefore, complex systems with a lot of repetitive tasks can benefit from a DTQ. The only pay-off when using these queues is between inter-process communication and performance. When tasks need information from other tasks or processors, the effectiveness of a Distributed Task Queue decreases. Fortunately, in offline evolution, every robot is evaluated independently from other robots or main processes.

Celery also manages the workers that consume tasks from the DTQ. A command in the terminal initializes the workers, and after that, they will run as a background process. From that moment, workers can execute tasks from task queues that connect to them. Even though the worker system given by Celery is Python-only, it is possible to implement similar workers in other languages. Revolve is built with Python and C++, which means that we need workers for both languages. The implementation for these two languages is included in the appendix.

3.1.1 Requirements

To be able to use Celery, the user needs a broker to act as a communication medium for Celery workers5_{. A broker is a software that handles the transmission and coordination of}

messages. Celery is recommending brokers like RabbitMQ or Redis, but other brokers are supported as well. The reason for choosing RabbitMQ is because of their recommendation and the possibility to combine Celery with C++. The C++ library, SimpleAmqpClient6_,

can be applied in such a way that Revolve’s simulators can connect to a Celery queue to obtain robots.

Apart from configuring the communication channels, Celery can also set up a result back-end (optional). Result back-back-ends enable storing results or sback-ending results from tasks back to the main process. In our case, Revolve needs the behavioral metrics and fitness function as a result of the robot evaluation. There are two choices of back-ends to use: databases or result messages. A large difference between the two is that a result message is obtainable once, while a database stores the result such that it allows multiple readings. However, Revolve does not need the historical accounts of simulations to be stored separately. We want to process the behavioral metrics and fitness value inside the EA. The EA can then evaluate the whole population and then store only the necessary data (like the fitness or blueprint of body and brain). For result messaging, there is the Remote Procedure Call (RPC) back-end where it queues the results into a client-specific queue. The queue terminates when all results are collected. The RabbitMQ broker also offers an RPC back-end, which makes RabbitMQ more fitted for Revolve. Therefore, we will use RabbitMQ as a broker and result back-end.

4_{Threads, cores, workers, or consumers. There are many names for this.}

5_{https://docs.celeryproject.org/en/stable/getting-started/introduction.html} 6

(17)

Celery has numerous configurations, some of which are required, to optimize the perfor-mance of using Celery. The settings that we use for Revolve may not be applicable for other programs, but they can give insight into what configuration to choose and why. The Celery configurations used for Revolve are located in chapter 5.1.2.

3.1.2 Implementation

In subsection 2.3.1, we have discussed the five different stages of Revolve. 1. Revolve Initiation

2. Robot Evaluation 3. Data Exportation

4. Generational management 5. Simulator Termination

We have located the bottlenecks of Revolve in the robot evaluation and data exportation stage. Therefore, Celery workers should alleviate the robot evaluation stage. If we want a Celery worker to be able to evaluate a robot, it requires three components: an analyzer, a simulator and the performance function. Once a worker has these three components, it can process robots independently. The manager starts Celery workers, and after initialization, the manager sends the information about the analyzer, simulator, and performance function to the workers.

The analyzer is a simulator instance that scans for morphological malfunctions. Each worker has access to only one analyzer, while the performance simulators initiate such that they become part of a simulator pool. This architecture allows the disconnection of a simu-lator without premature termination of the simulation. Whenever a simusimu-lator disconnects, the worker will notice this and a restart protocol will initiate.

Firstly, the workers obtain their tasks (robot assignments) from the EA using a Celery DTQ. This queue is unique for each experiment, making parallel experiments possible. Sec-ondly, a worker uses its worker-bound analyzer to analyze the robot for malfunctions before putting the robot in yet another different experimentation queue. This queue connects the workers to the simulators; every simulator connects with this one queue. Simulators are continually checking for new robots to appear in this queue. Thirdly, a simulator receives the robot and simulates it. Once the simulation is complete, the result of the simulation is sent back to the worker using the RPC back-end. Finally, the worker that queued the robot acquires the result, which is then evaluated into a fitness function and send back to the EA. The implementation of Celery in the robot evaluation is given in chapter 5.1.2.

The data exportation stage requires the attention of one processor and does not allow distribution over multiple workers. So another way of speeding this up is doing the data exportation while the manager has nothing to compute. The manager is waiting whenever the workers are evaluating the robots. It works as follows: generation n needs to be simulated,

(18)

so the EA will queue every robot, and once it has done doing so, it starts the exportation of generation n−1. When the data exportation completes, the EA will retrieve the results from generation n from the workers. Relocating the exportation code in such a way will increase the efficiency of the stages. Using this architecture will ensure that the data exportation is done in a parallel manner, increasing the possible speedup.

Figure 3.1: The structure of Revolve containing Celery. The Celery workers are in control of the evaluation of robots; from analyzing and simulating the robots to evaluating the fitness. The two queues are the communication cylinders between the main program and the workers, and the workers and the Gazebo instances.

3.1.3 Architecture

The layout of the structure of Revolve (without Celery) is given in section 2.3. However, the structure is slightly different when using Celery. First of all, the evaluation requires putting the robots in a Distributed Task Queue. Furthermore, after the EA successfully queued all the robots, it will create a snapshot of the previous generation as we discussed in the last subsection. Workers analyze the robots and send them to the simulators. Finally, when the workers receive the behavioral metrics, the performance measure is calculated, and the results sent back to the EA. The whole structure of Revolve combined with Celery is given in the Figure 3.1.

(19)

3.2 Experimentation

The differences in morphology between the moose and the white-tailed deer in the intro-duction set the motivation to research the evolved morphology using different performance functions. This section will elaborate on the experimental parameters used to compare these two performance functions, such that we can research their evolved morphology and com-pare the outcome. The performance functions that we will use for these experiments will be discussed in subsection 3.2.3.

3.2.1 Experimental Setup

Our evolutionary robotics experiments use a bunch of different parameters like population size, offspring size and number of generations. Furthermore, other important parameters are the selection function, crossover and mutation rate. All of the experimental parameters are listed in Table 3.1.

Parameter Value Description

Generations 200 Termination condition Population Size 100 Individuals per generation

Offspring Size 50 Number of offspring per generation Crossover rate 0.7 Probability of crossover Mutation rate 0.8 Probability of mutation individuals Evaluation time 30 Time per performance evaluation (in seconds) Table 3.1: Experimental Parameters for the Evolution of Robot Morphology

The only distinct variable in our experiments is the performance function, so all the other parameters need to be equivalent. Therefore, both experiments will use the parameters given in Table 3.1. Furthermore, we will repeat every experiment 30 times such that our results are statistically significant while keeping the total computation time maintainable. An experiment with a starting population of 100 individuals and 50 offspring per generation will require a total of 100 + 50 · 199 = 10050 fitness evaluations. All together, when running 30 experiments of 10050 fitness evaluations this will correspond to a total of 30 · 10050 · 30(evaluation time) ≈ 2500 hours of simulated time.

3.2.2 Computational Intelligence Machine Cluster

The Computational Intelligence (CI) group on the VU uses a cluster of machines called the Rippers to run their experiments. This cluster contains five identical machines each having a 32-core processor and 64GB RAM. All experiments of this thesis (speedup and research related) are done on one of the five machines. Because the speedup is very dependent on machine specifications, the specification of the machine is given in Table 3.2. Due to the similarity of the machines, we expect similar behavior for future implementation.

(20)

Component Details

CPU AMD Ryzen Threadripper 2990WX 32-core Processor

RAM 64GB ram 2400 MHz

Video NVIDIA Corporation TU106 [GeForce RTX 2070] Disk 1 TB ssd and storage 4TB RAID Mirror

Power 1000W power supply

Table 3.2: The specifications of the machine used for speedup and fitness orientated experi-mentation. Speedup was measured without any other programs running.

3.2.3 Rotational Fitness

To more clearly see the influence of fitness functions, we investigate the difference in robot morphology as a result of using different performance functions, comparing the evolved morphologies as a result of displacement fitness and rotational fitness. If B = (x0, y0) is the

starting point and E = (xn, yn) is the end-point, then the displacement function is given by

the Euclidean distance between the two locations.

D(B, E) =p(xn− x0)2+ (yn− y0)2 (3.1)

The goal of this performance function is straight-forward: move as far away as possible. Thus an evolved solution would shape its morphology to achieve the furthest distance within allotted simulation time.

The rotational fitness function rewards turning behavior, specifically turning left, given by Equation 3.2. Assuming that A (angle of rotation) is the total rotation to the left in radians, then A > 0 means that the robot turned left. Additionally, to contrast the displacement experiments, the rotational fitness will penalize displacement. Such that robots that rotate in place are favored. Therefore, the rotational fitness function used will be:

Rfitness= A − wD · D (3.2)

In this formula, A stands for the rotation to the left in radians, D stands for the total distance travelled, and wD is the weight of penalization. This function favors rotating robots

that move as little as possible. Observations show a mean displacement D between 0 and 1.5 simulation distance in 30 simulation seconds. We want to penalize a robot that moves large distances because the main goal should be to rotate. However, the penalty of displacement should not be too severe when the rotation is large because the modular robots may have to move to be able to rotate. Therefore, to balance it out the weight of wD = 3 is chosen.

(21)

Chapter 4 Results

In this chapter, we will compute the theoretical speedup using Amdahl’s Law, compare this speedup with the observed speedup, and close the chapter with the experimental results. First, we will go over the distribution of computation time both with and without Celery. Second, we will compare the results and calculate the efficiency of Celery. At last, we will study the obtained results from the rotational fitness experiment. The discussion of the results is integrated with this chapter because of the dependencies between the different results. For example, the results of the measured computation time of each stage allow the calculation of a theoretical speedup.

4.1 Expected versus Observed Speedup

To fully understand the benefits of Celery, we need to know the computation time of every stage in Revolve, given by Figure 2.3. Using the computation time of these stages, combined with Amdahl’s Law, will create an estimation on the bound of the speedup. After that, we’ll analyze the Celery adaptation of Revolve. The analysis enables a comparison between the results of theoretical and observed speedup to calculate the optimization efficiency.

4.1.1 Time distribution of Revolve

Revolve can be broken down in the five stages: Revolve initiation, robot evaluation, data exportation, generational management, and simulator termination. Running 30 experiments with 50 generations using 100 individuals determines the computation time. Each stage of the experiment was explicitly logged, creating an overview of time spent per stage. The mean computation time µ is given in table 4.1. We have calculated the µ from the experimental data. A summary of the results per stage is visualized in histograms given in chapter 5.1.2.

(22)

Stage Iterations µ (s) Total Time (s) Revolve Initiation 1 38 38 Robot Evaluation 50 566 28300 Data Exportation 50 24 1200 Generational Management 50 17 850 Simulator Termination 1 9 9

Total computation time 30397

Table 4.1: Computational time differences in Revolve stages in seconds. Furthermore, the µ is obtained using the experimental data of repeating the 50 generational experiment 30 times. The termination time is calculated by subtracting all the other stages from the total computation time.

Table 4.1 corresponds to the time distribution given by the Figure 4.1. The time spent in the robot evaluation stage is significantly higher than in all other stages.

Figure 4.1: Revolve is broken down in five stages. The bottleneck is located at the evaluation stage. This distribution is created using the experimental data of 30 runs.

The percentages in this Figure 4.1 can be used in Amdahl’s Law to calculate a theoretical speedup. As we have discussed in the method, we converted the evaluation and exportation steps to parallel code. If we parallelized the evaluation of 93.1% and the exportation stage of 3.95%, then we can estimate the speedup with p = 97.05%. Amdahl’s Law can estimate

(23)

the speedup with the use of infinite workers as the limit. lim n→∞Sn = limn→∞ 1 1 − p + _np = limn→∞ 1 0.0295 + 0.9705_n = 33.9 (4.1) So the maximum possible speedup implied by Amdahl’s Law is 33.9 times, this is with an infinite number of processors n. Unfortunately, it is not realistic to assume that the 97.05% of parallel code computes in zero seconds, which the limit implies. Even if all the robot simulations are simultaneous, the simulations themselves will not be instant. Therefore, we should assume a lower bound for the term 93.1_n . A considerable candidate for the lower bound can be time taken by the slowest robot per generation, because the parallel computation time depends on this robot. However, in early generations the robots are small and the size of the robot has a large influence on the simulation time of a robot. So the bound should be based on the last generations of robots, since they are more representable in the long run.

Figure 4.2: The size of a modular robots versus simulation time.

In Figure 4.2, 10650 robots are plotted with their size on the x-axis and the simulation time on the y-axis. The correlation between these variables is equal to 0.74. The complexity and size of the body determines the simulation time of a robot.

The simulation time of these 10650 robots does not represent a normal distribution, since their body size influences them. Therefore, we will choose the upper bound not to be the slowest robot (which is an outlier), but the simulation time corresponding to the highest 2.5%. This percentage represents robots that take longer than 26.58 seconds to evaluate. Note that the EA exports metrics of the past generation before it requests the results of the robots. We know, using Table 4.1, that the exportation will take an average time of 24 seconds, which means that taking a bound of 26.58 seconds should suffice.

(24)

With these limits, we can write an extension to Equation 2.1 which fits Revolve better. This extension is given by Equation 4.2. The sequential time Ts is given by the total

com-putation time of the Initiation of the experiment, Initiation of generations and Shutdown stage. Tp is the total computation time spent in parallel, which is Evaluation and

Expor-tation. Furthermore, we choose a lower limit of 26.58 seconds per generation, because it is likely that a robot will take a long time to evaluate. This will create a lower bound of Tp

over our whole experiment (of 50 generations) equal to 50 · 26.58 = 1329 seconds. lim n→∞Sn = limn→∞ T Ts+ Tp n = lim n→∞ 30397 38 + 850 + 9 + max(1329,28300_n ) = 13.65 (4.2) Where the nominator is the average computation time of Revolve using 50 generations, and the denominator consists of the sequential-1 and parallel stage2. The expected speedup of 13.65x is much lower than the non-extended version of Amdahl’s Law implied by Equa-tion 4.1. The lower speedup confirms the occurrence of inaccuracy using Amdahl’s Law caused by the oversimplification of the equation. With Equation 4.2, we can also calculate the bound for a number of workers. The expected speedup using a number of workers is given in Table 4.2.

Workers Expected Speedup (times)

4 3.8

8 6.9

16 11.4

32 13.6

Table 4.2: The expected speedup using multiple numbers of workers, using an extended version of Amdahl’s Law to fit Revolve. The expected speedup calculated by equation 4.2.

Now that we have estimated the bound on the speedup using a numerous amount of workers, the computation time of Revolve with Celery can be discussed and compared to these results.

4.1.2 Time distribution of Revolve with Celery

The addition of Celery changed the structure of the data exportation stage, such that the manager handles the exportation while Celery workers are completing the robot evaluations. The computation time data is given in Table 4.3, where the exportation and evaluation stage are combined to form the generation stage. More information about the time spent in the generation stage can be found in Figure 5.1.2 along with the distributions of computation time for the different number of workers.

1_{Revolve Initiation, Generational Management and Termination stage} 2_{Robot Evaluation and Data Exportation stage}

(25)

Stage \ workers 4 8 16 32 Revolve Initiation 27 30 39 92 Generation 274 117 69 73 Generational Management 15 13 12 13 Simulator Termination 10 10 10 10 Total of 50 generations 14497 6540 4099 4402

Table 4.3: The computation time per stage per number of workers with a final row containing the total computation time of the experiment of 50 generations.

The findings in Table 4.3 reveal that increasing the number of workers does effectively decrease the computation time. However, when we are using 32 workers, the time spent in the revolve initiation and generation stage increases again. My presumption is that using this amount of workers, saturated either the 64 threads available or the RabbitMQ messaging system.

Efficiency

In a perfect world, using X amount of workers will create an X times speedup. Unfortunately, this is rarely the case due to the addition of communication, work-balancing, and other parallelization costs between workers. However, efficiency can still be used to calculate the overall performance of the parallelization of Revolve. The calculation of the efficiency will be done using the formulae given in Equation 4.3.

Eprocessor = Ep = So n or Espeedup = Es = So Sn (4.3) With the use of efficiency, an overview of Celery’s performance can be drawn based on the number of workers. These results are shown in Table 4.4.

Number of Workers So Sn Ep Es

4 2.1 3.8 0.53 0.55 8 4.6 6.9 0.58 0.67 16 7.4 11.4 0.46 0.65 32 6.9 13.6 0.22 0.51

Table 4.4: The resulted observed and theoretical speedup are compared using efficiency. Ep

is the efficiency based on the number of workers used, while Es is the efficiency based on the

theoretical speedup Sn.

One versus multiple experiments

Table 4.4 shows us that running one experiment is best done with 16 workers because it will achieve the highest speedup. However, the Computational Intelligence group on the VU is

(26)

running multiple experiments at the same time, where every experiment uses four simulators. Assuming a simulator is using between 1.6 - 2 threads3_{, a computer with 64 threads can run}

8-10 experiments at a time. The efficiency of the Celery program is important due to the possibility of parallel experiments on the same machine.

The speedup with 16 workers is equal to 7.4x and gave a total computation time of 4099 seconds when running 50 generations. Unfortunately, we can only run two experiments of 16 workers at the same time, because every worker is demanding between 1.6 and 2 threads. Assume running 24 experiments with 16 workers on a 64 thread machine. This requires running two experiments4 in parallel 12 times in a row. In this case, 24 experiments are done in 24₂ · 4099(s) = 12 · 4099(s) = 49188 seconds. Revolve without Celery is using four simulators and completes an experiment in 30397 seconds. When running eight experiments in parallel, to get a total of 24 experiments, we should repeat eight parallel experiments three times. This will result in a total computation time of 3 · 30397(s) = 91191 seconds. Resulting in a 1.85x speedup when Revolve uses 16 Celery workers.

To obtain the optimal number of workers, we need to maximize the efficiency Ep because

we want to use the minimum amount of processors achieving maximum speedup. In Ta-ble 4.4, the maximum efficiency is achieved using only eight workers. Using eight workers per experiment allows four experiments at any moment in time. Therefore, the total com-putation time of 24 experiments is 24₄ · 6540(s) = 6 · 6540(s) = 39240 seconds. Resulting in an overall speedup of 2.3x if 24 experiments need to run.

3_{Observed during multiple experiments on the ripper machine of which the specifications are given in the}

method section.

(27)

4.2 Experimental results

In Evolutionary Robotics, the fitness function will favor specific morphology and behavior to evolve. This gives reason to believe that the evolved morphology differs when using a different objective. In our experiments, we used the displacement- and rotation-based fitness function. The morphology of the three robots best-performing robots evolved by the displacement fitness are given in Figure 4.3.

The phenotype of the robots is build using RoboGen[2]. Revolve uses the modules of RoboGen in their simulators because it allows the replication of the robots in real hardware. Components have a color based on the type and whether it’s attached horizontally or ver-tically. In Figure 4.3 there are four different colors: Core Component (yellow), Fixed Brick (blue) and Active Hinges (red/pink). Where red is the vertical and pink is the horizontal Active Hinge.

Figure 4.3: Evolved robot morphologies using the displacement fitness function. The bodies all have a snake-like shape with a lot of active hinges and containing only one fixed brick.

The behavior of these robots is recorded and is open to anyone that has Dropbox5. The Computational Intelligence group already classified the behavior of the displacement robots given in Figure 4.3 with the help of biologists. These particular robots move similar to the sidewinding snake, rolling snake, and ondulating worm. The CI group made a video of these comparisons, and is available on YouTube6_{. The best performing robots that evolved}

using the rotation objective are given in Figure 4.4. These robots still have snake-like shapes, but they have less active hinges and are smaller in total size. The robots evolved for displacement are build with a mean of 26 modules, while robots evolved for rotation have only 13 components on average. This is making the size of the rotation-based robots twice as small. Furthermore, the average module usage is the same except for the active hinges. Rotation-based robots have eight active hinges, which is 61.5% of the total body size. This is significantly less than the 21 active hinges of the displacement-based robot representing 80.7% of the robot.

5

https://www.dropbox.com/sh/qkczcc0i8jaswk1/AAC9tPWtN0J6sB-tQ7tz5KpTa?dl=0

6

(28)

Figure 4.4: Evolved robot morphologies using the rotation fitness function. The bodies much smaller in size, but still have snake-like shapes.

If a rotation-based robot contains a lot of fixed bricks, like the first robot in Figure 4.4, then the collection of fixed bricks act as a rotation point. The robot’s head (yellow) will rotate around it. This observed behavior is recorded while simulating the first robot. However, the second and third robot have a slightly different rotation behavior and morphology. The reduced size of the tail limb does not allow the robot to move the heavy head. Resulting in a limb that turns the robot while the head stays stationary. This gives a reason to believe that, in the case of small limbs, the robot rotates around its center of mass: the head. In the case of larger robots, the head is not the center of mass anymore, allowing the robots to move. However, the large robots seem to rotate around heavy parts of the bodies, i.e., a collection of fixed bricks. In Figure 4.5, the average component usage is plotted for the different fitness functions, showing that the component usage is very similar. However, the use of active hinges is decreased by 61.9%.

(29)

Chapter 5 Conclusion

This thesis aims to improve the performance of Revolve and research the morphological effects of different fitness functions. The initial stages involved breaking down Revolve to analyze its different computational phases. Revolve executes a typical Evolutionary Algo-rithm iterations of evaluation, selection, and mutation, with the addition that robot evalua-tion is completely inside simulators. Addievalua-tional analysis lead to the bottlenecks specifically caused by the Robot Evaluations and Data Exportation stage. With the bottlenecks and computation time known, we have estimated the theoretical speedup using Amdahl’s Law. We used this Law as a guideline and as a bound on the possible speedup. Increasing the performance depended on the evaluation stage needing a better way of distributing robots over simulators. Therefore, we used the Python-based library Celery. Celery implements a Distributed Task Queue, enabling distributing robots over several workers and simulators efficiently. Furthermore, robots are exported during the evaluation of the next generation of robots, improving parallel performance.

The results of this new architecture created a worker pool that has access to a pool of simulators. We calculated the maximum theoretical speedup based on the number of workers and compared it to the observed speedup. The best speedup on our machines, given by Table 3.2, is achieved using 16 workers, where the theoretical speedup is 11.4x, and the observed speedup is 7.4 times. However, increasing the number of workers leads to fewer experiments we can run in parallel. We used a formula to calculate the efficiency of using multiple workers, which showed us that a single experiment on a machine works best with 16 workers. If Revolve needs to execute multiple experiments, then the best option is eight workers per experiment - resulting in a 2.3x speedup. Thus, the number of workers depends on the number of experiments we want to compute in parallel.

Another goal of this research was to observe the differences in optimal morphology when using different fitness functions. We have used the displacement fitness to evolve robots where the morphology should fulfill the desire to move, while the rotational fitness function requires robot bodies to rotate left easily. The results showed a small difference in morphology, but all robots still evolve towards of snake-like bodies. Even though both the groups of robots consist solely of one limb, the usage of hinges, size, and behavior is different. The displacement evolved robots are large and require many active hinges to achieve maximum

(30)

displacement. Robots evolved for rotation thrive best with small bodies containing a rotation point and a limb to rotate. However, the rotation of these robots feels clumsy, simplistic, and they are still snake-like robots.

5.1 Future Work

In this section, we will discuss the possibilities to improve the performance of Revolve even more. We will distinguish between technical and research opportunities for improvements because possibilities concerning Revolve are technical, while robot morphologies speculations are research-centric. In the research section, we will discuss viable solutions to the simplistic morphologies observed in the different fitness function experiments.

5.1.1 Technical

Broker improvements

The Celery version of Revolve requires a messaging system to ensure that robots and their results are sent to the right workers. The current messaging system, RabbitMQ, sometimes saturates due to the number of messages it needs to process. While we monitor Celery using the monitoring tool Flower1, we can see that the Evolutionary Algorithm does not instantly collect completed tasks. Increasing the efficiency of this messaging system can result in improving the observed speedup. A possible solution could be the restructuring of Revolve, such that the number of messages received by RabbitMQ is distributed uniformly instead of a burst per generation.

Distribution of data exportation

In the current architecture, a single process in the Evolutionary Algorithm manages the data exportation. A snapshot of the whole population is saved at once, making this part sequential2. If the robot exportation can be done individually instead of on a population level, then this work can also be distributed to the workers. This can both increase workers’ effectiveness by doing more work and decreasing the content of a result message from the workers - increasing the speed of RabbitMQ.

Multiple robots per instance

Revolve is using different simulators to evaluate robots, such that at any given time, there is only one robot in the simulator. However, an empty gazebo instance is still using processing power. If the robots are appropriately spaced, Revolve can be restructured so that multiple robot simulations are allowed in one simulator without colliding. This structure allows all

1_{https://flower.readthedocs.io/en/latest/}

(31)

the processing power to be used by one simulator. Meaning that the Gazebo simulator can be multi-threaded such that every robot has his own thread.

5.1.2 Research

We have observed the difference between evolved robot morphologies while using different performance functions. The performance functions include a displacement- and rotation-based function. The first function favors robots that move, while the other is favoring rotation while penalizing movement. In this subsection, we will discuss two possible implementations that may allow more complex morphologies and behavior.

Creating complex behavior

We have obtained the evolved morphologies in our experiments using the displacement- and rotation-based task existing of one long limb, making the movement and rotation of a simple form. The number of components in the limbs varied due to the performance function involved, but the morphology and behavior are still simplistic. Maybe adjustments in the brain-body cooperation can create coordination between multiple limbs. The problem may exist due to the limited global communication between modular parts, where components of the body only communicate with other components inside the same limb. Researching the communication between body parts may unlock a variety of complex brains and bodies, allowing for complex behavior.

Morphology constraints

In our experiments, snake-like robots dominate the diversity in morphology. The robots seem to be beneficial at the starting of an experiment, making it possible to convert the whole population to snakes slowly. Although it is a local-optimal morphology, it raises the question of how other morphologies would perform. The snake-like morphologies of the robots may be preventable when adding constraints using the performance function. The displacement is penalized during the rotation-based task experiments, which results in a local-optimal outcome of the robot head staying stationary while the limb rotates the robot around. These snake-like robots may have been especially favorable when the displacement is penalized. We have seen that the performance function does change the morphology, so changing the penalizing system might increase the morphological complexity. Such a penalty could also be based on the number of limbs or the length of the limb. Researching the morphologies evolved by these fitness functions might give insight about the snake-like morphology domination.

(32)

References

[1] Gene M. Amdahl. “Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities”. In: Proceedings of the April 18-20, 1967, Spring Joint Computer Conference. New York, NY, USA: Association for Computing Machinery, 1967, pp. 483–485. url: https://doi.org/10.1145/1465482.1465560.

[2] Joshua Auerbach et al. “Robogen: Robot generation through artificial evolution”. In: Artificial Life Conference Proceedings 14. MIT Press. 2014, pp. 136–137.

[3] St´ephane Doncieux et al. “Evolutionary Robotics: What, Why, and Where to”. In: Front. Robotics and AI 2015 (2015).

[4] Miguel Duarte et al. “JBotEvolver: A Versatile Simulation Program for Evolutionary Robotics”. In: Artificial Life Conference Proceedings 14. MIT Press. 2014, pp. 210–211. [5] A. Eiben. “EvoSphere: The World of Robot Evolution”. In: vol. 9477. Dec. 2015, pp. 3–

19. isbn: 978-3-319-26840-8. doi: 10.1007/978-3-319-26841-5_1.

[6] A. E. Eiben and James E. Smith. Introduction to Evolutionary Computing. 2nd. Springer Publishing Company, Incorporated, 2015. isbn: 3662448734.

[7] A. E. Eiben et al. “The triangle of life: Evolving robots in real-time and real-space”. In: Artificial Life Conference Proceedings 13. MIT Press. 2013, pp. 1056–1063.

[8] A.E. Eiben and J. Smith. “From evolutionary computation to the evolution of things.” English. In: Nature 521. Nature, 2015, pp. 476–482.

[9] Alex Ellery and A. E. Eiben. “To Evolve or Not to Evolve? That is the Question”. In: Artificial Life Conference Proceedings 31 (2019), pp. 357–364.

[10] Iñaki Fernández Pérez, Amine Boumaza, and Fran¸cois Charpillet. “Comparison of selection methods in on-line distributed evolutionary robotics”. In: Artificial Life Con-ference Proceedings 14. MIT Press. 2014, pp. 282–289.

[11] John Gustafson. “Reevaluating Amdahl’s law”. In: Communication of the ACM 31 (May 1988).

[12] Matthew F. Hale et al. “The ARE Robot Fabricator: How to (Re)produce Robots that Can Evolve in the Real World”. In: Artificial Life Conference Proceedings 31 (2019), pp. 95–102. doi: 10.1162/isal_a_00147.

(33)

[13] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. San Fran-cisco, CA, USA: Morgan Kaufmann Publishers Inc., 2008. isbn: 0123705916.

[14] Elte Hupkes, Milan Jelisavcic, and A. E. Eiben. “Revolve: A Versatile Simulator for Online Robot Evolution”. English. In: Applications of Evolutionary Computation. Ed. by Kevin Sim and Paul Kaufmann. Springer, 2018, pp. 687–702. isbn: 9783319775371. doi: 10.1007/978-3-319-77538-8_46.

[15] Nick Jakobi. “Evolutionary Robotics and the Radical Envelope-of-Noise Hypothesis”. In: Adapt Behav 6.2 (Mar. 1998), pp. 325–368. issn: 1059-7123.

[16] Nick Jakobi. “Minimal simulations for evolutionary robotics”. In: 1998.

[17] Nick Jakobi, Phil Husbands, and Inman Harvey. ““Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics,””. In: vol. 929. Jan. 1995, pp. 704–720. [18] James Kennedy and Russell Eberhart. “Particle swarm optimization”. In:

Proceed-ings of ICNN’95-International Conference on Neural Networks. Vol. 4. IEEE. 1995, pp. 1942–1948.

[19] Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. “Optimization by simulated annealing”. In: science 220.4598 (1983), pp. 671–680.

[20] S Krishnaprasad. “Uses and abuses of Amdahl’s law”. In: Journal of Computing Sci-ences in Colleges - JCSC (Jan. 2001).

[21] Gongjin Lan et al. “Directed Locomotion for Modular Robots with Evolvable Mor-phologies”. In: Sept. 2018, pp. 476–487. doi: 10.1007/978-3-319-99253-2_38. [22] Duc Truong Pham et al. “The bees algorithm—a novel tool for complex optimisation

problems”. In: Intelligent production machines and systems. Elsevier, 2006, pp. 454– 459.

[23] Fernando Silva et al. “Open Issues in Evolutionary Robotics”. In: Evolutionary Com-putation 24.2 (2016), pp. 205–236.

[24] Ask Solem. Celery User Manual. https://docs.celeryproject.org/en/latest/ index.html. 2009 (accessed regularly, 2019/2020).

[25] X. -. Sun and L. M. Ni. “Another view on parallel speedup”. In: Supercomputing ’90:Proceedings of the 1990 ACM/IEEE Conference on Supercomputing. 1990, pp. 324– 333.

[26] A. M. Turing. “Computing machinery and intelligence-AM Turing”. In: Mind 59.236 (1950), p. 433.

[27] Fei Xia et al. “Amdahl’s Law in the Context of Heterogeneous Manycore Systems -ASurvey”. In: IET Computers Digital Techniques (Feb. 2020). doi: 10.1049/iet-cdt.2018.5220.

[28] Petr Zemek. Consuming and Publishing Celery Tasks in C++ via AMQP. https : //blog.petrzemek.net/2017/06/25/consuming-and-publishing-celery-tasks-in-cpp-via-amqp/. 2009 (accessed regularly, 2019/2020).

(34)

Appendices

Appendix A

This appendix will give an overview of the Celery configurations used for Revolve. app = Celery('pycelery')

# Setting configurations of celery.

app.conf.update(

broker_url = 'pyamqp://localhost:5672//', result_backend = 'rpc://',

task_serializer = 'yaml', result_serializer = 'json',

accept_content = ['yaml', 'json'], enable_utc = True, result_expires = 600, result_persistant = False, include = 'pycelery.tasks', worker_prefetch_multiplier = 1, task_acks_late = True, max_tasks_per_child = 1, )

We have already discussed the broker and result back-end in chapter 3. The other configu-rations are given by:

Accept content: What content-type is accepted for the Celery workers.

Result expire: Maximal time (seconds) a message will be available before deletion. Result persistant: Whether messages should be recovered when the broker disconnects. Include: Location of the tasks created by the user.

Worker prefecth multiplier: Number of messages a worker can reserve for itself, before the completion of his current task.

Task acks late: Enables the late acknowledgment of tasks. A worker will never only prefetch a task if his current task is acknowledged.

Max tasks per child: Maximum number of tasks a worker can do at the same time. The reason for using these configurations: robots can already be converted to a YAML string, and the behavioral metrics can be sent in JSON packages. Therefore the accept content equals these two values. If results are not collected after 10 minutes, the EA probably dis-connected. Hence the results can be deleted, and Revolve restarted. The same goes for the result persistence. Robots may take a long time to simulate. Therefore we do not want workers to reserve robots before they are done with the current one. This can be done by putting the prefetch multiplier on one and enable the late acknowledgment. All the workers

(35)

have one analyzer, and we do not want to implement two robots at a time, so the maximum task number is 1 per worker.

Appendix B

This appendix is about the technical side of Celery implementation. Revolve needs to start Celery workers and connect simulators to queues, and we need to call a particular function to use celery workers.

To start multiple workers, we call the following command:

$ celery multi start worker1 worker2 -Q robots$port -A pycelery -c 1

This command tells Celery to start two workers with the names worker1 and worker2. Apart from the initiation of the workers, it also generates a RabbitMQ queue named robots$port. Celery knows what configuration to use by the argument −A. The argument should name the folder containing the celery app, tasks, and configurations. The last parameters set the concurrency of the worker.

The Gazebo simulators that Revolve is using are build in C++, and to implement Celery in C++, we need to add the SimpleAmqpClient library. This library allows the connec-tion between C++ and RabbitMQ (or Celery) queues. Which makes consuming messages possible with both Python and C++ code. The implementation of this library, where the connection is made with RabbitMQ, is given by the following code. Therefore, using C++ in combination with Celery is possible, but I recommend the readings of Petr Zemek[28] for a better understanding of this matter.

this->celeryChannel = AmqpClient::Channel::Create("localhost", 5672); // Consumer tag this->consumer_tag = this->celeryChannel->BasicConsume( /*queue*/"cpp", /*consumer_tag*/"", /*no_local*/true, /*no_ack*/false, /*exclusive*/false, /*message_prefetch_count*/1 ); this->envelope = this->celeryChannel->BasicConsumeMessage(this->consumer_tag);

// read the message

auto message = this->envelope->Message(); auto body = message->Body();

Optimizing Artificial Evolution Manager using Celery

Master thesis