Locomotion learning for evolving modular robots of arbitrary shapes

(1)

Locomotion learning for evolving

modular robots of arbitrary shapes

Author:

Dmitry Egorov

Supervisors:

Dr. A.E. Eiben

Dr. Valeria Krzhizhanovskaya

A thesis submitted in fulfillment of the requirements

for the degree of Master of Science in Computational Science

in the

Computational Science Lab

Informatics Institute

Faculty of Science

University of Amsterdam

HPC Lab

Faculty of IT and Programming

ITMO University

(2)

UNIVERSITY OF AMSTERDAM

Abstract

Faculty of Science Informatics Institute

Master of Science in Computational Science

Locomotion learning for evolving modular robots of arbitrary shapes by Dmitry Egorov

In this work we develop a method for automatic generation of locomotion controllers for modular robots of arbitrary shapes, that is intended to be used in an embodied robotic evolution system called Triangle of Life. The robots consist of building blocks that can be assembled in different ways to create various body morphologies with different number of limbs and degrees of freedom.

We investigate two approaches to automatic design of robotic locomotion con-trollers. The first approach is learning the topology and the parameters of a neural network controller using an evolutionary algorithm with locomotion speed as the fit-ness function. The topological structure of the network is learned from scratch without relying on any prior knowledge about the network. The algorithm used is called Neural Evolution through Augmenting Topologies (NEAT). This algorithm, as the name im-plies, develops the topology of a network by augmenting it with new neural structures. We take a critical look at this algorithm, specifically, we investigate the redundancy of the neural networks it creates.

The second method is to design a neural network for locomotion by modelling Central Pattern Generators (CPG) – the neural structures that are responsible for animal locomotion. Numerical models of CPGs are commonly used for locomotion of legged robots. We create our own model of CPGs using the same neural network implementation that we used in the first method, and optimize its numerical parameters for locomotion speed.

Finally, we compare the learning performance of the two methods by using them for learning locomotion in two different body morphologies.

(3)

Acknowledgements

I would like to thank my supervisor, dr. A.E. (Gusz) Eiben, for making this work possible. His futuristic vision inspired me to engage in this research, and working with him was a great pleasure.

I would also like to thank Elte Hupkes, who created most of the software that was necessary for this research, and consulted me on technical details of it.

Furthermore, I want to thank Milan Jelisavcic for creating friendly atmosphere in the workplace, which allowed me to be more productive.

Finally, I thank dr. Alexander Boukhanovsky for giving me the opportunity to study in an international Master’s program and spend a year in one of the best uni-versities in Europe.

(4)

Chapter 1 Introduction

1.1 What is evolutionary computing

Evolutionary Algorithms (EA) are powerful tools for solving various problems in com-puter science and engineering, such as optimisation, software design, and machinery design. They are inspired by biological evolution - the process that gave birth to the variety of life on Earth. EAs have been successfully applied in aerospace engineering to improve performance of structural elements of satellites [25], rocket engines [35], aircraft propeller blades [32], aircraft wings [39] and entry heat shields [24]. Other popular applications of EAs include design and control of electric power systems [4], computational finance [1] and resource scheduling [16].

1.2 What is embodied evolution

Embodied Artificial Evolution (EAE) is a concept of artificial evolution of real physical objects, such as robots, taking place in real physical space. In the context of robotics, embodied evolution implies evaluating performance of the candidates using real physi-cal robots. An embodied evolutionary robotics setup can be as simple as a single robot that evolves control policies for a given task and evaluates them by applying them in real life. But it can also be something much more complex, with multiple robots co-existing and interacting with each other in a shared environment, and new generations of robots being physically constructed and released into the world.

Embodied evolutionary systems are much more demanding and complicated to create and use than the ones that take place in computer simulations, and have more limitations. They are confined to real time, while a simulation can be sped up using powerful enough hardware. They also may require a lot of space and resources to create new candidates. Real life hardware is prone to failures, and evaluations are more noisy and unpredictable than in simulation.

Then why are we interested in embodied evolution? The answer is that solutions that performed well in simulation may not perform as well in real life. This is due to the fact that simulating real life in a computer with perfect accuracy is impossible - a computer simulation is always a simplified version of reality. A solution discovered in simulation may demonstrate unrealistic behavior that exploits flaws in the simulator to achieve high fitness [29]. Therefore, once transferred from the simulation into the real world, it will perform much worse. This problem is known in evolutionary robotics as reality gap. The best way to overcome this gap is to evaluate all the candidates in the same conditions that they are intended to be used eventually.

(7)

The second advantage of embodied evolution is the possibility to continuously adapt solutions to changing working conditions in an online fashion, that is, without stopping the problem-solving activities. For example, an evolving robot can adjust its control policies to changes in the environment as it performs the given task.

While evolution of robot controllers adapts the behavior of a robot with a fixed body morphology, the next step in EAE is a system where robot bodies co-evolve alongside controllers. This work addresses one of important problems of such a system – the Control Your Own Body (CYOB) problem which is a task of finding a general way to control a robot body with an arbitrary morphology.

1.3 Locomotion learning

A controller in the context of robotics is the programmable part of a robot that accepts signals from the sensors as input and produces output signals that are fed into robot’s actuators. This process may or may not lead to some meaningful action performed by the robot. One of such meaningful actions, perhaps the most basic one, is locomotion – the movement of the robot through the environment. The complexity of this action depends on the concrete robot body structure. For a wheeled differential-drive robot it can be achieved by simply rotating the wheels with appropriate rates, but for a legged robot it can be much more complicated, requiring all individual moving parts to work together in a coordinated manner to achieve locomotion of the robot as a whole. In the context of legged robots, this type of movement is called gait.

Evolution of robot morphologies in an embodied evolutionary system produces robots with new morphologies for which no suitable controllers exist. This means that a new robot will most likely be unable to effectively control its own body and perform meaningful actions.

This research is focused on designing an algorithm for automatic generation of controllers for a given arbitrary body morphology, that would make the robot capable of moving through the environment. This problem is known in robotics as the locomotion learning problem.

An important requirement for the algorithm is the ability to work with wide variety of body morphologies. This means that the algorithm should not make too many assumptions about the morphologies (such as the assumption of symmetry), and be able to generalize well.

1.4 Triangle Of Life introduction

The grand vision of this research is to implement an embodied robotic evolution frame-work known as Triangle of Life [13]. Within this frameframe-work, embodied robotic agents exist in real space and real time, can reproduce and evolve. The life cycle of the agents consists of 3 stages: birth, infancy and mature life (see Figure 1.1). Triangle of Life is a generic framework, which means it can be implemented with different hardware platforms, reproduction mechanisms, and genetic encoding mechanisms. The only re-quirements are that real-life agents should be representations of their pieces of genetic code called genotypes, and that reproduction should happen through application of genetic operators to the genotypes.

(8)

Once a new genotype is created, the process of construction of a new agent from that genotype is started. This process is called Birth and is the first stage of the agent’s life. The mechanism of the agent creation is not specified, but possible options include 3-d printing of robot bodies, or assembly from pre-made modules.

A newly constructed organism enters the second stage of its life – Infancy. This is the stage when the organism’s mind (controller) adapts to its body. This stage is necessary because it is unlikely that random genetic recombination of the parents’ genotypes will result in a brain and a body that perfectly fit each other. During infancy, organisms cannot conceive offspring, and the living conditions at this stage may be milder than in the adult stage. To spread their genetic information, the organisms have to live until the end of infancy, which means that this stage serves as an initial test to filter out the least fit organisms early.

The criterion for transition from Infancy to Mature life can be arbitrarily specified by a user, for example, time since birth, or some performance threshold achieved by the organism. Once in Mature life, the organism becomes fertile, and can produce offspring. If the artificial life system was set up to solve a particular problem, the mature organism also starts the activities associated with problem-solving.

Figure 1.1: Life cycle of organisms in The Triangle of Life. The corners of the triangle are moments of: 1) Conception: construction of a new organism is started. 2) Delivery: construction of the organism is completed. 3) Fertility: The organism becomes ready to

conceive offspring.

1.5 Research goal

The goal of this research is to design a method of fast and robust automatic generation of locomotion controllers for robots with arbitrary body morphologies.

To that end, we take a look at two different approaches to designing locomotion controllers. First, we use a widely known general algorithm that generates neural networks using principles of artificial evolution, called Neural Evolution through Aug-menting Topologies (NEAT). This algorithm is able to develop neural networks that perform a given task without prior knowledge about the topology of the network, since it evolves the topology through augmenting a very simple starting network (hence the name). We use NEAT to evolve neural networks capable of performing locomotion in a number of test body morphologies.

(9)

The second approach relies on hand-designing a topology of a neural network con-troller in advance, and then optimizing its numerical properties (connection weights, parameters of activation functions of the neurons) for the task of locomotion. As dis-cussed earlier, the algorithm must be able to generate controllers for arbitrary body morphologies. Obviously, we cannot design a topology of the controller network for every conceivable body shape, but we can define a generic process that creates such a topology for a given body shape using a set of rules.

We compare these two methods in terms of performance, and identify which one is more suitable for integrating into the Triangle of Life framework.

(10)

Chapter 2 Related work

2.1 Online evolution

In many design applications artificial evolution is used as a method of generating solu-tions to a given fixed problem. Once an appropriate solution is found, the evolutionary process stops, and the solution is extracted and exploited. The evaluation of candi-dates happens either in computer simulation, or in laboratory conditions. This is a traditional approach to evolutionary computation which can be used for optimization problems, engineering and design. This kind of evolution is what Eiben et al. [13] call offline evolution.

In contrast, they call online evolution a process of continuous production of new solutions even during the problem-solving stage. New changes are being introduced into the population of candidates as they try to solve the problem at hand, and the candidates’ performance is being constantly evaluated. Online evolution allows the solution to adapt to changing working conditions, especially if the conditions are only partially known to the designers.

The problem of robot control is one of such cases. If the robot must work in unknown environments and without maintenance, it is crucial to have the ability to adapt its behavior to changes in the environment, as well as to changes in the robot itself (wear, contamination and loss of actuators, etc).

This approach can be useful for solving problems of robotic exploration, where a robot (or a population of robots) is sent to a distant and/or unknown environment, and is required to autonomously survive and accomplish certain tasks for a very long period of time. The potential applications of such a scenario include deep space [36] and deep ocean exploration.

2.2 Situated evolution

In many artificial evolution systems the process is separated into isolated well-defined stages: evaluation of the candidates, selection, reproduction. There exist a central managing agent that executes these stages in a fixed sequence, and explicitly ranks and selects candidates for reproduction based on their fitness for a specific goal. The candidates are usually evaluated separately, they do not interact with each other.

Consider an alternative scenario: candidates coexist in the same space and time and can interact with each other. The selection process is not governed by any central agent, instead, it arises from interactions between candidates and the environment they exist in. There is no explicit acts of evaluation of the candidates, fitness values may

(11)

not even exist at all, and the chances of candidates to spread their genetic code are determined implicitly by actions they perform during their lifetime. The restrictions of the simulated environment lead to competition for resources, which impose selection pressure. This type of distributed, agent-based, goalless evolution is much closer to biological evolution as it is understood today. It is sometimes called situated evolution [37], open-ended evolution [6, 38] or artificial life (ALife) [41, 2].

This concept is intuitively applicable to evolutionary robotics if we think of robots as simulated living beings. It has been used to design evolving adaptive teams of robots. For example, Prieto et al. [37] designed an asynchronous situated evolution process with robotic agents that coexisted and co-evolved in the world. Each robot had a certain amount of energy that was decreasing over time, and could be replenished by performing actions that were in line with the objective. The robots had to maintain their energy values above a certain threshold in order to survive. The entire process was asynchronous, meaning that the events of birth, death and mating of robots happened whenever certain criteria were met, rather than centrally orchestrated. Since this algorithm was designed to work with a limited number of real robots (which means that robots could neither be created nor destroyed), it reused “dead” robots by means of embryo based reproduction: each robot, apart from its own active controller, carried an “embryo” controller that took over the robot after the active controller “died”. The authors applied this algorithm to a problem of collective surveillance of an area in a simulated 2-d world. In [38] they adapted the algorithm for an experiment with real-life wheeled robots presented with the task of cleaning an area.

In [8] a group of mobile e-puck1 robots were allowed to reproduce only when they were within a certain distance from each other, which made them evolve aggregation behaviour. Robots were placed into an environment where the only requirement was to survive and reproduce. Like in [37] the survival of the robots was determined by their energy levels, which could be increased by gathering food items. Robots reproduced by broadcasting their genomes to the nearby robots, and the broadcasting range was limited. Thus, successful genomes had to be able to perform aggregation behaviour as well as foraging behaviour.

In [6] authors implemented a goalless multi-agent evolutionary process in a sim-ulated 2-d environment. Agents in this work were controlled by lists of rules that consisted of conditions and actions. These rule lists were subject to evolution. Simi-larly to [8], the environment was constructed in such a way that agents had to gather food items to survive, which made them evolve foraging behaviour.

Situated evolution has been utilized to solve other kinds of problems not associated with robotics, such as function optimization [41] and engineering tasks [2, 53].

Satoh et al. [41] use simulated colonization of the domain space by artificial organ-isms (Artorgs) to find global optima of non-convex functions. The artorgs could move through space, consume, produce and trade resources, mate and produce offspring, and die due to lack of resources. These interactions collectively resulted in artorgs gathering in the objective function optimum. The multi-agent nature of this algorithm allows for its distribution over multiple computational units.

Yang et al. [53] used an artificial life algorithm similar to the one used in [41] to optimize the design of journal bearings. Anh et al. [2] applied a combination of an

1

(12)

artificial life algorithm and tabu search to design a fluid engine mount with conflicting requirements of mechanical stiffness and vibration isolation.

Some work has been done in simulating generic evolving multi-agent ecosystems. Such simulations can be useful for studying biological evolutionary processes, as study-ing them in real life is usually impossible due to their large time scales. For example, Gras et al. [14] considered a predator-prey ecosystem, where agents were controlled by Fuzzy Cognitive Maps (FCMs). These maps are graph-like structures that take input data from the environment and produce an action that an agent needs to take. The population of agents was heterogeneous (each individual had its own potentially unique FCM), and underwent evolution in a distributed bottom-up manner. Yaeger et al. [52] used an artificial life simulator called Polyworld to study how complexity of organisms increases due to evolution.

2.3 Evolution of controllers

Evolutionary computation was able to generate robot controllers capable of amazingly complex behaviors.

In [5], robots evolved a complex phototaxis task that included cooperation and memory. Robots were placed into an arena and were required to move towards the light source, but only approach it from a certain direction according to marks drawn on the arena floor. The experiments were done in a computer simulation as well as in real differential-drive robots, controlled by recurrent neural networks.

Tuci and Trianni [48] evolved a 2-robot team that was capable of performing dif-ferent roles despite having identical controllers. The controllers were recurrent neural networks with a fully interconnected hidden layer of 6 neurons. The robot body was a differential-drive chassis equipped with IR sensors, ambient light sensors and a camera. The team was required to guard the “nest” and forage items around it at the same time, and the robots were expected to autonomously allocate these roles and perform them. Duarte et al. [12] generated neural controllers capable of a complex task that re-quired an e-puck robot to find an item in a double T-maze and bring it to a certain location. The location of the item was signalled to the robot by flashing lights near the entrance of the maze. The controller they designed for this task had a hierarchi-cal structure composed of multiple relatively simple neural networks. Each of those networks solved a certain sub-task and was evolved separately to do only that. Then a behavior arbitrator was evolved, that carried out a task of delegating control of the robot to one of those sub-controllers based on sensory inputs. Evidently, this work contained a significant amount of engineering by hand, as evolving such a complex behavior without it would be problematic.

2.4 Co-evolution of bodies and controllers

All aforementioned robot evolution experiments have one thing in common: they sider evolvable controllers within fixed robot bodies. Other kinds of experiments con-sider evolution of bodies themselves.

The paper by Sims [42] is one of the earlier works that describes generation of simulated creatures using artificial evolution. In this work, both body morphologies

(13)

and controlling neural networks co-evolved. The genetic encoding described positions of various components of the neural network within the body of a creature, and allowed for replicating parts of the network throughout the body. The author evolved creatures capable of various types of locomotion, such as jumping, swimming and walking.

Bongard [7] investigated how allowing robots to evolve bodies along with minds can accelerate discovery of useful behaviors. In his work, robot morphologies could parametrically change from a snake-like shape to a legged upright shape. Robots started with random neural network controllers and snake-like shapes, and evolved controllers capable of phototaxis (movement towards a light source), while their body postures became more upright. It was discovered that morphological change does accelerate the evolution of the desired behavior. Note that in this work, changes of body morphologies were not driven by evolution, but followed a plan set by the experimenter in advance. Only the controllers were subject to evolution.

In [10] authors evolved locomotion in voxelized robots made of different kinds of soft materials. They used Compositional Pattern-Producing Networks (CPPNs) to encode robot morphologies. The CPPNs will be described in detail shortly, but the basic idea is to “ask” the CPPN to provide an output value for the given spatial coordinates by feeding the coordinates into the input nodes of the CPPN. The CPPNs in this work were evolved with the fitness function being the speed of robots’ locomotion. One interesting aspect of this work is that these robots did not have controllers in the traditional sense. Instead they moved by contracting and expanding some of their voxels in a periodic fashion, while other voxels served as supporting and connective tissue. This means that the way those voxels were spatially arranged completely determined the robots’ behavior.

Weel et al. [51] implemented the Triangle of Life framework in a computer simula-tion using modular robotic platform called Roombot. In this implementasimula-tion, robots learned locomotion in the infancy stage using an algorithm called RL PoWER [27]. This algorithm learns gaits by optimizing a set of parametrized cyclic splines, where each spline represents an angular position of a robot’s joint as a function of time. The robots entered the mature stage if they moved far enough away from their location of birth. The evolutionary process itself did not contain an explicit fitness evaluation stage or selection process. Robots mated whenever they found themselves within a certain distance from each other, which implicitly promoted genetic code that makes organisms move fast.

The work by Weel et al. is a good proof of concept for Triangle of Life, as it incorporates co-evolution of minds and bodies that is distributed and driven by the environment. Its drawback, however, is that the robots were controlled by sets of time-dependent functions that were interpreted as target joint angles, which limited their capabilities to open-loop locomotion without incorporating sensory input.

2.5 Learning algorithms for neural networks

Artificial Neural Networks (ANNs) are a very widespread architecture for robot con-trollers in evolutionary robotics. Learning methods for artificial neural networks may be divided into supervised and unsupervised learning methods. In general, we want the network to produce correct outputs given some inputs. In supervised learning, a set of training data is available, for which we know the correct output for each input.

(14)

Using a supervised learning algorithm such as error backpropagation, the network is trained on the training dataset, and then exploited on data for which correct outputs are unknown. However, training data is not available for many classes of problems. In case of robot control, the network must produce outputs that are fed into the robot’s motors, and the net effect of these outputs must result in some useful actions. That means, only a high-level description of the desired behavior is available. For the task of locomotion this description may be expressed as the speed of movement. In this situation, unsupervised learning methods are more useful.

In this work we stick with evolutionary approach to learning because of its gen-erality. There are a number of other algorithms for ANN training, such as Hebbian learning and self-organized maps, though all of them require some prior assumptions about the ANN. Evolutionary algorithms that change both the topology of the network and its parameter values requires very little prior knowledge about its structure. In this work the only parts of the network that we fix are the input and output interfaces of the network, which are determined by the concrete robot body morphology.

2.6 Gait learning

There exist a number of wheeled differential-drive robotic platforms aimed for re-searchers in the area of artificial intelligence. They are very simple in terms of motion control. There also exist a number of legged robot designs, as well as modular re-configurable robotic designs. The latter two groups of robots can be potentially more useful in field conditions with irregular terrain, but the locomotion control problem for them is much harder. Legged robots must maintain balance, perform efficient gaits and adapt to the current terrain. Modular robots present even bigger challenge since their morphologies are not fixed and might be unknown at design time. In this sec-tion, an overview of several classes of controller architectures and learning methods is presented.

2.6.1 Central Pattern Generators

There has been a substantial amount of work investigating robot locomotion control by mimicking neural structures found in living organisms, called Central Pattern Genera-tors (CPGs). These structures produce rhythmic output signals without any rhythmic input from sensory feedback. In robotics, a CPG is usually modelled as a system of first-order differential equations [17, 26, 40]. These equations form a feedback loop that produces rhythmic output with properties dependent on external non-rhythmic inputs. The solutions of the equations provide target positions to the robot’s actuators.

Ijspeert et al. [17] used a model for biological CPGs to design a locomotion con-troller for a simulated salamander. They modelled the CPGs using a neural network that consisted of identical groups of interconnected neurons for each body segment of the salamander, called segmental oscillators. The neurons themselves were modelled

(15)

as leaky integrators that obeyed the following system of differential equations: ˙ ξ+ = 1 τD   X i∈Ψ+ uiwi− ξ+   (2.1a) ˙ ξ− = 1 τD   X i∈Ψ− uiwi− ξ−   (2.1b) ˙v = 1 τA (u − v) (2.1c) u = ( 1 − e(Θ−ξ+)Γ_{− ξ} −− µv, if u > 0 0, if u ≤ 0 (2.1d)

Here wi are coupling weights, Ψ+ and Ψ− are sets of excitatory and inhibitory

neurons that are connected to this neuron, and τD, τA, µ, Θ and Γ are constant

parameters of the neuron.

Such models of computational processes that involve interconnected networks of neurons are sometimes called connectionist models. The authors generated the neural networks using the idea of staged evolution - complex control structures evolve incre-mentally using simpler elements evolved at the previous stage. They first evolved a single CPG composed of multiple interconnected neurons, and then evolved the con-nections between different CPGs to produce gait in the robot. Their controller was capable of swimming and walking and could switch between those states smoothly depending on the level of the input signal.

In another work by Ijspeert et al. [22], they modelled the CPGs using a set of coupled nonlinear oscillators, where the i-th oscillator is expressed with the following system of differential equations:

τ ˙vi = −α x2_i + v_i2− Ei Ei vi− xi+ X j (aijxj + bijvj) + X j cijsj (2.2a) τ ˙xi = vi (2.2b)

Here aij and bij are coupling weights, sj and cij are sensory inputs and sensor weights

respectively, and α, τ and Ei are positive constants. This system demonstrates limit

cycle behavior – it converges to stable harmonic oscillations with amplitude √Ei and

period 2πτ from any starting conditions.

Marbach and Ijspeert [31] later explicitly incorporated phase differences into the system of coupled nonlinear oscillators 2.2 and used this configuration for online gait learning in a simulated modular robot. The parameters of the oscillators were optimized using Powell’s method. The authors showed that the robots were capable of adapting their gaits to motor failures.

(16)

In [21], the CPGs were implemented as a system of coupled oscillators governed by a system of second-order differential equations, where the phase offsets between oscillators were again explicitly included into the equations:

˙ θi = 2πvi+ X j wijsin(θj− θi− φij) (2.3a) ¨ ri = ai a_i 4(Ri− ri) − ˙ri (2.3b) xi = ri(1 + cos(θi)) (2.3c)

Here ai, Ri and vi are constant parameters of the oscillator, φij is the phase offset

between i-th and j-th oscillator, and wij are coupling weights.

A comprehensive review of application of Central Pattern Generators can be found in [20]. The key features of CPGs noted by the author are: the stability of the os-cillations to perturbations, distributed nature of CPG controllers (which is useful for modular and/or reconfigurable robots), and the ability to modulate the CPG behavior by a few non-rhythmic driving signals.

The last property is especially interesting: it means that having a properly im-plemented CPG controller, one can change locomotion speed and direction and even cause transition to a different type of gait by varying the drive signals. Thus, a CPG locomotion controller can serve as an abstraction layer between low-level control of each individual actuator and high-level control of gait properties. For example, in the work of Ijspeert [19] a controller for a simulated salamander-like robot was developed, that was able to produce two distinct types of locomotion – swimming and trotting – and switch between them smoothly in response to change in the level of the constant input signal.

Data from various sensors can be fed back into a CPG controller to produce a closed control loop that is able to adapt to the current situation and produce stable gaits on uneven terrain.

Kimura et al. [26] designed a locomotion controller for a quadruped using a combi-nation of CPGs and a reflex mechanism that was able to produce an adaptive gait on irregular terrain. The reflexes responded to signals from contact sensors on the robot’s feet, which prevented the robot from stumbling when its leg met an obstacle on uneven terrain.

Buchli et al. [9] showed that a CPG-based controller can produce gaits for a robot with under-actuated legs (one of two joints was actuated by a motor, and the other one was spring-loaded). The controller inputs included data from an acceleration sensor and joint angular position sensors. The authors showed that the controller was able to account for the dynamics of the under-actuated robot body and to adapt to its resonant frequency.

In [18] authors designed a gait for a quadruped robot based on CPGs that takes into account sensory inputs from gyroscopes and cameras to remain stable on rough terrain. They first designed a static, non-responsive gait using CPGs, and then wired sensor inputs to parameters of CPGs to make the gait dynamically adapt to the terrain. They used Particle Swarm Optimization to learn mapping weights from the inputs to the CPG parameters.

(17)

As was already mentioned, CPG controllers can be implemented in a distributed way and used in modular robots. Sproewitz et al. [43] addressed the problem of CPG-based gait learning for reconfigurable modular robots with arbitrary morphologies. They used real-life robots made up of multiple identical connected modules with 1 degree of freedom per module. Their CPG model was implemented as a network of coupled phase oscillators with one oscillator per module, that communicate with each other via Bluetooth. The parameters of the CPG network were optimized in real time using the Powell optimization algorithm.

2.6.2 HyperNEAT

Another approach to locomotion learning is to use Compositional Pattern Producing Networks (CPPNs) to encode neural network controllers. A CPPN is a directed graph that takes some input values and produces output values. Each node of a CPPN is a simple math function, and these functions are connected by weighted directed edges. Thus, outputs of a CPPN are complex (possibly nested) functions of its inputs.

A common way to build a neural network using a CPPN is to query the CPPN for weights of neural connections. Consider a CPPN with 4 inputs and 2 outputs and a simple neural network with 3 layers of neurons (input, hidden and output layers). A pair of Cartesian coordinates of a hidden neuron and an input neuron can be fed into the CPPN, and the two output values can be treated as weights of the input-to-hidden connection and the hidden-to-output connection. By repeating this procedure for every pair of input and hidden neurons, one can determine all the weights in the neural network. If we view the CPPN as a genotype, and the neural network it produces as the corresponding phenotype, we can say that the CPPN encodes the neural network. This type of encoding is called generative encoding, because the genotype is used as a rule to generate the phenotype (as opposed to direct encoding, where a genotype is a direct description of its phenotype).

The CPPNs themselves can be considered a form of neural networks, therefore they can be generated by an evolutionary algorithm, such as NEAT. The algorithm that uses NEAT to evolve CPPNs and then generates neural networks is called HyperNEAT and was first introduced in [44].

Clune et al. [11] applied the HyperNEAT algorithm to the gait learning problem of a simulated 4-legged robot. They used a neural network controller with a single hidden layer and with inputs from joint angular positions, touch sensors and torso orientation sensors. They also compared the performance of HyperNEAT with that of an evolutionary algorithm that acted directly upon fixed-topology neural networks. The HyperNEAT algorithm performed much better, which authors attribute to two reasons. Firstly, HyperNEAT tends to produce modular neural networks, and can reuse neural modules in such a way that all legs are controlled by identical or similar modules. Thus, mutation of one module will affect other modules, which will change the behavior of all legs at once. Secondly, it can exploit symmetry of the problem by producing symmetric neural networks.

In [54] authors applied HyperNEAT to learn a gait for a real-life 4-legged robot with 9 degrees of freedom, and compared it to a hand-designed parametrized gait with learned parameter values.

(18)

This work is notorious because it describes experiments that are highly embod-ied. All the computations related to learning were done within the robot’s on-board hardware, and evaluations of candidate gaits were done in the real world. Because the experiments were done in real time, the number of evaluations was limited to only 180 per run. Despite that fact, the HyperNEAT algorithm outperformed parametrized gait learning. In a later study [30] they first learned gaits in simulation and then transferred them to the real robot, which allowed the algorithm to perform many more evaluations and discover better gaits.

Haasdijk et al. [15] used HyperNEAT to evolve locomotion in a group of simple identical modules that were physically connected to each other to form a 4-legged robot. Each module was controlled by its own independent copy of the controller, which meant that they had to demonstrate cooperative behavior in order to move. Furthermore, different modules had to play different roles in the body (e.g. vertebrae, hips, feet), so they had to be capable of specialization. To provide specialization capabilities, the authors used the same CPPN (evolved with NEAT) to produce different controllers for different modules by feeding the CPPN the location of the module inside the robot’s body.

Since CPPNs can produce regular repeating patterns in multi-dimensional space, they can also produce time-dependent signals that can be fed directly into the robot actuators. Morse et al. [34] exploited this idea to evolve gait controllers where the activation value of a neuron was calculated by feeding its spatial coordinates and a time coordinate into a CPPN at each time step. The CPPNs themselves were evolved by NEAT, as usual.

2.6.3 Other methods

Some of the aforementioned works use HyperNEAT to evolve gait controllers for ar-tificial creatures. In that approach, the NEAT algorithm is used to evolve pattern-producing networks, which are then used to generate the neural networks. A drawback of this approach is that the user is required to specify the topology of the underlying neural network. In [11, 54] a network with a single hidden layer was used that had no recurrent connections and no hidden-to-hidden connections, which could limit the complexity of evolved behaviors. Another approach to neural evolution is to use NEAT directly to generate neural networks and use them as controllers. This approach does not require the user to specify the topology of the neural network, because the topology evolves automatically, starting from the simplest one.

Allen et al. [3] used NEAT to automatically produce gait controllers for animated humanoid characters in video games, and Tibermacine et al. [46] did the same for simulated virtual creatures made of simple 3-d shapes.

In [47] authors applied NEAT to generate locomotion controllers for cellular crea-tures in a 2-d cellular world. Here a creature was represented by a group of cells that could form multiple disconnected clusters, and the movement of each cell was controlled individually by its neural network.

Inden et al. [23] extended the traditional NEAT algorithm to handle large neural networks. They argued that some problems could be solved by large neural networks that have regular repetitive structure (i.e consist of a large number of identical or similar sub-networks). One of such problems is legged robot locomotion. Indeed, it is

(19)

sometimes enough to evolve a controller for one leg and replicate it for other legs with small changes [49]. This led them to develop an extension for NEAT called NEATfields, that evolves hierarchical neural structures. On the top level, such a structure consists of fields, that are treated like individual neurons by evolution. Each field is a sheet of similar interconnected sub-networks, and each sub-network consists of individual neurons. This architecture allows to encode large regular networks with relatively small genotypes that can be efficiently synthesized using NEAT.

It has been shown that modular gait controllers have an advantage in solving the problem of multilegged locomotion. In [49] authors constructed a gait controller for a 4-legged robot in the form of a neural network that consisted of 4 identical modules for each leg. They used an evolutionary algorithm to learn the parameters of a single module and then constructed the entire network by duplicating that module with permutations of its inputs. This controller design was compared with a non-modular neural network where all parameters were learned independently, and was found to be superior. It was to be expected, since the 4-legged robot body has symmetry, and therefore can be controlled efficiently by a neural network with the same kind of symmetry. The same conclusions can be drawn from works that use HyperNEAT [11, 54, 30].

Moriguchi et al. [33] described a way to learn a forward model for robot locomotion. Forward models predict outcomes of motor commands which allows a robot to plan its movement. The learning method they propose is Symbolic Regression, which uses an evolutionary algorithm operating in the space of mathematical expressions.

While in the paper authors applied their method to a simulated underpowered dou-ble pendulum prodou-blem, it can be applied to other kinds of motion planning prodou-blems, including locomotion.

One of the challenges of using evolution for gait learning in real hardware is mea-suring the performance of controllers. Authors often use methods that are not viable outside of the laboratory. For example, in [54] the speed of robot’s locomotion was mea-sured using an overhead camera that tracked the position of an infrared LED mounted on the robot, which is only possible in laboratory conditions.

The work of Wawrzynaski et al. [50] addresses the problem of measuring the speed of movement of a legged robot in field conditions using only readouts from on-board sensors of the robot, which include an Inertial Measurement Unit (IMU), servomotor position sensors and feet contact sensors.

(20)

Chapter 3 Methods

This chapter describes two methods for robot locomotion learning that were used and compared in this work. In both cases the robots were controlled by recurrent neural networks where input nodes were connected to the robot’s sensors, and output nodes -to the mo-tors. In order -to fully describe a neural network, one must describe its - topol-ogy (numbers of neurons of different types and their connectivity), and properties of all the neural structures (weights of connections, activation behavior of neurons). Once we have a network with a given topology, we can optimize its numerical parameters to solve a given problem. But the topology itself can also be optimized.

The first method relies on developing a network locomotion controller network by searching the space of possible topologies and numerical parameters. We use an evolutionary algorithm for evolving neural networks called Neural Evolution through Augmenting Topologies (NEAT). This algorithm is able to optimize the numerical pa-rameters of a given topology, as well as augment the topology itself with new structures if it proves to increase the problem-solving performance.

The second method proposes only optimization of the parameters of a given network topology that was hand-designed with locomotion control in mind. In chapter 2 we described some works on using Central Pattern Generators for robotic locomotion control. In particular, some authors modelled CPGs using networks of neurons with specific activation behavior (the so called connectionist models). In this study we use some ideas from those works to design our own locomotion control neural network that possesses CPG-like qualities. We then optimize its numerical parameters (keeping the topology fixed), and compare the results to the first method.

3.1 Principles of NEAT

NEAT [45] is an evolutionary algorithm specifically designed to work with neural net-works. NEAT learns both the network topology and the parameters of its neurons and connections. It starts with minimal complexity of the neural network and develops aug-mentations incrementally through structural mutations. New individuals are created from crossover between two existing individuals, and then undergo stochastic muta-tions. Mutations can augment the topology as well as change numerical parameters of the network.

Three core features of NEAT are: starting with the simplest networks and aug-menting them, matching genes of the same origin, and protection of innovations. These features are described in detail below.

(21)

3.1.1 Genetic encoding

NEAT uses direct encoding of the network structure. A genotype consists of a list of neuron genes and a list of connection genes. A neuron gene contains all relevant infor-mation about its corresponding neuron, while a connection gene contains the weight of its connection as well as references to two neuron genes.

3.1.2 Mutation

The random changes are introduced into the networks my means of random mutations, which can be of 3 kinds. Structural mutations either add a new connection between two existing neurons, or add a new neuron in the middle of an existing connection. Weight mutations change weights of existing neural connections. Parameter mutations change values of parameters of existing neurons. Obviously, only structural mutations add new genes to the genotype, whereas other kinds of mutations act on existing genes.

3.1.3 Tracking homologous genes

If during crossover genes are exchanged randomly without regard to what parts of the network they express, different kinds of imperfections can be introduced into the net-work, such as: duplicate neurons and connections; neurons that are not attached to the network; connections that are not bound to any neurons. In addition, if a useful complex structure expressed by multiple genes happen to develop in one of the parents, it will likely be destroyed by such indiscriminate crossover. Thus it is important to keep track of what structures genes represent. In biology, homologous genes are genes that express the same trait in organisms. Homologous gene matching allows to perform genetic crossover in a meaningful way, without introducing imperfections and losing genetic information. This problem is resolved in NEAT by means of gene origin track-ing. When a new gene is added to the genotype as a result of a structural mutation, it receives a unique integer number called a historical mark. This mark is assigned to the gene once at creation and never changes. The fact that these marks are unique means that if two genotypes both happen to have a gene with the same mark, these genes are of the same origin (they started out as the same gene and then were copied by the reproduction process). This means that they necessarily express the same structure in the network, even though numerical parameters of that structure may be differ-ent because of differdiffer-ent mutations they wdiffer-ent through. Simply put, two neuron genes with equal historical marks always express the same neuron, although with poten-tially different parameter values; similarly, two connection genes with equal historical marks always express the same connection, although with potentially different weights.

3.1.4 Crossover

When two genotypes are crossed over, the genes with matching historical marks (ho-mologous genes) are paired with each other. The child then inherits only one gene from the pair. If a gene does not have a matching pair, the decision whether to inherit it or not must be made using a consistent rule (that is, non-randomly). For example, in our implementation the unpaired genes are inherited if they come from the more fit of the

(22)

two parents. The consistency of the rule allows the child to inherit complex structures represented by more than one gene, which otherwise would be broken into components by random inheritance.

3.1.5 Innovation protection

Another important feature of NEAT is a mechanism of protecting the newly intro-duced neural structures from dying out. When a new structure is introintro-duced, it most likely will not immediately increase the fitness of the individual. It has to take time to adjust the new weights and parameter values to be beneficial. Therefore, without the protection mechanism new structures would die out due to poor performance, and the neural network would never develop augmentations.

In NEAT, new structures are protected by speciation. Individuals are grouped into species based on their genetic similarity, and are forced to share fitness among all in-dividuals of the same species. Thus, a species cannot afford to become too large even if it consists of well-performing organisms. Therefore, any single species is unlikely to take over the entire population. This mechanism also allows the evolutionary process to better avoid local maxima. It effectively penalizes organisms for being similar to others, and encourages the evolutionary process to explore different solutions rather that converge to a group of similar solutions.

Similarity metric

To implement speciation, we need some quantitative way of measuring similarity be-tween two neural networks. Luckily, we already have a mechanism of tracking gene origins (historical marks), which can be used to devise a similarity metric.

The fewer matching genes two genotypes have, the less similar they are. Thus, we can calculate the measure of dissimilarity of two genotypes by counting the number of disjoint and excess genes. A gene of the genotype A is disjoint relative to the genotype B, if its historical mark lies within the range of historical marks of B, but B does not have a matching gene. A gene in A is excess relative to B if it lies outside the range of historical marks of B (see Figure 3.1 for a visual representation).

Figure 3.1: Genes of two genotypes aligned by their historical marks

As we can see, NEAT has three important features:

1. Augmentations of the network structure: unlike other ANN learning techniques, NEAT changes the topology of the network.

(23)

2. Intelligent crossover based on historical origin of genes: the algorithm makes sure that crossover happens in a meaningful way and does not destroy genetic information.

3. Innovation protection through fitness sharing: The algorithm encourages variety of solutions.

3.2 Central Pattern Generator controllers

Locomotion controllers based on Central Pattern Generators can be implemented in a variety of ways, most of which include finding solutions to systems of coupled differ-ential equations by numerical integration. A good review of research on CPG-based robotic locomotion control can be found in [20]. In many cases CPG-based controllers are distributed, meaning that the controllers consist of homogeneous modules that are located throughout the body of a robot, and have the same connectivity as the body. This makes such controllers useful for reconfigurable robot morphologies, which is es-sential for our purposes. We implement our own version of a CPG controller using a neural network made of special kind of neurons. In the chapter 4 a more detailed description of our implementation of CPG-based controllers will be given.

(24)

Chapter 4 Platform description

For this study we carried out all locomotion learning experiments in a computer simu-lation. We used the Gazebo robotic simulation software [28] which provides rigid-body 3-d physics simulation using Open Dynamics Engine (ODE) 1_{, though it can work}

with other physics solvers, such as Bullet Physics 2 and the Dynamic Animation and Robotics Toolkit (DART) 3. For managing the simulation we used Revolve 4 robotic simulation framework which provides a convenient Python interface with Gazebo, and contains tools for construction of simulated modular robots based on the RoboGen 5 platform.

4.1 Robot components

RoboGen is a robotic platform developed for evolutionary robotics applications. It contains a set of specifications for mechanical parts that can be used to construct modular robots. The structural components of these parts are 3d-printable, while electronic components (servos, sensors, etc) can be purchased from online stores.

The robots used in this study are constructed from a set of body part types that are based on those available in RoboGen. Most of the part types that we use are identical to their RoboGen counterparts, but some are modified to suit our purposes better (see Figure 4.1).

(a) RoboGen core component (b) RoboGen fixed brick (c) RoboGen active hinge (d) RoboGen passive hinge

Figure 4.1: RoboGen components

1. Core component.

This is the part that hosts the robot’s brain, battery, and an Inertial Measurement Unit (IMU). The IMU measures accelerations along 3 axes, and rotation rates

1_{Open Dynamics Engine home page http://www.ode.org}

2_{Bullet Physics Library home page http://bulletphysics.org/wordpress}

3_{Dynamic Animation and Robotics Toolkit home page https://dartsim.github.io} 4_{Revolve source code https://github.com/ElteHupkes/revolve}

(25)

around 3 axes. We increased the size of this component compared to the original specification to accommodate for the Raspberry Pi board that we use as a brain in our hardware experiments. This component is completely 3d-printable.

2. Fixed brick.

This is a structural component that does not have any function other than to connect and space other parts of the robot. This component is completely 3d-printable.

3. Hinge joint.

This is a two-part joint with a single pivot axis. It can either be powered by a servo (active hinge), or be free-rotating (passive hinge). This component is assembled from 3d-printable structural components and a servo motor (or a free-rotating axle).

The body parts can be attached to each other at the defined places called slots. A core component and a fixed brick have 4 attachment slots, one at each side of the part excluding top and bottom sides. A hinge joint has 2 attachment slots. The relative orientation of two adjacent body parts is determined by an angle of rotation around the axis normal to the slot. In practice we only use angles that are multiples of 90◦.

4.2 Robot Morphologies

All the experiments with the gait learning algorithm were conducted using 2 robot morphologies, to which we refer as spider and snake (see Figure 4.2).

(a) spider (b) snake

Figure 4.2: Screenshots of the test robot morphologies in the simulation software.

The spider is a quadruped with 90◦ symmetry. In consists of the core component with a leg attached to each of its 4 sides. Each leg consists of 2 hinge joints and 2 bricks. The joints are positioned so that their axes are at 90◦ to each other. This morphology has 8 degrees of freedom.

The snake consists of the core component with a tail attached to its side. The tail of the snake consists of 4 hinge joints with alternating orientations, spaced by 4 bricks. These morphologies were chosen partially because they have real-life counterparts in the laboratory (see Figure 4.3, snake morphology not shown), so the gait learning algorithm could be tested in hardware. Despite the fact that these morphologies were

(26)

hand-designed rather than generated by an evolutionary process, they are sufficiently different from each other to assess the ability of the gait learning algorithm to generalize to arbitrary morphologies. Of course, to be completely sure in the generalization power of the algorithm, we need to integrate the learning process into the morphological evolution and test it with new unpredictable morphologies, which is beyond the scope of this work.

(a) (b) (c)

Figure 4.3: Real-life robots in the laboratory

4.3 Neural network

This section provides a detailed description of the robot controllers used in the exper-iments. The robots are controlled by artificial neural networks. An artificial neural network is a directed graph, where nodes are called neurons, and directed edges are called connections. We will denote the set of neurons of a neural network as N and the set of its connections as C. A neuron ni ∈ N can be regarded as a math function

that calculates the output value outputi of the neuron from one or more input values.

A connection cij ∈ C that is directed from the neuron nj to the neuron ni has a weight

wij associated with it. All neurons belong to one of 3 layers: the input layer, the

output layer and the hidden layer.

The input layer neurons receive readouts from the robot’s sensors as input values, and pass them unchanged as output values. Each input neuron is connected to only one sensor and therefore receives only one input value.

The neurons in the hidden and the output layer receive their inputs from the incom-ing connections, then calculate output values (one per neuron) and pass them along the outcoming connections. Each output or hidden neuron can have an unlimited number of outcoming connections that can lead to either output neurons or hidden neurons (including this neuron itself). Additionally, each output neuron must be connected to one actuator in the robot’s body.

A neuron ni calculates its output value using an activation function fiact:

outputi = fiact(inputi) (4.1)

The input value inputi of the neuron ni is a weighted sum of activation values of

the neurons that have connections directed at ni:

inputi =

X

j∈inputsi

(27)

In all the experiments that used NEAT, the neural networks consisted of 3 types of neurons with different activation functions. The types of neurons are called Simple, Sigmoid and Oscillator. Each of these types has its own set of parameters. The NEAT algorithm was allowed to augment the network with any of these types of neurons.

Simple neurons

Simple neurons have a linear activation function expressed by the following formula:

f_iact = (inputi− biasi) · gaini (4.3)

Here biasi and gaini are the parameters of the neuron ni.

Sigmoid neurons

The activation function of the sigmoid neurons is called the logistic function:

f_iact= 1

1 + e−(inputi−biasi)·gaini (4.4)

Here biasi and gaini are parameters of this particular neuron (see Figure 4.4).

Figure 4.4: Graph of the logistic function with different values of bias and gain.

Oscillator neurons

The oscillator neurons are a little bit different from other types in that their outputs do not depend on the inputs, but rather on time elapsed since the start of the simulation:

f_iact = 0.5 · 1 + Ai· sin 2π Ti (time − Ti· φi) (4.5)

Here time is the current time of the simulation. This type of neuron has 3 parame-ters: Ai is the amplitude of the oscillations, Ti is the period and φi is the initial phase.

The function value oscillates harmonically between 0.5 − 0.5Ai and 0.5 + 0.5Ai.

Each neuron is associated with a certain body component of the robot. This means that the neuron is treated by morphological evolution as if it was physically inside that component. This is done with the full-fledged integration into the Triangle of Life

(28)

framework in mind: when body parts of parents will be inherited by their children during morphological crossover, the neural structures inside those body parts will be inherited as well.

4.4 Actuators

Active hinge joints of the robots are equipped with step motors. The output values of the controller are used as target angular positions for these motors. The motors currently used in the real-life robots are Turnigy XGD-11HMB digital servos that are claimed to provide torque from 2.2kg − cm to 2.5kg − cm. The simulated motors are limited to the maximum torque of 1.8kg − cm. To achieve the supplied target angle, the motors use PID controllers with the proportional coefficient set to 0.5, and the integral and derivative coefficients set to 0.

4.5 NEAT implementation

4.5.1 Genotype

The genetic encoding of the network is implemented as a straightforward description of its structure, with each single gene encoding either one neuron, or one connection between neurons. The historical marks of genes act as their unique identifiers. The way crossover is done in NEAT guarantees the uniqueness of the gene marks on the individual level (an individual cannot have more than one gene with a given mark), though the uniqueness is not guaranteed on the population level (since genes are copied during the reproduction process). It is also guaranteed that two genes with the same mark encode the same structural component of the neural network.

4.5.2 Initial population

The initial population of controllers is derived from a minimal neural network that follows from the robot morphology and specifications of its body parts. In the default specification used in Revolve, each type of a body part contains set numbers of input and/or output neurons. They do not contain any hidden neurons or neural connections. Therefore a neural network consisting only of those specified input and output neurons is the simplest controller possible for a given morphology. However, in real gait learning experiments a hidden bias neuron with a sinusoidal activation function was added, and connected to the output neurons inside all active hinges of the robot (see Figure 4.5). The activation function for this type of neuron, as explained earlier, does not depend on the input value, but rather produces a cyclic output that depends on time and values of the neuron parameters (amplitude, phase offset and frequency). The justification for this decision is that gait is a cyclic process, and oscillating bias neurons would have been added to the network by mutations anyway. By adding this neuron explicitly we bypass an initial stage of evolution where controllers produce non-cyclic motions that stop shortly after the evaluation begins. Ideally, the cyclic output of the oscillator would be modulated by sensory inputs as it propagates through the network, which would produce locomotion patterns that respond to the sensory information. This

(29)

(a) spider (b) snake

Figure 4.5: Minimal starting neural networks for NEAT experiments.

being said, when talking about initial simplest neural networks for given morphologies, we will further mean neural networks with a hidden oscillator neuron connected to all output neurons in active hinges of the robot body, unless stated otherwise.

The initial population of controllers is generated by applying non-structural mu-tations to the simplest controller. The population size is a fixed parameter of the algorithm. Each new generation can contain a number of new children created by crossover and mutations, and a number of the best-performing parents from the pre-vious generation, that are copied into the next generation unchanged. This practice of bringing a number of individuals into the next generation is commonly known in evolutionary computation as elitism. The percentage of the population occupied by the“elite” individuals is also a parameter of the algorithm. All the non-elite candidates are deleted as soon as a new generation is produced.

4.5.3 Tournament selection

A tournament type selection is used to select candidates for reproduction. This is done by randomly selecting a subset of the population without repetition, and then selecting the two best individuals from that subset. These two individuals then become parents of one child for the next generation. The tournament selection is repeated as many times, as many children need to be created.

The size of the subset (the tournament size) can vary from 2 to the total size of the population. The tournament size affects the selection pressure. The minimal tournament size imposes no selection pressure whatsoever, because any 2 candidates have equal chances to be selected regardless of their fitness values. Conversely, setting the tournament size equal to the size of the entire population means that the two absolute best individuals will be selected to produce every child for the next generation. We set the tournament size as a percentage of the total population size.

4.5.4 Mutations

As mentioned in the chapter 3, mutations can either add new structural components or change parameter values of the existing components. The neuron addition muta-tion is done by randomly selecting an existing connecmuta-tion and splitting it into 2 new

(30)

connections with a new neuron in the middle. The old connection is removed. One of the 2 new connections receives the weight equal to that of the old connection, while the second connection receives the weight of 1. The connection addition mutation chooses a pair of neurons at random for which no connection exists, and adds a new connection between them. The weight of the new connection is drawn from the normal distribution with zero mean and a user-specified variance. It is worth mentioning that since the connections are directed, the ordering of the pair of neurons matters. A new connection from neuron A to neuron B can be added if the connection from B to A already exists.

The numerical parameters of the neurons are mutated by adding the normally distributed random variable to the current value. The random variable is drawn from the normal distribution with zero mean. The neuron parameters have upper and lower bounds, and the variance of the random variable is specified as a fraction of the range of the parameter. The new value of the parameter is checked against the bounds. This mutation is applied to only one parameter of the neuron, which is chosen with equal probability from the set of parameters (e.g. for a sigmoid neuron either gain or bias is mutated with 50% chance).

The connection weights are mutated the same way as neuron parameters, except the variance for the weights is specified in absolute values, since the weights are unbounded. The non-numerical properties of the neurons, such as the type of the activation func-tion, are not mutated.

The probability of structural mutations determines whether to apply the mutation to a genotype. At most one structural mutation can be applied to a genotype regardless of its size. This means the rate of growth of the genotypes with time can be linear at most. The probability of parameter mutations, on the other hand, determines whether to apply a mutation to every single gene in the genotype, which means that in principle all genes in a genotype can be mutated, and the probability of any single gene to undergo mutation does not depend on the number of genes in the genotype.

4.5.5 Crossover

The crossover of two genotypes is done by constructing a list of pairs of their genes with equal historical marks. The child genotype inherits one gene from each pair with 50% chance. For genes that have no pair in the other parent, all the genes from the fitter of two parents are inherited. The genes of the less fit parent are not inherited.

4.5.6 Controller evaluation

All the controllers in the population are evaluated for a set period of time. The evalua-tion is done by activating a controller and measuring the linear horizontal displacement of the robot from its starting position by the end of the evaluation period. The fitness value is displacement divided by the length of the evaluation period:

f itness = p∆x

2_{+ ∆y}2

∆t (4.6)

Thus, fitness is effectively the speed of linear locomotion of the robot averaged over the evaluation period.

(31)

4.5.7 Protecting innovations by means of fitness sharing

As explained earlier, NEAT has a speciation mechanism to protect innovations. This is implemented by adjusting fitness values for each individual to account for its similarity to others. After the fitness value fi of the individual i is evaluated, the shared fitness

value f_i0 is calculated by comparing i to every other individual in the population:

f_i0 = Pn fi

j=1sh(δ(i, j))

(4.7)

Here n is the population size, δ is the pairwise dissimilarity metric and sh is the sharing function:

sh(δ) = (

1, if δ < τ

0, otherwise (4.8) where τ is the speciation threshold specified by the user. As we can see, this formula calculates the number of individuals that are similar to a given organism, and divides its fitness by that number. The fitness sharing effectively penalizes organisms for being too similar to others and promotes variety in the population. By changing the speciation threshold we can regulate the greediness of the evolutionary process.

4.6 CPG controller implementation

Controllers based on Central Pattern Generators are constructed using the same neural network implementation, but with a different type of neurons that we call differential neurons. The activation behavior of these neurons obeys the following differential equation: ˙ ui = X j∈Ψi wijuj + biasi (4.9)

Here ui is the output, biasi is the constant bias, and Ψi is the set of incoming

neurons of the i-th neuron, respectively. As we can see, this neuron calculates the rate of change of its activation value based on the input signals, rather than calculating the activation value directly. We will shortly demonstrate that we can get a CPG-like behavior from a network of these differential neurons.

Let us consider a system of two differential neurons connected with a two-way connection, as shown in the Figure 4.6. Let us denote the outputs of the two neurons as x and y, and write a system of first-order differential equations that governs this neural structure:

˙x = wxyy + biasx (4.10a)

˙

y = wyxx + biasy (4.10b)

where wxy and wyx are weights of connections from the neuron Y to the neuron X,

and vice versa. This system has harmonically oscillating solutions for certain values of wxy and wyx, namely, when they have opposite signs (see Figure 4.7). The frequency

(32)

of oscillations depends on the concrete values of the weights, and phase offset and amplitude depend on initial values of x and y. We will call this structure a differential oscillator hereafter.

Figure 4.6: A single differential oscillator made of two interconnected differential neurons.

Figure 4.7: Outputs of a single differential oscillator made of two interconnected differ-ential neurons. The weights of connections have equal absolute values and opposite signs.

Figure 4.8: Output values of differential oscillators that are coupled with a two-way connection.

Given an arbitrary robot body with a number of motorized joints, we will add a differential oscillator to each joint. Further, let us couple the oscillators by chaining them with two-way connections, as shown in Figure 4.10.

Connecting oscillators this way will result in an additional term for every incoming connection in the equations 4.10:

˙x = wxyy + X j∈Ψx wxjuj+ biasx (4.11a) ˙ y = wyxx + X j∈Ψy wyjuj+ biasy (4.11b)

Locomotion learning for evolving modular robots of arbitrary shapes