An Embodied Cognition Perspective on Minimally Cognitive Agents in a Joint Action Task

(1)

University of Amsterdam

Research Project I

Master Brain & Cognitive Sciences

Track: Cognitive Science

Donders Institute, Nijmegen & ILLC, Amsterdam

An Embodied Cognition Perspective on

Minimally Cognitive Agents in a Joint

Action Task

Author: Simon Hofmann Javastraat 43-1 1094GZ Amsterdam UvA-ID: 10865284 Supervisors : Katja Abramova Dr. Willem Zuidema Co-Assessor : Dr. Julian Kiverstein Amsterdam, October 6, 2016

(2)

List of Figures

1 Minimal Joint Action Paradigm by Knoblich and Jordan (2003) 10

2 Performances over three blocks of trials . . . 11

3 Perceptron: Neural input & output . . . 14

4 CTRNN: Update of Neurons . . . 15

5 Behaviour of a single node . . . 17

6 Behaviour of two randomly initialized 8-nodes CTRNNs . . . 18

7 Connectome: Minimally Cognitive Agent . . . 19

8 Genome Lists and Reproduction . . . 21

9 Minimally Cognitive Agent: Rolling Bot . . . 23

10 The Best Rolling Bot of Generation 13000, Fitness: 3.36 . . . 24

11 Evolution-Driven Fitness Trajectories within 2 Groups . . . . 28

12 Influence of the Mutation Rate . . . 29

13 Joint Agents: Positions of Target and Tracker & Key-activations . . . 30

14 Single Agent: Positions of Target and Tracker & Key-activations 31 15 Joint Agents: Trained for One Target-Turn . . . 32

16 Joint Agents: Neural States and Input . . . 33

17 Joint Agents: Sensory lesioning . . . 34

18 Joint Agents: Sensory lesioning in Second Half of Simulation . 35 19 Joint Agents: Neural and Sensory Weights . . . 38

20 Joint Agents: Correlation Matrix of the Neural Activations . 39 21 Joint Agents: Correlation Plot of the Neural Activations . . . 40

22 Sigmoid Activation Function . . . 50 23 Joint Agents: Correlation Matrix of the Neural Activations . 51

(4)

Acknowledgements

Thanks to Katja Abramova and Jelle Zuidema for the inspiring supervision on the whole project. To Julian Kiverstein, Jelle Bruineberg and Marc Slors for the talks on representations, embodied cognition and the greater pic-ture. To Gisela Govaart for being a great listener, reader and adjuster of my thoughts. To Casper Hesp for inspiring talks about minimally artificial agents, their evolution and the technical challenges. To Diewuke Hupkes for introducing me to the world of servers. To the CLClab at the ILLC in Amsterdam for hosting an exotic project and to the CCS at the Donders Institute in Nijmegen.

(5)

1 Theoretical Framework

In this chapter the theoretical framework of minimally cognitive agents in a joint action task will be presented. Such a computational modelling paradigm is motivated by two streams of theoretical discourses in the cogni-tive sciences. First, an outline of the joint action theory will be given. It is one of the major research fields in recent times within neuroscience, cognitive neuroscience, psychology and cognitive science. Moreover, the outcome of this particular field is highly influential in and effected by related disciplines as artificial intelligence, robotics or cybernetics. The second stream which will be presented concerns the assumption of representations guiding cog-nitive processes in jointly operating agents. This assumption drives many debates in that field and beyond. In the following, a promising candidate, namely the enactive minimally cognitive models of continuous time recurrent neural networks shall be introduced, which is supposed to enrich and clarify the debates of the joint action theory and the related theoretical assumptions of representation.

1.1 Joint Action: Emergent and Planned Coordination

A crucial ability of basically any life form is to coordinate individual ac-tions to those of others. For instance, humans carrying objects together, constructing buildings, participating in city traffic or playing in one team against another. Wolves, dolphins and chimpanzees among other species go hunting in complex group structures, while taking complementary roles. And even seemingly primitive lifeforms like ants jointly carry leaves or manufac-ture multiplex nests in a coordinated fashion. So far, a joint action can be seen as any form of coordinated social interaction of two or more individuals, which has an impact on the agents’ environment. Despite of its crucial role in the survival and success of many living beings, the research field within (neuro-)cognitive sciences is quite young and the underlying mechanisms are hardly known. According to Sebanz et al. (2006), a major factor is that neuroscience of the last decades had predominately focused on single brains. Just in the previous years there are increased research endeavours on joint attention, action observation, task sharing and other sub-fields of the Joint Action Theory (ibid.). Nonetheless, the research on the neural underpin-nings remains in an early stage1.

In light of this, it is useful to distinguish between two notions of joint action: the weak and the strong notion. Going back to ideas by Gilbert (1990), one view underlines that agents, participating in a cooperation, form a cou-pled system and therefore a joint agency or “plural subject”. This

perspec-1_{An overview of the work on some neural mechanism in joint action is given by}

(6)

tive forms the weak notion of joint action, since its requirements are low2. Knoblich et al. (2011) call this form of joint actions “emergent coordina-tion”. Emergent coordination appears due to action-perceptions couplings and is independent of planning or common knowledge of the participating agents. In more detail, emergent coordination encompass the process of entrainment (synchronization or temporal coordination), shared affordances (common objects in the environment increase the probability of related ac-tions) and internal action simulation (observed actions activate similar action tendencies). In Chapter 1.3 we will see that within the theory of Enactivism related arguments for emergent behaviour are made. Taking this all together, the presented view allows also non-human lifeforms like ants to fulfil the re-quirements to perform coordinate behaviour, what Knoblich among others already call joint action.

In contrast, proponents of the second view on joint action would deny such subscription to primitive lifeforms. Moll and Tomasello (2007) state that even within Taï chimpanzees, which hunt in complex group structures, in-dividuals do not show signs of Gilbert’s “jointness” or what others called “we-intentionality” (Searle, 1990; Tuomela, Raimo; Miller, 1988). In other words, the primates do not posses an idea of the complementary and recip-rocal contributions of each group member, including themselves, nor do they jointly plan and commit to a shared goal (Bratman, 1992). This perspective reflects the strong notion of joint action. In such a view, mental capabilities of participating agents, such as mind-reading, planning, complex communi-cation and the successful establishment of shared intentionality are crucial requirements for joint actions or what Knoblich et al. (2011) call “planned coordination”. According to the authors, nearly all planned actions employ emergent coordination. Similar to Moll and Tomasello (2007), many scien-tists argue that just humans have such unique abilities3.

1.2 The role of co-representations in the Joint Action Theory

Even though the neural and cognitive mechanisms of joint actions are poorly understood (e.g. Chersi, 2011), the majority of joint action experiments is guided by (and interpreted with) the assumption that in order to success-fully perform a task together (e.g. lifting a table), agents need to co-represent each other and the other’s actions (Knoblich et al., 2011; Müller et al., 2011; Sebanz et al., 2005; Tsai et al., 2008). A co-representation involves a mental model of another agent, its actions and possibly its mental states (e.g.

inten-2

To take a step further, it could be claimed to withdraw from the dichotomy of joint and single actions. As basically any life form’s action perturbs the shared environment of other individuals and constitute new affordances for all members of the group.

3

(7)

tions)4. However, the employment of co-representations as an explanatory vehicle is a rather surprising trend, since cognitive science has no widely ac-cepted definition of representation in general (Dietrich and Markman, 2003), hence also the co-representation term remains undetermined. In addition to that, the indefinite conceptualization of representations remarkably influence the continuing debates about how a joint action is defined (see Chapter 1.1). Consequently, we can ask the question what role co-representations play in the joint action theory and how the two suggested notions of joint action5 can facilitate the disentanglement of the representation term?

Advocates of the strong view often appeal to our subjective experience of what a representation could be. For instance, in order to carry a table with somebody, I have to plan the whole transport, I have to become aware of how strong the other person is and I have to mentally visualize whether me or rather her should hold the table at the front to walk backwards with it. Finally, I must instruct the person. In case she does not know which table to carry, I have to take her perspective to understand that she was not able to know, because there are many tables in the room. So described6, this instan-tiates a complex form of joint action. However, according to Knoblich et al. (2011, p.65) even in minimally planned coordination “some awareness”7of the other agent’s contribution is required to successfully act together. Nonethe-less, the authors remain unspecific about the term. So that we can only hold that a co-representation is some form of a subjective experienced mental model, which is somehow related to another real agent and the agent’s cur-rent and future actions. In this sense the representation term fulfils the role of a pragmatic placeholder or theoretical construct for unknown cognitive processes and neural realizations (Bennett et al., 2007; Pitt, 2008). Fur-thermore, this conceptualization of representation pertains to the classical computational understanding in cognitive science that cognition is nothing else than rule-based operations on symbolical mental representations, which relate to the world (Kiverstein, 2010; Van Gelder, 1995). This view, which sometimes is called “cognitivism” (Kiverstein, 2016; Varela et al., 1991), led to “dead end” (Ramsey et al., 1991, p.181) scientific endeavours as in artificial intelligence (AI) research until the late 1980’s, which nowadays is referred to as “Good Old Fashioned AI”(GOFAI) (Dreyfus, 2007; Haugeland, 1985). However, the information theory by Shannon (1948) inspired more fine-grained definitions of mental representations. Dietrich and Markman (2003)

4_{This shall serve as a preliminary definition. We will see later, why such an}

under-standing turns out to be troublesome.

5

Butterfill (2012) indicated a similar distinction of the requirements for joint action, namely the separation of joint intentionality proposed by Bratman (1992) (strong notion) and joint goals in plural activity (weak notion).

6_{described with a psychological vocabulary} 7

Knoblich and colleagues might refuse the strong idea of a conscious awareness of such processes. However, they explicitly speak of a certain degree of awareness of the necessary contribution the other agent provides in a joint task.

(8)

propose that a representation is “any internal state that mediates or plays a mediating role between a system’s inputs and outputs in virtue of that state’s semantic content”. In other words, mental representations are states, inter-nal to an agent, which correlate with distal properties of the environment (including other agents) and are information bearing processes, which have a causal role on the agents behaviour (Dretske, 1986). Furthermore, even in absence of the external property, a representation that “stands-in” for this property must be enduring (Bechtel, 1998; Beer and Williams, 2015; Hauge-land, 1991; Van Gelder, 1995). Under such a definition, Dretske (1986) and others consider even bacteria8 as representation driven biological systems. We see that this conception of representation allows for (co-)representations in the weaker notion of joint action. Within this conception we can also speak of low-level representations. For instance, a co-representation could be any form of neural circuit in an ant brain correlating with states of an-other ant and which is causally effective even in (short) perceptual/sensory absence of the other group member in a coordinated action.

Such an understanding of representation has its attraction. However, neu-roscientific research is far from unveiling causal efficacious neural circuits due to methodological constraints such as limited device portability, low time-resolution (fMRI), low spatial resolution (EEG) and subsequently in-surmountable computational complexity of the statistical detection of dy-namical causal relationships (Daunizeau et al., 2011; Eklund et al., 2016; Logothetis, 2008; Lohmann et al., 2012; Vul et al., 2009). In addition to that, others have argued that similar to the underlying assumptions in the GOFAI research, also the more fine-grained understanding of mental representations is prone to fall into the “Cartesian trap”, i.e. the misleading conceptual dis-tinction of the mind and the body (Bennett et al., 2007; Kiverstein, 2010).

In the next chapter, a theoretical framework will be introduced which is widely known as Enactivism. From an enactivism point of view the assump-tion of representaassump-tions is troublesome (Clark and Toribio, 1994). Within this theory an alternative conception of the mind, brain and environment interre-lation is formulated. Moreover, a common modelling framework within this theoretical branch will be introduced, which captures the dynamical nature of minimally cognitive systems and allows for an empirical approach to revise strong representational explanations in many cognitive phenomena.

1.3 Enactivism and the Research Paradigm of Minimally Cognitive Agents

Under the aegis of the Embodied Cognition theory, Enactivism has had an increasing impact on the understanding of cognition in the past two decades.

8

A common example is the anaerobic bacteria, which posses small magnets (magnete-somes) to orientate along the magnetic field to the earth’s north pole in more oxygen free waters.

(9)

This theory emphasizes that cognition is not an isolated phenomenon, but takes place in an agent’s body (embodiment), which is (inter-)acting upon an environment and its affordances, i.e. possibilities for action (embeddedness) (Clark, 1998; Jones, 2003; Varela et al., 1991). For instance, Van Gelder (1995) rejects the idea of cognitive processes being computational and rule-based operating on mental symbols. He argues that such a complex system as the human mind, which is inevitably coupled with the environment, can only be understood in terms of a dynamical system9, which is continuously evolving due to the “mutually determining states” of the mind, the brain and the environment.

Simultaneously, with the increasing recognition of the embodied cognition theory (EC), a new modelling approach has been developed. A group around Randall D. Beer started to employ artificial neural networks, in particular continuous time recurrent neural networks (CTRNN), embodied in mini-mally cognitive agents. These agents and their emerging dynamics are ex-amined within simple cognitive tasks (Beer, 1996). Influenced by adaptive behaviour and dynamical system theories, this approach aligns with debates on the nature and necessity of representations in cognitive behaviour (Clark and Toribio, 1994).

The network architecture has three major advantages. First, the simplic-ity of the neural network and its parameters are still in a feasible range for mathematical analysis. The minimally cognitive agents consist mostly of not more than three to six fully interconnected neurons with additional attached motor-neurons, which allow the agents to perform motor executions under given tasks (e.g. Agmon and Beer, 2013; Beer and Williams, 2015). Second, even though the agents are minimally constructed, highly complex behaviour emerges from their dynamics. Third, CTRNNs have to a certain degree bio-logical plausibility due to their recurrent architecture and the applied Evo-lutionary Algorithms (EA), which guide the development of CTRNNs by heritable variation with differential reproduction of the network parameters (Beer and Williams, 2015; Husbands and McHale, 2004). The EA approach is “relatively prejudice-free” due to the open parameter search of the fully connected network10, which leads to the generation of insightful behavioural strategies within paradigmatic constraints derived from minimal theoretical assumptions (Harvey et al., 2005). Consequently, as Bedau (1999) and Di Paolo and Noble (2000) argued, such “emergent, computational thought ex-periments” can inform us about theories of natural cognitive phenomena. Crucially, since the states of the CTRNN are defined by differential equa-tions, which describe the network’s continuous inner dynamics and those with

9_{The Dynamical System Theory is a mathematical subfield, which describes complex}

interrelationships with differential equations, which depict the continuing change of states such coupled systems can have.

10

Due to the exhaustive connections no architectural pre-assumptions are made, besides of the number of neurons.

(10)

its environment (embeddedness), it is claimed that these networks don’t en-code knowledge in internal symbolic representations (Beer, 2014). Instead of perceiving information, then formulate a “symbolic plan” and execute it as proposed by proponents of cognitivism (ibid.)11, the neural networks evolve dynamical computations in a distributed manner12.

Taken together, Beer and other advocates of enactivism provide not just theoretical arguments in favour of the embodied cognition framework, but can also precisely describe the evolved behaviour in dynamical mathemat-ical terms. Consequently, this approach can contribute to a methodologi-cal toolset, which captures the dynamics of natural mind-body-environment systems. Heretofore, minimally cognitive agents have mastered sequential learning, decision and perceptual categorization tasks, as well as performing chemotaxis and legged locomotion (Yamauchi and Beer, 1994).

Recently, however, new attempts came up to investigate social interaction with such agents in minimal experimental paradigms (Di Paolo et al., 2008). So far, minimally cognitive agents performed dyadic actions (agents inter-acting with each other) (Di Paolo et al., 2008), but little has been done concerning triadic interactions (agents interacting together with an object). This form of interaction is considered as the highest level of joint behaviour and starts to evolve in children at the age of 5-7 months (Metcalfe and Ter-race, 2013). Without neglecting the complexity of triadic interactions, the successful dynamical system approaches on dyadic actions by Di Paolo et al. (2008) give hope that the above described methodology of minimal cognitive agents is also able to succeed in triadic joint action tasks. If so, formerly widely accepted assumptions of co-representationalism could be revised.

1.4 A Minimally Triadic Joint Action Paradigm

One primary goal of the presented research is to use minimally cognitive agents, introduced by Beer and others, in a triadic joint action task. Such a task supposes to be as minimal as possible, to take full advantage of the min-imal architecture of the dynamical agents with respect to the mathematical dynamical analysis. A suitable task was developed by Knoblich and Jordan (2003). In this task two participants control the acceleration of either the right or left movement of a tracker. The goal of the task is to catch a target, which is horizontally moving with a constant velocity on a two-dimensional pathway to the left or the right until it bumps into a wall and immediately, ergo non-gradually changes direction (see Figure 1). By contrast, to change the orientation of the tracker, which is moving e.g. to the right, one par-ticipant has to repeatedly activate the key, in this case to the left, until the tracker-velocity gradually reaches zero and then increases again towards the

11_{Arguments of that kind, i.e. in form of artificial life thought experiments, against this}

understanding of cognition were already made by Braitenberg (1986).

(11)

new direction.

Figure 1: Minimal Joint Action Paradigm by Knoblich and Jordan (2003)

For the analysis the task environment was split in three regions (2 Border Regions, 1 Middle Region). In the boarder regions Knoblich and Jordan emphasize the necessity of an anticipatory coordination strategy (ACS) to successfully follow the target, i.e. the tracker must be decelerated

before it hits the wall, in order to change direction and follow the target after its turn.

Knoblich and Jordan argue that in order to successfully perform the task participants have to follow two strategies. First, in the middle region of the screen a compensatory coordination strategy (CCS) is the best choice, i.e. adjusting action selection and timing based on immediate feedback/cues from the environment. More precisely, depending on the positioning of the tracker towards the target it has to be decelerated or accelerated. Such ad-justments can only be executed by one of the agents, hence they are arguably independent of the other group member. Second, in the boarder region group members have to plan their moves in relation to the anticipated actions of the other. The authors call this anticipatory coordination strategy (ACS). In other words, to follow the target as best as possible, the tracker, after approaching the boarder region, has to be decelerated in advance and accel-erated into the new direction after the abrupt turn of the target. That is both members have to give up their CCS, while one does not press his or her key anymore, the other execute the required de- and acceleration. Accord-ingly, such a coordination depends on both agents.

Furthermore, Knoblich and Jordan claim that during anticipatory coordina-tion it remains difficult to predict all accoordina-tion alternatives of the other agent, therefore external and reliable feedback about the other subject’s action might improve the performance. As a result, they included a sound condi-tion, in which participants got auditory cues accompanying either of the key activations (left, right). This led to a 2 × 2-Design, since all group results

(12)

were compared to single subject performances, i.e. where one participant controls both keys.

Figure 2: Performances over three blocks of trials

Already in Block 2, the performance of groups in the sound condition (+) was as good as the performances of individuals. The figure is borrowed from Sebanz et al. (2006), who have redrawn

the results by Knoblich and Jordan (2003).

Individuals were more successful in the task, since they had less problems than groups to effectively de- and accelerate the tracker right in time, i.e. applying the ACS. However, as predicted the auditory feedback about the exact timing of the other person’s action improved group performances signif-icantly (see Figure 2). Nevertheless, this action feedback was not necessary to apply an ACS, but enhanced the learning of it within the group condition (see performance over the three blocks in Figure 2). Thus, Knoblich and Jordan argue that this pattern is not simply explained by the assumption that the task becomes easier due to the sound cue. Instead, the auditory feedback supposedly facilitates the acquisition of the ACS, that is, group members learn to anticipate or co-represent the action of the other

(13)

mem-ber more easily. Accordingly, Knoblich and Jordan (2003, p.1015) suggest that “one uses one’s own experience in the braking situation to simulate the other’s braking performance”.

Even though the authors do not mention the co-representation term in the presented paper explicitly, it can be argued that they presuppose such a cog-nitive mechanism. As we have seen in Chapter 1.1, Knoblich and colleagues distinguish between two forms of interaction: Emergent and planned coordi-nation (Knoblich et al., 2011). The former describes spontaneous unplanned joint actions, whereas the latter stands for actions planned and executed under awareness of the participating agents. Concerning the described inter-action task, the authors explicitly assume planning and anticipation of both participants is necessary to successfully catch the target. Consequently, we can call such a action “planned coordination” with respect to Knoblich et al. (2011). Moreover, referring to a study by Guagnano et al. (2010) they argue that agents represent their co-actors even if it is not necessary to perform the task at hand effectively. However, in the above presented task subjects have complementary roles. Knoblich and Jordan concluded that a better perfor-mance unfolds when participants do not just coordinate their own, but also anticipate and integrate the other’s actions in their own action plans. Hence, the participants co-represent each other.

1.5 Research Hypotheses

Translating all this to the modelling paradigm of the continuous time cur-rent neural networks embedded as “nervous system” in minimally cognitive agents, following key question derives: are such agents capable of solving the triadic interactions task proposed by Knoblich and Jordan (2003). In such a task, the authors assume the necessity of co-representations in both of the participating group members. If the simulation is successful, a revision of the co-representation term is required, since its necessity within minimally cognitive agents is far from being tenable.

As a next step, the following three hypotheses can be formulated:

The first hypothesis is that two CTRNN architectures, which will evolve through an EA, can successfully act together in the joint action task devel-oped by Knoblich and Jordan (2003).

The second hypothesis states that the best strategy applied includes a de-and acceleration of the tracker before the target changes direction, which in turn leads to an improved performance. This strategy (ACS) was verified to be the best solution among human participants (ibid.).

The third hypothesis claims that the joint performance improves when the minimal cognitive agents can send each other a social cue representing the activation of the button each of them controls. This hypothesis refers to the human experiment by Knoblich and Jordan, who found a positive effect

(14)

when participants heard a tone which accompanied each keypress.

Additionally, all hypotheses will be compared with the achievement of a single minimal cognitive agent, which performs the tracking task by itself. Rather than being a hypothesis, one aim of the research is to keep the com-plexity of the network architectures low. This means that the two developed CTRNNs do not have more than eight interneurons each, while being capable of successfully solving the task; this morphology stays close to various designs of minimal cognitive agents by Beer and colleagues (Agmon and Beer, 2013; Beer and Williams, 2015). Other neurons transfer either the input (sensory neurons) or execute the output (motor neurons) and are not restricted in respect of the hypotheses, but by the experimental necessities.

To sum up, the aim of the research is to examine the dynamics and be-havioural strategies of minimally cognitive agents in a joint action task. This will inform us about the plausibility and necessity of co-representations. In Chapter 1.2 two notions of such specific mental representations were intro-duced. One is more psychologically appealing, the other resembles a more fine-grained conceptualization going back to Dretske (1986) and others. The proposed research tries to achieve the replacement of at least the former interpretation with a fully dynamical and mathematical description of the emerged behaviour and shall provide a conceptual clarification of the latter, in particular when such an understanding goes along with the symbolic ideas of the classical cognitivism. Thus, the aim is not to rigorously neglect the conceptual idea of co-representations in general, but shall contribute to a complementary tool-set to formalize cognitive behaviour.

2 Continuous Time Recurrent Neural Networks in

Joint Action

In the following chapter, the continuous time recurrent neural network will be presented, which resembles the “nervous system” of the minimally cogni-tive agents. Then, the morphology of such agents will be described, followed by an explanation of the applied evolutionary algorithms, which guide the development of the network’s parameters. Last but not least, the experimen-tal setup will be introduced, which is an adaptation of the triadic interaction task developed by Knoblich and Jordan (2003).

2.1 Implementation

2.1.1 CTRNN - a Dynamical Neural Architecture

A continuous time recurrent neural network (CTRNN) was applied. CTRNNs are “the simplest non-linear, continuous dynamical neural network model” and “they are universal dynamical approximators” (Beer, 1995; Funahashi and Nakamura, 1993; Harvey et al., 2005). They are distinct from normal

(15)

recurrent neural networks (RNN) with respect to the differential equations, which describe the change over time of the neural states (see Formula 1). While RNNs receive discrete input vectors that are propagated through the network sequentially, CTRNNs constantly update their neural states and include continuously changing external input over time. Nevertheless, in a model simulation on a digital computer this continuous update is approxi-mated by discrete time steps. This is done at a much finer scale than inputs are typically discretized in standard RNNs. A possible objection would be that due to this numerical integration (Explicit Forward Euler Method, see Formula 2) the continuous stream of information that real neural circuits follow is not reflected. Therefore, the network looses its biological plausi-bility. However, the stepwise integration nearly corresponds to the cortical 10Hz alpha rhythm (Buffalo et al., 2011; Kachergis et al., 2014), which cor-relates with the phenomenon of “perceptual framing” (Valera et al., 1981; Varela et al., 1991). For instance, if the time-interval of two flashing lights is smaller than 0.1sec, the two lights will be perceived as simultaneous. Either way, CTRNNs are also intriguing because of their simplicity. Therefore, they have computational advantages over networks with more biological plausi-bility as proposed by Hodgkin and Huxley (1952).

In order to understand the continuous update mechanism of a CTRNN, first let’s briefly recapitulate the easiest form of neural models, the percep-tron (see Figure 3). A perceppercep-tron receives an input vector x of n entries

Figure 3: Perceptron: Neural input & output

The perceptron was introduced by Rosenblatt (1958).

(xi {x1, x2, . . . , xn}). Each entry will be multiplied by its corresponding

weight wi, i.e. every entry has a defined impact on the neural activation.

These weighted entries are summed together with the bias term θ, which functions as a threshold. Then, the activation function, e.g. the sigmoid function (σ(x))13, receives the sum and transforms it to the normalized

out-13_{Also called the standard logistic activation function, σ(x) =} 1 1+exp−x

(16)

put of the neuron. In such a way, the perceptron can be used as a linear classifier. In case a sigmoidal activation function is applied, values above and equal to 0.5 would be classified in one category, and values below 0.5 in the other category.

As mentioned above, CTRNNs are capable to solve far more complex prob-lems than linear classification. However, we find all parameters of the per-ceptron also in CTRNNs, but these are differently arranged, which allows for a continuous and simultaneous update of the activation of all nodes in the network. In other words, opposed to the discretized feed-forward processing of the perceptron, the CTRNN recurrently updates all its nodes with respect to all their states, as depicted in Figure 4.

Figure 4: CTRNN: Update of Neurons

We see the update mechanism of one node (Neuron 1) of a CTRNN of size n. For all neurons this update happens simultaneously. Each node takes the neural activations off all neurons (including itself) into account, i.e. the network is fully connected. The update process is defined as being continuous (see Formula 1). What is new is the τ -parameter, which is called time-constant. It influences how fast the activation of a neuron changes over time. The dashed lines indicate that

depending on the application, not every neuron receives external input (Ii) nor does it

necessarily produce output for processes external to the CTRNN. The bubble with −yi

represents the shrinkage of neural activation over time, which occurs if the internal (Σ) plus the external input is smaller than the current state of the neuron. In short, a neuron looses energy over time if its input is too small. Neurons with such properties are called “leaky integrators”.

(17)

a CTRNN is defined in the following form: dyi dt = 1 τi − y_i+ N X j=1 wjiσ(yj+ θj) + Ii, i = 1, 2, ..., N (1)

The formula defines the change of a neuron over time represented by dyi dt . The current state of the neuron i is denoted by yi. τ is the neuron’s

time-constant (τ > 0) and wjidescribes the fixed connection strength (weights) to

the neighbouring neurons and itself (recurrent connection). The bias term is θ and an external input for neuron i (in our case the sensory input) is denoted by Ii. All neurons are interconnected, indicated by the sum (P)

over all products of the weights (ji) and outputs of neurons (j), which is transformed by the sigmoid function. As can been seen from the negative sign of the current state (−yi), the neuron is considered a “leaky-integrator”,

that is, if a node receives no input at all, it neural activation tends to zero14. For the simulation on a computer a numerical integration is applied (Izquierdo et al., 2008; Williams et al., 2008):

Euler method: y1 = y0+ hf (t0, y0), (2)

with the initial value yt0 = 0, and preferably a small time-step h = 0.01.

We use the (explicit) forward Euler method to solve the differential equa-tion:

yt+1= yt+ h

dy

dt (3)

According to Agmon and Beer (2013) following parameter ranges are applied: the τ range is from 1 to 10, all weights w and the bias θ are in the range from -13 to 13.

In the next section we will see a simple example of neural activation and the corresponding role of the network’s parameters. This should provide a more intuitive understanding of the behaviour of CTRNNs.

2.1.2 Parameter Analysis

In the following, a single node was examined. After a report by Potter (2006) some parameters were set to values which have no effect on the neural state, while others were “switched on”. In Figure 5 we see for example the states of single nodes with different parameter settings for the recurrent “self-weight” w and bias term θ over 500 time steps. Time constant τ has no effect, since it was set to 1. In the second half of the simulation the external input (I = 1) was cut to 0. We notice that a CTRNN can be excited without any external input.

14

(18)

Figure 5: Behaviour of a single node

An example of stable behaviour. The neural state slides over time to a stable energy level. The input Ii“only” shifts this energy level slightly up. In the second half of the simulation (after

250time-steps) the input is cut to zero. As a result, the neural activation slowly falls back on its intrinsic stable level.

The time constant τ determines how fast a state is reached. The bigger the time constant τ is, the longer the change of the neural state takes. The network is prone to primarily stable states15, but its behaviour can also be oscillatory and even chaotic (Beer, 1995) (see Figure 6).

2.1.3 Morphology of the Minimally Cognitive Agents

In the previous chapter, we have seen the neural activation patterns of a single node. When such nodes are connected to a network, they can gener-ate more interesting and complex behaviours. In the presented study, the network consists of eight interneurons and a sensory apparatus of two visual sensors, perceiving the position of the tracker and the position of the target, and a pair of auditory sensors, hearing either the right or left sound of the corresponding keypress (depending on the sound condition) (the experimen-tal setup of the joint action task is described in detail in Chapter 1.4&2.2). Two effectors or motor controllers activate either the left or the right key

15

(19)

Figure 6: Behaviour of two randomly initialized 8-nodes CTRNNs

Here we see two CTRNNs consisting of 8 nodes each. All parameters are randomly initialized. Each colour represents the neural state of one node over 5000 time steps. On the left, the network shows oscillatory behaviour, whereas the right network approaches after around 3000

time steps a stable state. Both networks received no external input I.

(see Figure 7). All sensors and motor controls have a weighted connection to a particular set of interneurons. The weight range is the same as for the weights between the interneurons (see Chapter 2.1.1). This morphology stays close to the architecture Agmon and Beer (2013) reported, which was implemented in minimally cognitive agents in a chemotaxis task. However, in the presented study two additional neurons were attached due to the extra auditory sensors.

2.1.4 Evolutionary Algorithm

The question arises, how the behaviour of minimally cognitive agents consist-ing of a CTRNN can be shaped to solve specific cognitive tasks? Established learning approaches such as backpropagation or Hebbian learning are not ap-plicable. Instead, Beer and colleagues apply evolutionary algorithms (EA), which allow for a relatively prejudice-free search in the parameter space (see also Chapter 1.3). In contrast to supervised learning rules like recurrent backpropagation, EAs only incorporate the overall performance (fitness) of an agent, rather than longing for whole datasets of, for instance, the agent’s motor output trajectories (Beer, 1997).

In the presented research, the genome of an agent is composed by all param-eters of the network, which are the time constants (n_τ = 8), the bias terms (nθ = 8), and the weights of each of the eight interneurons (nw = 8 × 8),

including those to the input sensors (visual nwv = 4, auditory nwa= 4) and

to the output controllers (motor n_wm= 4). Each of these parameter-vectors resembles one genome-section. Consequently, the search vector has 6 sec-tions with overall 92 posisec-tions16.

16

(20)

Figure 7: Connectome: Minimally Cognitive Agent

Connectomes, i.e. neural architectures, for agents in the joint and single condition do not differ. In the centre of an agent there is a fully connected recurrent neural network (interneurons). Visual sensors are connected to Neuron 8, 1 and 2 (in blue). Audio Sensors are connected to Node 7, 5 and 3, respectively (in green). Motor controls (effectors) connect to Neuron 6 and 4

(in red).

In the single condition, i.e. where one agent is performing the task alone, the population size was 110. In the joint condition, that is, where two agents perform the task together, there were two populations of size 55 each, one for the agents which control the left key and the other for those which control the right key (see Chapter 2.2).

There are many ways to implement an EA. For instance, Agmon and Beer (2013) used a fitness proportionate selection (FPS) in combination with an elitist selection, that is, the top-performing agents are copied to the next generation. For the joint action task, this approach was even extended. Some testing on the toy case, which will be introduced in the following sec-tion, indicated that this extended approach turned out to be more successful than a pure fitness proportionate selection. Hence, an overall mixed EA was employed, which incorporates the EA strategies in Table 1.

After the evaluation of a population (trial phase), i.e. sorting the agents based on their performance (fitness), the reproduction phase was initialized. In this phase, the sexual and asexual reproduction was being carried out (mixed evolution). A new generation was composed by the following steps (see also Table 1):

(21)

Table 1: Evolutionary Setup

Joint Condition Single Condition

1. Elitist selection 2 × 2 2

(≈ 3.6%) (≈ 1.8%)

2. Breeding 2 × 10 10

(sexual reproduction) (≈ 18.2%) (≈ 9.1%)

3. Fitness proportionate selection (FPS) 2 × 27 60

(asexual reproduction) (≈ 49.1%) (≈ 54.5%)

4. Random Fill-Up 2 × 16 38

(genetic immigration) (≈ 29.1%) (≈ 34.5%)

Population size 2 × 55 110

An elitist selection (1.) is when the best performing agents (here two) are copied to the next generation without any mutation, hence the lifetime of the best agents is not limited to one

generation. The amount of children (2.) was restricted to be not more than 10, since if the number extends 10, there would be a relatively high chance that too many children are bred

with the same genetic code (see also Figure 8). The FPS (3.) is a selection based on the performance of an agent. If the performance is better, the agent has a higher chance to produce

offspring in the next generation, which is a copy of its own genome plus a mutation. The random fill-up (4.) delivered new genetic material in the following generation in order to spread the parameter search. The slightly diverging percentages between the two conditions (for 3. and

4.) are due to the limited amount of children (2.).

First, the two best agents were copied to the new generation. This elitist se-lection warrants a monotonically increasing performance, but has also some drawbacks. For instance, this selection strategy is not able to adapt to chang-ing environments, especially within small populations (Bäck and Schwefel, 1993). However, in the here presented research such an adaptation capacity was not necessary, due to the fixed experimental setup. On the contrary, this form of selection rather allowed the evolution to converge faster. Moreover, since a FPS was also applied, the probability increased that the best agents reappeared both unmodified and modified in the new generation, which al-lowed for an active search around the current best parameter settings. Next, the two best agents (parents) produced 10 children. A child was pro-duced by taking a full copy of the best agent’s genome. Then, with a chance of 100% genetic crossover17 one randomly selected genome-section of the second best agent was implemented in the current genome-string (sexual re-production)(see Figure 8).

This was followed by the fitness proportionate selection (FPS)18. In this form of selection the probability p_i of agent i to be selected to produce

off-17

The 100%-chance of a crossover is due to the elitist selection. Since the best agents reappear in the new generation, the children suppose to absolutely differ from their parents (100%) for the purpose of genetic variation.

(22)

Figure 8: Genome Lists and Reproduction

The current population list contains the genome strings of each agent (rows, coloured bars), vertically sorted according to the fitness evaluation. Each colour represents a different genome-section. The size indicates the amount of parameter-positions a genome-section has (e.g.

the weights of the interneurons (w) is with 64 positions the biggest section). Among others, the new generation contains copies of the best two agents (elitist selection) and up to 10 children

from the previous two best agents (sexual reproduction). In order to produce a child, the following two steps were done: first, another full copy of the best agent is transferred to the new

generation. Then, one randomly selected genome-section (e.g. the θ-parameters) of the second best agent will replace the corresponding section in that copy. The chance of this random

selection is 100%, i.e. a genetic crossover is guaranteed.

spring in the next generation (asexual reproduction), is defined by: pi=

fi

PN

j=1fj

, (4)

with the number of individuals N and the agent’s fitness f_i. Conse-quently, the chance of an agent to be reselected was proportionate to its fitness. In other words, the higher the fitness of an agent was in comparison

(23)

to another agent, the higher was its chance to transfer its genetic material to the following generation. In the joint condition half of the agents, which were selected via the FPS, were randomly shuffled into new pairs to assure a partly independent evolution of both populations (left, right).

The remaining positions in the new generation were filled by new and ran-domly initialized minimally cognitive agents to refresh and spread the search into new regions of the parameter space.

Finally, all but the first two best agents fell under a mutation of a Gaussian random variable with a mean of zero and a variance σmut = 0.25, as

sug-gested by Agmon and Beer (2013). After approaching a local minimum in the evolution, the mutation variance was adjusted to σmut= 0.02, which led to a

slight improvement. In Figure 12 of Chapter 3.1 we see the impact different mutations rates have on the evolutionary learning curves. Nonetheless, no comprehensive experiments with the parameters of the evolution (mutation rate, population size) were conducted due to limited computational resources and time.

In many EA designs there is a concrete termination criterion defined be-forehand. One possibility is that the evolution ends after a given number of generations. Another option is to terminate the process when a desired fitness value is reached. In the presented research the termination of the EA was initialized ad hoc. The evolution was stopped after a critical amount of generations passed without improvements of the best agents’ performance.

The learning process turned out to be computationally expensive. This was not due to the applied mixed EA approach, but caused by the long evaluation process each agent had to run through, i.e. the simulation of the different trials. The evolution ran on two servers in parallel. To enhance the computational speed a CPU splitter was implemented in the code and executed via the terminal command xargs (see code in Appendix). Thus, the population list was split in six equally big sections and then for each section the trial phase was simulated (evaluation) on a different processor19. This led to an increase in processing speed by approximately the factor of 6. As a result, the processing time was reduced to about 01:10min for one generation and was for each condition nearly the same20.

2.1.5 Testing the Implementation: a Toy Case

In order to test the implementation of the network architecture and the evolution, a simplified version of the task was constructed. In this task, an agent had to navigate to given targets. In each of the eight trials, the target was presented at a different fixed location. Over all trials, this fixed location

19

Splitting on more CPUs would be possible, but the experimental 2 × 2 design (see Chapter 2.2) already required to run four evolutions in parallel, which results in a 4 × 6-usage of CPUs, while the most powerful server just had 32 CPUs (and other users).

20

(24)

circulated around the starting position of the agent with diverging distances. Inspired by the above mentioned architecture by Agmon and Beer (2013), the 6-node nervous system was equipped with two wheels and could drive in a two-dimensional space. Both its eyes received the angle to the target, respectively (see Figure 9). Since the morphology was smaller than that of the minimally cognitive agents in the joint action task, also the genome length (search vector) of the rolling bot was smaller (56 positions). The population size was set to 10. After approximately 3800 generations the

Figure 9: Minimally Cognitive Agent: Rolling Bot

The visual input of the this rolling bot was the angle of the bot’s eyes and the target. Since there was a small distance between the eyes, the angle for each eye was slightly different. The angle had no sign, hence the only way the agent could calculate the direction to the target, was

by subtracting the inputs from both eyes.

fitness of the best agent reached the value of 10, i.e. the average distance to all targets21. In Figure 10 we see the best agent after 13000 generations. For an animation of this figure please click: Rolling Bot Animation22.

21

If the bot did not move throughout all trials, the average distance was about 38.

(25)

The following conclusions can be drawn. First, these results show that

Figure 10: The Best Rolling Bot of Generation 13000, Fitness: 3.36

Each coloured trajectory is the pathway of the rolling bot to a given target. The simulation length of each trial was kept constant on 5000 time steps. The agent has no vision in the back.

Consequently, the best strategy which evolved is to circulate in the beginning of the trial to compensate for the restricted field of vision.

the implementation was successful. Second, the evolution achieved to find a reasonable parameter setting, thus a good behavioural strategy, which is nearly perfect. And third, the CTRNN architecture was capable of solving a minimal cognitive task, which complexity is comparable to the task given by Knoblich and Jordan (2003). Therefore, the implementation seemed ready for an adaptation to the task of interest.

2.2 Minimal Joint Action Paradigm: Experimental Setup

As a next step, the joint action paradigm formulated by Knoblich and Jordan (2003) was adapted for the computer simulation. The research design is as follows. First, a simulated environment was constructed in Python3.5, in which the minimally cognitive agents could act (see code in Appendix). The described joint action task (see Chapter 1.4) was implemented in this virtual environment. The 2 × 2 design allows the comparison of single vs. joint actions under a social-cue condition vs. non-cue condition, that is, sound

(26)

stimuli were presented when either one of the control-keys were pressed. A central goal of the implementation was to keep it similar to the hu-man experiment with respect to the velocities of the target and tracker, the acceleration, the morphological restriction of the agents and the simulation time. One could debate about whether it is necessary or at least makes sense for our purpose to keep the simulated time-scale equivalent to that of the human experiment. However, it is theoretically and conceptually appealing to keep these scales approximately the same. On the other hand, such an implementation allows for a more feasible comparison of both experiments. Consequently, the following decisions on the implementation were made.

The agents in the single condition have two motor neurons (effectors), which activate either the left or the right button in the given task, whereas agents in the joint condition just have effective control over one button. Crucially, for each condition the number of interneurons were kept the same to allow for comparability. An agent could press each key maximally twice per whole time-unit (t) or every 50 times-steps (ts)23.Yet, both keys could be activated simultaneously or with a delay, which could be even smaller than the restriction for a single key (< 50ts). Hence, the setting allowed for up to four keypresses per time-unit (left and right).24

In the condition with a social-cue, agents received auditory input. Each sensor (ear) gets activated by just one of the sounds, which accompany either the left or right keypress. The duration of the sounds was 0.1t or 10ts. Like in the human experiment this sound cue is the only social feedback a participant get about the other person’s action (cue duration = 100ms). The magnitude of that input was I_i = 1 (for Neuron i = [3, 5, 7]) multiplied with the corresponding auditory weight (wa). In the single condition these cues can be considered as a recurrent input or feedback of the agent’s own motor neuron activation.

Since Knoblich and Jordan (2003) did not report the size of the screen, a virtual screen size of 40 ([-20,20]) was chosen. Consequently, the visual input I2,8 was in the same range multiplied with the visual weights wv. The Input

range for neuron 1 (I₁) was double this size, since it receives the information about the position of both tracker and target.

In the computer simulation, the magnitude of the velocities (in visual angle per second (_secγ )) was kept the same as those reported by Knoblich and Jordan (2003). Hence, we can transform the unit of visual angle per second

23_{1t are 100ts, if h = 0.01 (see Chapter 2.1.1). We distinguish here between time-unit}

and time-steps, since the choice of Euler’s h (see Formula 2) has a significant impact on the outcome of the computation.

24

Knoblich and Jordan (2003) noted that the “optimal performance was computed under the assumption of a maximal keypress rate of 5 keypresses per second”. It can be argued that this assumption does not necessarily corresponds to real performances of participants, since more than 2-3 clicks per second are highly unlikely for a controlled acceleration.

(27)

into a computational distant-unit per time-unit (d_t):

1.0◦ γ

sec = 1.0 d

t (5)

Each agent (single) or agent-pair (joint) ran four trials. The initial start-ing position of the target and tracker were in the centre of the screen (zero point). A trial terminated after three target-turns, when the target reached the zero point again (see Figure 1 for a visualization). In each trial the tar-get velocity and initial moving direction changed. This ensured that agents do not just evolve a blind strategy, i.e. a strategy which is successful even without information of the target and tracker position. Instead, to success-fully perform on the task agents have to make sense of sensory information in their environment. There were two degrees of target velocity (slow, fast), and two initial target directions (left, right), thus, four trials25. When the target velocity was fast(4.3d_t)) the impact of the keypress on the tracker ve-locity was high (de-/acceleration by +/ − 1.0d_t). In case the velocity was slow(3.3d_t) the impact of the keypress was low (+/ − 0.7d_t). In the fast trial it took the target approx. 9.3t to travel from one side of the screen to the other, resulting in a whole trial duration of 27.89t, whereas for a slow tar-get 36.35t passed before it arrived at the zero-point. The ratio between the two durations and between the two velocities stays the same (0.767), hence agents had primarily to adjust the frequency of their actions in the different trials.

Next, it will be shown that keeping the simulation close to the human setup was not straightforward overall. In particular, the chosen simulation-length is based on a heuristic, which was validated as follows: While the screen size was not mentioned, Knoblich and Jordan (2003) reported at least the dis-tance between participants and the screen of dscreen = 80cm. Derived from

simple geometry, one can transform visual angle (γ) into any other unit of length with the following equation:

|ac| = 2 × tan(γ

2) × dscreen (6)

Where |ac| denotes an absolute distance (e.g. an object size) on a screen in front of a person’s eyes. Keeping the distance to the screen (d_screen) constant, one can transform with Formula 5&6 the velocity which was applied in the computer simulation into the approximated real unit of speed:

1.0d t = 1.0 ◦ γ sec ≈ 1.4 cm sec (7) 25

The order of the trials has no influence, since for each trial the neural states of an agent were set to zero. Hence, given the agent’s genome, the performance in each trial is deterministic.

(28)

With Formula 7, we could estimate the real trial duration for the hu-man experiment, assuming a screen size of 40cm26. A fast target moved with 6.01_seccm, which leads to a trial duration of approx. 20sec. The slow target, with a speed of 4.61cm_sec, needs approx. 26sec before its reaches the zero-point after three turns. However, a screen size of 55.87cm in the human experiment, which corresponds to most of nowadays standard wide-screen computer displays, would already result in about the same trial durations as in the simulation.

Finally, after an agent or agent-pair run all four trials its performance, i.e. fitness, was evaluated. The agents’ fitness was defined by the average dis-tance between the tracker and target throughout all four trials, therefore a smaller fitness is better. These values were the basis for the evolution algo-rithms described in Chapter 2.1.4.

3 Results

3.1 General Performances of Agents in All Conditions

All four conditions were simulated on two servers. When only slow to no improvement of performances within the four populations was discovered, the evolution was terminated. In other words, the evolution converged or at least was trapped in local minima. The development during the first 1000 generations was approximately continuous, whereas in the successive populations there was occasionally no progress over a couple of thousand of generations (e.g. see Figure 11&12). In Table 2 the best agent (single) and agent-pair (joint) of each population are shown.

Table 2: Best fitness for each condition

Fitness (F) Sound(+) Sound(-) _F¯

Single 7.26 6.98 7.12 (18.000 generations) (21.000gen.) Joint 6.01 6.31 6.16 (18.000gen.) (24.000gen.) ¯ F 6.635 6.645 6.64

The fitness is the average distance between the target and the tracker. The lower the value the better. If the tracker stays constantly on the target, the fitness would be around zero. If the tracker does not move at all, the fitness is around 10. And if the tracker moves in the contrary

direction the values would be greater than 10.

26

This could be legitimate estimation with respect to the size of the screen in Figure 2. However, the image could be an idealization and does not depict the real experimental set-up.

(29)

After 18,000 generations the performance of the agent-pair in the sound condition turned out to be the best (F_{J oint,Sound(+)} = 6.01). In general, agents in the joint condition were more successful than single agents ( ¯FJ oint=

6.16 < ¯FSingle = 7.12), whereas the sound-condition was not generally

im-proving the performance ( ¯FSound(+) = 6.635 ≈ ¯FSound(−) = 6.645).

Never-theless, the auditory feedback improved the performance within the joint-condition (F_{J oint,Sound(+)} = 6.01 < F_{J oint,Sound(−)} = 6.31), but appeared to be distracting for individual performers (F_{Single,Sound(+)} = 7.26 > FSingle,Sound(−) =

6.98). These results are not statistically backed. To allow for a statistical analysis, the evolution would have had to run many times; this was not pos-sible due to limited time and computational resources. It would have been a possibility to run statistical test with e.g. the best 10 agents or agent-pairs within each group. That did not turned out to be promising after all, since the fluctuation of fitness between generations within one group was too high, as we can see in Figure 11. Consequently, such statistical results could have changed notably if the evolution just stopped one generation earlier or later. In Figure 11 we see that even the fifth best agent (purple) regularly

approx-Figure 11: Evolution-Driven Fitness Trajectories within 2 Groups

We see two fragments of the evolutionary learning curves of two populations. These are the performances (fitness) of the five best agents within a group, each represented in a different colour. On the left are the trajectories for the Joint Condition with Sound-Cue (+) from

generation 15001 to 16000. The final best fitness was reached already at this stage (FJ oint,Sound(+)= 6.01). On the right are the trajectories of the Single Agents without the

sound feedback (-) from Generation 1251 to 2250. We see in both cases that no progress was made within these intervals.

imates the fitness value of 10. This value was reached by many randomly initialized agents. For instance, if the tracker was not moved at all the av-erage distance to the target was about 10, ergo, so was the agent’s fitness. Similarly, Figure 12 shows how much impact the mutation rate has on the performance of succeeding generations. With a smaller mutation rate a pop-ulation stabilizes more effectively on a certain range of fitness values (here: σmut = 0.02), whereas a higher mutation rate (here: σmut = 0.1) results in

(30)

agents or agent-pairs. In other words, at this mutation rate the performance of these best agents (3.-5.) is not much better than the performance of most randomly initialized agents. Hence, even in such an advanced state of the evolution (generation 17001-18000), as can be seen in Figure 12 (right) , the population does not generally improve under a high mutation rate27.

Figure 12: Influence of the Mutation Rate

Left and right are the fitness trajectories of the five best agents in the same population (J ointSound(+)) over 1000 generations, but in different evolutionary phases with two distinct

mutation rates (left: σmut= 0.02, right: σmut= 0.1). The bottom line represents the best

agent-pair. Their fitness was not outperformed by any other agent-pair throughout this evolutionary phase.

3.2 Analysis of the Best Agent-Pair

3.2.1 Trial Performances and Behavioural Strategy

Next, the results of the best (joint) agent-pair of the sound condition will be analysed. This pair performed the task most successfully among all evolved agents. In Table 3 the single performances for each trial are shown.

Table 3: Fitness for each trial of the best evolved agent-pair

Fitness (F) fast slow _F¯

left 5.72 5.5 5.61

right 6.5 6.34 6.42

¯

F 6.11 5.92 6.01

Initial Target Direction (left, right) and trial speed (fast,slow)

When the initial target direction was to the left, the agents were more

(31)

successful following it with the tracker ( ¯Flef t = 5.61 < ¯Fright = 6.42)28.

Furthermore, during the trials with slow target velocity, the performance was better than those with a fast moving target ( ¯Fslow = 5.92 < ¯Ff ast =

6.11). Consequently, the best performance of the agent-couple was during the slow trial with initial target direction to the left (Fslow,lef t= 5.5). For an

animation of this best trial click the following link: Best Agents in Joint Action29.

Figure 13 shows the trajectories of the target and tracker over the full length of this trial and the corresponding key activations of the agent couple.

Even though the first turn of the target is seemingly ignored by the agents,

Figure 13: Joint Agents: Positions of Target and Tracker & Key-activations

Best trial of the agent couple in the sound condition (Ftrial= 5.5). The initial target direction

is to the left, the target moves slowly (¯vtarget= 3.3du_tu).

In the graph at the bottom, the blue squares indicate the key activations and the surrounding yellow circles represent the corresponding sound-cues.

the tracker stays close to it for the following two turns. For comparison, we see in Figure 14 the best trial of the single agent in the non-sound condition. Similarly to the agent-pair, there is no (behavioural) reaction to the first turn of the target, then the second turn is followed quite well, but for the last turn no convincing behaviour reveals. In general, at this stage, within all conditions no strong solution for the complete task at hand has evolved. All best evolved agents show the same reluctance towards the first turn of the target. Nonetheless, all those agents are able to anticipate at least one turn of the target with the same strategy as can be seen in Figure 13&14.

In addition to that, another evolution was initiated. In order to check whether the minimally cognitive agents are capable of reacting on the move-ment of the target right after the task begins, all four conditions were evolved for 10.000 generations just on the very first turn (see Figure 15). Agents in

28

This is not a general phenomenon, hence not driven by the agents’ morphology. Some other agents had a stronger performances, when the target initially moved to the right.

29

(32)

Figure 14: Single Agent: Positions of Target and Tracker & Key-activations

Best trial of a single agent without sound cues (Ftrial= 6.524). The initial target direction is to

the left, the target moves fast (¯vtarget= 4.3du_tu).

all condition successfully reacted instantly on both initial target directions (left, right) and revealed the same strategy mentioned above (see black box in Figure 15). If tested on the following two turns as well, these agents were not proficient to successfully follow the target. It can be concluded that the CTRNN is indeed capable of reacting immediately to the moving target. Nevertheless, if it is trained and has to perform on three target-turns the reluctance towards the first turn seems to be an attractive strategy to increase the fitness through the evolutionary development. Then, at a later stage such evolved networks appear to have difficulties to adapt its inner dynamics towards the first turn of the target, i.e. the evolution is trapped in a local minimum.

3.2.2 Neural Activations and Network Lesioning

Coming back to the best evolved agent-pair, the neural states of each agent were examined (see Figure 16). The different magnitudes of activation of neurons connected to the sensory apparatus between the two involved agents, which can be seen in Figure 16a, are mainly due to the differently evolved sensory weights as presented in Chapter 2.1.3. These high activations were flattened out through the dynamics of the network, in particular due to the properties of the sigmoidal activation function (see Formula 1 and in the Appendix Figure 22)). However, these neural activations are not very informative, unless they are compared for example to those of the same but lesioned network. In Figure 17 we see two types of lesions. For the first networks, the auditory information was cut off. For the other networks, no input at all was available (for their trial performances see Table 4&5).

(33)

Figure 15: Joint Agents: Trained for One Target-Turn

Depicted is the learning curve of agent-pairs from generation 1001 to 10,000 trained on the first target-turn. The mutation variance is: σmut= 0.1. Here, the fitness values are better overall,

since the average distance to the target is smaller for just one target-turn than for three. In the black box, we see the positions of the target and tracker for the best evolved agent-pair.

Table 4: Fitness of the Lesioned Network Pair: No Sensory Input

left 17.36 18.43 17.9

right 16.67 16.93 16.8

¯

F 17.02 17.68 17.35

Initial Target Direction (left, right) and trial speed (fast, slow). We notice that all performances drop significantly.

Table 5: Fitness of the Lesioned Network Pair: No Auditory Input

left 6.79 5.34 6.07

right 8.43 8.48 8.46

¯

F 7.61 6.91 7.26

Initial Target Direction (left, right) and trial speed (fast, slow). Despite of lesioning, the general performance remains on a good level, relative to other intact networks.

the nervous systems of both agents tend to stabilize over time, similar to the random initialized 8-node CTRNN in the right half of Figure 6 of Chapter

(34)

Figure 16: Joint Agents: Neural States and Input

Neural states and input during the trial with slow target velocity and the initial target direction to the left.

a) shows the neural states of each neuron in both agents. The stronger colour stands for the right agent, whereas the more transparent colour represent the left agent. Neuron 1,2 and 8 show high activations in comparison to the other neurons due to their connection to the visual sensors. In b) these neurons were removed to get a clearer image on the states of Neuron 3-7. c)

shows the neural input. Here we see how neuron 2 (red) and 8 (orange) integrate the information about the target and tracker position, respectively; and how neuron 1 (blue) receives

input about both positions. Neuron 3 (green) and 7 (grey) hear the right and left tone, respectively; and neuron 5 (yellow) integrates both auditory cues. In d) the activation of motor-neurons 4 and 6 are plotted in relation to the distance between target and tracker. The

capital letters L and R represent the activation of the keys ((L)eft, (R)ight).

2.1.2. The key-activations are completely off and over a long period of the trial even simultaneously executed. As a result, the tracker collide with the right border and remains there. Nonetheless, one conclusion which can be drawn is that the evolved (intact) network makes sense of the sensory data and adapts its behaviour appropriately. Interestingly, the lesioned networks reveal the same reluctance to activate any key during the first turn of the target, therefore it seems that the inner dynamics of the networks suppress the motor activations independent of the external input.

(35)

Figure 17: Joint Agents: Sensory lesioning

a)-c) shows the behaviour of the evolved network of the best agent-pair, when no sensory input is received throughout the whole trial. Trial- and overall fitness degreased tremendously

F = 17.35, Ftrial= 18.43 (for performances of all trials, see Table 4).

In d)-f) only the auditory input is cut off. The trial fitness is even slightly better than for the non-lesioned network, but the overall fitness decreased (F = 7.26, Ftrial= 5.34) (see Table 5).

Nevertheless, the performance remains reasonably good.

Opposed to that, when only the auditory input was cut off, the networks’ behaviour remained similar to the intact ones’ and the drop of performance was moderate30 (FSound−Lesion = 7.26 > FJ oint,Sound(+) = 6.01). The

(36)

tory feedback appears to be a meaningful input to fine-tune behaviour and adjust dynamics between the two minimally cognitive agents. This is not surprising, since the sound cues resemble the only physical connection both agents have towards each other.

One more type of lesion is of interest. In Figure 18 we see the neural and behavioural outcome of the coupled networks that stop to receive input in the second part of the simulation.

Figure 18: Joint Agents: Sensory lesioning in Second Half of Simulation

The networks received no auditory and visual input in the second half of the simulation (Ii= 0).

a) & b) show the neural activations during the trial. In a) we see how the neural activations, in particular of neuron 1,2 and 8, flatten out during the second simulation half. Even though we find shifts in activation, if we compare the neural states in b) with those of the intact network in

Figure 16b) the keypress patterns in d) and Figure 13 remain nearly the same.

Consequently, the neural activations of the lesioned and the intact net-works are starting to diverge in the second half of the trial. Nevertheless, even though we also see this shift specifically in Neuron 4 and 6, which are