Undermining the Opponent: Extending BDI Agents with Disruptive Behaviour for the Multi-Agent Programming Contest

(1)

Undermining the Opponent:

Extending BDI Agents with

Disruptive Behaviour for the

Multi-Agent Programming

Contest

(2)

(3)

Undermining the Opponent: Extending

BDI Agents with Disruptive Behaviour

for the Multi-Agent Programming

Contest

Yassin Abdelrahman 11822155

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science

Science Park 904 1098 XH Amsterdam

Supervisor

Mostafa Mohajeri Parizi & Dr. Giovanni Sileno Complex Cyber Infrastructure (CCI)

Faculty of Science University of Amsterdam

Science Park 904 1098 XH Amsterdam

(4)

Abstract

This work has been done as part of a team project for participating in the Multi-Agent Programming Contest of 2020. In this contest the objective is to create a Multi-agent system in an environment where agents need to score more points than the opponent to win the game. Points are scored by delivering constructs of building blocks to goal states. Our framework is built on the Belief-Desire-Intention model of agency and is based on the agent-oriented programming language Agentspeak(L). In this thesis we focus on delaying the enemy team from delivering constructs by creating the blocker agent, an agent that stands on one of the goal states and performs or fakes a clear action on an enemy that is trying to deliver a construct to the goal. Because there can be up to 100 agents loaded in the environment there is the need for computational efficiency when implementing the agents. The effectiveness of sub-symbolic and symbolic AI is discussed and the latter was used in the implementation of the blocker because of its vast superiority in efficiency.

(5)

3.2.1 Server . . . 11 3.2.2 Agent . . . 11 3.2.3 Builder . . . 11 3.2.4 Attacker . . . 11 3.2.5 Scout . . . 12 3.2.6 SuperAgent . . . 12 3.2.7 Strategist . . . 12 3.3 Graph . . . 12 3.4 Navigation . . . 13 4 Blocker Agent 14 4.1 Reasoning . . . 14 4.2 Approach . . . 15 4.2.1 Machine learning . . . 15 4.2.2 Symbolic AI . . . 16 4.2.3 Choice . . . 17 4.3 Implementation . . . 17 5 Experiments 19 5.1 Scenarios . . . 19 5.2 Results . . . 21

(6)

(7)

Chapter 1 Introduction

Real-world scenarios and problems can often be modelled in terms of competition. Ex-amples of these scenarios include team sports [8], team games, animal species fighting for certain locations or prey, and market or other social human behaviour. In academic settings, competitions are generally targeted towards stimulating research; in certain occasions, academic competitions might be about optimization of strategic decisions in scenarios like the ones given above.

One of such competitions is the multi-agent programming contest (MAPC)[4]. This is a competition intended to stimulate research in the area of multi-agent system (MAS) development and programming. A MAS is a system consisting of multiple autonomous intelligent agents that interact with each other and the environment they are set in. These systems can be used to solve problems individual agents find difficult or impossible to deal with. Therefore, A MAS can be used to research how agents could cooperate and use teamwork[14], a quality that is crucial in recreating human behaviour and intelligence and consequently important in the area of artificial intelligence (AI).

OpenAI has recently made important progress in the area of agent-oriented program-ming. OpenAI Five [2, 9] was a project that used reinforcement learning to optimize decision making in the popular videogame Dota 2. However, the technological require-ments necessary to apply such a method from scratch are very high. In addition, several trends of research in AI are trying to reduce this burden by studying possible interaction between machine learning and symbolic AI. The present research project can be seen as being part of this streamline of work.

(8)

1.1 Research Project

The original research proposal was to design and develop a team of agents (possibly) par-ticipating in the MAPC. The work has been distributed between five bachelor students, each taking a different role and/or a different approach to creating a certain role. For the work of the other students refer to [5, 12, 1, 13].

The MAPC of this year gives us a scenario of two teams of agents that have to build and turn in constructions for points after gathering the resources from dispensers. The team with the highest amount of points wins the game. The contest of last year had the same concept apart from a handful of small differences. Most of the contestants of last year could build constructs, explore the environment and navigate to their goals consistently. To our knowledge1, no team has considered the possibility to disrupt the enemy from turning in constructs and preventing them to complete their goals. Because the main goal of the agents is winning the game, hindering the enemy could be used to widen the gap of points between the allied and the opposing team, increasing the chances of winning. In this thesis, the focus has been to implement this disruptive behaviour. This specific project will therefore focus on the agents disrupting (specifically the block-ers) the opposing team. These agents can attack the enemy agents by exploding them-selves to disable important enemies or blocking them from reaching their desired desti-nations.

1.2 Research Question

The general research question associated to this objective would be:

• What is the best strategy to interrupt the opponent’s progress the most while using the least computational resources possible?

In order to tackle down this question, we need to identify:

• What are the main approaches to implementing the interruption strategy in the agents?

1

(9)

Because approaches can be generally divided between symbolic and sub-symbolic, some consideration needs to be elaborated on:

• What are the pros and cons of symbolic vs. sub-symbolic approaches in the context of interruption strategy?

Then we reach the most specific research question that would help to respond the higher-level research question:

• How can either or both of these approaches be incorporated as a role in a BDI agent?

1.3 Thesis Overview

The remainder of this thesis is organized as follows. The background needed for under-standing the concepts used in this thesis will be explained in the next chapter. Chapter 3 presents the BDI framework that was made for this project. The attacker subrole that was implemented will be discussed and explained in chapter 4. The experiments and the results of these experiments will be shown in chapter 5, followed by the discussion of these results in the last chapter.

(10)

Chapter 2 Background

2.1 The Multi-Agent Programming Contest

The MAPC takes place every year with a new scenario introduced every two to four years. Every year after the introduction of this new scenario small changes are introduced to make the competition more interesting. The purpose of the repetition of the same problem is discovering innovative approaches not thought of in the previous years and perfecting those implementations.

Last year Agents Assemble[3] was introduced. In this scenario, a group of All-Terrain-Planetary-Vehicles is sent into hazardous environments to obtain necessary materials for the survival of the human race. In the simulation, these materials will be obtained from various dispensers (figure 1a) located all over the environment and have to be assembled into constructs (figure 1b) previewed on the taskboard (figure 1c). These constructs will then have to be moved to certain goal states (figure 1d) to earn points. Because it is a competition, two teams of robots (agents) will be put into the environment at the same time and the team with the most points at the end of a round wins.

(11)

(a) Dispensers (b) Task example

(c) Taskboard _{(d) Goal states}

Figure 1

While navigating through the environment as seen in figure 2, agents can encounter black obstacle terrain. Agents can not pass through this terrain, but can remove it by performing a clear action (figure 3). The agent has to stand still and perform a clear action for three turns to clear an area of terrain. When used on an opposing agent, the clear action disables it for a set amount of turns and destroys all blocks attached to it.

(12)

(a) (b)

Figure 3: Before (left) and after (right) the clear action of an agent

During the contest clear events can occur at random moments and at random locations. These are marked similar to clear actions performed by agents but with a random size ( figure 4). The clear event is shown for a number of steps before it resolves, functioning the same as an agent clear action with the added fact that obstacle terrain can be added on random locations in the center of the event.

Figure 4: Caption

The agents’ perception is limited in the environment. They have a vision radius of five blocks (nodes), as shown in figure 5. The key to navigating through the environment is storing the percepts on the map and sharing this information with friendly agents. More on this is explained in section 3.3.

(13)

Figure 5: Caption

The game server controls the simulation in a discrete time manner and every t = 4 seconds requests an action from all agents in the form of steps. The simulation stops at a certain amount of steps.

In our approach to the contest, agents are not only available on the playing field itself, but also as background characters, e.g, agents that are managing the team to success. A good metaphor for this would be soldiers on a battlefield, who represent the agents on the field, taking orders from the strategists away from the battle. They are told to perform small tasks without an explanation of how they should be done. The soldiers have to find a way to do these small tasks while the strategists manage all the soldiers to eventually complete the bigger task, winning the battle.

The MAPC implemented small but important changes to the scenario of last year. One of those changes was the addition of the taskboard where agents pick up tasks they want to do. Another important change is that more agents will be on the field. Last year every team had around ten agents on the field. This year the MAPC will increase that amount to up to 50 agents. This brings the need for agents that do not use as much computational power as others to make the framework more efficient.

(14)

2.2 Belief-Desire-Intention Model and Architecture

The belief-desire-intention (BDI) model of agency is an abstract architecture for rational agents[11]. Agents can use these concepts to achieve their goals. Their beliefs are the information the agent considers to be true about the environment and itself and what they might infer with this information. Beliefs can be summarized as the agent’s inter-pretation of the world. They are the source of information BDI agents act upon, until the information is proven to be false by new beliefs. Desires are the general goals or objectives the agents would like to accomplish. The sum of these desires can be seen as the motivational state of the agents. They deliberately choose which desires they want to commit to and fulfill.

Finally, the Intentions are the actions and plans the agent has chosen to perform to fulfill its desires. Plans are specifications of means by which an agent should satisfy certain ends [10]. They consist of two parts: the head and the body. The head itself also consists of two parts, namely the triggering event and the context. The triggering event is the specific condition necessary for this plan to start. The context is the set of conditions required for the plan to be executable. These conditions usually consist of desires or beliefs. The body consists of all the actions an agent does or the subgoals it has to achieve to succesfully execute the plan.

We can use the soldier metaphor again; their beliefs are everything they think they know about the battlefield they are on. Their desires are the goals they have to accomplish to arrive at the conclusion of the battle on the battlefield. An example could be defeating the opposition at a certain location. Their intentions are the plans and actions they have chosen to do right now to eventually fulfill this desire. These could be throwing a grenade, moving to a certain location or subgoals like defeating a certain person on this location. Subgoals also need plans to succesfully complete.

BDI allows for the production of accessible, explainable and understandable rules agents can follow, which is the reason that BDI has been a popular agent architecture in the MAPC.

(15)

Chapter 3 Framework

Our experience and review of the Agentspeak(L) BDI framework with the SPADE[15] package in python is presented in section 3.1. Section 3.2 is used to explain the final product of the new BDI framework made with python. This framework consists of multiple components (figure 6). The following sections will go into detail about each component. We first go over the idea behind the framework, followed by an explanation of the classes an agent is and is able to be, the way BDI is implemented and is communicated to the server and ultimately the manner of navigation is described.

(16)

3.1 Spade and Agentspeak(L)

As mentioned before, the group initially tried out the spade package with the spade_bdi plugin that integrates AgentSpeak(L) into Python. It allows the usage of Python func-tions and integrates the AgentSpeak(L) BDI-framework functionality. However, eventu-ally this approach was not used because of problems with the connection of the agents to the server the simulation was run on. The main issue was connecting multiple agents to the server and letting them send messages to the server simultaneously. For more information about these problems, see the work of D.P. Jensen and T. Stolp in [5, 12]. Because of this problem and the inability to find a solution, we ultimately chose to create a BDI architecture within Python. Agentspeak(L) was nevertheless a good example of how a framework could be built. Therefore, the constructed framework follows roughly the same structure as Agentspeak(L).

3.2 BDI implementation structure in Python

The custom framework is made with the architecture seen in figure 6. Every block in the framework is a class in Python. As the figure shows, every arrow can be interpreted as “inherits from“. This means that for example the Super Agent class inherits all the functionality of the Scout class which in turn inherits all functionality of the Agent class. This is done for two reasons. The first reason is that all the basic functionality that all agents need are combined into one class, Agent. The second is that agents in the field have to be flexible regarding the role they currently are assigned to. An agent with the Builder class has to be able to switch to the Attacker class when it is deemed necessary by the Strategist. This allows agents to pick up a task whenever a situation demands it, be it that this agent is closer to certain blocks so that it can build a construct or that it is able to block an enemy agent from reaching a goal state.

The implementation of the BDI model is as follow. Beliefs are stored in a graph. In-tentions are modelled as a tuple containing lists. Every one of these lists has a different purpose; the two most important lists are the method and arguments lists. Method contains the function the agent wishes to perform and arguments are the arguments this function needs. These two lists closely resemble plans in Agentspeak(L). Methods can contain primitive and non-primitive functions. Examples of primitive functions are move to the west, clear a certain location or attach a block to the agent. Non-primitive functions are more complex functions that require primitive functions to be executed. In the implementation the focus was more on the use of the beliefs and the intentions, not so much the desires. For more information about how BDI was implemented in this

(17)

3.2.1 Server

The Connector class makes a connection between the agents and the server. This enables agents to send and receive messages to and from the server. All agents use this class at every step in the simulation because they need to send a message to the server perform an action.

3.2.2 Agent

The Agent class has the basic functionality all agents are able to utilize. Actions like moving in a certain direction or to a certain block or attaching blocks to themselves to build a construct. All agents use this class at all times because all other classes depend on these basic functions.

3.2.3 Builder

The Builder class focuses on building constructions. Builders obtain tasks from the task boards and acquire the necessary blocks from a dispenser themselves or with the help of another builder. They then build the constructions by combining their blocks with the blocks of other builders and submit their work at the nearest goal state node. For more information and implementation details on the builder refer to the work of L. Weytingh and T. Stolp in [13, 12].

3.2.4 Attacker

The Attacker class focuses on making it difficult for the opposing team to complete their tasks. These agents can block the opposing agents from moving into goal states which makes them unable to turn in tasks. Additionally, they are able to perform a clear action on opponents to disable them and clear the blocks around them, causing them to have to walk back to a dispenser to obtain the blocks again. Thus, the attacker agents are supposed to increase the difference between the allied and opposing teams in a different manner than the builder agents.

The attacker role was split up into two subroles, the clearer and the blocker. This thesis focuses on the blocker, the agent that delays enemies from entering goal states. For information and implementation details on the clearer refer to D.P Jensen’s work [5].

(18)

3.2.5 Scout

The Scout class moves around the environment to create a map all agents can make use of. It discovers all the different dispensers the builders need to build their constructs of blocks or examines opposing agents to facilitate the attackers with the information they need for their disruptive tasks. For more information and implementation details on the builder refer to the work of D.J. Bekaert in [1].

3.2.6 SuperAgent

The Superagent class inherits all the functionality from all of the other classes. It is able to transition from being one type of agent to another whenever necessary, which is why every agent is ultimately a superagent.

3.2.7 Strategist

The strategist is an agent that is not necessarily present in the environment itself. This agent assigns roles to agents on the field with a static script. Strategists can be compared to the army-strategists in the metaphor.

3.3 Graph

Information about the environment is stored in the Graph class. This means that this is where the beliefs of an agent are stored. An agent can obtain all the information it needs at a certain step to perform a certain action by requesting it from the Graph. Examples of this are information about a node, an agent, anything an agent can see. The information about agents includes what team that agent is on. This is especially important when working on the attacker because they focus their actions on the enemy agents. The graph gets updated at every step of the simulation. Whenever an allied agent is seen it is possible to merge the graphs of these agents, sharing all the information they have and synchronizing their locations. For more information about the graph, see D.J. Bekaert’s work in [1].

(19)

3.4 Navigation

Navigation to certain nodes is done with the D* Lite algorithm [6]. This algorithm was chosen because the environment can change at any moment by the removal of black terrain and random clear events. D* Lite is a path planning algorithm featuring path replanning to accomodate this changing environment. For more information about nav-igation and D* Lite, see L. Weytingh’s work in [13].

(20)

Chapter 4 Blocker Agent

The blocker agent is the main focus of this thesis. It stays on a goal state that is crowded by enemy agents. The blocker senses enemy agents nearing the node and attempts to block them from turning in their tasks. It performs one turn of a clear action on the location the builder is going to be in the next step. Because clear actions are considered dangerous, the builder usually moves to a safer location, which is usually further away from the goal. If it does not move away a full clear will be used instead to destroy the construct and disable the enemy. Thus, in this fashion the opposition wastes steps turning in a construction, which could mean the difference between winning and losing the game.

4.1 Reasoning

The idea behind the blocker was the concept of an efficient attacker that focuses solely on hindering the opposition from completing their tasks. Because it does not waste com-puting power on navigation standing still in the goal state, it only needs computational resources to be put into its main objective, blocking. Because of this computational efficiency, this is the supernumerary class of choice in the framework, especially when a large amount of agents needs to be loaded in.

Because actions are only triggered by enemies closing in on the goal state, the blocker can detect multiple agents simultaneously or quickly after each other, causing it to be able to block multiple agents from turning in their tasks. While the blocker is doing this, other friendly agents could make use of this advantage by building and submitting

(21)

The main focus of programming this agent is how and when to perform a clear action on a certain location to effectively hinder an opposing builder. This is because the blocker has to block them from the goal for as long as he is able to to be as useful to his team as for example a builder.

4.2 Approach

There were two different feasible approaches to creating this blocking attacker agent. First, the strengths and weaknesses of implementing the blocker with either a machine learning or symbolic AI approach will be discussed in 4.2.1 and 4.2.2 respectively. Sub-sequently the choice of Symbolic AI over machine learning is explained in 4.2.3.

4.2.1 Machine learning

Applying certain machine learning algorithms (sub-symbolic AI) to this type of agent has its up- and downsides. let us use metaphor of the black box. The input would be a list of navigation routes from adversary enemies, moving from their origin to their destination. These destinations could be goal states, dispensers, other agents, task boards or random locations on the map. The algorithm has to predict when an opposing agent moves towards the goal state occupied by the blocker. Furthermore it has to determine if the agent is worth blocking; opposing agents carrying a construction are the only targets. Finally it has to determine when and where to perform a clear action to maximise the efficiency of their energy usage. Thus the variables to be adjusted by the algorithm are when and where the clear action has to take place. The location and timing of the clear actions are the output of the black box.

Machine learning would be effective in this scenario. This is because the measurement of the optimal location and step to perform a clear action could be more efficiently done with that algorithm than manually. Another upside is that the algorithm is arguably more creative than a human solving the same problem since it can approach the same problem from more angles than a human can. Finally, this approach works better on different navigation algorithms. All the different teams in the MAPC could approach navigation in a variety of manners. When the program trains with different navigation algorithms the prediction will always be more accurate than a Symbolic AI approach. In this case however, machine learning has its downsides. The training data could only consist of routes made with one navigation algorithm. This is because it was not possible to collect the navigation algorithms of the other teams or the algorithms of previous years. The reason for the latter was the amount of changes done to the contest. Furthermore, creating another navigation algorithm would be too time consuming. Having one method

(22)

not needed if the agent is using the same manner of movement from point A to point B as allied agents do. If an agent always advances towards a certain destination with the same approach the location and timing of the clear action it is supposed to do is telegraphed. And although different teams could use different manners of navigation in the actual contest, it could be argued that path prediction is not needed because the goal of enemy builders is always the same when carrying constructs; a goal state. If a clear is done in the direction of one of those builders there is a high chance the builder will react.

Another problem is the limited vision of the agents. They can not see opposing agents outside their percept unless there coincidentally is an allied agent around the location the enemy builder is coming from (figure 7). The algorithm learns to predict enemies closing in on the goal state by learning their route from begin to end. In the actual scenario it only has vision of the enemy when it is close, which causes the prediction of their route to be less accurate. Although following enemies as a blocker would solve this problem, namely by predicting their behaviour, navigation costs are too high when there are 100 agents in the field to be computationally justifiable.

(a) (b)

Figure 7: Textual representation of the vision of an agent before (left) and after (right) an enemy agent (B?) enters

4.2.2 Symbolic AI

A symbolic AI approach is effective in this specific scenario, because it does not rely on one specific navigation algorithm for the enemy builders to use. This means that agents always approach the goal state in the same way. For this reason, the timing and location of the blocker’s clear action is calculable and it is not required to predict enemy paths. This leaves us with two problems. The first is detecting if an enemy is moving closer to the occupied goal state. It takes a maximum of five steps from any location in the percept of an agent to the agent itself. To prevent an enemy from accessing the goal state, the

(23)

order to classify an agent as a builder. An example could be defining a minimum amount of blocks the opposing agent has to carry to be classified as a builder.

4.2.3 Choice

It is possible that a blocker supported by machine learning would perform better than the symbolic AI blocker in the actual contest. Despite this possibility, in this case and with the available resources the last-mentioned approach is the optimal one. In fact, both ways are effectively the same in this scenario, but using the machine learning approach would be computationally unjustifiable and blocking in a reactive manner as was done with the symbolic AI approach was much more efficient. Thus, in the following experiments, the symbolic AI approach was evaluated.

4.3 Implementation

This implementation relies on the logical assumption that builders move to goal states whenever they carry constructs. This means that the usage of different navigation algo-rithms should not let the blocker perform vastly different compared to the usage of D* Lite.

The approach chosen above includes several functions. The blocker initially uses the func-tion get_crowded_goal to determine the goal state most crowded by enemy agents and navigates to the center of it. This function uses the beliefs of the agent itself and the beliefs of nearby allies to see how many enemy agents and allied agents are in a certain radius around the goal state. If no allies were detected around the most crowded goal state it will adopt the goal to navigate to that goal. If at least two allied agents are detected the next most crowded goal will be chosen.

After navigating to the goals state, the same_agents_closing_in is then used to detect if enemy agents are closing in on the goal state the blocker is on. This uses same_agents to detect if an enemy from the previous step is an agent on a certain location right next to it. If it is the same agent,same_agents_closing_in calculates if it is getting closer to the blocker calculating the manhattan distance from the location the enemy is on to the blocker and comparing it to the manhattan distance from the location the enemy was on in the previous step. The manhattan distance is used because agents can not move diagonally.

(24)

If the enemy is getting closer, its location is added to thecloser_agents list. When-ever an enemy in this list is closer than a manhattan distance of six nodes and is classified as a builder, which means it needs to carry at least two blocks as to simulate a construct, the blocker usesclear_relative_node. This is used to perform one turn of a clear action in the relative direction of the enemy builder. These directions are in the form of the cardinal and intercardinal directions, e.g, northwest and southeast. We call this process blocking. If the enemy moves away, the blocker performs one turn of a clear action on the next agent approaching the goal state. If the enemy does not move away however, the blocker performs a clear action for three turns, disabling the enemy and destroying the construct it is holding.

For video examples of blockers blocking builders, see the showcases.md file on our shared github [7].

(25)

Chapter 5 Experiments

This chapter explains what experiments were done to test the blocker agent and its performance on these experiments. First, all the different scenarios in which the blocker was tested will be discussed in 5.1. Consequently the results of these experiments will be shown in 5.2.

5.1 Scenarios

There are multiple different scenarios possible for a blocker when defending its goal state from the builders delivering their constructs. The two variables are the sizes of the goal state groups and the amount of builders closing in on the blocker. In the experiments below, both sizes of goal state groups were evaluated. For the remainder of this thesis, the goal on the left in figure 8 is the small goal and the goal on the right is the large goal.

Figure 8: Different sizes of goal state groups

Because a single blocker has a hard time defending a larger goal state all by itself, a second blocker was added and set in the configuration seen in figure 9. The amount of agents approaching the blocker was set as 1 ≤ X ≤ 3 for both goal sizes. A maximum

(26)

than three builders entering one group of goal state at the same time or close behind each other. Every combination of an X amount of enemy agents and a goal size Y was tested five times. The starting locations of the enemies were randomized for every one of those tests.

Figure 9: Placement of two blockers on the large goal

The agents of the competing team are programmed to imitate the behaviour of a builder agent. The agent moves to a nearby goal state with blocks attached to it on one side to simulate a construct being carried to be turned in. The navigation algorithm used is D* Lite[13, 6], the one used for this project. Whenever the agent is on a node being cleared, it moves to the previous location it was on to prevent being cleared. When an agent sees that the next location on its path is being cleared, the agent stands still until it is safe to walk further. When it stands on a goal state however, it does not have to run away because it can submit a construct and run away before a full clear action can take place. Ultimately, the goal for the blocker is to prevent builders from accessing any goal states around it for as long as possible. The most important statistic is the amount of steps every enemy builder wastes attempting to reach the goal state. This statistic is calculated by subtracting the amount of moves a builder would usually take to enter the goal state from the amount of moves it took when delayed by the blocker. The calculating starts at the same time as the simulation starts and stops when all enemies reach a goal state.

(27)

5.2 Results

Small goal 1 builder 2 builders 3 builders 1 blocker 4 3.4,3.6 1.8,2,2.8

Table 5.1: Results for blocking a small goal (average steps taken in excess) Large goal 1 builder 2 builders 3 builders

1 blocker 2 1.6,0.8 1.4,0.8,1.2 2 blockers 4.4 3.6,3.2 2,2,2.4

Table 5.2: Results for blocking a larger goal (average steps taken in excess) The rows of the tables signify the amount of blockers on the goal state. The columns are the amount of builders approaching the blocker. The numbers are the average steps taken in excess by each builder. For example, in the second row and second column of table 5.2 there were two builders approaching the larger goal state with two blockers. Over five tests an average of 3.6 steps was wasted by the first builder and 3.2 steps were wasted by the second one.

Table 5.1 shows that on a small goal the blocker can face off two blockers with almost the same performance as if it faced off one. It can block one builder approaching from any direction for four steps consistently. When two builders are approaching from different distances the blocker lets them both waste four steps. When they approach from the same distance however, the blocker has no time to block each enemy twice, which is why the average steps wasted is below four. When there are three blockers the attention of the blocker is more divided, which is shown in the results.

For the large goal (table 5.2) however one blocker does not perform well. The goal is too large for the blocker to block at least one enemy two times and can only clear two agents when they approach at the same time from the same direction. The row showing the performance of the two blockers however is interesting. Together they almost equal the performance on a large goal state as the performance of one blocker on a small goal state.

Interestingly, whenever an enemy is blocked it usually wastes at least two steps, because it has to step back and forth once. This is why most of the average numbers are close to multiples of two. In the cases that the locations of two builder agents are close to each other the blocker would block both agents, which even though that was not its purpose, improved its results. As mentioned previously, these results are based on builders only utilizing the navigation algorithm used for this project. However, because this approach is not based on data obtained previously and instead based on logical rules, it should in principle generate the same results for other navigation manners as explained in chapter

(28)

It should be noted that since every scenario was only run for five iterations the results can vary. Some situations the builders positioned in the most optimal location for the blockers for them to waste six steps, other situations the builders positioned poorly in respect to the blocker, which let the blocker only waste a maximum of two steps. However, the combinations of different directions enemies can approach the goal state from are limited. For example, two builders approaching from the north and the east generate the same results as them approaching from the west and the south. This pattern follows for all other mirrored direction combinations. This means that even with a low amount of iterations the results are relatively accurate.

(29)

Chapter 6 Discussion and Future Developments

This project has been conducted by studying the interaction of both the symbolic AI and machine learning approaches in the context of the MAPC of this year. Machine learning has become popular in the last decade and is arguably a better approach approach to most problems in AI compared to the symbolic approach when looking at performance. This is because machine learning algorithms learn from previous behaviour and results which they act upon, while symbolic AI does not. It does not mean however that machine learning algorithms are always the better option of the two.

For the MAPC there was the need for efficient use of computational resources because of the increase in maximum agents available on the field. An example of the choice of a symbolic approach over a machine learning approach would be the use of D* Lite, a lightweight navigation algorithm, for navigation purposes. A similar dilemma occurred when deciding how to design a blocker agent.There were upsides to using a machine learning approach instead of a symbolic approach, e.g creativity and its possibility of a greater performance on different navigation algorithms. However, eventually the latter was chosen because of the logical assumption of the goals of builders, namely turning in a task at the nearest goal state, and because of the fact that a blocker implemented in the way that was done would be computationally efficient.

One important aspect of designing this approach has been its computational efficiency. In the MAPC 2020, the scenarios have changed to include up to 50 agents in each team (100 total) and expectedly much larger map sizes. Through the already done benchmarks, it is obvious that this will be a challenge for many teams and it is important that at least some of the agents in our team are less resource hungry. As shown in the implementation, after navigating to the goal state the blocker agents only rely on their local perception to reason about their actions. This means that they can easily scale with respect to both map size and number of agents. Furthermore the fact that for the most part the blockers

(30)

makes them much more desirable specially for settings with higher numbers of agents. The main downside of this implementation is its lack of persistence. A blocker can only know if an enemy is moving closer by comparing their location on the previous step with their location of the current step. This means that after the second step the blocker can execute its intention of performing one part of a clear action in the agent’s path or location. Thus for the detection of movement and the performing of one turn of a clear action a total of three steps is needed.

If there was a way to tag agents with an ID to be able to be tracked, it would be possible for the blocker to remember that a certain enemy is a builder. This way the blocker can block this agent for an infinite amount of moves if it is the only agent approaching the goal state, because it can perform one turn of a clear action in the path of the builder. This means that the builder has to move back every other step instead of every three steps.

For future research this symbolic AI approach could be compared to some machine learn-ing approaches. Another approach that could be done is predictlearn-ing what goal state node an enemy is going to end up on. Two steps before the opposing builder lands on the node the blocker initializes a full-clear action, clearing three steps in a row. When the enemy lands on the goal the construct it is holding is destroyed and the agent becomes disabled. This could be done by combining functionality of the blocker from this thesis and the clearer, an agent that predicts where enemies will be in three steps. Ultimately the navigation algorithms of the previous contestants of the MAPC could be used to fur-ther evaluate the implementation of the blocker in this thesis . Even though the scenario fundamentally underwent several changes, the navigation algorithms could be recreated to be applied to the contest of this year.

(31)

Bibliography

[1] D.J. Bekaert. “Using Topology-Grid Hybrid Map-based Exploration in an Unknown Environment”. unpublished. 2020.

[2] Christopher Berner et al. “Dota 2 with large scale deep reinforcement learning”. In: arXiv preprint arXiv:1912.06680 (2019).

[3] _{The Multi-Agent Programming Contest. The 2019 contest: Agents Assemble. url:} https://multiagentcontest.org/2019/.

[4] The Multi-Agent Programming Contest. The 2020 contest: Agents Assemble II. url: https://multiagentcontest.org/2020/.

[5] D.P. Jensen. “Teaching belief-desire-intent agents how to learn: an integration of machine learning and BDI agents”. unpublished. 2020.

[6] Sven Koenig and Maxim Likhachev. “Fast replanning for navigation in unknown terrain”. In: IEEE Transactions on Robotics 21.3 (2005), pp. 354–363.

[7] _{MAPC-UVA. mapc-uva. url: https://github.com/DanielPerezJensen/} mapc-uva.

[8] Sung-Wook Park et al. “Development of a multi-agent system for robot soccer game”. In: Proceedings of International Conference on Robotics and Automation. Vol. 1. IEEE. 1997, pp. 626–631.

[9] Jonathan Raiman, Susan Zhang, and Filip Wolski. “Long-term planning and situ-ational awareness in OpenAI five”. In: arXiv preprint arXiv:1912.06721 (2019). [10] Anand S Rao. “AgentSpeak (L): BDI agents speak out in a logical computable

language”. In: European workshop on modelling autonomous agents in a multi-agent world. Springer. 1996, pp. 42–55.

[11] Anand S Rao, Michael P Georgeff, et al. “BDI agents: from theory to practice.” In: Icmas. Vol. 95. 1995, pp. 312–319.

[12] T. Stolp. “Extending the BDI model with plan cost estimation for the Multi-Agent Programming Contest”. unpublished. 2020.

[13] L. Weytingh. “Extending the BDI model with optimisation-based goal-selection in the Multi-Agent Programming Contest”. unpublished. 2020.

(32)

[14] Jing Xie and Chen-Ching Liu. “Multi-agent systems and their applications”. In: Journal of International Council on Electrical Engineering 7.1 (2017), pp. 188– 197.

[15] Mohammed J Zaki. “SPADE: An efficient algorithm for mining frequent sequences”. In: Machine learning 42.1-2 (2001), pp. 31–60.

Undermining the Opponent: Extending BDI Agents with Disruptive Behaviour for the Multi-Agent Programming Contest

Undermining the Opponent:

Extending BDI Agents with

Disruptive Behaviour for the

Multi-Agent Programming

Contest

Undermining the Opponent: Extending

BDI Agents with Disruptive Behaviour

for the Multi-Agent Programming

Contest

Contents

Chapter 1

Introduction

1.1

Research Project

1.2

Research Question

1.3

Thesis Overview

Chapter 2

Background

2.1

The Multi-Agent Programming Contest

2.2

Belief-Desire-Intention Model and Architecture

Chapter 3

Framework

3.1

Spade and Agentspeak(L)

3.2

BDI implementation structure in Python

3.3

Graph

3.4

Navigation

Chapter 4

Blocker Agent

4.1

Reasoning

4.2

Approach

4.3

Implementation

Chapter 5

Experiments

5.1

Scenarios

5.2

Results

Chapter 6

Discussion and Future Developments

Bibliography