Using Topology-Grid Hybrid Map-based Exploration in the Multi-Agent Programming Contest

(1)

Using Topology-Grid Hybrid

Map-based Exploration in

the Multi-Agent

Programming Contest

(2)

Layout: typeset by the author using LA_TEX.

(3)

Using Topology-Grid Hybrid

Map-based Exploration in the

Multi-Agent Programming Contest

Combining the TGHM exploration algorithm with the

BDI framework for the Multi-Agent Programming

Contest: Agents assemble II

Dorian J. Bekaert 11308974

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science

Science Park 904 1098 XH Amsterdam

Supervisor

Mostafa Mohajeri Parizi & Dr. Giovanni Sileno

Complex Cyber Infrastructure (CCI), Informatics Institute Faculty of Science

University of Amsterdam Science Park 904 1098 XH Amsterdam

(4)

Abstract

This thesis is part of a larger project to create a team for the multi-agent program-ming contest (MAPC), an annual competition designed to stimulate the research of multi-agent systems. This thesis will focus on exploration in the multi-agent programming contest environment, as agents in this competition only have a lim-ited vision. The exploration algorithm is based on a topology-grid hybrid map (TGHM) which uses distance and information gain to determine the most optimal exploration route. This algorithm is applied to the MAPC environment and tested against a random exploration algorithm. The results these two algorithm yielded shows that the TGHM algorithm is at least twice as fast the random algorithm. The agents use a belief-desire-intention model to rationally select which action they are going to perform.

(5)

1 Introduction

In situations where it is hard or impossible for people to reach a certain area, such as rescue operations after an earthquake or other planets, a (mobile) agent can be deployed. The problem with these situations is that the communication between the robot and its supervisor either is not steady enough, in case of rubble, or the robot takes to long to respond, if the robot where to be on Mars. For this rea-son it is more advantageous if the agent can perform its own exploration in these unknown environments. It is also possible to use multiple agents to perform this exploration as this will save time. The studies of multi-agent systems (MAS) fo-cuses on multiple intelligent agents interacting with each other, to solve problems that are either too complex or time-consuming for an individual agent.

This thesis is part of a larger project to create a multi-agent team for the multi-agent programming contest. The multi-agent programming contest (MAPC) is an annual competition created to stimulate research in the area of multi-agent systems. The project is a combination of Abdelrahman [1], Jensen [2], Stolp [3], Weytingh [4] and this one. The agents in the team can be assigned 3 different roles: attacker, builder and scout, which all use a belief-desire-intention model that uses an agent’s desires/goals and information about the game state to determine their intention/plan of action. The strategist and task manager are agents which do not actually participate in the MAPC simulations themselves, but focus on managing the agents who do play.

This thesis will focus on the scouts, the strategist and the manner in which agents store the information about their environment. The scouts are the agents fo-cused on exploration which they do by using a topology-grid hybrid map (TGHM) algorithm[5], which explores by generating candidate target points and using the distances and information gained by going towards those point to determine to best possible candidate. This algorithm has already been successfully applied on a mobile robot exploring a 3D environment in which it performs better than a forward simulation-based algorithm[6] and a frontier-based approach[7]. In this thesis the TGHM algorithm will be successfully applied to the 2D environment of the MAPC, after which it will be compared to a random exploration algorithm.

The strategist decides which roles are assigned to what agent, while it also aids the agents in sharing environment information amongst themselves. The manner in which the agents store this environmental information is in a graph which con-sists of a rectangular grid of nodes where each represents a cell in the grid of the MAPC environment.

(7)

This paper is organized as follows: Section 2 gives more information on the multi-agent programming contest and review the belief-desire-intention framework and previous work about exploration. Section 3 primarily describes the setup for the larger project itself, i.e. the agents’ multi-agent system and their BDI model. After that it will explain the agents’ graph, the TGHM exploration algorithm and the strategist. Section 4 presents the results of the implemented exploration algorithm and compares it to random exploration. Section 5 and 6 provide a conclusion, discussion and future works.

2 Background information

2.1 The Multi-Agent Programming Contest

The Multi-Agent Programming Contest (MAPC) [8], as the name suggests, is an annual programming contest where teams design a multi-agent system to compete against other teams in a certain scenario. The competition is organized to stimulate research in the area of multi-agent systems by helping identify problems, gather test cases, help debug and analyze existing systems for their weak and strong aspects. In this year’s contest, Agent Assemble II, two teams of multiple agents have to compete in a 2D grid world. The goal of each team is to score points by gathering blocks, assembling them into a complex structure and deliver the outcome to predestined goal sites. The simulations last a certain number of steps, usually 750, in which the agents from each team have to work together to achieve the highest score by the end of the simulation.

The agents have a limited local vision and can the area around them (which is in the shape of a diamond, reaching 5 cells to each side). At the beginning of the simulations the agent do not know their location or the dimensions of the world. Furthermore, the agents can loop the world horizontally and vertically, meaning that if they are on the far-right edge and move to the east, they will end up on the far-left.

If an agent wants to create and deliver a structure, the first thing it has to do is find and move towards a taskboard, where it can see which tasks are available. Tasks are blueprints that illustrate which blocks should be delivered in what shape. Larger structures with different block types score more point for the team as they are harder to assemble. After the agent received its task, it has to gather the blocks for the structure. There are three different types of building blocks, b0, b1 and b2, which can be requested from dispensers of the corresponding type. The different block types and dispensers can be seen in Figure 1, where blocks are represented by flat squares (with their corresponding type written on them) and dispenser by

(8)

3D square (also with their corresponding type written on them. Once the agent has collected and assembled the structure as instructed by the task, it has to move towards a red goal site to submit it and gain the points. Besides taskboards, blocks and dispensers there are also obstacles in the environment, which are represented by the black squares. The obstacles can be cleared by the agents to reveal more taskboards, blocks, dispensers and goal sites.

Figure 1: A MAPC Agents Assemble I simulation

Every step the agents can perform an action to interact with the environment. An agent can perform a move action to move on cell to the north, east, south or west. Furthermore, agents can attach, detach or disconnect things to or from themselves. This way they can transport items such as blocks from a dispenser to a goal site so that when they move they can, for example, carry a block with them. An agent can also request a block from a dispenser, submit a task at a goal site, accept a task or clear an area. Every agent also has a certain energy level, which gets recharged by 1 energy every step.. If an agent decides to clear an area, it takes 30 energy and three steps to ‘charge’ the action (during which the agent can not move). A clear action can not only destroy obstacles, but it can also be used to destroy blocks and temporarily disable other agents. In particular, a clear action clear the selected cell and the 4 adjacent cells.

(9)

The game environment is run on a separate server to which the agents have to connect. The communication between the game server and the agents is done by sending JSON files back and forth. The server send different types messages to the agents, to inform them about the current state of the simulation. When the simulation starts the agents receive a ‘sim-start’ message informing them the new simulation is starting, which contains the name, team, team size and total number of step of that simulation. Agents can also receive a ‘sim-end’, ‘bye’ and ‘request-action’ message. While ‘sim-end’ and ‘bye’ are fairly self explanatory, a ‘request-action’ message is a bit more complicated as this message contains the perceptions of the agent’s local vision and the tasks it received. If an agent accepts a task, only that agent can submit. Another agent can however accept that same task and submit it, but that agent needs to accept the task itself.

These perceptions contain the location of all things within the agent’s vision relative to the agent, such as: other agents (from both teams), blocks, dispensers, obstacles, clear markers and goal sites. The message also tells the agent whether its previous action has succeeded.

When an agent receives a request action message, it has 4 seconds to choose its next action. If after those four seconds the server has not received next action from the agent, it continues to the next step and the agent does nothing that round. If the agent does however respond in time, it send a JSON message as the one in Figure 2 containing the agent’s ID, the action type and possible action parameters. It would be possible for an agent to enter another agent’s ID instead of its own, the same way it would be possible to have one ‘central’ agent send all the messages1_. 1 { 2 " t y p e ": " a c t i o n ", 3 " c o n t e n t ": { 4 " id ": 1, 5 " t y p e ": " m o v e ", 6 " p ": [" e "] 7 } 8 }

Figure 2: The JSON message from an agent to the server, instructs the server that agent 1 requests to move east.

1_{More information about the game scenario and the different types of message can be found}

(10)

2.2 Literature review

Belief-Desire-Intention Model

The belief-desire-intention (BDI) model[10], which is used by agents, can select a desire and use information about the current game state to rationally select an intention/plan of action. In the model, beliefs are information about the game state such as facts, X is a dog, but also inferential rule, if X is a dog, than X is an animal. Desires are goals that the BDI agent is trying to accomplish by selecting the right plan of action. If an agent is hungry, for example, it could have the desire to eat some food. Lastly, an agent has plans of action, which describe a sequence of actions that, when performed successfully, will accomplish a certain desire. If an agent chooses a desire (‘find something to eat’) it could trigger a plan of action, which will then become the agent’s intention. If an agent is hungry and has a desire to eat some food and has beliefs that: he is located in the living room, there is a refrigerator in the kitchen and the is pizza in the refrigerator, it could select the intention to eat food. Based on its beliefs it can than split up this ‘macro-intention’ into smaller sub intentions like, ‘walk to location where food located’, ‘get food from refrigerator’ and ‘eat food’, each with their own sub goals. These sub intentions can then be split up even further; ‘get food from refrigerator’ could split up in: ‘open refrigerator door’, ‘pick up pizza’ and ‘close refrigerator door’.

The agent can dynamically use the new beliefs to change it intention, which makes the model reactive in real-time and this is a advantage in a dynamic envi-ronment. Let us say that the agent is hungry and is performing its sub intentions ‘walk to location where food is located’ with the belief that there is pizza in the kitchen. Once it has walked to the kitchen and receives new beliefs that the pizza is located in the living room, it will move towards the living room before checking the condition, if the food is at the same location as the agent, and moving on the next intention.

AgentSpeak

A programming language that is based on such a BDI model is AgentSpeak(L)[11]. An agent script in AgentSpeak(L) consists of three parts; the initial beliefs and inferential rules, which represent the BDI model’s beliefs; the initial goals, which represent the desires; and the plans to reach those goals, which (if chosen) are the agent’s intentions. Each agent executes its own AgentSpeak(L) script and manages its beliefs, which it updates with new information it receives from the environment, and an intention stack, in which it stores all its intentions to be executed.

Topology-Grid Hybrid Map algorithm

(11)

using a Topology-Grid Hybrid Map. The paper applied the algorithm on a mobile robot with sensors in a 3D environment. The robot used LIDAR sensors to scan the surrounding area and an odometer to measure the distance it travels itself. Then it processes this environmental data to determine what location it has to move to, to achieve the best performing exploration. The TGHM algorithm mea-sured itself against two other approaches, namely a forward simulation-based au-tonomous exploration algorithm[6] and a frontier-based approach for auau-tonomous exploration[7]. The former uses Monte Carlo planning[12] to generate potential paths and then computes their ‘reward’ value. The path with the highest value will be chosen as the target for the mobile robot to move to. The frontier-based approach describes a manner to detect and navigate to these frontiers in grids.

These two other exploration approaches however, have a low mapping effi-ciency while the TGHM algorithm selects its target points by taking into account information gain and motion cost, resulting in a greatly reduced exploration time.

2.3 Research questions

In its original paper the TGHM algorithm performs its exploration using a mobile robot in a 3D environment. In this thesis the TGHM exploration algorithm will be implemented in the 2D Multi-Agent Programming Contest environment and tested against a random exploration algorithm.

3 Method and Approach

3.1 BDI agent

As previously written, the BDI model uses beliefs, desires and intentions to ratio-nally choose its next goal. Part of this thesis is to create a BDI model in python based AgentSpeak(L). Beliefs are facts about the state of the world and in the python BDI model are represented as the perceptions that the agent receive from the game server. The intentions are plans that the agent has chosen to do to fulfill a desire, which are represented as python functions. Once a intention is selected it is put added to a intentions stack (LIFO), which the agent uses to select its next intention to complete. The intentions consist of a tuple containing a method, args, context, descriptions and a Boolean variable stating if it is a primitive function or not. Because the intention should not be executed immediately, the name of the function (method) and its arguments (args) are separated. The context is used to give the function a condition which must be satisfied in order for the agent to start the intention. The descriptions are strings which describes the goal of the intention. Lastly, there are macro-intentions (e.g. explore the environment

(12)

or complete a task) and primitive ones (e.g. move north, clear an area), which are divided by the primitive Boolean variable being True, in which case the func-tion is primitive, or False otherwise. Primitive funcfunc-tions are immediately added to the intentions stack, while non-primitive functions are decomposed until they are multiple primitive functions (after which they are also added to the intention stack).

In AgentSpeak(L), desires are states which the agent wants to bring about based on its beliefs and can thus be seen as goals. In the python BDI model, these goals are represented as the conditions of the if-statements in an agent’s macro-intentions which decide what plan will be chosen as the next intention2_.

3.2 Game setup

Before the agents can start creating intentions they need to be able to connect to the game server. The agents do this by connecting to the sockets of the game server, which is done in the python Server class (not to be confused with the game server running the simulation). The Server class uses multi-threading where each agent gets its own thread, so they can all run simultaneously.

Primarily, spade [13] (Smart Python Agent Development Environment) was to be used for running the agents and their communication between the game and each other. This choice was mainly made because spade also provides an AgentSpeak plugin which was to be the initial setup for the agents’ BDI models. On a later stage of the project, however, as the idea for AgentSpeak was discarded, so was spade due to complications [2][3].

The Agent class, which inherits the Server class, provides the functions which the agents invoke to perform the basic game actions as specified in the MAPC’s Github repository under docs/scenario.md.

Every agent eventually is an object from the SuperAgent class. This is the agent class which inherits the classes used to play a certain role and the BDI agent class. Because an agent has to be able to play multiple roles (not simultaneously), it also has to inherit every role class. These role classes, attacker, builder and scout in this case, in their turn inherit the Agent class. These role classes provide the SuperAgent with BDI intentions (e.g. the scout returns an intention to explore the environment, the builder an intention to build a task, etc.). The SuperAgent is the class which handles the messages from the game server, updates the information and gets its intention from the role classes (depending on which role the agent plays at the moment. The SuperAgent inherits the BDI Agent class, which handles the intentions the SuperAgent receives from the role classes by breaking the intentions

(13)

down to primitive functions (i.e. the basic agent functions) and inserting them into the intention queue 3.

Once every agent is running their SuperAgent class, they can communicate with the Strategist (who assigns the agents their roles) and the Task Manager (who selects the task the builder will create).

Figure 3: The inheritance architecture which creates an agent.

3.2.1 Provisioned Tools

The code in which all the agent classes3, including the BDI agent, is written in Python3.7 [15]. This is because everyone contributing to this project is most familiar with this programming language, and that allowed for faster and more advanced development. The results of the simulations are plotted using the python package, Matplotlib [16].

3.3 Different agent roles

Now that the agents can connect to the game server and have their basic setup, they can be divided into the different roles.

As briefly mentioned in the introduction, an agent can be assigned one of three roles: a builder, an attacker or a scout. The builder, as the name suggests, is in charge of gathering and assembling the structures. Builders are essential to the team as they score the points which will help win the contest4. But the way to

3_{The entire project’s code can be found on Github[14].}

(14)

win is not only to score points, but to score more points than the opponent. This can be done by either building structures for your own team or obstructing the opponent so they can not score points. This is exactly what the attacker does. It tries to obstruct and disable enemy agents so they can not accomplish their tasks or score points5. Lastly there is a scout agent, who’s main goal it is to explore the environment. Agents exploring the map are necessary for builders, as they need to know where blocks, dispensers and goal sites are. Attackers are also aided by the scouts as they need to know where the opponent is in order to stop them. Another goal of the scout is to find teammates. Each agent holds an internal map of the environment in which it stores the information it perceives. If two friendly agents see each other they can combine those internal graph, so they not only share the information they collected but also their current location. This internal graph will be further explained in a later subsection.

It is possible to create more agents than the simulation can register. These unregistered agents can help the others with role assignment or task selection. The first is done by the strategist, who assigns the registered agents their roles (attacker, builder or scout) but also helps them merge their internal graph with each other. The task manager decides which tasks will be constructed and by which agents6_.

3.3.1 Navigation

Agents need to navigate to be able to go from point A to B. This is done by using the incremental heuristic search algorithm, D* Lite[17]. An incremental search algorithm stores the weights it calculates so it can reuse that information instead of having to calculate it again. Heuristic search algorithms (e.g. A*[17]) use the distance to the goal state to guide the direction in which the algorithm is searching. Incremental heuristic search algorithms combine both features to speed up the search of similar search problems, which is important in environments that are unknown or change dynamically (both of which apply to the MAPC environment). The D* Lite implementation uses Manhattan distance[18] to calculate the distance to the goal state, in the heuristic part of the algorithm. It is decided to choose the Lite version of the D* algorithm because it is a simpler version of the algorithm and at least as efficient as regular D*7_.

5_{More information about the attacker can be found in Jensen [2] and Abdelrahman [1].} 6_{More information about the task manager can be found in Weytingh [4] and Stolp [3].} 7_{More information about navigation can be found in Weytingh [4].}

(15)

3.4 Mapping the environment

3.4.1 From perceptions to graph

The agents store the information they perceive from their surroundings in a graph. This way they can easily recall where dispensers, goal sites, other agents, etc. are located. They also store the locations of obstacles, which helps them navigate through the environment more efficient. The MAPC environment is dynamic, which is why the agents update their graph with their new perceptions at the very beginning of every step. This way new obstacles, dispensers, agents, etc. will be added to the graph, but also, for example, a location, where there used to be an obstacle but is now empty, will be updated so the agents know the obstacle was cleared.

The Graph is a python class that stores the agents’ current location, game step, other relevant game information and the actual graph itself. The graph itself consists of a rectangular grid of nodes where each represents a cell in the environment. The nodes are stored in a python dictionary where the keys represent the coordinates of the nodes and the values are the node objects themselves (also another python class). Each node stores information about the terrain (empty, obstacle or goal site), who its neighbouring nodes are and it maintains a dictionary of things that have been located on the node at a certain step. This results in the nodes not only possessing information about the current state of the game but also about its past states. This information can be used and processed to, for example, deduce the opponents movement.

When an agent receives a ‘request-action’8 _{message from the server it uses the}

perceptions of that agent’s local vision to update its graph. The perceptions in this message contain the current game score, the agent’s current task and information about its previous action and it that action has succeeded. But that information that will actually be added to the graph is the things seen in the agent’s vision, which consist of the coordinates of that thing relative to the agent, what type of thing it is, e.g. entity(agent), blocks, dispensers, etc., and more detailed informa-tion about that thing. In case the thing is an entity, it shows on what team it is; if it is a block or dispenser, it shows what type it is (b0, b1 or b2). The message also shows which on which coordinates the terrain is non-empty (e.g. a obstacle or goal site) and it shows where an d what type of blocks the agent has attached to itself.

Each SuperAgent object holds its own graph object. If two agents decide to merge their graphs, one of them will transfer its graph information to the others and both agents will now point to that same graph object. This way it is possible

(16)

that the next step these two agents will both update the same node, as they both see it. But as their information about that node is identical, this will not cause problems, because if they both update the terrain to the same thing than it will still results in it being the correct terrain type. Also the dictionary, which keeps track of every thing that has been located on that node at a certain step, removes any duplicate thing so that it will not report the same dispenser twice on a node. More information about graph merging will be explained in Section 3.6.3.

Initially each agent possessed its own graph, but this ended up creating a lot of problems when the graph were being merged. This lead to the approach where agents share the same graph object. Now each agent still begins the simulations with their own graph, but once they merge, one of the agents will adopt the other’s graph. Besides the fact that this approach does not cause problems, it also eventually leads to less graph being used during the game. Another added bonus is that because the agents now update the same graph object, they also know each others current, up-to-date location and information.

For the internal graph of the agents a python library (e.g. networkx) could have been used. However a custom graph is created because this allows for a graph which is specifically tailored to the MAPC scenario.

3.5 Exploration

Now that the general setup of the project is clarified, the exploration algorithms will be presented by first explaining what exploration means in the context of the multi-agent programming contest.

3.5.1 What does exploration mean in the MAPC?

Every agent that plays has a graph in which it stores the information perceived from the environment. And although this process, together with graph merging, slowly creates a map of the environment. To make full use of every aspect of the game it is more advantageous to assign some agents to explore the environment. In the Multi-Agent Programming Contest, agents have a limited vision of their en-vironment which means that they have to move if they want to explore. Exploring in this context means moving in a certain direction, adding the newly discovered area to the graph as nodes (if the nodes were not previously discovered yet) and adding the new information to these nodes. In the context of BDI, exploring means creating a main intention to discover the entire environment with sub-intentions to move to a certain goal node. In the next part two different exploration algorithm will be discussed, each of them selects their goal node in a different way. The agent keeps selecting the next goal node until the main condition is satisfied: it

(17)

has completed its exploration. These exploration completion conditions differ per exploration approach.

3.5.2 Random exploration

Random exploration was created as a basic version of exploration, which later could be used for comparison with others. Instead of choosing a random direction to move to each step, the agent adopt an intention to moves towards a random goal where the x and y coordinate are selected within a certain range. If range r is selected than this means the x and y coordinates will be a number selected between −r and r. The decision to let the agent select a random goal, instead of a random direction each step, was done because statistically this (random goal) approach would make the agent less prone to walking around in circles (or squares in this case), which results in a better exploration. The exploration completion condition for the random exploration is that 99% of the environment has to be discovered. Once this is done the agent has successfully discovered the environment and can signal the strategist that it wants to change its role. If for some reason the agent can not reach its random goal, it will simply adopt a new intention to select a new random goal to move towards.

3.5.3 Topology-Grid Hybrid Map Algorithm

The second and main exploration algorithm is the topology-grid hybrid map algorithm[5]. This exploration algorithm (from here on referred to as the TGHM algorithm) uses distance and information gain to select its next target point i.e. goal node. In the original paper the algorithm is used by a robot in a real 3D environment using sensors to detect distance, while in the MAPC the agents are used in a 2D vir-tual environment. The following section will explain the TGHM algorithm along with what parts need to be adjusted to fit the MAPC scenario, while remaining scientifically equivalent to the paper. Secondly, as the TGHM algorithm has two parameters, which can be tweaked so the algorithm better suits the scenario, it will elaborate on these parameters and their influence on the results.

The main idea of the TGHM algorithm is that it uses information gain and distance to decide to which target point (a location in the simulation) to move to, to explore in the most optimal way. It does so by starting the exploration round by, first generating a set of candidate target points and adding these to the candidate topology points set, which contains the candidate target points from previous exploration steps. Then these candidate topology points are filtered based on how much information would be gained if the agent were to move to that point. Then for each of these filtered points a utility value is calculated based on the

(18)

information gain and distance towards that point. The candidate topology point with the highest utility value will be selected as the next target point to move towards. The agent will repeat this process until there are no more candidate topology points, meaning that all the remaining candidate topology points did not pass the ‘information gain filter’ and were thus discarded.

Now that the general structure of the algorithm has been clarified, a more de-tailed explanation of the algorithm can be given. At the beginning of the algorithm three different objects are initialized: 1) the topology node list, which contains all the points the agent selected as the next target point (not to be confused with candidate target points); 2) the candidate target point set, which contains all the possible target points the agent can move to at its current position; and 3) the candidate topology point set (not to confuse with the topology node list) contains the candidate target points generated each round. Candidate target points are added to the candidate topology point set and the topology points, which did not pass the information gain filter, are remove from the candidate topology point set each round.

In the original paper the candidate target points are selected at maximum pos-sible distance from the robot. This distance limit it then either reached by the robot’s sensors, which categorizes the target point as a type I frontier, or the sen-sors are blocked by an obstacle before it can reach its maximum detection distance, categorizing the target point as a type II frontier.

Frontier Type I

Frontier type I points are the target points which reach the robot’s sensors’ limit. Converting this to the agent’s scenario means selecting all the cells in the outer bounds of the agent’s local vision (the red squares in Figure 4b).

Frontier Type II

Defining the frontier type II points in the agent’s grid world is a bit harder as the agent can see through obstacles, where as the robot can not. If the robot’s sensors hit an obstacle, the target point will be the furthest it can go before reaching that obstacle. Although the agent’s vision can not be blocked by an obstacle, a target point in the outer bounds of the agent’s vision can be an obstacle (the red square in Figure 5b). When this happens a line is formed between the agent and the blocked target point. The final target point will be the cell furthest from the agent, which is also a) crossed by the red dashed line and b) not an obstacle. In this case that would be the green square on the bottom-right in Figure 5b. Once this process is done, these candidate target points will be added to the list of candidate topology points, which contains the candidate target points generated

(19)

(a)

(b)

Figure 4: (a) Target point as selected by the robot’s sensors. (b) The agent (middle blue square) and its local vision (the cells within the diamond).

(a) (b)

Figure 5: (a) Selected target point when the robot’s sensors hit an obstacle. (b) The agent (middle blue square), its local vision (the cells within the diamond) and a target point which is an obstacle.

in the previous exploration rounds. Then each candidate topology point will be filtered on their information gain, which is defined by the number of unknown cells that the agent would see if it were to move to that topology point. (This number is calculated by generating the coordinates in the local vision of the agent at the topology point and then counting how many of those coordinates are not yet in the agent’s graph). The information gain has to be higher than the fixed parameter

(20)

N0, or the topology point will be removed from the list of candidates. The value

of N0 can be adjusted to better fit a certain scenario. A high N0-value means that

only topology points which will acquire a lot of information will remain in the candidate topology points. This high N0 creates an algorithm that discovers the

environment quickly, but not as detailed as a lower N0 would have.

N_unknownk >N0 (1)

V = F (T) = Nunknown· exp(−λL(T)) (2)

Hereupon the remaining candidate topology points will use Equation (2) to calculate their utility. Nunknown is the number of unknown cells at that topology

point, L(_{T) is the distance between the agent’s location and the topology point,} which is calculated by the length of the path the agent would walk, if it were to move to that point. λ is the algorithm’s second parameter, which is used to weigh the information gain against the motion cost. A small λ means that motion is cheap and information gain is more important. This will create an algorithm primarily explore the environment before filling in every detail. A large λ will do the opposite as it values motion cost over information gain, resulting in an agent that will fill in every gap while slowly progressing in its exploration.

Once every candidate topology point has a utility value, the one with the high-est is selected to be the next target point and the agent will get the intention to move towards it. This point will also be added to the topology node list (which contains all selected target points). After the next target point is selected the remaining candidate topology points will be used next exploration round to de-termine the target point after the current one. When the agent reached its new target point, the whole process starts over again until enough of the environment is explored such that the remaining candidate topology points no longer pass the information gain filter.

(21)

Algorithm 1: The implemented TGHM algorithm

1 Initialize the topology node list T with the origin position P0;

2 Initialize the candidate target point set C0, and the candidate topology point set N0:

C0, N0= ∅;

3 Initialize t as the round number of the exploration: t = 1; 4 while N_t6= ∅ or t == 1 do

5 Update Ctwith candidate target points; 6 if Ct6= ∅ then

7 Update Nt with filtered candidate target points meet from Ct:Nt= Nt−1∪ Ct; 8 Filter the candidate topology points according to Equation (1): Nt= Nt

\Nt{unqualif ied}; 9 else

10 Nt= Nt−1; 11 end

12 if N_t6= ∅ then

13 Choose the topology point with the highest value from N_t by Equation (2) as

the next target point Pt; 14 else

15 The exploration finishes; 16 end

17 Move agent towards Pt;

18 Turn P_tinto a topology node and append it to T ; 19 Update N_t accordingly: N_t= N_t\P_t;

20 t = t + 1; 21 end

3.6 The strategist

As told previously, the strategist handles the role assignment of the agents and other processes which require interaction with multiple agents, such as graph merg-ing. The communication between the strategist and the agents is achieved by queues. The strategist hold an input queue, in which agents can put messages to let the strategist know they either updated their graph, want to merge their graph or want to be assigned a new role. At the beginning of every game step the first thing the agents do is update their graph with the newly received perceptions and send a message to the strategist that they have done so. The strategist then lock that agent’s thread until it has received a message from every agent that it has updated its graph, after which it releases all the threads. Then the agents send a message to the strategist, asking it if their graph can be merged with another agent’s. For proper graph merging, all the agents need to have their graphs up-to-date, this is why the strategist locks their threads until they all updated their graphs. The messages the agents send to the strategist consist of a tuple contain-ing the task (e.g. ‘merge graph’ or ‘new role’) and a pointer to the object of the agent sending the message.

(22)

After this is the agents’ turn to see if the strategist has send them any messages. The strategist also has a list of n output queues, where n is the number of agents on the team. These queues are used by the strategist to send messages to each individual agent (first queue belongs to agent 1, the second to agent 2, etc.) and contain messages such as new role assignments (which consists of a tuple with the first element being the task ‘role assignment’ and the second being the actual role). The strategist creates the output queues but both strategist and agents have the output queues (agent 1 only has the output queue belonging to agent 1, etc., while the strategist has a list of all queues).

3.6.1 Role assignment

At the beginning of the simulation, the strategist chooses a role for each agent, which it does by randomly selecting a role (attacker, builder or scout) with prob-abilities 0.4, 0.1, 0.5, respectively. Once the scout agents are done exploring the environment, they send a message to the strategist informing it and requesting a new role assignment. The strategist then assigns the role of builder to the now-former scout agents.

The decision to assign the agents role with probabilities respectively 0.4, 0.1, 0.5 to the attacker, builder and scout was chosen because as this seemed to create a fair division. In the beginning of the simulation the agents do not know anything about the environment, so it would make no sense create an abundance of builder as they would not know where and what to build. So that is why primarily half of the agents become scouts. Once the scouts are done exploring they become builder so they can score points. The percentage of attackers remains the same as, regardless of their role, they have to protect the other agents for which they do not necessarily need environmental information.

Ultimately, there was no completely working team that could (somewhat de-cently) perform a simulation, thus this role division is created purely theoretical. This is the reason why different probability combination could not be tested to actually determine which is the optimal role division.

Because this standard role division is chosen, which does not take into account game information such as the team score, the opponents score or the agents’ lo-cations in the environment, there was no need for the strategist to also become a BDI agent as it does not require ‘thinking’. If the strategist was however a BDI agent, it would have to action plans: one to initially give every agent a role and another one to assign a new role to an agent. The first plan could be a sub inten-tion of a macro inteninten-tion with the goal/desire to create, for example, an offensive team (a higher percentage of attackers), a constructive team (a higher percentage of builders), an all-round team (evenly divided probabilities), etc. For the second

(23)

plan the strategist could have a desire to assign a role to everyone, it could then use its beliefs (which is currently only the agents’ graphs but could include their roles) to notice that an agent is not assigned or want a new role. From this it could create an intention to use its beliefs and other game information to deter-mine the best possible role assignment. These plans could even be extended so the strategist uses machine learning to determine what the best possible role for an agent is given the current game state.

3.6.2 Agent identification

Before two agents can merge their graphs, two things need to happen. 1) Every agent needs to be up-to-date, meaning they need to have updated their graph with the latest perceptions. This goal is achieved by the strategist locking the threads until every agent has updated its graph. The second step is that the two agents need to identify each other, because the perceptions from the server only tell which team an agent belongs to but not that agent’s ID.

When identifying agent the first step the strategist takes is determining where in the target agent’s local vision are friendly agents (the target agent is the agent who’s graph the strategist will attempt to merge). It does so by going over every nodes that is located in the target agent’s local vision (all nodes within the blue diamond in Figure 5b) and check whether at the current step there is an agent from the same team present. The strategist now knows at which locations in the target agent’s local vision are teammates present. Now it has to determine who are the agents at those locations. The locations are the coordinates on which the agents are seen from the perspective of the target agent. For example, if the target agent is at coordinate (2, 3) and another agent is at coordinate (4, 5), then the target agent sees another agent at location (2, 2) as this is the location at which the agent is seen relative to the target agent’s location.

First the strategist checks if the to-be identified agent already shares its graph with the target agent. In this case no identification is needed as the target agent already knows the other agent’s location.

If the to-be identified agent is not familiar, then the strategist will create a list of potential ‘suspects’ who could be the ones at that location. It does so by checking for all agents (besides the target agent itself) if they see an agent (the target agent) at the reverse location on which the target agent has seen an agent. For example, if the target agent sees another agent at coordinates (2, 3), then the strategist checks if anyone has seen an agent at coordinates (-2, -3).

Now the strategist has a location of the to-be identified agent and a list of possible agents for that location. If there is only one possible agent than that one

(24)

is identified the agent at the location. If however, there is more that one possible agent, then the strategist will attempt to eliminate possible suspects based on their graph information. The first thing the strategist does is calculate which cells, both the target agent and the to-be identified agent, can see. This is done by taking the coordinates in the target agent’s local vision (relative to the target agent, such that the target agent is located at (0, 0)), adding the coordinates at which the to-be identified agent is located and finally see which of those coordinates still located within the target agent’s original local vision. The coordinates which are, are the ones that both agents can see. Once those coordinates are determined, the strategist will compare the information of the nodes at those coordinates between the target agent and each of the possible agents. It compares the nodes on their type of terrain and the things which are currently present on that node. If one of the nodes is not similar the possible agent will be disqualified as potential to-be identified agent.

If after the elimination there is only one possible agent left, then the agent identification is successful. If however there is more than one possible agent, then the identification has failed and the target agent will not be merging its graph with the agent at that location.

3.6.3 Graph merging

Once the target agent has identified the agent within its local vision, they can merge their graphs. Let us say that the strategist has identified the agent in the target agent’s local vision as agent 2. If the agent ID (which is given to every agent by the game server) of the target agent is lower than that of agent 2, than agent 2’s graph will be merged into the target agent’s. If the target agent’s ID is higher, than it is vice versa [19, Chapter 15.2].

Let us assume that the target agent has agent ID 1, meaning that agent 2’s graph will be merged in the target agent’s. To merge the target agent’s graph, graph 1, and agent 2’s graph, graph 2, the strategist passes both graph objects as arguments to the function merge_graphs() along with both agent’s IDs and the offset (location of agent 2 as seen from the target agent’s perspective).

Now the initial step is to calculate the shift that the coordinates of the nodes in graph 2 have to make to adjust to the coordinate system of graph 1. Because both graph’s coordinate systems do not have their (0, 0) coordinate at the same node, this shift has to be calculated. The shift is the coordinate of the target agent’s current location in graph 1 plus the offset (which is given as an argument to the merge function) minus the coordinates of agent 2’s current location in graph 2.

This shift is than applied to the coordinate of every node in graph 2 to calcu-late what their coordinates will be in graph 1. If that new coordinate is already

(25)

occupied by a node from graph 1, than graph 2’s node will transfer its information regarding terrain and things to graph 1’s node. The nodes are created in such a way that when its terrain is updated, it also stores the step at which it is updated, this way the most up-to-date terrain information can be chosen when merging graphs by comparing the step at which they were last updated. If there is not already a node at the location of graph 2’s node’s new coordinates, than that node will be added to graph 1. After all the nodes from graph 2 have been transferred to graph 1, the agents in graph2 will also shift their current coordinates and be added to graph 1.

After the merging is done graph 1 now contains the information from both graphs. The strategist will then point the graph for every agent in graph 1 (and formerly in graph 2) to the newly merge graph 1. By pointing to the same graph object the agent all update the same graph which means they not only know their own information and current location, but also that of the other agents in that graph.

As told when explaining the implementation of the TGHM algorithm, the agent maintains a set of candidate topology points, which are coordinates of locations it can possibly move to. If the graph of said agent gets merged into another graph (meaning the agent now moves in a different coordinate system), than the coordinates in the candidate topology point set will now map to a different location in the environment. That is why, after the strategist merges two agent’s graphs, it send a message to the agents who’s graphs got merged, containing the offset (the shift the coordinate system has made). With this offset the agent can update its candidate topology points set so the coordinates inside can once again map to the correct locations.

3.6.4 Determining the dimensions of the simulation

If the strategist attempts to merge the graphs of two agent, but they already shared the same graph, they can use that opportunity to try to determine the width and height of the simulation. Let us say agent 1 and agent 2, already shared the same graph, but the coordinates of the current location of agent 1 plus the offset between the two agents is not identical the coordinates of the current location of agent 2, than that means that at one point one of the two agents has looped the simulation. The dimensions of the simulation can only be determined if the agent already shared their graph because this ‘dimension determination’ process relies on the fact that the location of agent 1 plus the offset between the two agents is not identical the coordinates of the current location of agent 2, which will always be the case when agents just merged their graphs.

(26)

After it is determined that at least on agent has looped the simulation, both agent’s locations and their distance will then be used to determine the width or height of the graph. To avoid accidental errors, the dimensions of the graph only gets updated when either a) there are no dimensions set yet, or b) the current dimension is multiple of the new dimension. Once a width or height has been set, the new location of the nodes is calculated by the x-coordinate mod width and y-coordinate mod height. This way if the simulation is 70 cells wide, the node that was first at location (72, 5) will be now be at (2, 5).

After a dimension is set, every graph gets updated to fit into the new ‘bordered’ world. Determining the dimensions is very helpful to the agents because before the width and height of the graph are set, the agents walk around in a infinite space, which means that there is no way to explore the entire environment.

4 Results

4.1 Exploration

The exploration algorithms were tested in a 50x50 environment9 _{by running 10}

simulations each lasting 750 steps. To measure the performance of the agent, multiple feature were measured each step:

1. The percentage of the area that was discovered. Determined by the total number of nodes in the agent’s graph being divided the height times the width. This number is then multiplied by 100 to create a percentage.

2. The percentage of the system-wide CPU usage during the step.

3. The percentage of the system memory usage during the step.

4. The time it took the agent to decide its next goal (either random goal or next target point).

From the area discovered per step, the average of the 10 simulations was calculated and plotted. From the other features, the average of the 10 simulations was also calculated, after which the average of that was calculated to create the overall average of the features per step.

9

The simulation configuration, mapper_config1.json, can be found on the Github repository[14] under /server-configuration.

(27)

4.1.1 Determining the range for random exploration

The range of selection of the x and y coordinate of the random goal was tested with multiple ranges to determine which choice was optimal. The tested ranges were 10, 25 (half the length of the simulation) and 50 (full length of the simulation). As can be seen from the results, in Figure 6, a range from 25 resulted in the highest percentage area discovered and is thus determined to be the most suitable range.

Figure 6: Area discovered per step by random exploration using different ranges.

4.1.2 Determining the parameters for the TGHM algorithm

The exploration data collected using different N0 values is shown in Figure 7 and

Table 8. The results show that N0 = 30 discovered less than half of the area.

This is deemed inadequate when compared to the other versions of the algorithm, which all score above the 90%. Regarding the other three values (N0 = 5, 10, 20),

even though N0 = 20 has a lower percentage of area discovered than the other

remaining two, it does so in at least 103 steps less. And although N0 = 5 and

10 score better with regards to CPU and system memory usage, N0 = 20 scores

better on decision time. So N0 = 5 and N0 = 10 eventually discover more of

the environment, N0 = 20 does this in significantly fewer steps. Because of this

N0 = 20 is chosen to be the most suitable parameter setting, as it combines both

(28)

Figure 7: Area discovered per step by the TGHM algorithm using different N0

values.

N0 5 10 20 30

Area Discovered (%) 99.5 99.28 91.92 41.56 Exploration Completed (step) 514 455 352 112 CPU Usage (%) 56.46 57.29 59.72 63.25 System Memory Usage (%) 26.35 27.19 27.81 28.19 Time (ms) 55.19 54.82 52.16 51.8

Figure 8: Average per step data by the TGHM algorithm using different N0 values.

The exploration data collected using different λ values is shown in Figure 9 and Table 10. As can be seen in the results the four different parameter values all produce fairly similar measurements. But because λ = 0.2 discovers more area and score better when it come to CPU and system memory usage it is chosen as the most appropriate parameter setting.

(29)

Figure 9: Area discovered per step by the TGHM algorithm using different λ values.

λ 0.2 0.4 0.6 0.8

Area Discovered (%) 93.82 91.92 91.92 91.92 Exploration Completed (step) 357 352 352 352 CPU Usage (%) 59.58 60.35 60.85 64.46 System Memory Usage (%) 30.76 31.68 32.28 32.95 Time (ms) 53.17 51.72 52.14 50.57

Figure 10: Average per step data by the TGHM algorithm using different λ values.

4.1.3 Comparing random exploration vs. TGHM Exploration

The random exploration algorithm and TGHM algorithm were then tested using their optimal parameters and compared to each other. For comparing the two exploration algorithms the same test environment, environment settings and mea-surements were used as before, only this time both algorithms ran 20 simulations instead of 10.

The results in Figure 11 and Table 12 show that the TGHM algorithm discovers more area almost twice as fast as the random exploration does. The TGHM

(30)

algorithm does score worse on CPU usage, system memory usage and decision time, which is sensible as it requires more calculations to select its next goal.

Figure 11: Area discovered per step by random and TGHM exploration.

Exploration Algorithm Random TGHM Area Discovered (%) 92 93.82 Exploration Completed (step) 750 357 CPU Usage (%) 57.64 59.58 System Memory Usage (%) 29.84 30.76 Time (ms) 51.38 53.17

(31)

5 Conclusion

In conclusion, a team is created for the MAPC. First the general setup of the project is described along with the different agent roles and the implementation of the BDI model in python. Secondly, a description of the agents’ internal graph and how it stores perceptions is provided. Thirdly, the random and TGHM explo-ration algorithms are explained and how the TGHM algorithm is implemented in the MAPC. When testing both algorithms, the results shown that although both eventually reach nearly the same percentage of discovered area, the TGHM algo-rithm does so more than twice as fast10_{. And lastly is explained how the strategist}

assigns the agents their roles and how it aids them in merging their graphs and determining the dimensions of the environment.

6 Discussion

The python implementation of the BDI model is successful, as it gives agents to ability to set goal (represented by if-statements within the plans) and use those to select the correct action plan as its next intention. The beliefs are represented by the agents’ graphs, which are held by the agents themselves. If two agents use the same graph, it means that they both point to the same graph object. In future work a separate agent (similar to the strategist or task manager) could maintain all the agents’ graphs, such that the (registered) agents send their perceptions to this separate agent which then upgrades the graph. The registered agent could then send a message to the ‘graph agent’ if it wants to access its graph. This does not necessarily change the program in a practical sense, but by storing the graphs separately instead of agents directly sharing it, the program will closer relate to an actual multi-agent system.

The paper which proposed the topology-grid hybrid map-based exploration algorithm[5], compared it to a forward simulation-based autonomous exploration algorithm[6] and a frontier-based exploration approach[7]. In this thesis is re-searched if it is possible to apply the TGHM algorithm on a 2D environment, as the original paper tested it in a 3D real-life environment. However after the algorithm was successfully implemented, it was only tested against a random ex-ploration algorithm. The TGHM paper also applies the exex-ploration algorithms on only one mobile robot and even though in the MAPC multiple agents can become scouts and perform the TGHM algorithm, this has not been tested. In future works the TGHM algorithm could be applied to multiple agents along with the

10_{Visual results of the different agents can be found on the project Github repository[14] under}

(32)

forward simulation-based algorithm and frontier-based exploration to determine if the TGHM algorithm still surpasses them.

References

[1] Y. Abdelrahman. Undermining the opponent: extending bdi agents with disruptive behaviour for the multi-agent programming contest. unpublished, 2020.

[2] D.P. Jensen. Teaching belief-desire-intent agents how to learn: an integration of machine learning and bdi agents. unpublished, 2020.

[3] T. Stolp. Extending the bdi model with plan cost estimation for the multi-agent programming contest. unpublished, 2020.

[4] L. Weytingh. Extending the bdi model with optimisation-based goal-selection in the multi-agent programming contest. unpublished, 2020.

[5] Shuang Liu, Shenghao Li, Luchao Pang, Jiahao Hu, Haoyao Chen, and Xi-ancheng Zhang. Autonomous exploration and map construction of a mobile robot based on the tghm algorithm. Sensors, 20(2):490, 2020.

[6] Mikko Lauri and Risto Ritala. Planning for robotic exploration based on forward simulation. Robotics and Autonomous Systems, 83:15–31, 2016.

[7] Brian Yamauchi. A frontier-based approach for autonomous exploration. In Proceedings 1997 IEEE International Symposium on Computational Intelli-gence in Robotics and Automation CIRA’97.’Towards New Computational Principles for Robotics and Automation’, pages 146–151. IEEE, 1997.

[8] The Multi-Agent Programming Contest. The 2020 contest: Agents assemble ii, . URL https://multiagentcontest.org/2020/.

[9] Multi-Agent Programming Contest. massim 2020, . URL https://github.com/agentcontest/massim_2020.

[10] Anand S Rao, Michael P Georgeff, et al. Bdi agents: from theory to practice. In Icmas, volume 95, pages 312–319, 1995.

[11] Anand S Rao. Agentspeak (l): Bdi agents speak out in a logical computable language. In European workshop on modelling autonomous agents in a multi-agent world, pages 42–55. Springer, 1996.

(33)

[12] Nicholas Metropolis and Stanislaw Ulam. The monte carlo method. Journal of the American statistical association, 44(247):335–341, 1949.

[13] PyPi. Spade: Smart python agent development environment. URL https://pypi.org/project/spade/.

[14] D.P. Jensen T. Stolp L. Weytingh Y. Abdelrahman, D.J. Bekaert. The uva’s entry into the multi agent programming contest. URL https://github.com/DanielPerezJensen/mapc-uva.

[15] Python. Python: programming language. URL https://www.python.org/.

[16] Matplotlib. Matplotlib: Visualization with python. URL https://matplotlib.org/.

[17] Steven Bell. An overview of optimal graph search algorithms for robot path planning in dynamic or uncertain environments. 2010.

[18] Eugene F Krause. Taxicab geometry: An adventure in non-Euclidean geome-try. Courier Corporation, 1986.

[19] SK Basu. Parallel and Distributed Computing: Architectures and Algorithms. PHI Learning Pvt. Ltd., 2016.

Using Topology-Grid Hybrid Map-based Exploration in the Multi-Agent Programming Contest

Using Topology-Grid Hybrid

Map-based Exploration in

the Multi-Agent

Programming Contest

Using Topology-Grid Hybrid

Map-based Exploration in the

Multi-Agent Programming Contest

Combining the TGHM exploration algorithm with the

BDI framework for the Multi-Agent Programming

Contest: Agents assemble II

Abstract

Contents

1

Introduction

2

Background information

2.1

The Multi-Agent Programming Contest

2.2

Literature review

2.3

Research questions

3

Method and Approach

3.1

BDI agent

3.2

Game setup

3.3

Different agent roles

3.4

Mapping the environment

3.5

Exploration

3.6

The strategist

4

Results

4.1

Exploration

5

Conclusion

6

Discussion

References