• No results found

Extending the BDI model with plan cost estimation for the Multi-Agent Programming Contest

N/A
N/A
Protected

Academic year: 2021

Share "Extending the BDI model with plan cost estimation for the Multi-Agent Programming Contest"

Copied!
30
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Extending the BDI model

with plan cost estimation for

the Multi-Agent

(2)

Layout: typeset by the author using LATEX.

(3)

Extending the BDI model with

plan cost estimation for the

Multi-Agent Programming Contest

Tim Stolp 1184878

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science

Science Park 904 1098 XH Amsterdam

Supervisor

M. Mohajeri Parizi & Dr. G. Sileno Complex Cyber Infrastructure (CCI)

Faculty of Science University of Amsterdam

Science Park 904 1098 XH Amsterdam

(4)

Abstract

The Belief-Desire-Intention model is a commonly used model for creating inten-tional agents. One of the disadvantages of this model is its lack of machine learn-ing capabilities. This thesis proposes a new multi-agent BDI framework written in Python to bridge the gap between non-symbolic machine learning and inten-tional BDI agents for the Multi-Agent Programming Contest. The Multi-Agent Programming Contest provides a testing environment for multi-agent systems and with this aims to stimulate research in the development of multi-agent systems. As a proof of concept machine learning is used to enhance the plan selection stage in the sequence of BDI reasoning steps. This is done by estimating the cost of a plan using multivariate polynomial regression to find the plan with the optimal cost-reward balance. The result is a multi-agent BDI framework compatible with Python’s extensive machine learning libraries and a machine learning application that can accurately estimate an agent’s navigation cost. The estimated costs are used in a genetic algorithm to find the optimal division of tasks across multiple agents, with the sum of its costs resulting in the total cost of a plan.

(5)

Contents

1 Introduction 5

2 Background 6

2.1 The Multi-Agent Programming Contest . . . 6

2.1.1 The 2020 contest . . . 6 2.2 BDI . . . 9 3 Literature review 10 3.1 Learning in BDI . . . 10 3.2 BDI Tools . . . 10 3.2.1 AgentSpeak(L) . . . 10 3.2.2 Jason . . . 11 3.2.3 GOAL . . . 11 3.2.4 Akka Actors . . . 11 3.2.5 Python Spade . . . 11

3.3 Designing a new framework . . . 12

4 Research Method 14 4.1 Strategy . . . 14

4.2 Framework . . . 15

4.2.1 Server connector . . . 15

4.2.2 Basic agent and Navigation . . . 15

4.2.3 Concurrency . . . 16 4.2.4 World Graph . . . 16 4.2.5 BDI . . . 17 4.3 Learning in BDI . . . 19 5 Results 21 5.1 Framework . . . 21 5.2 Learning . . . 22

(6)

6 Discussion 23 6.1 Evaluation . . . 23 6.1.1 Framework . . . 23 6.1.2 Learning . . . 23 6.2 Future work . . . 24 6.2.1 Framework . . . 24 6.2.2 Learning . . . 25 7 Conclusion 26

(7)

Chapter 1

Introduction

Multi-agent systems are a powerful tool that has been on the rise since the 1990s. It has gotten the attention from many different scientific disciplines. May it be the scalability, or the potentially complex behaviours, multi-agent systems have found their way into many applications in the industry as well as the academic world. (Leitão and Karnouskos 2015) One widely known model used in the creation of intentional agents is the Belief-Desire-Intention (BDI) model. This model, based on human reasoning, describes the reasoning process from perceiving the world to performing a certain behaviour. One of the downsides of this model is that it does not inherently support learning.

This thesis will go over how to integrate non-symbolic machine learning into a multi-agent system of intentional agents. This is done by creating a new framework that bridges the gap between learning algorithms and the BDI model. Within this framework it will be shown how plan cost estimation can be used for optimized plan selection within the BDI model. This framework will be created as part of the 2020 Multi-Agent Programming Contest. This contest provides the environment in which the agents can be tested so that their behaviour can be observed and analysed.

(8)

Chapter 2

Background

2.1

The Multi-Agent Programming Contest

The multi-agent programming contest was created as a means to research new and innovative multi-agent systems as well as testing existing systems. Here it aims to help identify key problems, collect suitable benchmarks, and create scenarios in which coordination between agents is required. To determine the performance of a system it plays multiple games where the system competes against other systems. Whereas winning the competition is not the main goal, it does provide incentive to create well performing systems. With this the contest hopes to stimulate research in the area of multi-agent systems.

2.1.1

The 2020 contest

The 2020 contest scenario consists of two teams of agents competing against each other in a grid based world. This world loops, which means that if an agent goes off the map on one side it will appear on the opposite site. This works both horizontally and vertically. The goal of the scenario is fulfil certain tasks. A task is collected from a task board which is located somewhere in the world. These tasks consist of creating complex structures of multiple types of blocks and handing them in at goal locations. The structures range from a single block to multiple blocks attached to each other. A bigger and more complex structure rewards more points than smaller structures. Throughout the game random tasks are generated and each task has its own deadline. The older the task, the less rewarding it will be.

To construct these tasks the agents have to explore the world and navigate through obstacles to find the necessary blocks. These blocks can be obtained by requesting them at different types of block dispensers. Once a block has been

(9)

obtained the agents have to find each other and connect the blocks to each other. An agent cannot connect multiple blocks to each other by itself. This enforces cooperation between agents when constructing the more complex structures.

The agents in this scenario do not know their absolute position in the world. They only know their current energy levels and whether they are disabled or not. Energy is used for clear actions, if the energy reaches zero then the agent will be disabled for a few turns. A disabled agent is unable to perform any action for a prior set amount of steps.

The game runs in steps. For each step the agents can submit an action that they want to perform. The actions that an agent can perform are as follows.

Skip

The agent will do nothing. This is to make sure the game server can continue to the next step once all agents have submitted their action.

Move

The agent will move in a certain direction. This can be north, east, south, or west.

Attach

The agent attaches to a block or allied agent. This action requires a direction (north, east, south, or west) to be specified. The agent has to stand next to the thing it wants to attach to.

Detach

The agent detaches from a block or allied agent. This action requires a direction (north, east, south, or west) to be specified.

Connect

This action allows two agents to connect their blocks to each other. For this action both the agents and the blocks’ locations that they want to connect have to be specified. The blocks have to be next to each other for the connection to succeed.

Disconnect

The agent disconnects two attachments of the agent. Both locations of the attach-ments have to be specified.

Request

The agent requests a new block from a dispenser. The agent has to be adjacent to the dispenser and it has to specify on which direction the dispenser is from the agent (north, east, south, or west).

(10)

Submit

The agent submits a block construction to fulfil a task. The agent has to have the block construction attached to itself and the agent has to be located on a goal location.

Clear

This action allows an agent to clear a cell and the four adjacent cells. The effect of a clear depends on the thing that is affected by it. An obstacle gets removed and becomes normal terrain. A block gets destroyed. An agent becomes disabled. To perform a clear action the agent has to submit and successfully perform the clear action a number of consecutive times before the area gets cleared. After the first time a marker will appear on the map that can also be perceived by enemy agents. This marker indicates that a clear is going to happen and allows agents to move out of the way. The amount of times the clear action has to be performed depends on the game’s settings. The agent also has to have enough energy to perform a clear action. It is also possible for an agent to initiate a clear and not finish it. This does not cost energy. The amount of energy required to do a clear action also depends on the game’s settings.

Accept

The agent accepts a task at a task board. Accepting a task will overwrite the previously accepted task in case the agent has one. To successfully accept a task the agent has to be within two cells distance from the task board and the agent is required to specify the task name.

For every step there are a certain amount of agents that will randomly fail their action. The amount of agents that fail their action depends on the game’s settings. Furthermore for each step there is a chance for a clear event to happen. A clear event is an event where on a random location a clear will happen. The size of this clear is in the game settings. The difference between the agents’ clear actions and the random clear event is that the event also leaves behind some new obstacles around the area where the clear has happened. This causes a substantial amount of change in the environment that the agents have to traverse through. Further details are available in the multi-agent contest documentation. (Fiekas, Ahlbrecht, and Krausburg 2020)

(11)

2.2

BDI

The BDI model is a model that tries to describe human reasoning. The beliefs describe the information present about the current state of the world. The desires represent objectives that a person would like to accomplish. Once a desire has been chosen it becomes a goal. A goal is a desire that is actively pursued. While multiple desires can possibly counteract each other, once a goal is chosen it is impossible to choose another goal that would interfere with the current goal. An intention is something that has been chosen to be done. Intentions come in the form of plans. A plan is a sequence of actions, these actions can possibly contain other plans making complex behaviour possible. The essence of this model is that it allows for separation between desires and intentions. What distinguishes these two is commitment. This separation makes it easier to plan ahead knowing the current intentions by eliminating the desires that cannot be fulfilled given a certain intention. (Bratman et al. 1987)

This model provides the theoretical framework to create autonomous agents. These autonomous agents, also known as intelligent agents, have some form of independence when it comes to task execution. This allows these agents to perform higher level tasks without needing supervision while also being more robust in the adaptation of new plans when the current plan turns out to be impossible. The agent is capable of using independent reasoning to come up with ways to fulfil the desire to finish the task which makes it more reliable to get the desired result.

(12)

Chapter 3

Literature review

3.1

Learning in BDI

In previous research there have been attempts at solving some of the weaknesses of BDI. In particular, BDI agents are not inherently able to learn from past actions. A common solution to the lack of learning capability is to extend the BDI model with decision trees to learn the probability of a plan’s success given the data on the success rate of past executions of said plans. Another way these past experiences were taken into account is by improving the context conditions of the plans in the plan library to figure out what conditions are necessary for a plan to succeed. (Airiau et al. 2008)

3.2

BDI Tools

Before creating a multi-agent system for participating in the contest multiple tools and frameworks were explored. This section will give a brief overview of the tools that were considered and it will elaborate on the final choices that were made.

3.2.1

AgentSpeak(L)

AgentSpeak(L) is a first-order logic based programming language based on the BDI model. The current state of an agent and its knowledge about the environment it is embedded in are the beliefs. The states that the agent wants to move to based on the beliefs it has are the desires. These desires are seen as the mapping from beliefs and events to goal adoption. The ways the agent chooses to bring these states about are then seen as the intentions. The fact that AgentSpeak(L) is a logic based language and for its theoretical foundation on a theory of mind gives it the advantage that the behaviour of an agent is explainable. This is useful in

(13)

various types of research, especially when it comes to modelling human behaviour. (Rao 1996)

3.2.2

Jason

Jason is an AgentSpeak(L) interpreter written in Java. This combines the advan-tage of the strong theoretical foundation of AgentSpeak(L) with the robustness and cross-platform capabilities of Java. Because of this Jason is a great choice when it comes to the creation of multi-agent systems across networks of different systems. Jason also provides the tools for inter-agent communications. The cus-tomizability of the agents’ architecture and the presence of various implemented internal actions provide a strong set of tools for straightforward extensibility by the user. (Bordini and Hübner 2005)

3.2.3

GOAL

GOAL is an agent programming language used for programming cognitive agents. Similarly to the BDI framework, agents created with GOAL derive their actions from their beliefs and goals. This language provides the building blocks to create a cognitive agent of which the beliefs and goals can be manipulated and its decision-making can be structured. (Hindriks 2009)

3.2.4

Akka Actors

The akka actors model provides a high level abstraction for writing concurrent and distributed systems. It is supported for both Scala and Java. The model alleviates the developer from the complicated concurrency mechanisms such as locking and thread management. This makes it much easier to write concurrent systems. It does this by restricting the way actors interact with each other. The actors run in their own thread and send messages to each other via the actors’ addresses to communicate. This type of communication removes the need for lock-based communication between threads which is often a pitfall for concurrent systems causing undefined behaviour that is difficult to debug. (Gupta 2012)

3.2.5

Python Spade

Spade is a python module for creating multi-agent systems based on XMPP instant messaging. The XMPP messaging makes it useful when running multiple agents across different systems. It is based on asynchronous code. The agents are based on different types of behaviours such as cyclic behaviour that keeps running until stopped or final state machine behaviour where the agent moves from one state

(14)

to another. Spade also has a BDI plugin. This plugin allows for AgentSpeak(L) parsing and running it as an agent in the Spade multi-agent system framework. It is also possible to call python functions from the AgentSpeak(L) code giving access to a much broader tool set. A BDI agent in the Spade framework is based on the cyclic behaviour to run the AgentSpeak(L) code. Furthermore it is possible to manipulate the agent’s beliefs through python functions within the Spade BDI plugin. All together it creates a BDI based multi-agent system framework with easy message passing between agents. (Palanca 2020)

3.3

Designing a new framework

The first thing that was looked into was Jason. Jason has been widely used by a majority of the contestants of the previous years of the multi-agent programming contest. The contest has also provided a basic agent framework written in Java that is capable of communicating with the contest’s game server. Jason imple-ments an extended version of AgentSpeak(L). (Bordini and Hübner 2005) Since AgentSpeak(L) is closely tied to the BDI model it became clear that for this project it would play a major role.

GOAL and Akka Actors were found when looking for tools to create multi-agent and BDI systems. GOAL was a good option for programming BDI multi-agents, but it was lacking when it came to the scalability to a multi-agent system. Akka Actors did provide the multi-agent capabilities but was less closely related to BDI. Compared to these Jason was still the more appropriate choice.

For the machine learning part of the research Python 3 was the preferred lan-guage. Python has a broad support of machine learning tool sets and was pre-ferred due to the familiarity and expertise of the language by the project members. Keeping in mind the importance of AgentSpeak(L) for its relatedness to BDI, a Python module named Spade was found that allowed for the creation of multi-agent systems while also having a plugin that would combine this framework with AgentSpeak(L) by parsing and combining both Agentspeak(L) and Python code. Because of the combined advantages for both machine learning and BDI com-patibility it was chosen to use Spade for the creation of the multi-agent system. Unfortunately while trying to work with Spade it became clear that its documen-tation was lacking and the BDI plugin, while also being buggy due to still being in development, did not provide all functionality necessary to adhere to the BDI model. This made it difficult to work with. Furthermore it eventually turned out that due to the Spade code being based on Python’s asynchronous library it was impossible to have multiple agents be connected to the game server. Awaiting a message from the server by one agent would block the other agents from per-forming their tasks. Making the message receival non-blocking caused the Spade

(15)

framework to still prioritise the execution of a single agent rather than divide the computing time across all agents. This caused a lot of undefined behaviour that was impossible to work with.

After being unable to continue the usage of Spade it was decided that a new framework would be created in Python. This framework would be based on some of the functionality of AgentSpeak(L) to adhere to a certain extent the BDI model which would make the agent’s behaviour explainable and the choice of language would allow for easy incorporation of machine learning. Essentially, the creation of this new BDI multi-agent framework aimed to spark interest in the research of combining the power of machine learning and the versatility of multi-agent systems.

(16)

Chapter 4

Research Method

This project and the decisions made during it are a collaborative effort between five students. For each part the students responsible will be credited and their theses will be referenced for further detail.

4.1

Strategy

The collaboration between agents in multi-agent systems is difficult to coordinate. This is because the actions or performance of one agent can affect the other agents’ performance. Because of this many decisions need to be kept track of. For each agent it is important to know what action to take, how to perform the action, and when to do the action. This coordination can happen in multiple ways. For this project a hierarchical decision making approach is used. In practice this means that at the top there is a strategist agent that handles the highest level decision making such as dividing the number of agents across different tasks. The main tasks of the agents, identified at the beginning of the project as relevant to the contest, are scouting, attacking, and building. This thesis will focus especially on the builders. Further information on the scout is found in the thesis of Bekaert 2020 and information on the attacker is found in the theses of Jensen 2020 and Abdelrahman 2020.

Because of the importance of coordination between multiple builders, since most construction tasks are impossible to be built by a single agent, a builder manager was created. This manager listens to the strategist from which it hears how many agents are given to a certain role. Afterwards the manager will choose which construct to build. This is done by using a genetic algorithm. This algorithm takes in estimations of the cost of the sub plans and actions for each agent which are then most efficiently divided across the available agents. This results in a ranking of costs per construct that can be built. This cost is combined with the

(17)

rewards the constructs give and from this the most cost reward efficient construct is chosen to be built. Afterwards the divided sub plans and actions are sent to the builder agents. These agents adopt these as their goal and reason themselves on how to most efficiently fulfil these goals. The strategist and builder manager are not present within the multi-agent contest’s game, but rather operate in the background. Due to the unpredictable nature of the environment in the game some plans do not have a clearly defined amount of time it takes to execute, this is most prevalent in the navigation. To handle this machine learning is used to estimate the cost of the plans after which the genetic algorithm can use these estimated costs for efficient task division across the agents.

4.2

Framework

4.2.1

Server connector

The first thing that was done was set up the Multi-Agent Systems Simulation Platform (MASSim). This is the simulation software on which the contest’s game is played and what the agents connect with to perform their actions. To connect to the server a Python class was created to connect and maintain this connection. This class parses incoming messages and provides functions to put outgoing mes-sages in the right format. The mesmes-sages are in the JSON format. This module was developed as part of the theses of Weytingh 2020 and Jensen 2020.

4.2.2

Basic agent and Navigation

On top of this class an agent class was made. This agent contains all the func-tionality to send the primitive actions to the server. The list of actions is found in the background section. In addition to the basic actions the agent is capable of navigating to a specified coordinate. Navigation is seen as a primitive action. This abstraction of the navigation to a primitive action makes it easier to use when constructing complex behaviours. This abstraction is made possible by the solid theoretical background of navigation where the existing algorithms are already efficient and effective. The navigation algorithm used is D* lite. D* Lite is an algorithm based on the LPA* algorithm which combines ideas of A* and Dynamic SWSF-FP. (Ramalingam and Reps 1996) This algorithm is especially useful in dynamic environments. This is because it assumes that unknown cells are regular empty cells, which are very prevalent due to the limited perception of the agents. Additionally the algorithm saves the calculated weights of each cell. Because of this for each step the calculation is reduced to verifying the existing route and updating the cells that have changed instead of having to recalculate everything

(18)

when the environment has changed. This module was developed as part of the thesis of Weytingh 2020.

4.2.3

Concurrency

To be able to run multiple agents on the same system the Python threading library was used. (Palach 2014) This was done by having the server class inherit the functionality from the threading class. To test if with this addition, multiple agents were connected with the game server. With this addition it was possible to maintain multiple connections to the server without having one agent’s blocking message receipt block the others from receiving their messages. This showed that it was possible to control multiple agents within the game on the same system.

To communicate between the threads the agents make use of message queues. These queues are thread safe and provide the necessary functionality for different types of message passing such as blocking and non blocking message receipt. These message queues are the only way to communicate between agents. This more accurately models the more common scenarios in which each agent would be run on a different system. The benefit of this is that this method of communication could be replaced with any other method that would allow inter-system communication as well. Having one agent run per thread gives a clear boundary between agents. This solution is reusable in other multi-agent systems and is easily scalable, either by upgrading the system’s hardware or by having multiple systems run multiple threads. The practical implementation was developed as part of this thesis.

4.2.4

World Graph

Since the agents in this scenario have limited perception, one of the tasks was creating a representation of the world and updating it whenever new information is perceived. This is done by having each cell be represented as a node object. This object contains all the information a certain cell has. This information is, but is not limited to, the entities currently on the cell, what type of cell it is, and when the information of the cell was last updated. The latter is important in a dynamic environment because it gives a measure of how reliable a piece of information is. The older the information is the more likely it is to have changed since then. These nodes are stored in a dictionary where a tuple of coordinates acts as the key and the node as the value. This dictionary is seen as the graph. The navigation makes use of this graph to find the optimal paths.

At the start of a game each agent remembers its initial position as the origin of its graph. Because of the limited perception of the agents the location of the allied agents is not known yet. Because of this the agents build up their own graph. This is done until two agents appear in each other’s perception range. When this

(19)

happens it is possible to combine the two graphs of the agents to construct a more comprehensive representation of the world. To combine the graphs the agents send their graphs to the strategist agent where the strategist combines the graphs by taking one agent’s origin and converting the coordinates of the other agent’s graph to be relative to the first agent’s origin. Afterwards the strategist sends the combined graph back to the agents. This module was developed as part of the thesis of Bekaert 2020.

4.2.5

BDI

To integrate the BDI model in the framework it was important to model the correct sequence of reasoning steps. The beliefs are the knowledge that an agent has about itself, the other agents, and the environment. The self knowledge of an agent is stored as variables and the knowledge about the environment and the other agents is stored in the graph. The graph stores the agent’s beliefs about the world and also keeps track of how reliable the information is as explained in the graph section. Desires are states that the agent wants to see become true in the environment. In accordance with the approach taken by AgentSpeak(L) the desires only materialize in the form of instantiated goals, which are reactions to events in the environment or changes in the agent’s beliefs. The recipe or exact steps of the reactions to an instantiated goal are provided to the agent in the form of plans. This comes down to any high level function, which describes what plans or primitive actions to take, or behaviour that would allow the agent to provide a positive result towards winning the game.

Intention Selection

There are two ways that an agent can decide on which plan to perform. It can either listen to an upper manager to receive its task, or it can decide via internal reasoning. Both of these actions are based on the agent’s constant maintenance desire. This desire implies that the agent always wants to have something to do. In practice this means that when an agent’s intention queue is empty, it will try and get a new intention.

In the case of a hierarchical multi-agent system one way to get a new inten-tion is from the agent’s manager. Listening to a manager is done when the agent has to cooperate with other agents; in this situation centralised task division is more efficient because it requires less communication. This also helps to lower the computational power requirements of the agents in the field. Instead of using their power and time to perform managing tasks, they can instead focus on pro-cessing the perception data, spend more time optimizing their behaviour to fulfil

(20)

goals and perhaps even think ahead. E.g., builder agents could spend their time precalculating certain paths to important destinations.

The internal reasoning can be done through regular if statements where it checks if certain prerequisites hold to perform a plan. This can also be enhanced with machine learning so that complex behaviors can be created by itself depending on the environment rather than needing to implement these behaviours beforehand. Once a plan has been chosen it becomes the current intention of the agent.

Plan Refinement

The intention or intentions are then added to the agent’s intention queue. Each intention consists of multiple bits of data: the plan, the information required to execute the plan, the context, a description, and if the plan is primitive or not. The context consists of the prerequisites of the intention. This allows the agent to check if the current intention is still possible during the execution of the plan. By doing this it can discard impossible intentions which saves time. E.g., if the intention is to retrieve a block, but if for some reason the block has moved, the intention should be removed so that the agent can try and find a different way to fulfil its current desire, or, in case this is impossible, remove all the intentions linked to the impossible intention since they rely on the success of the impossible intention. Afterwards the agent can move on to get a new intention. The description is useful to explain the agent’s behaviour. These descriptions could be logged and the behaviour would be easy to understand and analysed by a human.

Lastly, the information if a plan is primitive indicates to the agent if an inten-tion needs to be further unpacked into lower level inteninten-tions or primitive acinten-tions that can be directly executed. During the unpacking of an intention the sub-intentions or primitive actions inherit the description of the parent intention. This way it is easy to understand the bigger picture for smaller behaviours. The un-packed intentions are added to the front of the intention queue so that they are first to be executed.

This way of unpacking higher level intentions was added because this allows sub-intentions to be reused in multiple high level plans. E.g., retrieving a block consists of multiple sub-intentions: move to a block, grab the block, and move it to the goal location. One of the main benefits of being able to combine smaller intentions to complex behaviours is that this could be done through learning al-gorithms such as reinforcement learning. These alal-gorithms could combine these sub-intentions in new ones to create new behaviour even while the agent is playing the game. This kind of real time learning is especially strong in dynamic environ-ments where it is impossible for the developer to predict what will happen, and is thus unable to create a plan of actions to deal with the situation. The two core modules of the BDI framework, namely intention selection and refinement, were

(21)

developed as part of this thesis in collaboration with the thesis of Jensen 2020.

4.3

Learning in BDI

One of the BDI model’s weaknesses is that it does not inherently support learn-ing. One stage in the BDI reasoning process that can be enhanced with machine learning is the plan selection. Optimal plan selection can be done with the use of optimization algorithms. These optimisation algorithms require data about the cost of the plans. This is where machine learning can be used. The cost of the plans is not always clear, especially in a dynamic environment. Depending on the world and events in the world an action can have a varying cost.

For the estimation of the cost a polynomial regression can be used. In the multi-agent contest scenario the most evident action that needs to be estimated is navigation due to the fact that during the navigation the agents are susceptible to the effects of the environment such as clear events and changes in obstacles. To try and estimate the cost of these navigations, data was gathered about the amount of steps it took for an agent to perform a navigation. This data consists of the x and y coordinate relative to the agent that is navigated to and the amount of steps it took to reach those coordinates. The goal of this is to be able to estimate the cost of the navigation given an x and y coordinate.

The data was gathered by having ten agents perform random navigations. The destination coordinates of the navigations were randomized between -30 and 30 for both x and y with respect to the agent. The agents kept track of how many steps they had taken. Once the destination was reached the coordinates, amount of steps, and navigation path were logged. These navigations were performed in the multi-agent contest scenario with a world size of 100x100 for 1000 steps at the time. The obstacles in the world were randomly generated with a seeded random number generator. The seed for this generator was kept the same through all data gathering resulting in the same world layout for every run. As mentioned in the scenario description for each step in the game a predefined amount of agents will randomly fail their action. During the data gathering the amount of agents that would fail their action was set to one. This means that each agent had a probability of 0.1 (one divided by amount of agents) to fail their action.

Additionally for each step there is a probability that a clear event will happen in a random location. An agent hit by a clear event is disabled for four steps. The probability that an agent will be hit by a clear event increases when the navigation distance increases or when the clear event probability increases. Because of the impact this can have on the cost of a navigation, a data set was gathered for eleven different clear event probabilities ranging from zero to one hundred increasing by ten percent each run. This resulted in eleven data sets of around 250 to 300

(22)

recorded successful navigations. In this data enemy interference is not taken into account. Doing this would require the prediction of the enemy’s behaviour. More on this can be found in the thesis of Jensen 2020.

To perform a polynomial regression on this data first the polynomial features are obtained from the input data. E.g., for a second degree polynomial regression the data is transformed from x and y to 1, x, y, x2, xy, y2. Afterwards the data is

split into a training and test set with the test set consisting of thirty percent of the data. Then a regression is done on the data giving us the estimation model. The model is tested on the test data giving a score. The score is defined by the coefficient of determination.

SSres=

X

i

(yi− fi)2

Where yi is the observed data value and fi is the estimated value. This is called

the residual sum of squares.

SStot =

X

i

(yi− ¯y)2

Where yi is the observed data value and ¯y is the mean of the observed data. This

is called the total sum of squares

R2 ≡ 1 −SSres SStot

Where R2 is the coefficient of determination. In the case that the model always estimates the mean ¯y (a baseline model), the R2 will be 0. If the model estimates

worse than a baseline model the R2 will be negative.

In multi-agent systems the plans, consisting of multiple actions, can be divided across multiple agents. This affects the amount of steps it would take to finish a plan. This is where the genetic algorithm is used. For each action in the plan the cost is estimated per agent. This data can then be used in the genetic algorithm to divide the actions across the agents in the most optimal way. Once the best configuration is found the total amount of steps can be summed up to find the total estimated cost of a plan. Lastly the rewards of the plans are taken into account by dividing the amount of points awarded by the amount of steps it takes giving the point density of a plan. The plan with the highest point density is then chosen to be executed. More in depth information about the optimization algorithm can be found in the thesis by Weytingh 2020.

(23)

Chapter 5

Results

5.1

Framework

To integrate machine learning in a BDI framework a new framework has been created in Python. This framework is capable of handling single and multiple agents. These agents are able to choose goals through a constant maintenance desire that causes the agent to get new intentions. The intentions can either be obtained through internal reasoning, or by requesting them from another agent (usually some kind of upper manager). These intentions can then be executed or if it is a high-level intention it is capable of expanding it into multiple sub-intentions or actions which are then executed. The sub-intentions hold the information required for checking if they are still possible, and they hold information about the current goal it is trying to fulfil. This framework is capable of having multiple agents execute complex behaviour within the game of the multi-agent programming contest.

(24)

5.2

Learning

The regression was performed on the eleven data sets up to a fifth degree polyno-mial regression. resulting in the following scores.

XX XX XX XX XX XX Deg. Event prob. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 mean 1st 0.01 -0.00 -0.04 -0.17 0.00 -0.00 -0.04 -0.01 -0.02 -0.06 -0.08 -0.03 2nd 0.89 0.94 0.91 0.86 0.88 0.89 0.87 0.88 0.83 0.90 0.89 0.89 3rd 0.89 0.94 0.91 0.86 0.88 0.88 0.87 0.88 0.81 0.89 0.86 0.88 4th 0.94 0.96 0.93 0.90 0.93 0.93 0.93 0.91 0.88 0.93 0.92 0.93 5th 0.94 0.96 0.90 0.89 0.92 0.93 0.93 0.91 0.86 0.92 0.90 0.91

Table 5.1: Coefficients of determination per clear event probability for various degrees of poly-nomial regressions

(25)

Chapter 6

Discussion

6.1

Evaluation

6.1.1

Framework

The agents’ reasoning is clear and the agents are capable of handling complex higher level behaviours. It is easy to add more types of behaviour by combining lower level plans. The behaviour of the agents is easily logged which makes de-bugging faster and more effective. The plan selection and possibly other parts are capable of being enhanced with machine learning. This is due to the fact that the design of the framework was based on exposing different components of a BDI agent to the underlying language for any needed modification. The framework is written in Python and thus is compatible with the strong machine learning li-braries available in Python opening the way to more research when it comes to BDI and learning.

Although the framework is capable of executing complex behaviour in the game, the plans required to construct shapes larger than one block within the game of the multi-agent contest have yet to be implemented. Furthermore the framework in its current state cannot be used across multiple computer systems because the agents are run per thread with inter-thread communication.

6.1.2

Learning

The results of the polynomial regression in multiple degrees across all datasets indicate that using a fourth degree polynomial gives the most accurate estimation of the cost of a navigation. The average score of 0.93 indicates that the estimated costs have little deviation from the actual costs. The fact that the first degree polynomial, also known as linear regression, has such low scores clearly indicates

(26)

that the data is not linear.

In the figures 5.1a and 5.1b the regression models for the third and fourth degree can be seen. On the bottom side of the third degree polynomial it is visible that this model does not seem to accurately estimate the cost of short distance navigations. On the other hand the fourth degree polynomial seems to flatten at long distances. Intuitively the cost of longer distances should grow with an increasing rate since a longer navigation has a higher chance to encounter events that hinder the navigation. Because of this the flattening of the fourth degree polynomial could be a negative trait.

The high accuracy of the estimations suggest that navigation cost estimation is possible and can be used in the optimization algorithm for improved plan selection and task division. This would indicate that task cost prediction is a good way to improve the performance of multi-agent systems where it is not clear what the costs are of the tasks that need to be performed.

One of the limitations of these results is that it does not show how the estima-tion scores would change in an environment with more or less blocking obstacles. Although the obstacles did change with proportion of the clear event chance. More often than not the agents would have little trouble finding a path between the ob-stacles. A bigger agent, for example, would have a harder time navigating through these sporadically placed obstacles requiring them to take longer routes which adds more unpredictability.

6.2

Future work

6.2.1

Framework

To improve the handling of the intentions the context of the intentions could be used to check if the intention is still possible during the execution. If it is deemed impossible it would be possible to remove all the intentions that are linked together which allows the agent to continue with other tasks. Although the data structures to perform this check are already implemented in the framework, the machinery to do so is delegated to future work

The current implementation of communication between agents is done by inter-thread communication. To allow the framework to be used with agents running on multiple systems a different message passing protocol could be implemented. How-ever, because of the similarities between thread communication and inter-system communication this should be a rather simple extension.

(27)

6.2.2

Learning

To further investigate the anomaly seen in the fourth degree polynomial, data could be gathered for longer distances. This would give more insight on longer distances and it would become clear if the fourth degree polynomial holds up as the best model.

One way to take the obstacles in the world into account is to add the map layout to the training data. Combining this with the use of a neural network rather than polynomial regression it would be possible for the network to learn what effects certain obstacle layouts have on the navigation cost and with this give a more accurate estimation. To decrease the computation time it would be possible to limit the amount of data about the world that is used by only focusing on the direction the agent will be moving in.

(28)

Chapter 7

Conclusion

This thesis has shown that it is possible to integrate machine learning in a multi-agent system of intentional multi-agents. This was achieved by using an optimization algorithm for improved plan selection. This algorithm made use of the cost es-timations of the plans obtained through the use of polynomial regression. This opens the way for enhanced multi-agent systems that combine the advantages of learning algorithms and explainable intentional agents. The framework currently allows agents to perform actions in the game of the multi-agent programming con-test, but to be able to fully play the game more plans need to be added from which the agents can choose to achieve their goals.

(29)

Bibliography

[1] Y. Abdelrahman. “Undermining the opponent: extending BDI agents with disruptive behaviour for the Multi-Agent Programming Contest”. unpub-lished. 2020.

[2] Stéphane Airiau et al. “Incorporating learning in BDI agents”. In: Workshop AAMAS: adaptive and learning agents and MAS (ALAMAS+ ALAg). ACM Estoril. 2008, pp. 49–56.

[3] D.J. Bekaert. “Using Topology-Grid Hybrid Map-based Exploration in an Unknown Environment”. unpublished. 2020.

[4] Rafael H Bordini and Jomi F Hübner. “BDI agent programming in AgentS-peak using Jason”. In: International workshop on computational logic in multi-agent systems. Springer. 2005, pp. 143–164.

[5] Michael Bratman et al. Intention, plans, and practical reason. Vol. 10. Har-vard University Press Cambridge, MA, 1987.

[6] Niklas Fiekas, Tobias Ahlbrecht, and Tabajara Krausburg. GitHub—agentcontest/MASSim 2020: Agents Assemble II. 2020.

[7] Munish Gupta. Akka essentials. Packt Publishing Ltd, 2012.

[8] Koen V Hindriks. “Programming rational agents in GOAL”. In: Multi-agent programming. Springer, 2009, pp. 119–157.

[9] D.P. Jensen. “Teaching belief-desire-intent agents how to learn: an integra-tion of machine learning and BDI agents”. unpublished. 2020.

[10] Paulo Leitão and Stamatis Karnouskos. Industrial agents: emerging applica-tions of software agents in industry. Morgan Kaufmann, 2015.

[11] Jan Palach. Parallel programming with Python. Packt Publishing Ltd, 2014. [12] Javi Palanca. GitHub—Javipalanca/Spade: Smart Python Agent

(30)

[13] Ganesan Ramalingam and Thomas Reps. “An incremental algorithm for a generalization of the shortest-path problem”. In: Journal of Algorithms 21.2 (1996), pp. 267–305.

[14] Anand S Rao. “AgentSpeak (L): BDI agents speak out in a logical com-putable language”. In: European workshop on modelling autonomous agents in a multi-agent world. Springer. 1996, pp. 42–55.

[15] L. Weytingh. “Extending the BDI model with optimisation-based goal-selection in the Multi-Agent Programming Contest”. unpublished. 2020.

Referenties

GERELATEERDE DOCUMENTEN

Distributed state estimation for multi-agent based active distribution networks.. Citation for published

Given that the formation will be composed of multiple robots, a schematic view of the system will include multiple elements representing each agent within the formation. Each agent is

Thus, we will study the output regulation problem for a directed dynamical network of interconnected linear systems with a bounded uncertainty.. The questions that came up is, is

Lemma 2.4 stated that a Laplacian has exactly one zero eigenvalue if and only if its corresponding graph has a directed spanning tree.. The following lemma generalizes

an output verification program (the verifier could be a simple file compara- tor for deterministic problems, in which case a test output data file must be provided), and.. a set of

Self-paced reading times are longer at T1 than at T2, and for complex than for simple passages, and systematically varied across word positions, with different patterns for the

Nachega, Department of Medicine and Centre for Infectious Diseases, Faculty of Medicine and Health Sci- ences, Stellenbosch University, Cape Town, South Africa, Depart- ments

Een blootstelling aan 0.1 mg BaP per dag (overeenkomend met ca. 11 mg/kg lichaamsgewicht per dag en 6.6 mg/kg in het voer) resulteerde in een significant effect op de EROD en