Extending the BDI model with optimisation-based goal-selection in the Multi-Agent Programming Contest

(1)

Extending the BDI model

with optimisation-based

goal-selection in the

Multi-Agent Programming

Contest

Luc Weytingh

(2)

Layout: typeset by the author using LA_TEX.

(3)

Extending the BDI model with

optimisation-based goal-selection

in the Multi-Agent Programming

Contest

Luc Weytingh

11672323

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science

Science Park 904 1098 XH Amsterdam

Supervisor

Mohajeri Parizi & Dr. Giovanni Sileno Complex Cyber Infrastructure (CCI)

Faculty of Science University of Amsterdam

Science Park 907 1098 XG Amsterdam

(4)

Abstract

In light of the complementary advantages of both symbolic and sub-symbolic approaches to multi-agent systems (MAS), this research project investigates the possibility of extending a symbolic approach to MAS with sub-symbolic components. The methods are investigated in an annual event devoted to stimulate research and development on MAS: the Multi-Agent Programming Contest (MAPC). With a genetic algorithm (GA) used for op-timisation, the results present an accuracy of 0.9 on goal-selection in belief-desire intention (BDI) agents. Based on the dynamics of the MAPC envi-ronment, these achievements prove difficult or impossible with a symbolic approach. Ultimately, the results are placed in a broader context, shedding light on the requirements for extending a BDI implementation of MAS with sub-symbolic components.

(5)

1 Introduction

In the most general definition, multi-agent systems (MAS) are systems consisting of multiple agents that interact in an environment in order to reach individual or shared goals. Integrating a variety of sub-fields of Artificial Intelligence (reasoning, knowledge representation, machine learning, planning, coordination, communica-tion, and so on), works in MAS explore and provide solutions to current real-world problems.

Approaches to MAS can be divided into symbolic AI and sub-symbolic AI. Contributions taking a symbolic AI approach, i.e. based on logic and search meth-ods, are valued for their explainability but struggle to adapt to their environments. In contrast, contributions taking a sub-symbolic approach perform well on learn-ing from past behaviour and optimislearn-ing their policies to the current state of the environment but struggle to explain the origins and causes of their decisions and actions.

In light of the complementary advantages of both (explainability and optimis-ing policies from experience), this research project investigates the possibility of combining the strengths presented by both approaches. It does so by extending a symbolic approach, known as the belief desire intention (BDI) model, with a sub-symbolic optimisation method, known as a genetic algorithm (GA).

The methods are investigated in an annual event devoted to stimulate research and development on MAS: the Multi-Agent Programming Contest (MAPC). The purpose of this contest is to identify key problems, develop benchmarks, compare languages and platforms, and compile test cases for MAS [3].

Each year, the MAPC provides a concrete problem scenario in the form of a game. Agents Assemble II, the 2020 scenario, is the second iteration of the scenario used in the MAPC 2019 (see Figure 1). It consists of two teams of agents moving on a grid. The goal of the game is to acquire blocks from the world and assemble them into complex patterns provided by the game server. Agents can decide to construct for their team, counter the enemy team, or explore the environment.

With a team of five bachelor students, and the help of two supervisors, this year’s problem is tackled. Ultimately, the results of the presented methods are placed in a broader context, shedding light on the requirements for extending a BDI-inspired model with non-symbolic methods.

(7)

Figure 1: A grid scenario used in MAPC 2019.

1.1 Background

The 2020 “Agents Assemble II” contest consists of two teams of agents moving on a grid. The goal of the game is to explore the environment, acquire blocks, and assemble them into complex patterns.

Agents can attach blocks to each of their four sides. Blocks rotate with the agent and can be connected to blocks of other agents to create complex shapes. To gain points, agents have to construct the shapes requested by the game server over the course of the game and deliver them to predefined goal locations.

The game is played in three rounds with an increased amount of agents for each round (15, 30, and 50 respectively). Each round takes 750 in-game steps, i.e. all agents can execute 750 actions. Subsequently, the scores of the teams are compared and tournament points are given as reward.

1.1.1 The environment

The environment is represented in a rectangular grid. Agents are limited to a perception of their local surroundings and are therefore unaware of the dimensions of this environment. The grid loops both horizontally and vertically. For example, if an agents moves off the edge at the top of the map, it appears on the bottom of the map.

(8)

There are three different types of terrain in the game: empty cells, obstacle cells, and goal cells. Empty cells represent empty space agents can move across freely. In contrast, obstacle cells are not traversable, i.e. they block moves and rotations that would involve the cell. Agents are required to submit their tasks at goal cells, i.e. an agent has to be on a goal cell to submit the shape attached to them.

In addition to the terrain, the game provides a number of ‘things’ can inhabit a cell, including entities, blocks, dispensers, taskboards, and markers. Entities are controlled by agents and are able to perform the actions defined by the game. Blocks are used to build shapes; they come in multiple types (in the current sce-nario there are three types: b0, b1, and b2), and can be retrieved from one of the sources of blocks distributed around the map, referred to as dispensers. Before agents can submit tasks at the goal location, they are required to accept it at one of the taskboards. Solely agents that accepted the task are able to submit that task. Note that this does not grant that agent sole rights on the task. Instead, teams can simultaneously work on a task. In addition, the taskboards are solely required for accepting a task, as agents receive updates about all available tasks at each game-step. Finally, cells tagged as markers indicate that clear events or actions will occur in a cell.

Clear events are randomly generated events that occur in areas of varying sizes. They clear the target area and generate new obstacles around the center. Agents that are located in the target area during this event become disabled for a set amount of steps. To provide agents with the option to exit the targeted area, clear events can be perceived shortly before they occur.

1.1.2 The agent

Every agent controls an entity in the game. Agents receive updates from the server and are able to perform one primitive action at every game-step. Actions can range from moving and rotating to connecting blocks. An overview of all actions agents are able to perform can be found below.

Skip Don’t perform any actions during this step.

Move Move in a given direction (north, east, south, or west). Attach Attach an adjacent block to the agent.

Detach Detach an adjacent block from the agent.

Rotate Rotate the agent (including attachments) clockwise or counter-clockwise.

(9)

Connect Two agents can use this action to connect blocks attached to them. Disconnect Disconnect two blocks from the agent.

Request Request a block from an adjacent dispenser. Submit Submit an attached shape to complete a task.

Clear Clears out obstacles and disables agents present in the targeted area (target position and the four adjacent cells). The clear action takes a set amount of consecutive calls to be performed (three in the current scenario). Moreover, a fixed amount of energy is used for a successful clear action.

Accept Accept a (specified) task from a nearby taskboard.

Agents are updated about their local surroundings, energy level, and if they are currently disabled at every game-step. The local percept provides information about cells within a 5 block radius (e.g. the relative coordinates of a nearby enemy agent or obstacle). Each agent has an energy level that is automatically recharged by one level at each step. This energy is, in the current scenario, only used by successfully completing a clear action. When an agent’s energy runs out, they become disabled, causing them to lose all attachments and remain inactive for a set amount of steps.

1.1.3 Tasks

Tasks are randomly generated over the course of the game. They range from easy tasks consisting of one block to hard tasks consisting of multiple blocks of different types (e.g. Figure 2). The amount of points rewarded for a task depend on its difficulty. In addition, tasks have a completion deadline and decrease by one rewarded point with each step. Therefore, the time required for creating a shape has to be taken into account when picking a task.

(10)

Figure 2: an example of a generated task as proposed by the game. The red square indicates the location of the agent with respect to the blocks.

1.1.4 Agent roles

To encourage a variety of methods and establish feasible projects, the workload is distributed across five bachelor students. It includes setting up a framework for interacting with the game scenario and providing the agents with the means to reason about the game from a BDI model. The abilities agents are required to have to successfully play the game have been identified in four roles: builders, scouts, attackers, and a strategist. Let us first define the roles and then elaborate on the the distribution of the workload.

The builders are responsible for building patterns of blocks. They are required to pick between generated tasks and build the required constructs (see Figure 2). Builders can be divided into (1) individual builders, (2) collaborative builders and (3) a task manager. Individual builders are required to single-handedly collect blocks from dispensers and deliver them to a goal state. They differ from collab-orative builders, whose goal is to obtain blocks, coordinate building, and deliver constructs with multiple agents. A task manager evaluates the tasks proposed by the game, and assigns them to the available agents best fit for them. Depending on the circumstances, the task manager can assign individual builders or collabo-rative builders to a task. Note that the task manager is not physically present in the game. Instead, it operates centrally ’behind the scenes’ as a ’meta-agent’.

Scouts explore the map and infer information relevant to the agents. They are responsible for building a map that can be used for navigating to and reasoning

(11)

about points of interest (e.g. goal states, dispensers). The created map will form a basis for the beliefs of the agents.

Attackers halt the opponents by disabling enemy agents. More specifically, they disable opponent agents by following and targeting clear actions at them. Moreover, attackers attempt to identify the most valuable agents of the enemy team. In most cases, this implies targeting enemy agents that have blocks attached, as these agents form the direct source of points for the enemy team.

A strategist observes the current state of the game, prioritises roles, and dis-tributes resources across teams of agents (e.g. teams of attackers, scouts, and so on) accordingly. Depending on the computational requirements and necessity this can be done at every step, or periodically. Note that the strategist is not physically present in the game. Similar to the task-manager, it operates centrally ‘behind the scenes’.

1.2 Distribution of the work

The following distribution of the work was agreed on. Note that some roles are assigned multiple times to ensure the completion of the roles for the final frame-work. Moreover, the distribution mentioned below does not include the individual research topics of the students. Instead, it refers to all preliminary work required for competing in MAPC and tackling the individually researched topics:

• Jensen is assigned the attacker role [11]. Moreover, Jensen tackles the basis for agent communication with the server and the implementation of the BDI model.

• Bekaert is assigned the scout role [5]. Additionally, Bekaert focusses on pro-viding agents with the means of saving their beliefs about the environment. • Stolp is assigned the builder role [22]. Furthermore, Stolp provides the agents with the means for reasoning concurrently. Besides this, Stolp is involved in tackling the implementation of the BDI model.

• Abdelrahman is assigned the attacker role [2].

• This research project tackles the builder role. Moreover, it focusses on pro-viding a basis for agents to communicate with the server and navigate con-currently.

After the builder, scout, and attacker roles have been completed, the strategist role is tackled. Depending on the workload of the respective roles, the strategist is created by one, or multiple students, in a static (i.e. predefined assignments), or

(12)

dynamic (i.e. based on the state of the game) manner. In any case, working on the other roles provides insights required for creating the Strategist. For example, the effectiveness of the roles can be taken into consideration, as well as the features the dynamic role assignments could be based on.

1.3 Outline of this research project

This paper is composed of four themed chapters. The first chapter begins by laying out the theoretical dimensions of the research. Literature on the topic of MAS (the BDI framework, GAs and so on), agent navigation, and relevant frameworks is reviewed. The second chapter is concerned with the methodologies used for this study; it presents the framework built for both MAPC and our individual research topics. Through this framework, the methods for combining BDI and GA are presented. The third chapter presents the results of these methods. Subsequently, the results are analysed and discussed in the final chapter.

2 Literature Review

This section provides an overview of the literature on topics relevant to the present project. First, the different approaches to MAS are discussed and elaborated. His-torically, research on MAS started from a focus on individual agents’ architecture, mental models, and behaviors. In more recent years it has shifted towards adap-tivity, environment, openness and the dynamics of these systems [1]. The context in this section provides insight into the decision of combining BDI with a GA. Second, techniques for solving the shortest path problem are examined; this is required for the navigation of the agent. Finally, frameworks for building the BDI model and implementing the GA are examined.

2.1 The belief-desire-intention (BDI) model

One of the most famous symbolic approaches to cognitive agents in MAS is the BDI model [18]. Inspired by Bratman’s notions of mental attitudes [6], the model is characterised by specifying computational counterparts of the agent’s beliefs, desires, and intentions.

Beliefs form the agent’s knowledge of the environment. The term is used in favor of knowledge to emphasise that an agent’s beliefs may not necessarily be true. Desires represent objectives or goals the agent would like to achieve. For example, to gain points in the game or to find the best possible task to complete. The means for achieving these future world states and the options available to agents are represented in plans. These contain a body of primitive actions or

(13)

goals the agent has to perform to achieve a given desire. An exemplary sub-goal would be to complete a certain task (higher-level objective) or navigate to a dispenser (lower-level objective). Because of the connections between plans and sub-goals, combined with the connection between sub-goals and primitive actions, agents’ actions are made explainable during execution. Plans are provided with an invocation condition and a precondition, specifying the circumstances under which the plan is ‘triggered’ and the conditions that have to hold for the plan to be executable. When the invocation condition of a plan triggers and the preconditions hold, a plan turns into an intention. At this stage, the agent starts executing the plan.

Although BDI models proved effective in various cases, especially in single-agent scenarios, the model also exhibits limitations. One of the main downsides is that the model lacks a form of learning, i.e. adapting to new environments and learning from past behaviour. Moreover, the model does not provide agents with any estimates about future states, making it susceptible to undesirable side-effects created by (often irreversible) actions.

2.2 Planning

In light of these problems, the demand for methods capable of taking future states into account grew. Planning, an approach central to AI research, became a widely recognised paradigm for realising action selection strategies [7, 15]. Solutions are often found offline, and can be adjusted online when necessary (e.g. in dynamic environments). The main downside to AI planning, however, is its need for pre-defined complete world models. In addition, the complexity of creating strategies makes AI planning undesirable in environments with heterogeneous agents and sizeable action spaces [10].

2.3 Learning

For creating strategies that are not limited to general solutions, i.e. for heteroge-neous agents and unknown environments, systems that learn coordination strate-gies have been proposed [16, 9, 20]. These approaches can roughly be classified into three types: (1) individual reinforcement learning (RLI) methods, (2) grouped reinforcement learning (RLG) methods, and (3) GA methods [10].

As the name suggests, RLI methods approach learning in a distributed manner, i.e. agents run local copies of reinforcement learning algorithms. Although RLI methods generally allow for scalability, finding solutions that satisfy the greater cause in multi-agent environments is not guaranteed. Because agents only have ac-cess to local perceptions, it is generally difficult to adequately model other agents’ behaviour.

(14)

RLG methods approach learning in a centralised manner. They do a consid-erably better job at modelling multi-agent behaviour, at the cost of high sample complexity due to the combinatorics of grouping [8]. RLG methods thus pose issues with scalability.

GA methods are often compared to natural selection. Groups of agents are analysed, the best of which reproduce into a new generation. Therefore, GA methods do not consider all potential groups, but only a sub-set (this is known as beam-search). This means that the complexity can be controlled by limiting the size of subsets, although this could yield suboptimal performance.

For the above mentioned reasons, and the requirement for scalable methods in the current scenario, the project continues the focus on GAs in particular. Thus, a description of the inner workings of a GA is provided.

GA methods involve three biologically inspired steps: mutation, crossover, and selection. Initially, the algorithm generates a population consisting of random individuals. Then, an iterative process begins where each iteration is referred to as a generation of the population. For every new generation, the fitness (e.g. cost function) of all individuals is evaluated. A set amount of the fittest individuals are stochastically selected for the next generation. They continue to the most significant step in GA: crossover. This is the process where individuals combine their ‘genes’ to create ‘offspring’. A point in the genes of two individuals, referred to as as crossover point, is selected before or after which they exchange their ‘genes’ (see Figure 3). The resulting offspring is added to the new population. Finally, some of the individuals are randomly mutated, changing part of their genes. The iterative process terminates after a set amount of generations or when a satisfactory fitness level has been reached for the population. However, with regards to controlling the complexity of a GA, a set amount of generations is preferred.

(15)

Figure 3: the crossover step in GA methods. A crossover point is randomly selected for two parent individuals. The parents swap their ‘genes’ after the crossover point forming offspring for the new generation.

2.4 The shortest-path problem

There have been numerous proposals of algorithms for finding the shortest path between two nodes in a weighted graph (often referred to as the shortest-path problem). The Dijkstra algorithm and the A* algorithm are two well-known ap-proaches to the problem. They both present their advantages and shortcomings, accentuated by the circumstances the algorithms are subjected to. Attempts min-imising the shortcomings or optmin-imising the algorithms for more specific use cases can be found in the numerous variations of the algorithms currently existing.

The Dijkstra algorithm finds the shortest path between a given node and every other node present. The algorithm does not make use of any goal-oriented heuris-tics. For this reason, it can be used in environments where these heuristics are computationally expensive or unknown. Owing to the fact that the algorithm ex-pands in every direction and its sole consideration for determining the next node is its distance to the start node, the algorithm can be relatively slow in some environments.

The A* star algorithm can be seen as a version of the Dijkstra algorithm extended with goal-oriented heuristics. It finds the next node in the path

(16)

min-imising the current and estimated remaining costs. More specifically, the function f (n) = g(n) + h(n) is minimised where n is the next node, g(n) is the cost so far, and h(n) is a heuristic function that estimates the remaining cost to the goal node. This approach works well in most static environments. However, in environ-ments where the terrain is initially unknown and the path needs to be recalculated frequently as a result of new perceived obstacles, the operation becomes expensive. Incremental heuristic search methods, a variation of the A* algorithm, pro-vide a solution to the above-mentioned problem. Reusing information of previous searches allows them to find solutions to series of similar searches considerably faster. An algorithm presented by Sven Koenig and Maxim Likhachev named D* lite implements an incremental heuristic search method specifically targeted at goal oriented navigation in unknown terrain [12]. In contrast to A*, D* lite does not require the map to be known. Instead, the algorithm makes assumptions about unseen terrain, e.g. that they are traversable. It plans a path based on the current beliefs from the goal node to the starting node. As the solver searches it saves the weight values used in determining it’s path for the next iteration. When the agent observes new obstacles along the path it is traversing, the solver updates the weights around the obstacle and recalculates the path.

2.4.1 Traveling salesman problem

Agents tasked with visiting multiple locations in the environment (e.g. for retriev-ing blocks of different dispensers before movretriev-ing to a goal state) are required to optimise the order of visiting these locations.

Solving the problem of finding the shortest path between multiple nodes, vis-iting each node and returning to the start, is referred to as the traveling salesman problem (TSP). The combinatorial optimisation problem is an NP-hard problem. Therefore, depending on the amount of nodes to visit, lines of attack amount to an exact approach (for small problem sizes) or a ‘suboptimal’ heuristic approach (for larger problem sizes) [4].

A method used for an heuristic approach is the two-opt algorithm. It is a simple heuristic search algorithm that solves a TSP consisting of n nodes with a complexity of O(n2) [13]. In contrast, a brute-force approach, i.e. considering all possible combinations of nodes, has a complexity of O(n!).

2.5 Frameworks

Multiple languages and frameworks exist for implementing a BDI model [14]. We adopt the view that choosing an appropriate BDI system depends on the ratio-nal behaviours required for that system. The focus in the current research is on

(17)

extending BDI with optimisation and learning. Therefore, the chosen framework should fit into these requirements.

Python is a widely known programming language that can provide the user with extensive AI oriented packages and tools. As it is a high-level programming language, it is able to perform many actions in relatively little code. Python also has its downside, however: the language is not designed to run concurrently. This implies that agents are reasoning on multiple threads (e.g. the reasoning of multiple agents is quickly alternated) instead of ‘truly’ running simultaneous. In spite of this, the advantages Python presents for implementing ML-methods outweigh this negative.

One of the most notable systems for implementing a BDI model is a language specifically designed for it: AgentSpeak [17]. A package called Spade provided a way of implementing this language into Python. However, due to a poor docu-mentation and server synchronisation issues, the option was discarded at an early stage of the project.

Consequently, the choice for building a custom BDI framework in Python was made instead. The advantage of this approach was that it would allow us to un-derstand the framework down to its finest details. For a more detailed description on the choice of writing a custom BDI framework, the reader is referred to [22].

3 Method

The current research presents a framework for interacting with the scenario pro-vided by MAPC. The framework is written in Python, and consists of classes in an architecture of inheritance (see Figure 4). Each agent is provided with an instance of the SuperAgent class, inheriting all functionalities and tools for reasoning an agent requires. This includes communication with the server through the Multi-Agent Systems Simulation Platform (MASSim) protocol, providing agents with the ability to execute primitive actions, and reasoning about higher level plans for the respective roles through an implementation of the BDI model. The present project expands the model with GA to optimise task-selection based on the current state of the game.

As it is crucial that individual agents do not interfere with the other agents’ time for reasoning, the reasoning for each agent runs concurrently in multiple threads. This circumvents the non-concurrent scenario where one agent takes an excessive amount of time and leaves little or no time for the remaining agents to reason. Note, however, that reasoning from multiple threads does not entirely solve this problem as the execution is simply alternated between them. Therefore, the system could be overloaded in scenarios where multiple agents require highly expensive reasoning.

(18)

The remaining part of the framework is explained traversing the classes from the lowest to the highest level of the hierarchical inheritance structure. Note that the implementation description of the attacker and scout role is excluded from the current research1_.

Figure 4: the class inheritance architecture of the presented framework. The SuperAgent inherits all functionalities and tools to reason from different roles and communicate with the server. Moreover, SuperAgents communicate with the Strategist and Task Manager for all centralised reasoning.

1_{Readers are referred to the theses by Jensen and Abdelrahman for a description of the}

attacker role implementation and to the thesis by Bekaert for a description of the scout role implementation [11, 2, 5].

(19)

3.1 The Server

The Server class handles the messages from and to the game-server. Agents send and receive JSON messages trough standard TCP sockets to send actions and receive relevant information. This messaging format is referred to as the MASSim protocol.

The socket library, a well known low-level networking interface, is used to establish a connection with the server.

1 import socket

2 socket = socket.socket() 3 socket.connect((host, port))

When the connection to the socket is successful, an authentication request is sent to the server including the agent’s name and password.

1 auth_request = { 2 "type": "auth-request", 3 "content": { 4 "user": agent_name, 5 "pw": agent_password 6 } 7 } 8 9 socket.sendall((json.dumps(request) + "\0").encode())

Each message to the server is terminated by a separate 0 bite and encoded in ‘utf-8’ . The server buffers everything up to the 0 bite, parses a JSON string from it, handles the request, and returns a response. A message is received from the server by reading data from the socket, decoding it, and parsing the result into a (Python) dictionary.

1 msg = json.loads(socket.recv().decode())

When the server sends an auth_response message indicating a successful au-thentication, the agent is connected to the server. Connected agents are able to send primitive actions and receive messages from the server during the game in an identical manner.

3.2 The Agent

The Agent class provides templates for the basic functionalities agents needs to operate. More specifically, it provides templates for the actions defined in Sec-tion 1.1.2. These primitive acSec-tions can be combined sequentially to have agents acting in the environment. To facilitate interaction with the agents, agents are additionally provided with the means to navigate through the environment. Note that providing the agents with navigation improves transparency with respect to

(20)

the programmers, as navigating to a predefined goal is more informative than sequences of move actions.

Navigation The D* lite algorithm was chosen as path planning algorithm for its fast and efficient adaptation to unknown terrain. The implementation by Stephens was used as a basis and expanded for the current scenario [21]. Obstacles, unlike blocks, are not strictly untraversable in the game since agents are able to clear obstacles using the clear action. This could, in some cases, significantly reduce the amount of steps to reach a certain goal. Moreover, after the obstacle is cleared, the newly created path could be used by other agents. Note however, that an agent should be less likely to clear an obstacle if it’s energy level is low to prevent it from running out of energy. Because of these reasons, the costs of transitioning to an obstacle was changed from infinite to a function of the energy level p of the agent c(s, s0, p) = 32 · e−0.008·p− 1 (see Figure 5). More specifically, the original cost function c(s, s0) determining the transition costs between locations s and s0 (i.e. the cost of moving between them) was changed to

c(s, s0, p) =      32 · e−0.008·p− 1, if s0 ∈ O inf, if s ∈ B ∨ s0 ∈ B 1, otherwise

where O defines the set of obstacles locations and B defines the set of block locations.

Figure 5: the obstacle transition costs. p is the energy level and c(s, s0, p) is the costs for transitioning from a location s to an obstacle location s0.

(21)

This newly presented cost function is an heuristic identified empirically as the best performing during our tests. It ‘encourages’ the D* lite algorithm (i.e. a low cost) to path find through obstacles if an agent is high on energy and ‘discourages’ it otherwise. For example, in the situation portrayed in Figure 6, D* lite finds the path portrayed in option A if the agent is low on energy, and B if the agent is high on energy. Note that D* lite does not reason about clear actions but merely finds a path ‘through’ obstacles; when an agent moving along the path generated by D* lite observes that the next node is an obstacle, a clear action is executed.

In addition, the presented methods for navigations should allow agents to nav-igate with attached blocks (as attached blocks cannot traverse obstacles). There-fore, every location in the map is attributed a variable surr_obstacles, indi-cating the amount of obstacles directly surrounding the location. The value of surr_obstacles is increased for locations surrounding a newly perceived obsta-cle, and decreased when an obstacle is cleared. An agent with blocks attached cannot path find to a location with surrounding nodes (i.e. the cost is infinite). Therefore, the agents keep more distance from the obstacles, allowing them to navigate with attached blocks.

(22)

Figure 6: an example of two options for moving to the goal. In option A, moving to the goal takes 8 steps in total: 8 move actions that take 1 step each. In option B, moving to the goal takes 7 steps: 1 clear action that takes 3 steps and 4 move actions that take 1 step each.

3.3 The BDI Agent

The BDIAgent class provides all tools necessary to reason following the BDI model. Every agent is provided with the means for storing beliefs about the environment. The states which the agent wants to bring about form its desires. The current implementation does not implement an explicit representing an agents’ desires. Instead, each agent role is assigned here is a ‘meta-desire’ (i.e. not explicitly defined in the framework). That is, builders are assigned the ‘meta-desire’ of ‘finding the best and most efficient way to construct shapes’, attackers are assigned the meta-desire of ‘finding the best way of hindering enemy agents’, and scouts are assigned the meta-desire of ‘finding the best way of exploring the environment’. Every game-step, the intention queue is checked: if it is empty, a plan is adopted from the agent role assigned to that agent by the strategist, otherwise, it executes the intention in the head of the queue.

(23)

3.3.1 The beliefs

The agents’ beliefs form a basis for all reasoning about the environment, and they contain all relevant information sent to an agent by the server. The beliefs are represented in an object containing a grid graph for the map representation, and additional attributes for fast access to frequently used information. When two agents recognise each other, they merge their beliefs forming a single belief object accessed by both agents.

The grid graph consists of node objects and is accessed through a Python dictionary. As the agents do not have a perception of their global position, the origin of the graph is located at their initial position. At every step, the nodes are updated based on the percept received from the server. If the percept includes information about a location unknown to the agent’s beliefs, a new node is created and added to the graph.

In addition to a representation of the map, the belief object stores supplemen-tary attributes relevant to the agents. This includes, but is not limited to, an agent’s energy level and the locations of all known dispensers, taskboards, and goal states. The latter prevents agents from repetitively searching through their map representation.

These attributes, as well as an agent’s grid graph, are merged with other agents’ beliefs over the course of the game. More specifically, when two agents receive a percept of a friendly unknown entity, they send a message to the strategist including the relative coordinates and obstacles visible for both agents. If both entities sent a message, share the inverse relative distance, and their surrounding obstacles overlap, the two agents merge their beliefs. This implies that both agents now update and request knowledge from the same belief object. For the reason that, after merging, multiple agents update and access the same beliefs, the beliefs are stored in a central location. In the current implementation of the framework, the graph objects are therefore stored centrally in the Strategist class. A more in-depth explanation of the process of merging and saving the beliefs can be found in [5].

At the beginning of each game, a few agents are assigned the task of finding the dimensions of the environment. With this knowledge, the horizontal and vertical edges of the map are linked (e.g. a node on the right edge links to a node on the left edge of the graph). This allows for more efficient path planning. For example, a path planned from (-15, 0) to (20, 0) would normally take 35 steps to reach (assuming there are no obstacles), but with the knowledge of the map being 40 blocks wide, the agent can loop the map horizontally and reach it’s goal in only 5 steps. An in depth explanation of how agents find the dimensions of the environment can be found in [5].

(24)

3.3.2 The plans

Every agent role in the framework provides specific plans for the agent to adopt. If the intention queue is empty, plans are adopted from the roles assigned to agents using role.get_intention() where role refers to the object of the selected role. The conditional (if-else) statements in these functions represent the invocation conditions of the plans. If a plan is ‘triggered’, the goals of this sub-plan are added to the intention queue. They consist of lists of:

1. Functions referencing either sub-plans or primitive actions. 2. Arguments for the functions calls.

3. Contexts (i.e. evaluation functions) acting as the preconditions of the plan. 4. Informal label-identifiers, i.e. descriptions, to help explain the actions in the

plan.

5. Booleans indicating if the respective functions are sub-plans or primitive actions.

3.3.3 The intentions

The intention queue holds the agent’s current intentions. At each step, the in-tentions queue is evaluated: if it is empty, a new plan is adopted (Section 3.3.2), otherwise, all intentions are scanned for their validity based on the contexts of the plan. For instance, when an attacker lost the agent it is following, the intention of following the agent is dropped. Subsequently, the agent recursively expands the head of the intention queue until it reaches a primitive action. This action is returned and executed by the agent. An advantage of expanding on an intention when it is in the head of the queue is that the methods representing these plans are only called at the moment that they are required. Consequently, the reasoning from these plans can be done according to the most up-to-date beliefs. For a more detailed description on the implementation of the intention queue, the reader is referred to [11, 22].

3.4 The Builder

The builder role contains (1) individual building plans and (2) collaborative build-ing plans. Agents assigned the builder role ‘await instructions’ from the task manager by iteratively scanning a task queue for their agent id. The details pro-vided by the task manager are represented in tuples consisting of an agent id (i.e. the assigned agent) and a dictionary of details about the optimised ‘plan of attack’ for a task. This dictionary of details contains the following properties (i.e. keys):

(25)

is_individual indicates if the task is performed individually or collaboratively. The agents perform a plan relative to the amount of agents working on the task. In other words, a task for one agent triggers the individual building plans and a task for multiple agents trigger the collaborative builder plans.

task_name is the name of the task at hand. Agents can use this name to retrieve information about the task, e.g. it’s shape, reward or deadline.

pickup contains an ordered list of locations and the required blocks to retrieve or tasks to accept. The last item of the list is the location of the goal state the agent should navigate to.

task_master is the id of the agent responsible for handing in the task. This is the agent that accepts the task and that the shape will be constructed onto. Note that this property is empty if the task is assigned to an individual builder.

comm_queue is the queue object provided to all agents working on the same task. It is used for communication between these agents, e.g. for connecting two blocks. Similar to the task_master property, it is empty if the task is individual.

The following is an example of an item in the task queue where agent 5 is instructed to retrieve three b0 blocks from a dispenser located at (13, 2), and move to the goal at (5, 2). The task_master property indicates that agent 7 is task_master, i.e. agent 7 will accept the task, and the shape will be constructed onto agent 7.

1 (

2 5, # these instructions are for agent 5 3 {

4 ’is_individual’: False, 5 ’task_name’: ’task13’,

6 ’pickup’: [(’b0’, (13, 2), 3), (’goal’, (5, 2))],

7 ’task_master’: 7, # agent 7 accepts and submits the task 8 ’comm_queue’: <comm_queue object>

9 } 10 )

3.4.1 Individual builder

The plans of the individual builder enable the agent to complete tasks with up to four adjacent blocks. Among the plans is the highest level plan do_task, that

(26)

breaks down into get_task, get_blocks and submit_task.

The do_task plan is provided with the details from the task manager to com-plete the task. The plan returns sub-plans for accepting the tasks, retrieving blocks, and submitting the task. The sub-plans are provided with a description: getBlocks, getTask, and submitTask respectively. These descriptions provide insight into an agent’s actions. Note that the order of the returned task and block retrieval is in accordance with the provided order of the details’ pickup property.

1 # an example of do_task in pseudo code 2 def do_task(details):

3 intentions = sort_by_details([get_blocks, get_task, submit_task]) 4 contexts = task_exists # evaluation function checking if the

deadline has passed.

5 arguments = get_arguments_from_details()

6 descriptions = sort_by_details(["getBlocks", "getTask", "getTask"]) 7 primitives = [False, False False] # these are not primitive actions

8 return intentions, contexts, arguments, descriptions, primitives

The plan named get_task navigates the agent to the provided taskboard and accepts the task as specified by the task manager. Likewise, get_blocks navigates the agents to the provided dispenser, requests a block, and attaches it. However, requesting a block is not a primitive action; it is achieved by calling the sub-plan orient_and_request. This sub-plan reasons about the location the block should be attached with respect to the other attached blocks (if there are any). It then calculates its optimal rotation strategy for rotating to an empty slot. The agent executes these rotations, requests a block, and attaches it to the adjacent slot.

1 # an example of get_task in pseudocode 2 def get_task(taskboard, task):

3 intentions = [nav_to, accept] 4 contexts = task_exists

5 arguments = taskboard, task

6 descriptions = ["navToTask", "acceptTask"] 7 primitives = [True, True]

8 return intentions, contexts, arguments, descriptions, primitives

The submit_task plan navigates the agent to the provided goal state and submits the task. The submission of the task is performed by the sub-plan turn_and_submit. Like its name suggests, it rotates the agent according to the orientation of the attached task and submits it.

3.4.2 Collaborative builder

The plans of the collaborative builder provide means to build complex shapes. Similar to the individual builder, the highest level plan of a collaborative builder is do_task. The plan breaks down into get_blocks, get_task, position_at_goal,

(27)

and prepare_build depending on the details provided by the task manager through the task queue. Due to time constraints, the collaborative builder has not been completed in its entirety. However, the theoretical architecture is pre-sented and elaborated on.

The do_task plan is provided with the details of the task at hand. Similar to the individual builder, the do_task plan breaks down into get_task and get_blocks according to the details’ pickup property. For collaborative builders, the attach location of the blocks when retrieving them from dispensers is not relevant as the shape is rebuilt later. Therefore, orient_and_request inspects the dispenser and looks for the least amount of turns required to turn to an empty slot. Subsequently, the agent checks if it is the task_master. In this is not the case, agents navigate to the edge of the goal state (prepare_build) and await the task_master. If the agent is the task_master it navigates to the center of the goal state (position_at_goal) and sends a message through the communication queue indicating its arrival. The agents then take turns in attaching blocks to the task_master, starting with the blocks closest to the agent and ending with the furthest. The agents communicate when they are done attaching blocks, and leave the area to create space for the other agents.

3.5 The SuperAgent

Each agent is controlled by an instance of the SuperAgents class. It inherits all agent functionalities, and is therefore able to reason from any role using the BDI model. The SuperAgent contains one vital function: run(). It acts as the main loop for every agent.

The run function continuously receives messages from the server. If the agent receives an “action request”, it updates its beliefs and listens to the strategist to decide which role it should reason from. If the intention queue is empty, the agent retrieves an intention from the assigned role. Subsequently, the agent retrieves a primitive action to be executed from the head of the intention queue (Section 3.3.3), and returns it to the server. Note that switching roles does not flush the agent’s intention queue, i.e. agents finish their current tasks before reasoning from the newly assigned role.

3.6 The Strategist

Due to constraints in time, the strategist implements a static distribution of the roles.

At the start of each round in the game, 40 percent of the agents is assigned the attacker role, 10 percent is assigned the builder role, and 50 percent is assigned the scout role. When the scout agents are done with exploring the map, the strategist

(28)

reassigns them to builders. Therefore, the final distribution of the agents is 60 percent builder and 40 percent attacker. For a more detailed description of the strategist implementation, the reader is referred to [5].

3.7 The Task Manager

The task manager is responsible for evaluating the tasks (i.e. shapes) provided by the game server, and assigning them to the best fit agents (an NP-complete prob-lem). The presented solution seeks to optimise this task-selection for the current state of the game, in less than polynomial time. Therefore, the chosen method should allow for scalability. Based on this requirement and its goal-oriented fea-tures, the method chosen for this optimisation is an implementation of GA.

The GA’s objective is to select the best possible task for the agents to complete given the current state of the game. Relating to the BDI framework, the objective is to convert the meta-desire of ‘finding the best and most efficient way to construct shapes’ into a concrete goal (a specific task) that can be completed by means of a plan.

The ranking of tasks is achieved in two steps. First, the optimal combination of agents is evaluated for individual tasks given the current location of the agents and their availability. Second, a ranking of the tasks is made based on the estimated time spent on the tasks and the reward corresponding with each task.

3.7.1 Evaluating individual tasks

To find the optimal combination of agents for individual tasks, the GA initially generates random distributions of the task between available agents, i.e. the tasks components are assigned to random agents. The algorithm then selects the distri-bution of agents with the lowest fitness score for the next generation, ‘mates’ the selected distribution, and randomly mutates them to form a new generation. Note that in this case, the fitness is minimised as the fitness of an individual is the sum of steps required for completing the task.

An individual in the population of the GA is represented by a binary vector ~i

consisting of a task and the available agents. Here, tasks are represented as sets of blocks to pick up. As individual agents are able to attach one block in every direction, agents are able to retrieve four blocks in total. Moreover, every unique block type has to be retrieved from a different dispenser. Therefore, a set of blocks to pickup can consists of up to four blocks of the same type. For example, five blocks of the same type are thought of as two sets of blocks to pick up: one set of four blocks, and one set of one block.

(29)

The contents of ~i representing a task with n sets of blocks to pick up and v available agents can best be conceptualised by considering the following (n + 1) × v matrix M =     

set 1, agent 1 . . . set 1, agent v ..

. . .. ...

set n, agent 1 . . . set n, agent v taskboard, agent 1 . . . taskboard, agent v

    

The result of flattening M (i.e. concatenating the rows to form one row) is ~i. Each cell contains a boolean (0 or 1) indicating if the respective agent will pick up (and deliver) the respective set, or navigate to the taskboard to accept the task. An example of ~i consisting of one b0 block (represented as set A), three b1 blocks (represented as set B), and two available agents can be found in Figure 7. In this example, agent 1 will pick up set B and accept the task, and agent 2 will pick up set A.

Figure 7: an example of a binary vector consisting of one b0 block (set A), three b1 blocks (set B), and two available agents. Agent 1 will pick up set B and accept the task, agent 2 will pick up set A.

Multiple constraints are set on ~i to prevent: • A set of blocks from being picked up twice. • A task from being accepted twice.

• A task from being accepted twice.

Constraint (1) and (2) are met by allowing only a single positive (i.e. 1) value for the taskboard and each set of blocks, as conceptually represented by a row in

(30)

M . Constraint (3) is violated when the sum of the amount of blocks contained in sets assigned to the agent becomes greater than four. As this constraint relies on information that is not represented in ~i (e.g. the amount of blocks in a set), it is enforced in the crossover and mutation steps.

The fitness of an individual is the sum of steps required for accepting the task, retrieving the sets of blocks, and bringing them to the goal state. For agents that have more than one assigned goal (e.g. picking up set A and accepting the task before navigating to the goal), finding the optimal route is a form of the TSP (Section 2.4.1). Representing these locations in an un-directed weighted graph, the shortest path for visiting all nodes is found using a python package implementation of TSP called tspy. The dynamic solver used in this implementation is configured to the two-opt algorithm (Section 2.4.1). To ensure an optimal path starting at the agent and ending at the goal state, a dummy node is added with an edge weight of zero to both the goal and the agent node. The edges to the remaining nodes are weighted with an infinite weight to prevent them from connecting. The dummy node is removed after the path has been optimised to result in a path from the agent to the goal state.

Finding the weights for the remaining nodes involves estimating the amount of steps required for navigating between them. There are multiple independent variables involved in this estimation. For instance, the time required for navigation increases as the game progresses due to the growing amount of random obstacles in the map caused by clear events. Polynomial regression, a technique used to model a non-linear relationship between independent and dependent variables, was used to find the relationship between navigation distances, the current step of the game, the chance of clear events, and the amount of required steps. For a more in-depth information on this process, the readers are referred to [22].

The presented structure lends itself well for a single crossover point chosen from a multiple of the amount of agents (i.e. conceptually between the rows of M ). This ensures that the aforementioned constraints (1) and (2) are not violated. After a crossover point has been selected, constraint (3) is evaluated. If it holds, the new individuals are added to the next generation, otherwise, the crossover point is reselected.

Finally, some of the individuals in the new generation are randomly mutated. Depending on the specified mutation ratio, an amount of objectives is reassigned. Conceptually, this amounts to randomly changing the positive value in some rows of M . Similar to the crossover step, constraint (3) is evaluated on the result. Likewise, the mutation is repeated until the individual meets all constraints.

As fitness levels differ between tasks, the algorithm is terminated after a speci-fied amount of generations. The individual with the lowest fitness score is selected as the best option.

(31)

3.7.2 Ranking tasks

After evaluating the individual tasks, a ranking of all tasks that are estimated to finish before the their deadline is made based on the point density d of each task. The point density is a value measurement for a task. It resembles the estimated amount of points gained per game-step. More specifically, it is equal to the amount of points rewarded for a task minus the estimated amount of steps required for completing the task (i.e. the amount of steps normalised by the amount of agents), normalised by the total amount of steps required (i.e. summing the steps of the agents): d = r − s v s = rv − s sv

Where v denotes the amount of agents assigned to the task, r the amount of points gained for a task, and s the total amount of estimated steps.

The tasks are then added to a ranking list in order of highest to lowest point density. As agents are only able to execute one task at a time, duplicate assign-ments are removed from the list from bottom to top. In other words, this operation ensures that only the task with the highest point density remains for every agent. In the final step, all tasks in the list are added to intention queues of the assigned agents. In other words, the agents adopt the goal of completing the assigned task .

3.7.3 Time complexity

Evaluating the time complexity of a GA is a research topic in of itself [19]. However, in most cases, the complexity originates largely from iteratively calculating the fitness of an individual [19]. Due to the optimisation required to solve the TSP in the presented GA, evaluating the fitness of individuals is most likely to present the majority of the computational requirements. Therefore, the complexity of the algorithm can best be compared in terms of the amount of times the fitness of individuals has to be evaluated.

The brute-force approach evaluates the fitness for all possible distributions of agents. This implies, that for

v

agents, k tasks, and tasks consisting of n sets, the fitness for k · vn _{distributions of}

agents has to be calculated. In contrast, as the GA calculates the fitness for every individual in each generation, the amount of times the GA calculates the fitnesses is g · p where g denotes the amount of generations and p denotes the population size.

(32)

In the current implementation, a value of 100 for g and a value of 50 for p was chosen, making the variation of the complexity solely reliant on the complexity of the fitness function. As the current implementation of the fitness function solves the TSP using the two-opt solver, the speed growth of the problem is best conceptualised in the complexity of this algorithm: O(n2) (Section 2.4.1). Note that this measure does not represent the ‘real’ complexity of the problem, but merely serves as an indication portraying the significant reduction of the speed growth.

3.8 Evaluation

This research project presents a BDI-inspired framework that can be extended with GA. The framework is evaluated for its ability to extend a symbolic method for MAS with sub-symbolic components. Moreover, the advantages and limitations of the current approach should be highlighted. In this context, evaluating the GA provides insight into the higher-level question of what the requirements for implementing a BDI ‘ML-extendable’ framework are.

Further elaboration on the work required for setting up an implementation of the BDI model that can feed data to a learning model will provided in the discussion.

3.8.1 Evaluating the genetic algorithm

The GA is evaluated by comparing its task assignments with the ‘optimally as-signed tasks’, i.e. the tasks and agent assignments that have the best point density available. These values are found using a brute-force solver that iterates over all possible combinations of agent assignments.

The locations of the agents, dispensers, the goal, and taskboard are randomly sampled from a set of locations L. This set contains locations ranging from (0, 0) to (69, 69), i.e. the locations of a map with a width of 70 and a height of 70 are considered. Moreover, the tasks are randomly sampled from a set T , containing 2000 instances of tasks generated by the server. Every element of T contains the amount of blocks for all three block types, and a reward (e.g. 3 b0, 1 b1, and 0 b2 blocks for a reward of 53 points).

The solution of the GA is deemed correct if both the brute-force solver and the GA yield the same ‘best task’ (i.e. the task with the highest point density), given identical input. The accuracy is tested for a number of agents v and a number of tasks k both ranging from 2 to 16 in steps of 2 , i.e. vi ∈ R and kj ∈ R where

R = {2, 4, ..., 14, 16}. The possible combinations of v and k are evaluated for 200 iterations each, yielding the algorithm’s accuracy.

(33)

The above mentioned evaluation finds the accuracy of the algorithm by com-paring the optimal task assignment. In addition, the algorithm is evaluated by comparing the point densities of the best calculated task from the GA and the solver for every combination of v and k. More specifically, the point densities based on the GA (defined as ds) are compared relative to the optimal point

den-sities (defined as dga), yielding a percentage p of the optimal point densities (i.e.

p = dga

ds · 100%. Note, that p cannot surpass 100% as the point densities found

using the solver cannot be lower than the point densities found using the GA (i.e. dga≤ ds).

4 Results

The GA’s accuracy for values of v and k as described in Section 3.8.1 can be found in Figure 8. The mean accuracy of the algorithm is 0.9, with a minimum of 0.7.

Figure 8: the accuracy of the GA in selecting the best task averaged over 200 iterations. Here, k denotes the number of tasks and v the amount of agents.

(34)

Table 1: the minimum, maximum, and mean values for p where p = dga

ds · 100%, dga

denotes the point densities found using the GA, and ds denotes the optimal point

densities found using the solver.

p min 92.65% max 100%

µ 96.48%

The minimum, maximum, and mean value for p can be found in Table 1.

5 Discussion

This research project presents a BDI-inspired framework extended with GA based optimisation. In this chapter, the presented results are discussed and evaluated. Subsequently, the results are placed in a broader context with suggestions on future work.

The presented results show an average accuracy of the GA of 0.9. Intuitively, Figure 8 shows a noticeable decrease in the accuracy for an increase in the amount of tasks and agents. From a perspective where the goal is to select the optimal task, the resulting accuracies for these higher values of k and v (resulting in the minimum accuracy of 0.7) could be seen as relatively low. However, as can be seen in Table 1, the task distributions selected by the GA are minimally valued 92.65% ‘as good’ as the optimal task distributions (and 96.48% on average).

The goal selection in the presented framework allows BDI agents to adopt nearly optimal goals during execution. These nearly optimal goals could not have been found online by a brute-force approach, as the complexity of a brute force approach poses intractable problems for online reasoning. Therefore, the results indicate that extending a BDI-inspired model with GA based goal-selection gen-erates promising results.

In a broader context, the framework enabled agents to operate in an environ-ment by reasoning from a BDI-inspired framework. To achieve this, agents were required to:

• Receive information about the environment. (Communication) • Process and store this information in beliefs. (Beliefs)

• Navigate through the environment to facilitate interaction. (Navigation) • Reason about goals from plans. (Reasoning)

(35)

• Execute these plans accordingly through intentions. (Execution)

These abilities allowed the model to be extended with a sub-symbolic method. The implementations of these abilities for the current project are evaluated and discussed to obtain a better understanding of the requirements for extending a BDI-inspired framework with sub-symbolic components.

Communication In the current project, agents were able to receive information about the environment through communication with the game-server. The presented approach was successfully able to establish communication with the server through JSON messaging. This was used to provide the agents with local percepts, info about the progress of the game, and info about tasks. Moreover, the agents were able to inform the server of the actions they would like to execute. To allow no ambiguity between agents, storing and updating information about the state of the environment in a central location can be considered as a requirement for multi-agent reasoning. Beliefs Agents in the game were able to process the information received from the

server and store them in their beliefs. The current implementation provided agents with an object containing a grid graph and additional attributes. Fur-thermore, agents were able to identify teammates, and merge their beliefs [5]. For the reason that multiple agents accessed the same beliefs, the beliefs were saved centrally in the Strategist class. As this ‘meta-agent’ was originally not intended for saving agents’ beliefs, the current implementation lacks an architecture coherent with the role assignments. Future work could improve the coherency of the architecture by creating a separate ‘meta-agent’ that saves the beliefs of the agents centrally.

As agents use beliefs to reason about actions, saving beliefs about the envi-ronment is a requirement for agents reasoning from a BDI-inspired model. Moreover, in environments where agents with limited perception are not pro-vided with any knowledge about their global position in the environment, agent identification is essential to reason about identical goal locations. For example, if two agents are collaboratively building a construct, they should be able to identify a location with identical global coordinates to build the construct on.

Navigation To facilitate interaction with the environment, agents are provided with means for navigation. In the presented framework, navigation is based on an implementation of D* lite. The benefit of this algorithm is that it does not require agents to replan the entire path when the agent is presented with new obstacles. After generating the initial path, the weights used for

(36)

planning the path are saved and only updated for the nodes surrounding new observations (excluding the nodes that were assumed to be empty). This means agents can follow a generated path and evade newly presented obstacles efficiently, thereby allowing for scalability. However, in the situ-ation that many agents generate their initial paths at the same time, the presented method can require an excessive amount of computational power for online reasoning. Therefore, future work could attempt to distribute the initial generation of paths by e.g. scheduling agent navigation. Moreover, as the MAPC environment allowed agents to clear obstacles, the current implementation adapted the cost function used by D* lite for path plan-ning through obstacles. However, the optimisation was done on an empirical basis. Future work could further optimise this heuristic with, for example, optimisation algorithms.

In systems where agents are required to visit points of interest throughout their environment, navigation is considered a basis for agents to interact with the environment. Systems that require a sizeable amount of agents to operate concurrently additionally require efficient replanning methods or a combination of path planning with low-level obstacle avoidance. This way, agents are left with sufficient time to reason about their actions.

Reasoning In this research project, agents are presented with plans to reason about their actions. More specifically, agents adopt the goals presented in the plans by adding them to an intention queue. The presented structure places individual actions into the context of plans, making them explainable to an observer. Although the implementation allowed for agents to oper-ate successfully, the framework lacks solid theoretical grounding. This can mainly be attributed to the fact that work on the framework had already started before the decision was made to adopt BDI reasoning. On top of this, structural adaptations to the current problem contributed to further diversion from the theoretical origin. Therefore, future work could instead build the framework around the theoretical basis of the BDI model.

In a broader context, agents enacting BDI reasoning should be able to rea-son from plans. Additionally, in circumstances where rearea-soning about goals poses intractable problems with a brute-force approach, using optimisation algorithms can provide a tractable solution.

Execution Executing plans through an intention queue enables agents to priori-tise goals and perform sequences of actions in a transparent way. In the current approach, higher level plans at the head of the queue are iteratively broken down into sub-plans. This allows agents to reason based on their latest beliefs, thus minimising the impact of unexpected events. Similar to

(37)

the section about reasoning from plans, the implementation of intentions lacks a solid theoretical grounding. This too, can be attributed to the later adaptation of the created framework to the BDI model.

For systems enacting the BDI model, an implementation of storing intentions is required to prioritise plans in a transparent way. Moreover, for agents operating in a dynamic environment, ‘unpacking’ intentions at the head of the intention queue (i.e. breaking them down into sub-plans) could pose as a method for limiting the impact of unexpected events, as agents are able to reason about their latest beliefs.

Due to time restrictions, the framework was not fully finished for the MAPC. Therefore, a summary of the work required to finish the framework is provided.

To complete the Builder role, more work is required on collaborative building. More specifically, the agents are not yet able to collaboratively create the shapes as required by the tasks. Additionally, strategies for evading attacks by enemy agents could be investigated. Moreover, the strategist role could be optimised to dynamically assign roles to the agents based on the current state of the game. For example, when the team is behind in points, the strategist could assign more agents the builder role, otherwise, more agents could be assigned the attacker role. Concluding this research project, the results of this study indicate the positive impact of extending a BDI-inspired model with GA based goal selection. In a broader context, the presented implementation sheds light on the requirements for extending a BDI-inspired approach to MAS with sub-symbolic components.

References

[1] Hosny Ahmed Abbas, Samir Ibrahim Shaheen, and Mohammed Hussein Amin. “Organization of multi-agent systems: an overview”. In: arXiv preprint arXiv:1506.09032 (2015).

[2] Y. Abdelrahman. “Undermining the opponent: extending BDI agents with disruptive behaviour for the Multi-Agent Programming Contest”. unpub-lished. 2020.

[3] Tobias Ahlbrecht, Jürgen Dix, and Niklas Fiekas. The Multi-Agent Program-ming Contest 2018 - Agents teaProgram-ming up in an urban environment. Springer, 2019.

[4] David L Applegate et al. The traveling salesman problem: a computational study. Princeton university press, 2006.

[5] D.J. Bekaert. “Using Topology-Grid Hybrid Map-based Exploration in an Unknown Environment”. unpublished. 2020.

(38)

[6] Michael Bratman. Intention, plans, and practical reason. Harvard University Press Cambridge, MA, 1987.

[7] Mathijs De Weerdt and Brad Clement. “Introduction to planning in multia-gent systems”. In: Multiamultia-gent and Grid Systems 5.4 (2009), pp. 345–355. [8] Michael L Dowell and Larry M Stephens. “MAGE: Additions to the AGE

algorithm for learning in multi-agent systems”. In: Proceedings of the Second International Working Conference on Cooperating Knowledge Based Systems (CKBS94). Citeseer. 1994.

[9] Thomas Haynes and Sandip Sen. “Learning cases to resolve conflicts and im-prove group behavior”. In: International Journal of Human-Computer Stud-ies 48.1 (1998), pp. 31–49.

[10] Fenton Ho and Mohamed Kamel. “Learning coordination strategies for co-operative multiagent systems”. In: Machine Learning 33.2-3 (1998), pp. 155– 177.

[11] D.P. Jensen. “Teaching belief-desire-intent agents how to learn: an integra-tion of machine learning and BDI agents”. unpublished. 2020.

[12] Sven Koenig and Maxim Likhachev. “Fast replanning for navigation in un-known terrain”. In: IEEE Transactions on Robotics 21.3 (2005), pp. 354– 363.

[13] Shen Lin. “Computer solutions of the traveling salesman problem”. In: Bell System Technical Journal 44.10 (1965), pp. 2245–2269.

[14] Viviana Mascardi, Daniela Demergasso, and Davide Ancona. “Languages for Programming BDI-style Agents: an Overview.” In:

[15] Felipe Meneguzzi and Lavindra De Silva. “Planning in BDI agents: a sur-vey of the integration of planning algorithms and agent reasoning”. In: The Knowledge Engineering Review 30.1 (2015), pp. 1–44.

[16] Lynne E Parker. L-ALLIANCE: A mechanism for adaptive action selection in heterogeneous multi-robot teams. Tech. rep. Oak Ridge National Lab., TN (United States), 1995.

[17] Anand S Rao. “AgentSpeak (L): BDI agents speak out in a logical com-putable language”. In: European workshop on modelling autonomous agents in a multi-agent world. Springer. 1996, pp. 42–55.

[18] Anand S Rao, Michael P Georgeff, et al. “BDI agents: from theory to prac-tice.” In: ICMAS. Vol. 95. 1995, pp. 312–319.

[19] Bart Rylander. Computational complexity and the genetic algorithm. Univer-sity of Idaho, 2001.

(39)

[20] Sandip Sen, Mahendra Sekaran, John Hale, et al. “Learning to coordinate without sharing information”. In: AAAI. Vol. 94. 1994, pp. 426–431.

[21] Sam D.J. Stephens. pydstarlite. https://github.com/samdjstephens/pydstarlite. 2018.

[22] T. Stolp. “Extending the BDI model with plan cost estimation for the Multi-Agent Programming Contest”. unpublished. 2020.

Extending the BDI model with optimisation-based goal-selection in the Multi-Agent Programming Contest

Extending the BDI model

with optimisation-based

goal-selection in the

Multi-Agent Programming

Contest

Luc Weytingh

Extending the BDI model with

optimisation-based goal-selection

in the Multi-Agent Programming

Contest

Contents

1

Introduction

1.1

Background

1.2

Distribution of the work

1.3

Outline of this research project

2

Literature Review

2.1

The belief-desire-intention (BDI) model

2.2

Planning

2.3

Learning

2.4

The shortest-path problem

2.5

Frameworks

3

Method

3.1

The Server

3.2

The Agent

3.3

The BDI Agent

3.4

The Builder

3.5

The SuperAgent

3.6

The Strategist

3.7

The Task Manager

3.8

Evaluation

4

Results

5

Discussion

References