Stochastic Approximation for Simulation Optimization: An Agent-Based Model Case Study

(1)

Stochastic Approximation for

Simulation Optimization

(2)

Layout: typeset by the author using LATEX.

(3)

Stochastic Approximation for

Simulation Optimization

An Agent-Based Model Case Study

Luc J.B. Vermeer 11041633

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Ing. P. Fratric

Institute for Logic, Language and Computation Faculty of Science

University of Amsterdam Science Park 907 1098 XG Amsterdam

(4)

Acknowledgements

This thesis could not have been completed without the patient guidance of my supervisor Peter Fratric. I would like to express my gratitude for the feedback and insights he has provided me that were paramount for this thesis.

I would also like to thank Sander van Splunter for his support and understand-ing throughout the process of makunderstand-ing this thesis.

(6)

Abstract

Agent-based models are potentially a useful tool for analyzing complex systems. Even a simple agent-based model can exhibit complex aggregated behavior pat-terns and provide valuable information about the dynamics of the real-world sys-tem that it emulates. The weakness of agent-based models lies in their unknown algebraic representation, which greatly complicates analysis by analytical meth-ods. When ABMs are used to find values or parameters to reach a desired be-havior, simulation-based optimization methods need to be used. In our study we use a predator-prey model to find the parameters under which stationary nonzero population values of both species are reached. One of such simulation-based opti-mization methods is an optimisation using stochastic approximation. An iterative method that can be viewed as the stochastic counterpart to steepest descent in deterministic optimization. This thesis aims to explore the usage of stochastic ap-proximation in the context of (multi)agent-based models. We test the algorithm on two objective functions formalising stabilisation of the system and calibration with respect to real world data.

Keywords: agent-based models, simulation optimization, stochastic approxi-mation, kiefer-wolfowitz algorithm, dynamic systems

(7)

Chapter 1 Introduction

Agent-based models (ABMs) are used to model a system of interacting entities. Each entity is called an agent. An agent can be anything that is perceiving its environment through sensors and is acting in accordance to its perceived represen-tation of the environment through effectors. In an ABM the agent is represented by a computer program that acts upon its perception on the environment in a predefined way. Strictly speaking there is a difference between multi agent-based models (mABMs) and agent-based models, because ABMs do not necessarily have more than one agent.

mABMs have a wide variety of applications. Examples exist for biologic sys-tems, including the analysis of the spread of epidemics [13], such as with the SARS-CoV-2 epidemic in 2020 [1], population dynamics [4] but also for systems where the agents are non-human such as modelling landscape diversity [15]. An-other field of interest includes business and network theory, where agent-based models are used to research organization behavior and cognition, such as team working [6], or for marketing purposes [11]. mABMs are also used in economics and social sciences for economic or social analysis and are a great tool for in-vestigating non-linearity within the system, unlike more traditional mathematical models that often make too many simplifying linear assumptions. Therefore such systems can exhibit various aggregated dynamics, i.e. equilibrium state, a cyclic state or a chaotic state. Due to their ’bottom-up’ nature, ABMs are capable of emulating unstable systems with crashes and booms, such as stock market, that can develop out of disproportionate responses to relatively small state changes (i.e. the ’Butterfly Effect’), hence the interest in mABMs as a possible tools for economic analysis had grown in the wake of the financial crisis of 2007/8 [14].

All of these fields use mABMs different with different purpose. A model emulat-ing an epidemic investigates how the spread of a virus can be minimized. A model for team working may seek under which circumstances productivity is maximized.

(8)

An economic model would like to prevent a crash of the market and maximize growth. While these are all different motivations for utilizing an mABM, what they have in common is that a model is used to establish the circumstances needed to reach a specific state, or state transition. In the model these circumstances, or beginning states are a consequence of the values at which the model’s parameters are initialized. In mathematical models one would be able to arithmetically solve which parameters and parameter value combination have the largest influence in reaching a certain state. However due to the "bottom-up" approach of ABMs the underlying arithmetic are not always clear and to quantify certain states to be reached an objective function is necessary to judge in what degree a desired state is reached. An objective function either measures cost or loss, depending on the nature of the function, which is maximized or minimized respectively, to approach the desired state as close as possible.

As a consequence of the inherent stochasticity of these models, an optimiza-tion algorithm is needed that is capable of solving noisy funcoptimiza-tions. A group of algorithms capable of optimizing such problems is stochastic approximation algorithms, finding its roots in the Robbins-Monro algorithm. Because of the non-mathematical nature of mABMs the gradient that is to be optimized is un-known, therefore not all stochastic approximation algorithms capabale of finding the optimum values. Therefore we need a gradient-free stochastic approximation algorithm, such as the Kiefer-Wolfowitz algorithm, which uses finite difference estimates to estimate the gradient.

In this thesis a mABM will be implemented to simulate a population of predator and prey, to analyze its dynamics and research through what parameter changes its state can be brought from a cyclic state to an equilibrium state. This information could be used in policy-making for any organization concerned with population control, such as a wildlife conservation program, or nature reserve managers.

(9)

Chapter 2 Background

In this chapter the fundamental knowledge for this thesis will be briefly discussed to motivate the approach taken in the next chapter.

2.1 Agent-based models

ABMs come in many forms but always consist of agents interacting with each other in an environment in some well-defined way. These agents are defined by the following properties:

• The manner in which these agents interact is defined by the modeler ex-pressed through a decision mechanism.

• Each individual agent is programmed to posses autonomous behavior, each agent is identifiable as a discrete individual with a set of properties such as attributes, behaviors and a decision-making capability, thus it is self-contained, in that it has recognizable boundaries and can be identified as an element of the whole model.

• Lastly to the requirements of an agent is that it is social. It interacts with other agents in a manner described by the previously mentioned policies.

2.1.1 Agents from an AI perspective

In one of the most well-known books on AI the Russel-Norvig book: Modern approach to Artificial Intelligence [12] agents are described as any entity that perceives its environment through sensors and acts upon this environment through effectors. All of these agents are implemented by a function that maps percepts to actions. This function defines the type of the agent. Agents can be categorized in

(10)

various ways into different types as described for example in [12]. The type that will be used in our model is the simple reflex agent. The simple reflex agent is an agent which acts upon a condition-action rule, which is comparable to an if-then statement in coding. When the agents perceives something in its environment it acts upon this in the same way every time, e.g. when a predator sees prey it eats it. It does not have information about the whole world, it does not make an estimation of the amount of prey that is still left to decide if it should perhaps wait until the prey has reproduced before the prey population goes extinct. It only acts upon the current percept and therefore it does have limitation and may not represent the real-world system accurately. However, because of their simplicity, they are easy to implement in a model. If sufficient amount of such agents are put into the model to interact on a 2D-grid environment, an interesting emergent phenomena can be observed.

2.1.2 Dynamics of (multi) agent-based models

As was briefly discussed in the introduction, the exact dynamics of the mABMs are almost always unknown beforehand and for each configuration of initial parameters the aggregated behaviour of the system needs to be assessed by simulation. In this aspect lies the beauty as well as the difficulty with such simulation-based models. As a result of the relatively simple decision mechanisms, due to the dynamics be-tween the agents (and possibly its environment), certain properties can manifest that were not foreseeable at the beginning. This is what makes mABMs an ade-quate tool for simulating complex systems. Given the possibly high complexity of interactions the only way to investigate the dynamics of mABMs depending on in-ternal parameters is by running Monte-Carlo simulations. This is one of the main differences between mathematically oriented approach of for example differential equations, where the solution can be obtained either by analytical or numerical methods. One such example is a Lotka-Volterra model, that is known to be solv-able by numerical methods and the information about equilibrium points as well as about the cycle of growth and decline of population can be obtained using analytical methods.

From examples in the real world we know that in general realistic systems will usually exhibit equilibrium, periodic, or chaotic dynamics. A dynamic equilib-rium means that changes in the systems are countered by changes of equal forces meaning that there is no net change. An example of this in chemistry is in the CO2 _{content of a closed bottle of soda. When a molecule of CO}2 _{leaves the liquid}

into the gas, at the same time the motion will happen from gas to the liquid, and because this dynamic occurs at the same rate, the CO2 content in the liquid (and

the gas) will stay equal, thus it is in equilibrium [2], even though the system is con-stantly fluctuating around this equilibrium. An example of a periodic state could

(11)

be the populations that will be modeled in this thesis, a population of predators and prey. If there is a large amount of predators relative to the prey, there will not be enough prey to feed all predators and so the predators will slowly die off. As a result the prey population will grow bounded only by the resources in the environment (logistic growth). If there is more prey than predators, the predators will have more food and they will be able to reproduce more until there are once again more predators than prey and the cycle will repeat itself. The populations follow one another in this feed-back loop, resulting in a cyclic motion. The last dy-namic to mention is a chaotic system, where all behavior appears to be at random, meaning they lack regularity. Examples of this tend to be a bit more abstract, but one example could be the stock market at certain scales. The prices of stock are considered hard to predict and appear to have a somewhat chaotic motion [5]. The field about chaos theory lies out of the scope of this thesis but it is important to realize that this is a state that can, in principle, emerge as a result of interaction in mABMs.

2.2 Simulation-based Optimisation and Stochastic

Approximation

Optimization is a selection of the best solution among a set of feasible solutions. This can be a solution that either minimizes some cost or maximizes a reward in given context, making it an important tool in decision making and in analyzing physical systems [7].

The simplest and most common form of an optimisation problem can be written as minimum seeking optimisation problem of a real valued objective function

minx∈Θf (x) (2.1)

where f is the objective function, x is the parameter vector and Θ is the feasible space.

In general simulation-based optimisation methods are applied where following two properties of the objective function are present:

• Unknown algebraic representation or unknown gradient: Since the objective function measures the output of the mABM, we do not posses the algebraic representation of the objective function, because we do not have such repre-sentation for the mABM.

• Stochasticity: The randomness of mABM translates to stochasticity of ob-jective function.

(12)

2.2.1 Stochastic Approximation Algorithm

Stochastic approximation (SA) was introduced in 1951 by Robbins and Monro and is a recursive algorithm. It was initially developed to solve noisy root-finding problems and was later applied to the setting of stochastic optimization by solving for the zero of a gradient, this algorithm does however require information about the gradient, which in our case is not available. The gradient-free variation was developed one year later by Kiefer and Wolfowitz (1952), and was able to find local optima in function. SA is an online method that requires very little memory and is currently one of the most widely applicable and most useful methods for simulation optimization. SA is dominantly used for applications where stochastic observations of the objective function are available, as in the case of the mABM in this thesis. The model gives a variability in the results even if initialized with identical parameters.

SA seeks to optimize the problem, where f(x) = E[Y (x, ξ)] is the objective function, Y (x, ξ) is the output random variable, ξ represents randomness and Θ denotes the parameter space. We try to find a sequence of parameters (xi)ni=0 that

converges to a local optimum in the objective function. In our case The algorithm does so by using the recursion

xn+1= ΠΘ(xn− an∇f (xˆ n)) (2.2)

where ΠΘ is a projection on x back into the region of the feasible parameter

space. an is a learning rate determining the intensity of reaction to the estimate

of the gradient ˆ∇f (xn) and scaling down the value of the gradient estimate as

well. The problem in this thesis will aim to minimize the cost of a function the sign before an is negative, but the stochastic approximation algorithm serves

just as well as a maximizing function if the sign is turned positive. There are two main methods to estimate ˆ∇f (xn). The original Robbins-Monro algorithm

requires an unbiased direct gradient estimate, where the Kiefer-Wolfowitz variation estimates the gradient using finite difference gradient estimates and therefore does not require knowledge about the gradient beforehand, making it a suitable choice for optimizing ABMs and will thus be used in this thesis.

2.2.2 Kiefer-Wolfowitz Algorithm

The Kiefer-Wolfowitz (KW) stochastic approximation only requires relatively small amount of objective function evaluations, deeming it fit for the use with ABMs. The recursion of the original KW is given as

xn+1= xn− an

Y (xn+ cn, ξn+) − Y (xn− cn, ξn−)

2cn

(13)

where Y (., .) is a sample value, xn is the given parameter, cn is the difference

along the x-axis used to estimate the gradient and ξ is the stochastic component. Note that the KW requires only values of Y and does not require the estimate of the expectation value as the SA is defined in general by the recursion (2.2). This dramatically reduces the amount of evaluations we need to perform.

This iterative scheme is converges to an optimal solution when n → ∞ if the following assumptions, extracted from the Handbook of simulation optimization [8] hold:

Assume f (x) = E[Y (x, ξ)], and x∗ is the x value of the (local) optimum. 1. Let an and cn be positive tuning sequences satisfying the conditions

cn→ 0, ∞ X n=1 an= ∞, ∞ X n=1 ancn< ∞, ∞ X n=1 a2_nc−_n2 < ∞,

2. f(x) is strictly decreasing for x < x∗_{, strictly increasing for x < x}∗_.

3. V ar[Y (x, ξ)] < ∞ and the following regularity hold: (a) There exists positive constants β and B such that

|x0 − x∗| + |x00− x∗| < β ⇒ |f (x0) − f (x00) < B|x0− x00|

(b) There exist positive ρ and R such that

|x0− x00| < ρ ⇒ |f (x0) − f (x00)| < R

(c) For every δ > 0 there exists a positive π(δ) such that |x − x∗| > δ ⇒ infδ

2>>0

|f (x + ) − f (x − )|

> π(δ)

These conditions assure correct functioning of the KW algorithm. The first condition warrants that an and cn both approach 0, but cn converges slower than

an and that andoes not approach zero too quickly or too slow, as this would result

in the algorithm getting stuck at a poor estimate, as seen in figure 2.1. If cn does

not converge correctly similar outcomes are expected [3]. The second condition ensure that there is a local minimum to be found. The first and second regularity conditions require f(x) to be locally Lipschitz, meaning there is exists a double cone, whose origin can be move along function f(x), ensuring the function stays outside of the cone at all times. This ensures that the there is a limit to how

(14)

(a) a_n converges to zero too quickly. (b) a_nconverges to zero too slowly.

Figure 2.1: The algorithm will either get stuck on a poor estimate (a), or not approach the optimum (b) if an does not converge at a right pace.

fast the function can change its slope. If this limit was not there the KW algo-rithm would not be able to approach x∗_{. The last regularity condition restricts}

the function from being too flat when distant to x∗_{. This regularity conditions are}

important to take into account when constructing the objective function to ensure convergence if the function is possible. These regularity conditions are in our case too difficult to verify, but in principle give a good mathematical insight into the convergence properties of the KW.

2.2.3 Noisy testing functions (1D and 2D)

First we test KW for a function f(x) where x is a one-dimensional parameter space. For that we use the scheme (2.3) on the function

f (x) = x2+ ξ (2.4)

in which ξ denotes Gaussian noise which was added to test the capability to opti-mize a noisy function. The performance can be seen in figure ??.

KW can be also extended to multi-dimensional parameter space and after test-ing on the one-dimensional function, we extend to two dimensions. This can be done using symmetric difference gradient estimate:

ˆ

∇fi(xn) =

Y (xn+ cnei, ξn,i+ ) − Y (xn− cnei, ξn,i− )

2cn

(2.5) For reasons formulated below, we use alternative gradient estimation method

(15)

Figure 2.2: The optimization of a noisy one-dimensional functions. The orange line indicates the path of the KW algorithm.

using one-sided forward difference estimate: ˆ

∇fi(xn) =

Y (xn+ cnei, ξn,i+ ) − Y (xnei, ξn,i)

cn

(2.6) In both estimation methods ei is a unit vector with the same dimension as the

parameter space, ensuring both estimates work by going through the components of {xn} one at a time and either adding and subtracting cn (in the symmetric

case) from the component to calculate the difference over the given interval. The one-sided forward difference estimate offsets the unbiased estimate created in the parameter by adding cn against the function Y (xnei, ξn,i), which only has to be

calculated once for each iteration. Therefore the one-sided forward difference es-timate only requires d + 1 runs of the model per iteration whereas the symmetric difference gradient estimate requires 2d runs, where d is the dimensionality of the parameter space. When dimensions increase the one-sided forward difference esti-mate becomes increasingly beneficial computation wise [8]. With regards to time constraints and limited computational power, in this thesis the one-sided forward difference estimate is much more preferred to symmetric difference estimate when estimating the gradient of the function.

The Kiefer-Wolfowitz algorithm was tested on a multiple dimension case for which the Rosenbrock function with added noise was used which is defined by

f (x) = n−1 X i=1 [100(xi+1− x2i) 2_{+ (1 − x} i)2] + ξ (2.7)

where ξ denotes Gaussian noise that was added to make the function noisy to test the capabilities of the stochastic approximation algorithm. For n = 2 dimensions the starting point was chosen at x = (−2, 1) which result in a minimum found of y = 0.083 found at x1 = 1.287, x2 = 1.656 after 200 iterations. The minimum

(16)

possible for this function is 0, so it managed to find a pretty good result given the added noise. The path it took can be seen in figure 2.2. Note however that the minimum given as a noisy value, because of additive Gaussian noise, but is enough to demonstrate the convergence of the algorithm.

Figure 2.3: The black dots denote the path taken by the Kiefer-Wolfowitz algo-rithm in finding the minimum of the Rosenbrock function.

(17)

Chapter 3 Case Study: Predator-prey

model

In this chapter the case study of using stochastic approximation on the agent-based model will be described. The agent-agent-based model as well as the stochastic approximation algorithm was implemented in Python 3.

3.1 Predator-prey model

The agent-based model used for the optimization will be the predator-prey model, chosen for its simplicity and relatability. This mABM is based on the Lotka-Volterra equation, which aims to emulate the dynamics between a prey and preda-tor population. Instead of an equation which specifies how these populations react to each other, a grid-based mABM will be implemented. The predator and prey agents will be put on a 2D grid represented by matrix. On this grid the agents will ’randomly walk’ and interact with one another. Most of the behavior of the agents will be based on a random number generator, hence the previously emphasised stochasticity to the model. To restrict the computational costs we limit the grid to 75x75 cells. A larger grid would mean there is more potential for the population to grow, but also would require a larger initial population as the population might die off because interactions are less likely to occur if the grid is too large relative to the population.

3.1.1 Intelligence of Agents

The agents in this mABM are simple reflexive agents, meaning that they react to their environment in a direct matter; if the predator sees prey, it eats it. It

(18)

does not reason about whether it is hungry or if it is worth the effort to hunt. The agents will be autonomous, meaning they do not make decisions with other agents. In this way it might differ from predators or prey showing herd behavior, such as wolves and sheep. The agents will display randomness to their behavior by walking in a random direction at each time step and by acting upon an array of probabilities, explained later in this section.

In this way an agent-based model is always an incomplete representation of reality. However this does not necessarily make it useless, as still it might display some of the dynamics that an actual wildlife population does.

The model in this thesis is largely based on a blog-post by Aleksejus Kononovi-cius, where he implemented the grid-based mABM in HTML5 [9]. At each time-step all agents are selected in random order, for each agent, one of the eight neigh-boring cells are selected. Agents remain under periodic boundary condition, e.g. if the agent exits the grid through the left side it will reenter the grid at the right side. The agent will be compared to the randomly selected cell and interaction between these cells is formulated by the following rules:

• If one cell is occupied by predator and another by prey, then the prey is eaten. After doing so the predator gives a birth to another predator with a certain probability. The new predator is placed in the former prey’s cell. • If both cells are occupied by the same type of agent, then nothing happens. • If we have a prey cell and empty cell, then prey gives a birth to another prey

with a certain probability.

• If we have a predator cell and empty cell, then predator dies with a certain probability.

• If after the application of the rules above still nothing has changed and movement of the agent is possible, then the agent moves from one cell to another unoccupied cell.

3.1.2 Parameters

From the rules the agents act upon the following three probabilities can be derived: • predator reproduction probability, the chance a predator reproduces if it eats

a prey,

• predator dying probability, the chance a predator dies if it does not encounter prey,

(19)

• prey reproduction probability, the chance a prey reproduces if it is placed in an empty cell,

These probabilities will be the parameters for the model expressed by a float value. Since probabilities needs to be in the [0, 1] interval, the feasible region Θ for each parameter will be a unit cube. Therefore each component of x = (x1, x2, x3) can

have a value between 0 and 1.

The model will also take certain fixed parameters. One previously discussed is the grid size, which is set to 75x75 to limit the computational cost of the model runs. As the grid becomes larger the amount of activity possible in the grid will increasing quadratically meaning estimated computation cost will do so as well. Another fixed parameter is the number of steps of the model, which will cause a linear increase in computation cost, however, because the stochastic approximation algorithm needs to be run for multiple iterations this can dramatically increase the computational cost for the simulation to optimize. Because the population generally seems to be stable before 500 steps, this will be the number of steps chosen.

3.2 Policy-making interpretation

For any wildlife conservation program it can be desirable to stabilize a population of animals in a nature reserve. This means to have the least amount of variance possible in the predator as well as the prey population. To achieve this stabilization the population dynamics will need to transition from one state to another, namely from a cyclic state to an equilibrium state. The stochastic approximation algorithm will adjust the parameters xi _{= (x}i

1, xi2, xi3)(where i is an iteration of the algorithm)

to find the parameters that satisfy this transition. One can imagine a sequence of decisions (xi₎n

i=0where each decision is done for example on year to year basis. The

initial value x0can be regarded as a parameter vector representing the current state

of the population. Each subsequent xi _{for i ≤ 1 can be considered as a decision}

in a decision plan (x1, x2, ..., xn) for n years ahead. This means that in order

to successfully apply policy-making and create a decision plan for given nature reservation one needs to:

1. Infer the starting point x0 from data. For this a method of simulated

mo-ments will be employed and the SA will be used to find the minimum of the error function.

2. Use SA to generate decision plan (x1, x2, ..., xn).

This decision plan can be used by an organization concerned with the preser-vation of populations to adjust their policy-making by trying to influence these

(20)

parameters in the real-world. Further on, the parameters related to step size of the stochastic approximation algorithm can be interpreted as rate of change the decision makers are willing to accept in given period, because it might be that adjusting parameters "too quickly" would have stressful impact on the lives of the animal population resulting in type of behaviour not captured by the mABM.

Therefore the objective function should have a value that is somehow repre-sentative of the state that is to be reached: an equilibrium state. In other words it should represent how stable the population is. The population is expected to somehow oscillate around a mean, as the population is likely to have a periodic movement. As the amount of prey increase the food for predators increase, the amount of predators increase and the prey decreases, causing the amount of preda-tors to decrease and as the prey increases which defines the periodic motion. What is to be minimized in this case is the amplitude of the periodic motion. To measure this the maximum value in the difference between an established mean and the populations is calculated. This maximum gives a representation of how stable the population is, and as it is minimized the population also gets stabilized.

If the results of an agent-based model would actually be used to change the policy for wildlife management it is important to note how these parameters could be influenced in the real world. In a computer system it is possible to change e.g. the predator reproduction probability with 0.001 and you might have a different outcome but it is not possible to make these type of changes in the real world. You might be able to influence these probabilities by for example feeding the animals so they have to hunt less and are able to mate more. However one should be aware that this can only influence these factors to a certain extend. There is a limit on how much resources you have and also there is a limit on how much influence the measures will have on the actual probabilities of reproduction or death. These probabilities usually are not linearly dependent on control variables we can change in the real world system and are most likely non-linearly dependent. To actually determine these dependencies on exact probabilities is a whole other field of re-search far out of the scope of this thesis and therefore we work with the assumption that this practical aspect would be taken by different research initiative.

3.3 Objective functions

In this thesis, SA will be implemented to optimize an objective function that detects two desirable behaviors in the agent based model. One being a stationary state, where we want the predator as well as the prey population to have the least amount of variance possible (while remaining non-zero). In the other case we will try to fit the population of the model to real-world data about the hare and lynx

(21)

population in Canada. What is returned from this is a sequence of parameters x = (xi₁, xi₂, xi₃) which describes the route from initial point of the SA to (locally) optimal calibrated point.

When the model is simulated it will do so for the given number of steps or until one of the populations die off (to minimize computation cost, as we are not inter-ested in what happens after the extinction of either of the populations). When the model is finished it will return two time series: one for each of the population’s count at each given time step, which will thus be the length of either the time step limit or the step where one of the population went extinct, in which case both will have the same length as one population dying off will result in the whole model to stop.

3.3.1 Stabilization case

In the model, if the prey die off then the predators also die off, the resulting population will technically speaking have a low variance. If the predators go extinct first, the prey may multiply until every cell of the grid is occupied with prey. However both of these case are generally not desired goals of any wildlife conservation program so if the model results in a dead population this should be penalized by the objective function. This is the only constraint the objective function will have. If the populations did not die off by the time the step-limit of the model is reached, we assume the population survives and no penalty is given. If the population went extinct before the given step-length was reached a penalty will be given that is adaptive to how long the population survived, as the last regularity condition of the Kiefer-Wolfowitz algorithm states that the function to be optimized (the objective function) may not be flat away from the optimal solution.

After the model is initiated it will always take some time for the populations to settle to a certain type of behavior. This is likely due to the fact that all the agents are instantiated on random coordinates within the system, making the likelihood of a distribution of agents that result in a stable population smaller. As the populations start to form clusters and patterns the models dynamic will be more likely to settle upon one of the states discussed in section 2.1.1. Therefore the objective function ignores this ’warm up phase’ and only takes the succeeding part of the resulting time series in account when evaluating the time series.

For the model we want to stabilize both populations, but because their popula-tion count is dependent of one another only one populapopula-tion needs to be stabilized to stabilize both populations, and it would be hard to estimate two population means where the populations can be stable as there might be a certain ratio in population size needed for stable behavior to occur. There is no need for us to

(22)

estimate this as the algorithm will be able to take care of this. Therefore the ob-jective function will only take the prey population into account, this choice should be arbitrary.

Objective function for the stabilization case

The objective function for the stationary case will thus be minx∈Θ(max(|Y (x, ξ) − µ)|)

n

s (3.1)

where Y (x, ξ)) denotes the vectorized time series containing the population of predators without the warm up phase, µ is the mean which the populations will be centering around. Because the mean should not be different for each iteration, this is given as a constant to the function to a mean which the population seemed to tend towards, which is chosen to be 3000 for the prey, n is the length of the time series and s is the step-limit the model is initiated at, which is chosen at 500 as discussed earlier.

3.3.2 Calibration case

If we want a model to be of use in a real world scenario, for example to research what is needed for a population to remain at stable levels as explained in the case above, it is first necessary to ensure the model somewhat represents a real case scenario. To do so one can analyze an existing population of predator and prey to see in what dynamic they co-exist, and try to replicate these dynamics in the model. Say we want to analyze what happens if we somehow increase the prey reproduction chance in an existing hare and lynx population in Canada. Real world data is analyzed in plotted in figure 3.1. The simulation’s parameters can then be optimized to fit to this data and these parameters could be used as a starting point for other research to these specific population, e.g. one could optimize the simulation for the previous case of optimizing the parameters towards a more stationary population.

To replicate these dynamics of predator and prey first the dynamics of the data need to be captured. This is done by calculating the moments of the data. The moments of the data are quantitative measures related to the shape of the graph of the data. The n-th moment of a function f(x) is

µn =

Z ∞

−∞(x − c)

n_{f (x)dx} _(3.2)

around a value c which in this thesis will be 0. The first, second and third mo-ments will be used, which represent mean, variance and skewness respectively. The

(23)

Figure 3.1: Data from Hudson’s Bay Company of Canada moments will be estimated using Python package SciPy.

The goal will be to minimize the difference in the moments of the data with the moments of the model, given parameters Θ. For this the Simulated Methods of Moments (SMM) estimation [10] is utilized which seeks to minimize some distance measure of the moments of the data m(x) and the simulated model moments m(x|Θ).

minΘ||m(x|Θ) − m(x)|| (3.3)

In which m(x) are the moments for the data and m(x|Θ) are the moments for the simulated data given parameters Θ. As with the stabilization case only one of the population will be analyzed, this time the predators are chosen to fit the model to. The minimizing will be performed by the stochastic approximation algorithm. The distance measure can be any kind of norm of the moment vectors, but the results of the estimate of ˆΘSM M will be dependent on the chosen norm. This also

puts all the moments in the same units, the following error function is chosen e(x, x|Θ) = m(x|Θ) − m(x)

m(x) (3.4)

as then the resulting error vector will contain the percent differences of the real world moments and the simulated model moments ensure all moments are in the same units. To turn this into a number that judges the fit of the data the objective function is defined as follows,

minΘ(e(x, x|Θ)TW e(x, x|Θ)) (3.5)

(24)

effectively taking the dot product of the error function with itself, transposed. For future research it might be interesting to see how a different weighting matrix affects the convergence rate but this lies outside the scope of this thesis as there is no general way to determine this matrix.

3.4 Implementation of the KW-algorithm

As discussed the parameter space consists of three parameters thus the opti-mization scheme will be as described in equation 2.1 with x = (x1, x2, x3) and

Θ = (0, 1)3, meaning there are three parameters all ranging from 0 to 1. The scheme stochastic algorithm uses is as described in equation 2.2 where xn+1

de-notes the next three dimensional parameter space in the iteration. ΠΘ denotes the

projection back into the feasible region. When one of the parameters exceeds the feasible region by taking a value either below zero or above one, all parameters will be set back to the starting point. anwas tweaked to an= _1000n1 for the stabilization

case and an= _n1 this difference is due to the difference in objective function, as the

stabilization case the objective function is based on the population count the error value ranges between 1000 and 10000, and for the calibration case a percentage difference is measured, so it ranges from 0 to 1 (although later multiplied by a penalty for breaching constraints. If the objective function does not converge we conclude that from this starting point converging with this model and algorithm is not possible.

To estimate the gradient ˆ∇we use the Kiefer-Wolfowitz algorithm as referenced in equation 2.6. For the finite difference cn = _n1 was used, which seemed to yield

the best results as it is somewhat in scale with the parameter space.

3.5 Results

In this section the results of the stochastic algorithm are shown. Plots will be shown with the error to display the convergence and the plots of the progression of the parameter values will be shown.

3.5.1 Stationary case

With starting points x = (0.2, 0.75, 0.77), estimated manually by analyzing the dynamics of the model, the objective function seems to converge and somewhat stabilize after 20 iterations, as seen in figure 3.2. The spikes in the error function are due to the stochastic effects of the mABMs.The progress the error function made was: f(0.2, 0.75, 0.77) = 2484 →n=20 f (0.232, 0.791, 0.579) = 752

(25)

(a) Error rate of the stochastic approxima-tion with iteraapproxima-tions on the x-axis and error on the y-axis.

(b) The progress of the parameters during the algorithm’s running.

Figure 3.2

(a) Populations at n = 1. (b) Populations at n = 10. (c) Populations at n = 20.

Figure 3.3: The populations at various iterations.

At n = 1 the populations still shows cyclic motion and is far from the de-sired population mean of 3000 for the prey population (denoted by the grey line). At n = 10 the variation seems to have decreased a bit, also the mean is moving towards the selected mean. At n = 20 the population seems to have converged towards the grey line and the variance has decreased. In this case of manually chosen starting points the populations seem to converge towards stabilization as can be seen in figure 3.3, depicting the populations dynamics at various iterations.

3.5.2 Calibration case

As the calibration case required more than just a stabilization of the populations it was harder to manually estimate some decent starting points. Therefore, a 3d grid was used to initialize the model with different variations of starting points. The points x = (0.25, 0.50, 0.75) were taken, as they are evenly spaced through

(26)

the parameter space. Of these three values all possible permutations were taken, leaving us with 33 _{= 27} _{sets of starting points. Of these most were not able to}

converge, error values varied per iteration without approaching a point. However, three starting points seemed to give somewhat decent results. Of these starting points the simulation was run with n = 20 iterations, the three starting points were later ran with higher iterations (n = 50) to test if the error would decrease further but this was not the case. The error functions are visible in figure 3.4 and the progression of parameters through the simulations are visible in figure 3.5.

(a) Starting point x = (0.25, 0.25, 0.75)

(b) Starting point x = (0.5, 0.25, 0.75)

(c) Starting point x = (0.75, 0.5, 0.75)

Figure 3.4: The objective functions from various starting points with the iterations along the x-axis.

(a) Starting point x = (0.25, 0.25, 0.75)

(b) Starting point x = (0.5, 0.25, 0.75)

(c) Starting point x = (0.75, 0.5, 0.75)

Figure 3.5: The progression of parameters from various starting points with the iterations along the x-axis.

Noted that in figure 3.4 the subfigures do not have the same scale along the y-axis. x1 denotes the predator dying probability, x2 the predator reproducing

probability and x3 the prey reproducing probability. The end values of the

param-eters were:

1. f(0.25, 0.25, 0.75) = 2.384 →n=20 f (0.158, 0.261, 0.779) = 2.282

(27)

3. f(0.75, 0.5, 0.75) = 2.467 →n=20 f (0.708, 0.580, 0.784) = 2.367

For these three cases which did seem to converge the first case found the best optimum, although we are still unable to claim that this is the global optimum. From this data it can be concluded that a high prey reproduction probability is necessary for populations to stay alive. This is likely due to the fact that preda-tors also feed on the prey, so larger numbers of prey could mean larger numbers of predators as well.

(a) Modeled data. (b) Real-world data.

Figure 3.6: The population produced by variables (0.158, 0.261, 0.779) which re-sulted in an error of 2.282, the best result among all tried starting points.

Although the calibration cases did seem to converge, the produced populations by the best runs still are not very representative of the real-world data, as can be seen in figure 3.6. This could be due to the fact that the global optimum is still not found, but what likely plays a bigger part is that the agent-based model produced in this thesis is not capable of producing the complexity of dynamics found in the real-world, as discussed, the agents show simple behavior; to have a more realistic view complexity would need to be added to their behavior. Other various factors such as seasonal behavior of agents is also not taken into account, as reproduction happens only during spring time which is a large difference in behavior that our model does not take into account. Besides that, we do not know if there were any external factors influencing the populations in the real-world data, which leads to unreproducible dynamics in the population with a model as simple as this model.

(28)

Chapter 4 Conclusion

In this thesis a mABM was implemented to simulate a population of predator and prey, and research to what extend a stochastic approximation algorithm could be used to analyze its dynamics and research through what parameter changes the model’s state can be brought from a cyclic state to an equilibrium state.

The Kiefer-Wolfowitz algorithm was able to transition the model from a cyclic to equilibrium state by adjusting the parameters. The error that resulted from the starting points showed stability around the chosen mean. Variation decreased significantly with the progression of the parameters the model was initialized at. Because the objective was relatively simple it seemed to have a better result than the calibration case likely due to the fact that the model is not capable of simulation a complicated array of dynamics, needed for preciser calibration.

In the calibration case the Kiefer-Wolfowitz algorithm does seem to converge but it is hard to say if this was towards a global optimum. Tweaking in the an

might be able to fix this. However, the model that was implemented might not be sophisticated enough for the error of the objective function to get any lower.

Correct calibration of an and cn are important factors in the success of SA as

well. These parameters need to be adjusted for different objective functions. The agent-based model seems to have a limit to what extend it can satisfy an objective function. Especially when the agent-based model has large stochastic components it is hard to converge, as each run of the model with identical parameters can take on different results. For future work this could be solved by running a large number of simulation, for this thesis however the computational power and time necessary for such an undertaking was not available.

From this we can conclude that the Kiefer-Wolfowitz algorithm is capable of finding local optimums when optimizing agent-based models as a gradient-free stochastic approximation algorithm, but the limitations of the model to be opti-mized should be taken into account when applying the results for policy-making.

(29)

Bibliography

[1] David Adam. “Modeling the pandemic - The simulations driving the world’s response to COVID-19”. In: Nature 580 (2020), pp. 316–318.

[2] P. W. Atkins. Atkins’ Physical chemistry. Oxford New York: Oxford Univer-sity Press, 2006. isbn: 0-19-870072-5.

[3] S. Bhatnagar, H. Prasad, and L. Prashanth. “Kiefer-Wolfowitz algorithm”. In: Lecture Notes in Control and Information Sciences 434 (2013), pp. 31– 39. issn: 01708643. doi: 10.1007/978-1-4471-4285-0_4.

[4] Paul Caplat, Madhur Anand, and Chris Bauch. “Symmetric competition causes population oscillations in an individual-based model of forest dynam-ics”. In: Ecological Modelling 211.3-4 (2008), pp. 491–500. issn: 03043800. doi: 10.1016/j.ecolmodel.2007.10.002.

[5] L. Chen. “The chaotic dynamics analysis of stock market”. In: 2013 6th In-ternational Congress on Image and Signal Processing (CISP). Vol. 03. 2013, pp. 1382–1386. doi: 10.1109/CISP.2013.6743889.

[6] Richard M. Crowder et al. “The development of an agent-based modeling framework for simulating engineering team work”. In: IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans 42.6 (2012), pp. 1425–1439. issn: 10834427. doi: 10.1109/TSMCA.2012.2199304.

[7] George B. Dantzig and Gerd Infanger. “Multi-stage stochastic linear pro-grams for portfolio optimization”. In: Annals of Operations Research 45.1 (1993), pp. 59–76. issn: 02545330. doi: 10.1007/BF02282041.

[8] Michael C Fu. Handbook of simulation optimization. Vol. 216. 2015. isbn: 9781493913831.

[9] Aleksejus Kononovicius. Agent-based prey-predator model. Oct. 2012. url: https://rf.mokslasplius.lt/agent-based-prey-predator-model/.

(30)

[10] B Y Daniel Mcfadden. “A Method of Simulated Moments for Estimation of Discrete Response Models Without Numerical Integration Author ( s ): Daniel McFadden Reviewed work ( s ): Published by : The Econometric So-ciety Stable URL : http://www.jstor.org/stable/1913621 . DISCRETE RE”. In: Society 57.5 (1989), pp. 995–1026.

[11] William Rand and Roland T. Rust. “Agent-based modeling in marketing: Guidelines for rigor”. In: International Journal of Research in Marketing 28.3 (2011), pp. 181–193. issn: 01678116. doi: 10.1016/j.ijresmar.2011. 04.002. url: http://dx.doi.org/10.1016/j.ijresmar.2011.04.002. [12] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach.

3rd. USA: Prentice Hall Press, 2009. isbn: 0136042597.

[13] Hokky Situngkir. “Epidemiology Through Cellular Automata: Case of Study Avian Influenza in Indonesia”. In: (2004). arXiv: 0403035 [nlin]. url: http://arxiv.org/abs/nlin/0403035.

[14] Matias Vernengo, Esteban Perez Caldentey, and Barkley J. Rosser Jr, eds. The New Palgrave Dictionary of Economics. Palgrave Macmillan UK, 2020. doi: 10.1057/978- 1- 349- 95121- 5. url: https://doi.org/10.1057% 2F978-1-349-95121-5.

[15] E. Wirth, Gy. Szabó, and A. Czinkóczky. “Measure of Landscape Heterogene-ity By Agent-Based Methodology”. In: ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-8.July (2016), pp. 145– 151. doi: 10.5194/isprsannals-iii-8-145-2016.

Stochastic Approximation for Simulation Optimization: An Agent-Based Model Case Study

Stochastic Approximation for

Simulation Optimization

Stochastic Approximation for

Simulation Optimization

An Agent-Based Model Case Study

Contents

Acknowledgements

Chapter 1

Introduction

Chapter 2

Background

2.1

Agent-based models

2.1.1

Agents from an AI perspective

2.1.2

Dynamics of (multi) agent-based models

2.2

Simulation-based Optimisation and Stochastic

Approximation

2.2.1

Stochastic Approximation Algorithm

2.2.2

Kiefer-Wolfowitz Algorithm

2.2.3

Noisy testing functions (1D and 2D)

Chapter 3

Case Study: Predator-prey

model

3.1

Predator-prey model

3.1.1

Intelligence of Agents

3.1.2

Parameters

3.2

Policy-making interpretation

3.3

Objective functions

3.3.1

Stabilization case

3.3.2

Calibration case

3.4

Implementation of the KW-algorithm

3.5

Results

3.5.1

Stationary case

3.5.2

Calibration case

Chapter 4

Conclusion

Bibliography