Genetic algorithm learning in a spatial Cournot oligopoly : learning, stability, and equilibrium selection

(1)

Faculty of Economics and Business

Amsterdam School of Economics

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided up into a number of sections and contains references. An outline can be something like (this is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page) (c) Introduction (d) Theoretical background (e) Model (f) Data (g) Empirical Analysis (h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you use should be logical) and the heading of the sections. You have a free choice how to list your references but be consistent. References in the text should contain the names of the authors and the year of publication. E.g. Heckman and McFadden (2013). In the case of three or more authors: list all names and year of publication in case of the rst reference and use the rst name and et al and year of publication for the other references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty as in the heading of this document. This combination is provided on Blackboard (in MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number (d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics

Genetic algorithm learning in a spatial

Cournot oligopoly

Learning, stability, and equilibrium selection

Leslie Dao (10561234)

Supervisor: D´

avid Kop´

anyi

Second reader: Marco van der Leij

Abstract

In situations where complete information is unavailable, learning about the setting has a natural role. Genetic algorithms provide an intuitive way to learn by using old solutions to build better ones. This paper investigates genetic algorithms in a circular market with quantity competition. It aims to examine the consequences of applying genetic algorithm learning in this setting to see if the equilibria can be learned at all and if so, to see which ones are stable and if there is a natural preference for one equilibrium or the other. Multiple simulation scenario’s are run to check the impact of the different model parameters. Firms appear to be able to learn the equilibria, but the number of firms which are active in the market seems to heavily influence the outcomes of the simula-tions. Additionally, only one of the examined equilibrium types appears to be stable.

A thesis presented for the degree of

MSc in Econometrics, track Mathematical Economics

July 8

th

, 2018

(2)

Statement of Originality

This document is written by Student Leslie Dao who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

1 Introduction

Economic models with agent interaction often impose a rather strict rationality assumption: Agents are aware of the setting, themselves, and the other agents and act rationally to optimize their utility. Becker (2013) states that all humans behave as utility maximizers. To achieve this, they have a stable set of preferences and gather information from multiple markets. Consequently, the agents’ behavior becomes fully predictable. This makes the assumption very powerful, but entirely reliant on rational behavior and knowledge of all agents.

The rationality assumption is often violated in practice. It has requirements which may not always hold. For instance, an agent must have well-defined preferences to be able to maximize its utility. Next, the agent must have complete knowledge of the problem. This means he must know his possible decisions and their consequences, the environment he is in, and potential interactions with other agents. Finally, the agent must know how to optimize his utility given the knowledge about the problem. This means that he must have the capability to calculate the optimal strategy in a reasonable amount of time.

There have been many articles about ways to relax the rationality assumption. This is what is often called bounded rationality. Unbounded rationality has one way to be modelled and is therefore straightforward to implement. Bounded rationality on the other hand has nearly infinitely many ways to be modelled and has led to many discussions about which parts of the rationality assumption should be relaxed (e.g. Munier et al., 1999). This is known by many as the wilderness of bounded rationality (see e.g. Hommes, 2011).

One example of a boundedly rational implementation is imitation (Vega-Redondo, 1997). When agents cannot explicitly calculate their best strategy, they can copy the best observed strategy instead. This makes sense from the point of view of the agent, as the strategy has proven itself to be successful already in the past. This particular model without memory leads to a Walrasian equilibrium, which is not desirable for the firms in Vega-Redondo’s model.

Another example is letting the agents ‘learn on the job’. Learning is a valuable tool in unknown settings and encourages exploration and adaptation. For instance, in repeated games one might use the outcome of one round to adapt his strategy for the next round. Then after the following round, more information is available allowing more informed

(5)

decisions. This can be modelled with reinforcement learning. Similarly to how people train their pets, good strategies (behaviour) are rewarded and bad strategies are assigned a punishment. By reinforcing the use of good strategies, bad ones are crowded out. As with imitation, it is important to add an exploration element to prevent reinforcing sub-optimal strategies to the point where the actual optimum will never be reached. A basic solution is to use ε-greedy, which assigns probability ε to exploring new strategies and 1 − ε to exploiting the best performing strategy thus far. However, other methods prove to be better than ε-greedy (see e.g. Tokic and Palm, 2011; Li et al., 2010).

One specific example discussed in this paper is the genetic algorithm. This involves agents creating new strategies from old ones in an attempt to improve their performance and payoffs. One big advantage of genetic algorithms is that they are modular. Tweaking individual components of the algorithm is easy to do since they are independent of each other. This has the added effect of making genetic algorithms robust to dynamic changes (Sivanandam and Deepa, 2007). They can also provide solutions to problems for which no optimal or analytical solution exists, or for which the solutions take a long time to calculate. The fact that genetic algorithms are applicable in many fields and problems, ranging from discrete to continuous, linear and non-linear, and restricted or unrestricted, makes them interesting to further investigate.

This paper examines the performance of learning using genetic algorithms in the setting of a market with location choice. Because it is difficult to form an optimal solution in an unknown market with agent interaction and with choice in more than one dimension, learning algorithms are an attractive option to try to learn the optimal solution instead. Markets with location choice have the interesting mechanism of having to transport the commodity to the consumer, adding another consideration when determining the prices. Perhaps one of the most well-known spatial market models is the market on the line by Hotelling (1929). An extension of this is the circular market (Salop, 1979), in which agents compete in prices. This research focusses on the circular market with quantity competition. The circular market has multiple equilibria, and modifications to the model can lead to different outcomes. We focus on the following questions in this research: Can boundedly rational agents learn where to locate on the circle using the genetic algorithm in this market? Can they learn equilibrium configurations and are all equilibria learnable? Agents keep track of their own performance and regularly update their choices using genetic

(6)

algorithms. Strategies which do not look promising are discarded. The impact of crossover methods and initial locations near equilibrium configurations are also investigated.

Firms can learn the equilibria of the model if they are allowed to explore the market. There are strong indications that not all equilibria are stable. The number of firms on the market seems to influence the learning rate, equilibrium basin of attraction, or both. The key performance index of the firms – the profits – are also different per equilibrium type which could play a role in their behaviour. The crossover method used in the genetic algorithm is also varied, but does not change the results apart from a few outliers.

This paper is structured as follows. Section 2 summarizes the theoretical background for this paper with the models and results of other articles. In Section 3, the model as it is used in this research is described. It involves a baseline and small extensions to the model to test the robustness. Section 4 shows the raw results of multiple runs of the model. Section 5 interprets the results and Section 6 closes with a discussion of the model and recommendations for further research.

2 Literature review

2.1 Bounded rationality and its importance

Rationality requires agents to exhaust the entire strategy space to find the optimum. This rarely happens in reality. For example in chess, one can opt to search the entire space for each move he has to make. However considering that the space can have up to 10120 _{elements, it can be very time consuming or even infeasible. Therefore people}

will often use rules of thumb (Conlisk, 1996) or satisficing (Simon, 1972). Satisficing is a combination of satisfy and optimize: Satisficing agents will optimize their objective function up until a certain point, at which the solution is at least satisfactory to them. This is an attractive method of optimizing when the problem is complex and having the best solution is desirable, but not necessary. Satisficing can be more time and resource efficient than optimizing when computing solutions.

Kahneman (2003) explains that we use two systems when it comes to decision making. We use System 1 for fast tasks, or things we are already used to. This also makes System 1 difficult to train or control and prone to errors. System 2 is more deliberate and precise, but also slower than System 1 and requires more effort. We use System 2 for tasks which require more time and attention. The amount of effort a task requires determines if

(7)

it gets assigned to System 1 or 2. In reality, people are not very diligent when assigning tasks and will opt to settle for good enough, similar to satisficing agents.

Realistically, people are not completely rational. This opens the way for bounded rationality, which is applicable in more situations (see e.g. Arthur, 1994; Ellison, 2006). By relaxing certain rationality assumptions, we can investigate if the results under the rationality assumptions can be replicated. The results are interesting in either case: If they still hold, we know the results can be achieved independently of the assumption. If they no longer hold, we know that the results are not robust with respect to the assumption.

2.2 Reinforcement learning

When agents are not able to simply calculate the optimal strategies, they must employ other ways to figure them out. In sufficiently stable environments, it is possible for agents to learn the optima (Gr¨une-Yanoff, 2007). There are multiple ways an agent can learn. And although an agent can also learn multiple aspects of the environment it is in, learning about the optimal strategies is the main focus.

One very intuitive way of learning is to do it on the job. Reinforcement learning (Erev and Roth, 1998) is a way to teach an agent how to interact with its environment by giving it feedback according to the actions it takes. The agent gets positive feedback (reward) for performing good actions, while it gets negative feedback (punishment) for harmful or bad actions. By reinforcing good actions, the agent will use these more often. An important aspect of reinforcement learning is knowing how to balance exploration and exploitation, i.e. when is it fruitful to look at new actions and when should the best action be played? This is a topic which has been investigated widely (see e.g. Audibert, Munos, and Szepesv´ari, 2009) and for which there exists no clear-cut answer. A widely-used way to balance exploring and exploiting is the so-called upper confidence bound (UCB) algorithm (Carpentier et al., 2011), but researchers are still finding ways to improve on this algorithm

as well.

Reinforcement learning is used in many areas of research. One other area is in machine learning. It provides a simple way to train a model given a set of strategies of which is not known what the best one is. With sufficient feedback, the model can learn to apply the best strategy. However, reinforcement learning requires some human intervention as it uses many exogenously set tuning parameters and assumes distributions on the rewards, which are used to determine the best strategy. Additionally, reinforcement

(8)

learning works best in static environments, as in dynamic environments even small learning spaces are a big challenge for reinforcement learning algorithms (Matari´c, 1997).

2.3 Genetic algorithm learning

Another way to learn optima is with genetic algorithms (R. L. Haupt, S. E. Haupt, and S. E. Haupt, 1998; Arifovic, 1991). The genetic algorithm (GA) is an iterative algorithm which draws its concept from biology. Ideas such as generations and survival of the fittest are part of the core of this algorithm. Unlike constructive algorithms, which build their solutions step by step, iterative algorithms change their solutions by a little bit each iteration to try and improve its performance. Compared to reinforcement learning, genetic algorithm learning does not require as much human input and assumptions.

GAs require a population of solutions (of size K) to try and make improved solutions. In general, a solution is a binary string of fixed length l composed of 0’s and 1’s. Each solution corresponds to a unique value. The starting population is usually randomly generated while making sure they are feasible. The most basic GA performs three operators on the population each iteration to generate new solutions: selection, crossover and mutation. These operators change the population according to each solution’s so-called fitness value, which is an indicator of the solution’s performance. This closely resembles survival of the fittest in evolutionary biology, causing only the best performing individuals in a population to survive. In a way, the GA is only a framework because there are many different ways to implement the operators.

Selection is the process of preparing the solutions for crossover, which creates the new solutions. In general, crossover takes two parent solutions, namely the ‘father’ and the ‘mother’ solution. The process of selecting these parents can be done in multiple ways. One particular example, which is also used in this research, is called elitism sampling. This only allows the top x per cent of the population in terms of fitness to be selected for crossover.

When two parents have been selected, a new child solution is created by combining parts of the parent solutions in the crossover process according to a certain rule. Again, there are multiple ways to implement this. This paper aims to investigate what the differences are between them and is discussed more extensively in Section 3.1.1. To prevent the population from becoming homogeneous, the mutation operator is performed on every solution created by crossover. This operator alters the resulting solution by a small amount, which can be interpreted as a defect or an experiment. This process continues until the

(9)

size of the new population equals the size of the original population. After applying these operators, all solutions in the population still need to be feasible, if applicable. If any of them are not, they can be modified in such a way so they are, or they can be deleted and more child solutions can be created. For example, in the travelling salesman problem (Grefenstette et al., 1985), the salesman is supposed to visit all nodes once and only once.

Hence routes in which a certain node appears twice because of crossover are invalid. In the context of learning, genetic algorithms provide an easy way to generate solutions when the problem is complex, such as in environments with agent interaction. By feeding the algorithm a pool of ‘example solutions’, i.e. initial guesses, it can improve on them using the operators mentioned above. The genetic algorithm represents combining parts of strategies which worked well in the past, in order to make even better strategies. The reason this algorithm works is often attributed to the building block hypothesis (Forrest and Mitchell, 1993), however the validity of the hypothesis is not recognized

throughout the entire GA research community yet.

One application of genetic algorithm learning is in Arifovic (1994), in which the cobweb model is used. The results of GA learning are compared to some other learning methods: Cobweb expectations (Ezekiel, 1938), sample average of past prices (Carlson, 1968), and least squares learning (Bray and Savin, 1986). In the cobweb model, GA learning converges to the rational expectations equilibrium (REE) for a wider range of parameters than the other learning methods. GA learning also exhibited features of some experimental economies, which the other learning methods failed to capture, such as price fluctuations. This, combined with the fact that the GA framework lets the agents learn how to maximize the objective function, makes GA learning an interesting method to investigate.

2.4 The circular market model and its equilibria

The setting of this research is similar to the circular market described in Salop (1979) but uses quantity competition instead of price competition. In this model, customers and n firms are distributed on the circumference of a circle. In addition to production choice, firms must also choose where to locate on the circle. To deliver the product to the customers, firms must pay transportation costs which they fully pass down to the customer.

This can be set up as a two-stage game, which is what Gupta et al. (2004) investigated. The firms choose where to locate first, and in the second stage they will compete in

(10)

quantities after observing the locations of all other firms. Otherwise, all firms are identical: They sell a homogeneous product, and have the same production and transportation technology. Gupta et al. (2004) assume there is no interaction between consumers, i.e. customers can only buy the product from the firms, as resale is not allowed.

The circumference of the circle is normalized at 1. Every point x on the circle has its own inverse demand function, depending on the quantity supplied at that point and some constants which are independent of the location. The behaviour of the consumers at location x is characterized by a demand curve of the form p(x) = a−bQ(x), where p(x) is the market price at x, a, b > 0 are constants and Q(x) is the aggregate supply at x. To make sure that all firms serve the entire market, a is restricted: a ≥ n/2. Firms can produce their good at constant marginal and average cost of 0 but have to pay a linear cost ci(x) = t · dist(ξi, x)

to transport their product from ξi to the consumer at location x. Parameter t is set to

1 without loss of generality. The distance dist(ξi, x) = min{|ξi− x|, 1 − |ξi− x|} is the

arc length between the two locations, because transportation may only happen along the perimeter of the circle. Using all market information, the profit of firm i located at ξi on

the market at point x can be calculated. Gupta et al. (2004) show that the equilibrium profit of a single firm i from serving at location x is equal to

πi x, ξi, ξ−i =

a − nci(x) +P_j6=icj(x)

2 (n + 1)2_b

where ξ−i is a vector of the locations of all firms excluding firm i, and ci(x) is the marginal

cost associated with transporting from the location of firm i to x. The total profit of firm i on this market can obtained by integrating the above expression over all locations:

Πi ξi, ξ−i =

Z 1

0

πi x, ξi, ξ−i dx

The equilibrium locations are given by the vector ξ∗, which is the profit-maximizing location of each firm given the locations of the other firms. The equilibria of this model are subject to some restrictions (Gupta et al., 2004). For example, firms supply the same quantity on the markets to the left and to the right of their equilibrium locations. The locations form the subgame perfect Nash equilibria (SPNE), because none of the firms have the incentive to deviate unilaterally in these equilibria. The number and form of the SPNE of this market depend on n, the number of firms in the market. For example, Figure 1 illustrates the possible SPNE locations for the case of n = 3 firms.

(11)

Figure 1: Salop SPNE with 3 firms (Gupta et al., 2004, diagram 3)

In the equilibrium on the left, the firms are spaced equidistantly from each other. In the equilibrium on the right, the firms are located directly across from each other, with 2 firms at the filled dot. With an even number of firms on the market, there are infinitely many equilibria. The equilibria with 4 firms are illustrated in Figure 2.

Parameter θ1 represents an angle of an arbitrary degree. In other words, as long

as the firms choose locations in such a way that there is at least 1 other firm directly across from them, that set-up is an equilibrium. This is the general case with an even number of firms, but they usually have an additional equilibrium (see Gupta et al., 2004, for further details). In contrast, markets with an odd number of firms have a finite number of equilibria.

This paper’s analysis focusses on markets with 3 and 5 firms, since these have a finite number of equilibria and are small enough for firms to be able to reliably coordinate on an equilibrium. Figure 3 illustrates the equilibria for the model with 5 firms. The last equilibrium is an additional one compared to the market with 3 firms.

(12)

3 Equilibrium learning with genetic algorithms

The goal of this research is to let firms learn the equilibrium locations in the model. We assume that once they have chosen a location (strategy), they are able to calculate the optimal quantity to produce and get the optimal profit, given their chosen location. Strategies are represented by binary strings, i.e. strings consisting of 0’s and 1’s. By using binary strings, genetic algorithms can be applied very easily and the model becomes more tractable. However, this also causes only a discrete number of locations to be available in the model. This means that, depending on the length of the binary strings, an arbitrary location ξ might not have a direct opposite in the model. Nonetheless, this provides the opportunity to perform stability analysis on the theoretical SPNE. This research involves simulating multiple scenarios of the circular Cournot market using different parameters. This section describes the parameters and modelling decisions, and the motivation behind them.

3.1 Genetic algorithm modules

One advantage of the genetic algorithm is that it is modular: It consists of multiple parts, which can be adjusted independently of the others. Because of this, the algorithm is very easy to tune without breaking other functionalities. However, this makes it important to describe every module in more detail than usual to be able to reproduce the steps. The details of the algorithm used in this research are described below.

3.1.1 Crossover

There are many different methods of crossover, but none of them are preferred over another by theoretical motivation. Because there are multiple equilibria in the Salop model, it is interesting to see if different crossover methods lead to different equilibria.

(13)

Bitwise crossover (Table 1) produces one child solution. This method requires an additional parameter pM: The ith bit of the child solution has probability pM to be the

ith bit of the mother solution and 1 − pM to be the ith bit of the father solution. In this

research, bitwise crossover uses an unbiased pM = 1₂ for each i.

M 1 2 3 4 5 6

F a b c d e f

C 1 2 c 4 5 f

Table 1: Bitwise crossover

Single-bounded crossover (Table 2) produces two child solutions. The first i bits of the first child are from the mother solution and the remaining bits are from the father solution, where i is picked randomly. The second child flips this around. The value of i is chosen according to a discrete uniform distribution along all bits. Therefore, it is possible that i is either the start or the end of the string, making the children exact copies of the parent solutions.

M 1 2 3 4 5 6

F a b c d e f

C1 1 2 c d e f

C2 a b 3 4 5 6

Table 2: Single-bounded crossover

Double-bounded crossover (Table 3) also produces two child solutions. It requires randomly sampled i and j (i 6= j) which define the bounds where the parent solutions flip their bits.

M 1 2 3 4 5 6

F a b c d e f

C1 1 2 c d e 6

C2 a b 3 4 5 f

Table 3: Double-bounded crossover

(14)

Since the single- and double-bounded crossover methods produce two child solutions, it naturally makes them suited for the optional election operator, further discussed in Section 3.1.5. However, performing two bitwise crossovers achieves the same result, albeit not as natural.

3.1.2 Mutation

After every crossover step, there is a chance that a solution will mutate. Mutation can be interpreted as experimentation and helps to prevent the strategy pool from becoming homogeneous. There are several ways to facilitate mutation. They all use an exogenous parameter pmut to determine the probability of a mutation occuring.

One class of mutation methods adjusts multiple bits in a single operation. An example is going through the entire string bit by bit and swapping each bit with probability pmut.

Another method involves sampling two bits and swapping those. An application for this method is in the travelling salesman problem, where the solutions are ordered numbers and the ordering has a distinct meaning. This class of methods can adjust the result of the crossover quite a bit, and can mutate parts of solutions which cause it to perform well, which might not be desirable.

The other class of mutation methods adjusts a single bit. For example, the method used in this paper has a probability pmut to flip a random bit in the solution. The random

bit is chosen according to a discrete uniform distribution along the entire bitstring. This method keeps most of the solution intact, which is why it is used when performing this research.

3.1.3 Social and individual learning

One important distinction to make in GAs is between social and individual learning as they can affect the outcome of the model. Social learning means agents share the same strategy pool which they use GAs on. Individual learning means all agents have their own strategy pool and only interact with their own pool. They are also called the single-population and multiple-population designs respectively in Arifovic (1994).

In social learning, the model has a single strategy pool which represents the agents, each of which has a different idea about the market. One can say that the learning and playing field are one and the same: Agents use their own and others’ strategies to learn where they should locate on the circle. This gives rise to the spite effect (Vriend, 2000).

(15)

Spite occurs when an action hurts the user, but hurts others even more. So while this is a performance hit for the user in absolute terms, he gains an advantage in relative terms. Vriend discovered that in a standard Cournot market with social learning, firms would converge to the Walrasian quantity in which they would have profits close to zero.

With the social learning GA, when the aggregated produced quantity is below the Walrasian level, producing more increases a firm’s profits. On the other hand, when the aggregated produced quantity is above the Walrasian quantity, firms are all suffering losses and producing less decreases their loss. Since deviations towards the Walrasian quantity are encouraged, firms converge towards this quantity. Firms which do try to deviate from Walras immediately notice it does not pay off and move back.

In individual learning, each agent in the model has its own pool with each strategy representing his own different beliefs. In this variant, the learning process does not directly interact with the market: Every agent has its own pool of strategies he thinks are good, or wants to try out. Each period, all agents pick one strategy from their pool to use in the market. Spite is still present, but is isolated to the market environment and does not affect the learning process. Agents use the performance of the strategy they picked to assess their own pool and update it if necessary. That is, only strategies which look promising are allowed to enter the pool at the cost of older strategies which did not perform well. Practically speaking, the learning and market environment only interact through the one picked strategy but are separate otherwise.

3.1.4 Hypothetical and realized fitness

The fitness value of a strategy is an indicator of its performance. Naturally, one would only know how a strategy would perform if it was actually used. However, one could estimate the fitness values of unused strategies by using historical realizations. These are called the realized and hypothetical fitness, respectively.

Hypothetical fitness is a way for agents to see how strategies other than they have used would have performed against the decisions of other agents. After every iteration, the fitness values of all strategies in an agent’s pool are updated. This allows the agent to more easily compare the strategies in his pool: Because the fitness values are recalculated after every iteration, they are never out of date. An application of hypothetical fitness is in the individual evolutionary learning (IEL) method in Arifovic and Ledyard (2004), where learning methods with and without hypothetical evaluation are compared.

(16)

Realized fitness is when agents only update fitness values of the strategies they actually used. These values are generally more reliable because agents have actually realized the payoffs associated with the strategy. This measure of fitness requires a baseline fitness to be assigned to newly created strategies, since they have not been played before. This baseline fitness is an important parameter to tune and acts as an exploration affinity of sorts: Setting it low means the newly created strategy will never be played. Setting it high will ensure the new strategies will be played first.

Concerning practical differences, hypothetical fitness is easier to implement. It requires the strategy pool and a function to calculate the fitness value. The decision can be found by calculating the fitness values of the strategies and choosing the strategy with the highest fitness. By using the fitness function as a key function, the hypothetical fitness values themselves do not need to be stored this way. Realized fitness requires another data structure to store the associated realized fitness per strategy. However, this also means that only one fitness value has to be calculated per iteration, instead of for the whole pool.

3.1.5 Election

Election (Arifovic, 1994) is an optional GA operator which occurs after crossover and mutation. It provides a way to evaluate the child solutions obtained from crossover and mutation. By comparing the hypothetical fitness of the children to the actual fitness of the parents, one can determine if the child solutions are promising.

Election works as follows. Two child solutions are created by applying crossover and mutation on two parent solutions. The solutions are then ranked by their fitness values; hypothetical for the children and realized for the parents. The top 2 solutions are put back in the population. There are three possible outcomes: Both parents are discarded, the parent with the lower fitness is discarded, or both children are discarded. In case three or more solutions qualify to be re-entered in the population, e.g. when solutions have equal fitness values, it is preferred to discard parent solutions before child solutions.

Mutation is a key operator in genetic algorithms to prevent the strategy pool from becoming homogenous too quickly. However when the system is in equilibrium and pmut is

non-zero, we waste part of the strategy pool on non-optimal solutions. A simple way to fix this is to let pmut decrease over time, but there is no way to tell beforehand how many

periods it takes to reach an equilibrium. Election provides an endogenous way to shut down mutation when an equilibrium has been reached and thus it is a powerful operator

(17)

to add to the genetic algorithm.

3.2 Baseline model

The baseline model is a one stage game in which firms only decide their location on the circle. After they have chosen a location, they are able to calculate the equilibrium production level. The model closely follows the set-up as discussed in Gupta et al. (2004). To be able to compare the results, only odd n are considered. This is because there are infinitely many subgame perfect equilibria when n is even. Firms each have K strategies in their pool, which they will reassess using genetic algorithms every g periods. The exact values of the model parameters used can be found in Table 4. The majority of the parameters are chosen arbitrarily, but certain values have some motivation. The reevaluation period is larger than the size of the strategy pools, such that the firms have a sufficient number of periods to try all promising strategies.

Parameter Notation Value

Number of firms n {3, 5}

Simulation periods T 2000

Reevaluation period g 150

Binary string length l 10

Strategy pool size K 100

Demand constant a 10

Demand slope b 0.5

Transportation cost per distance unit t 1

Bitwise crossover probability pM 0.5

Mutation probability pmut 0.05

Table 4: Parameters used in this model

In the application of the genetic algorithm, individual learning has an advantage over social learning. Vriend (2000) shows that with individual learning, the spite effect is isolated from the learning process, because every firm has its own strategy pool and therefore the payoff of any strategy is not influenced by strategies used in other periods. As the spite effect is not the focus of this research, individual learning is preferred and used. Furthermore, since firms decide about location and quantity choice independently, it

(18)

is more natural to assume that the learning process also happens independently, giving even more reason to opt for individual learning.

Additionally, realized fitness is used in favor of the hypothetical fitness. Arifovic and Maschek (2006) and Vall´ee and Yıldızo˘glu (2009) argue that realized fitness encourages firms to explore other strategies and enforces equilibrium strategies in symmetric Cournot settings. While hypothetical fitness allows firms to compare all strategies in their own pool to see if they made the best choice given what other firms have played, it remains only an indication and is less accurate than realized fitness. Moreover, computation times are substantially lower when using realized fitness.

For the sake of tractability and usability with the genetic algorithm, the circular market is discretized. Each discrete location on the circle (and thus each strategy) is represented by a binary string of length l. A binary string of length l has 2l _{different values.}

The location ξ it encodes is its fraction of 2l: ξ =

Pl−1

k=0sk· 2 k

2l , sk ∈ {0, 1}

where sk is the value of the k + 1th bit of the binary string. To calculate profits, the

formula stated earlier has to be adjusted. Define L as all discrete locations on the circle. Then the profit of firm i is

Πd_i ξi, ξ−i =

X

x∈L

πi x, ξi, ξ−i

Finally, there are other model parameters which do not have a clear superior value. An example is the size of the strategy pools: It is difficult to determine or provide arguments for a specific optimal value in advance. In this research, they are taken constant after some testing. The choices can be improved by performing additional impact analysis.

3.3 Extensions

Beside the baseline model, there are other factors which are interesting to investigate to see if they influence model convergence: They might speed up convergence or lead to a specific equilibrium. All extensions described below are implemented as additions to the baseline model.

The choice of crossover method is largely up to personal preference: what feels right, and ease of implementation. Because the crossover method is key in the genetic algorithm,

(19)

it is worth investigating what the effects are of using different methods on the convergence of the model. As described in Section 3.1.1, there are 3 main crossover methods to use. This research uses all of them and compares the results to see if there are any differences between them.

Election is a very promising operator in helping the model stay in equilibrium and helps shut down mutation. Usually without election, the top x% of the strategy pool in terms of fitness is kept and used for creating new solutions. This is called elitism sampling. However with election, there is a possibility that these top performing solutions will not re-enter the population in favor of the promising child solutions. Single- and double-bounded crossover are perfect candidates to use for election since they already provide two child solutions naturally. For bitwise crossover, the operator is performed twice in a row.

Since the circle is discretized, it might be impossible for the firms to locate at the exact SPNE locations as presented in Gupta et al. (2004). To help achieve convergence, the initial locations of the firms are also varied across simulations. The firms are initialized randomly, and close to either of two main equilibrium types illustrated in Figure 4.

(a) Equidistant (b) Across

Figure 4: The two main types of equilibria (Gupta et al., 2004)

In equidistant equilibria, each firm is located at a unique location at the same distance from both its neighbours. In across equilibria, there are only two locations where the firms are located among which they try to distribute themselves evenly. Because the firms cannot be initialized at the exact equilibrium locations, they are put at the closest location which has a binary representation of length l set in the model. As a control

(20)

group, for some simulations the firms are initialized randomly on the circle disregarding the equilibria. By initializing the firms near the equilibrium locations, the equilibria can also be checked for their local stability.

4 Results

4.1 Simulation outcomes

For every n, initial location configuration, and crossover method, 50 simulations are run with each batch using the same starting seed to produce comparable results. On a computer with low to medium specifications, this takes about 3 hours to run for 5 firms and 2000 periods. Each simulation is categorized based on the median locations of every firm in the last 100 periods. We consider a larger timespan than just the last period, since the simulation results showed that the firms would still move when nearing the end of the simulation, even when just for a little bit. The median is used in favor of the mean as 0 and 1 refer to the same location. If a firm were to move around this point, taking the mean would result in the wrong location. We distinguish three categories of which two are equilibria, as shown in Figure 5. For n = 5 there is an additional equilibrium, but this equilibrium never appears in the simulation results. The non-convergent outcomes seem like they are trying to converge to either the equidistant or across equilibrium instead. Furthermore, the equilibria in Figure 5a and 5b appear for all values of n. Therefore the third equilibrium has been omitted from the analysis.

(a) Equidistant (b) Across (c) Non-convergent

Figure 5: Examples of the simulation outcomes for n = 3

The different colored circles represent the different firms in the market. The dashed lines only serve to visualize the locations of the firms relative to each other, and have no meaning otherwise. The non-convergent outcome can occur when the simulation has not

(21)

been run long enough for the system to settle in an equilibrium, or when the equilibria are unstable. If an equilibrium is unstable, firms close to the equilbrium locations will move away from them rather than converge to them. For every simulation outcome, its distances to both equilibria are calculated. The distance is defined as the total deviation of the simulation result to the projected equilibrium along the perimeter of the circle. For each simulation result, the equidistant and across equilibria which are closest to the realized simulation result are used as the projected equilibrium. Figure 6 shows how the distance is calculated for a single simulation result. The orange markers indicate the projected equilibrium, and the distance is the sum of the length of the black arrows. Mathematically, this can be calculated with the following formula, where ˆξi is the projected equilibrium

location of firm i: d(ξ, ˆξ) = n X i=1 |ξi− ˆξi|

The distances to each of the two equilibria are calculated per simulation. If the distances to both equilibria exceed 0.05n, then the simulation is assigned a non-convergent label. Otherwise the simulation is assigned the label of the equilibrium it is closest to. Notice that the threshold increases in the number of firms. This is a way to account for the increased difficulty of coordinating on a specific equilibrium when there are more firms on the market. The number of assigned labels are shown in Table 5.

At first glance, it is clear that the equidistant equilibrium appears more frequently

(a) to its projected equidistant equilibrium

(b) to its projected across equilib-rium

(22)

n Initial location Crossover Equidistant Across No convergence 3 Random Bitwise 49 1 0 Single 50 0 0 Double 48 0 2 Equidistant Bitwise 47 0 3 Single 47 0 3 Double 47 0 3 Across Bitwise 47 1 2 Single 48 1 1 Double 49 0 1 5 Random Bitwise 14 1 35 Single 23 0 27 Double 11 1 38 Equidistant Bitwise 16 0 34 Single 10 1 39 Double 15 0 35 Across Bitwise 15 1 34 Single 13 1 36 Double 16 0 34

Table 5: Number of appearances of equilibria per simulation configuration

than the across equilibrium, regardless of the simulation parameters. Additionally, there are more non-convergent outcomes when there are more firms on the market. The across equilibrium almost never appears. Firms do not seem to be able to settle in the across configuration. Table 6 shows the average distance to each equilibrium per simulation configuration, separated into equidistant converged and all other simulations. Across converged simulations are not shown separately, since the firms do not seem to be able to settle in this configuration.

From Table 6, it is apparent that the difference in distances to the equilibria is smaller with fewer firms. In both cases with 3 and 5 firms, on average the simulations ended up being closer to the equidistant equilibrium than to the across equilibrium. There seem to be small differences in the average distances between the crossover methods as

(23)

Converged only Other

n Initial location Crossover Equidistant Across Equidistant Across

3 Random Bitwise 0.05721 0.34202 0.49560 0.00439 Single 0.04557 0.34559 - -Double 0.05348 0.34299 0.18499 0.25708 Equidistant Bitwise 0.05615 0.34228 0.18833 0.29358 Single 0.05725 0.33318 0.27060 0.28250 Double 0.04077 0.33929 0.26735 0.25822 Across Bitwise 0.05747 0.33602 0.27859 0.21863 Single 0.04983 0.34332 0.26246 0.15420 Double 0.05830 0.34536 0.20430 0.27370 5 Random Bitwise 0.17728 0.57617 0.40534 0.51054 Single 0.18684 0.59979 0.39535 0.46625 Double 0.18040 0.59757 0.36234 0.54566 Equidistant Bitwise 0.19040 0.57963 0.39723 0.51122 Single 0.16505 0.51686 0.41481 0.49259 Double 0.17535 0.61019 0.37269 0.52619 Across Bitwise 0.16001 0.58895 0.39248 0.55219 Single 0.18633 0.66869 0.38035 0.52929 Double 0.17199 0.61675 0.39705 0.47152

Table 6: Average distances to the equilibria per simulation configuration, for equidistant-labelled simulations only, and for other simulations

well, but there is no crossover method which outperforms the others in every parameter configuration. Because firms never settle in the across equilibrium, the distance to this equilibrium can be disregarded, but are mentioned anyway for the sake of completeness.

For the firms in the market, their profit is the most important indicator to measure how well they are doing. There seems to be a difference in profits as well when the firms are in different equilibrium configurations. However because we know the formula for the profits (Gupta et al., 2004), we can also calculate them without simulating. Figure 7 shows the profits of the firms in one simulation outcome each for the equidistant and the across equilibrium. The profits are calculated by taking the median profits of the last 100 periods.

(24)

(a) for n = 3, across initial locations, and bitwise crossover

(b) for n = 5, random initial locations, and double crossover

Figure 7: Firm profits in equidistant and across equilibria

To be able to check the profits of the individual firms in each simulation and to be able to compare the two equilibria, only one outcome is sampled for each equilibrium.

In the equidistant equilibrium, the profits of the firms are not exactly identical but they are close to each other. This could be due to the discretization of the circle, because it may not allow the firms to locate themselves at exactly the same distance from each other. However, in the across equilibrium the profits are clearly not the same and the firm which shares its location with the least number of other firms enjoys higher profits.

In the across equilibrium, the firms which have the least number of other firms sharing their location enjoy higher profits, and the dispersion in the profits of the firms is larger in this equilibrium than in the equidistant equilibrium. This could be a reason why the across equilibrium does not appear often in the simulation outcomes.

There are also differences in the industry profits, shown in Table 7. The industry profits are calculated by taking the median profits of the individual firms over the last 100 periods and summing them over all firms. The average industry profits in the equidistant equilibrium are higher than in the across equilibrium, and when the system has not converged. Furthermore, the average industry profits in non-converged systems is higher than in the across equilibrium.

The results from the simulations with 5 firms seem to indicate that the firms have not had enough time to learn the equilibria. Two possible ways to solve this are to either

(25)

n Initial location Crossover Equidistant Across No convergence 3 Random Bitwise 36628.44 36619.48 -Single 36629.19 - -Double 36628.47 - 36622.19 Equidistant Bitwise 36627.43 - 36623.29 Single 36628.68 - 36623.35 Double 36629.07 - 36624.68 Across Bitwise 36628.84 36619.50 36619.44 Single 36628.82 36615.39 36619.70 Double 36628.47 - 36624.22 5 Random Bitwise 27251.04 27245.33 27249.83 Single 27251.32 - 27249.08 Double 27251.61 27244.99 27249.22 Equidistant Bitwise 27249.76 - 27248.28 Single 27251.32 27231.93 27249.51 Double 27251.98 - 27249.74 Across Bitwise 27250.99 27240.94 27248.21 Single 27252.02 27244.16 27249.91 Double 27251.67 - 27249.17

Table 7: Average industry profits (total profits of the firms)

give the firms more time to learn, or to introduce inertia to the system, such that fewer firms change their locations at the same time. Running the simulation longer is the most straightforward solution. Adding inertia introduces a probability α of firms not choosing the best location, but instead remaining at their current one. This causes the firms’ choices not to interfere with each other as much, since fewer firms change their location every period. The adjustments are applied to a single simulation configuration, of which the results are shown in Table 8 in addition to their respective baseline results. Adding more periods seems to help the firms by letting them reach the equidistant equilibrium more frequently, however this effect seems limited and diminishes with the amount of periods added. Adding inertia has a smaller effect. The equidistant equilibrium remains the dominant equilibrium, but the system still does not converge in the majority of the cases.

(26)

n Initial location Crossover T α Equidistant Across No convergence 5 Random Bitwise 2000 0 14 1 35 3000 0 19 0 31 5000 0 20 0 30 2000 0.3 16 1 33 5000 0.3 16 0 34

Table 8: Number of appearances per equilibrium configuration for n = 5, random initial locations, and bitwise crossover: baseline model, more periods, inertia with probability α

(a) Bitwise crossover (b) Single-bounded crossover (c) Double-bounded crossover

Figure 8: Firm profit distributions for n = 3 and equidistant initial locations, for different crossovers

4.2 Equilibrium stability

Table 5 shows that the equidistant equilibrium appears more often than the across equilibrium in all cases: The latter appeared at most once out of 50 times. This seems to indicate that the equidistant equilibrium is locally stable. Furthermore, it seems that the basin of attraction of the equidistant equilibrium depends on the number of firms in the market. When there are more firms on the market, the equidistant equilibrium appears less often. When initializing the firms near the across equilibrium, they diverge away from it most of the time. This strongly indicates that the across equilibrium is locally unstable. This seems to be independent of the model parameters which were adjusted. Another indication that the across equilibrium might be unstable can be seen in Figure 7 and Table 7. In the equidistant equilibrium, all firms have about the same profits. The small differences in the figures are due to the set up of the simulation, where exact equidistance might not be possible. In the across equilibrium on the other hand, the firms who share

(27)

their locations with the least number of competitors have higher profits than those on the opposite side of the circle. This profit difference might not be sustainable, causing the across equilibrium to be unstable. Furthermore, the industry profits in the across equilibrium are lower than in the equidistant equilibrium.

Because Table 6 shows unnormalized numbers, i.e. the numbers are not adjusted for the different values of n, the top and bottom half of the table cannot be compared as is. In any case, it is clear that simulation outcomes are closer to the equidistant equilibrium than the across equilibrium, confirming the findings from Table 5. However, when looking at the distances for the converged simulations, there is no substantial difference between 3 and 5 firms. This seems to further confirm the likelihood that the basin of attraction of the equidistant equilibrium depends on the number of firms.

4.3 The effects of crossover

The choice of crossover method was varied across the simulations to investigate its impact on the outcomes. The hypothesis is that it could have an impact on equilibrium selection, meaning that a certain type of equilibrium might appear more frequently when using a certain type of crossover.

The results do not seem to fully support this hypothesis. In general, the number of labels in Table 5 do not vary substantially from one crossover method to the next. However, there are two outlying groups: 5 firms with random initial locations, and 5 firms with equidistant initial locations. In the first case, the number of equidistant labels assigned using single-bounded crossover is more than twice as much as when using double-bounded crossover. However because the firms’ locations are randomly initialized, it could be that the equidistant equilibrium was reached more often by pure luck. In the second case, the number of equidistant labels assigned using bitwise or double-bounded crossover is 50% larger than in the single-bounded crossover case. However Table 6 shows that the distances of the converged outcomes to the projected equilibrium do not differ much. More simulations are required to fully investigate this observation. There are some small differences in the equilibrium distances between the crossover methods, but there does not seem to be conclusive proof that one crossover method outperforms the other.

The crossover methods do not seem to lead to differences in profits either. When looking at the profits of the firms for each simulation configuration, they follow roughly the same distribution regardless of crossover method as shown in Figure 8. In this case, the

(28)

mode of the profit distributions is around 12210, all with comparable variance. Hence, the key performance index for the firms does not seem to depend on the type of crossover used. From their perspective, it would make no difference if they would switch the crossover method in the learning process.

The lack of a significant effect of changing the crossover method is an interesting result in favor of robustness of the genetic algorithm with respect to the crossover method. In other words, disregarding the fact that there exist multiple ways to apply the crossover operation in the genetic algorithm does not lead to significant different results from one research to another.

5 Conclusion

This paper describes the research set-up and results of genetic algorithm learning in the circular market model with location and quantity competition. Gupta et al. (2004) derived the subgame perfect Nash equilibria for the location and quantity choice of the firms for different numbers of firms on the market. This paper focusses on two types of equilibria of the model: the equidistant and the across equilibrium, because these two equilibria were prominent in the results and appear for any number of firms. Multiple parameters of the model were varied, such as the number of firms, their initial location, and the type of crossover they use when applying the genetic algorithm. The goal of this research is to see if the equilibria of the circular market can be learned, and to perform stability analysis on the equilibria. Furthermore, the impact of the model parameters on the learning rate and the stability of the equilibria have also been investigated.

When firms are allowed to explore long enough, they can definitely learn to settle at certain equilibrium locations. The rate at which they learn the equilibrium seems to depend on the number of firms which are active on the market: Learning seems to be faster in a smaller market. The results above were categorized as an equilibrium outcome more frequently when the model included fewer firms.

Firms reach the equidistant equilibrium much more frequently than the across equilibrium, regardless of the model parameters, being reached almost all the time in the cases with fewer firms. This strongly suggests that the equidistant equilibrium is locally stable. However the number of times this equilibrium is reached decreases when the number of firms increases, indicating that either the basin of attraction of the equidistant

(29)

equilibrium, or the learning rate decreases in the number of firms. Two adjustments were made to further test this. Adding more periods has an effect on how often the equidistant equilibrium is reached, but this rapidly diminishes in strength and seems to plateau at a certain value. Introducing inertia only had a small effect on the results. The across equilibrium remains absent, regardless of the model parameters. This indicates that the across equilibrium is unstable.

There are differences in terms of profits between the different equilibria as well. In the equidistant equilibrium, the profits are identical across all firms. In the across equilibrium, the profits are skewed with firms sharing their locations with the least number of other firms having higher profits. This difference in profits in the across equilibrium could also play a role in why the equilibrium is unstable. Namely, firms could notice the profit difference and deem the across equilibrium as not sustainable. Furthermore, the industry profits are higher in the equidistant equilibrium, causing firms to prefer to coordinate on this equilibrium.

The types of crossover used in the genetic algorithm were also varied across the different simulation set-ups. In general, there are no glaring differences between the different crossover methods in terms of the simulation outcomes, the average distance to the equilibria, and the profits of the firms. There are some outlying location configurations where a case could be made for an impact of the crossover methods, but it will require more simulations to be able to confirm this. There are differences in profits, but these are caused by the type of equilibrium reached rather than the type of crossover used. The crossover methods do not seem to impact the learning process or the outcomes, i.e. the equilibrium type. One could say that crossover, while integral to genetic algorithms, merely serves as a way to combine parts of good solutions, where the exact method of doing so has little to no importance. This is a good result for the robustness of the genetic algorithm.

6 Discussion and recommendations

This research focusses on relaxing the rationality assumption in the circular market model with quantity competition by letting firms learn the equilibria instead of calculating them, however there are some aspects which could be improved on. This model only lets the firms learn the location part of the equilibria, and lets them calculate the corresponding

(30)

optimal quantity using the locations of other firms. This allows them to get the optimal profit for their chosen location. Firms could also use genetic algorithm learning to set their produced quantity. One simple way to do this is by giving firms access to a second pool of binary strings, which encode their quantity choices. This way, firms choose two binary strings per period: One for their location, and one for the quantity. The issue with this method is that location and quantity are decoupled. The optimal quantity for the best location might not correspond with the best quantity in a firm’s pool. Another way to include quantity in the learning process is similar to what has been done in Arifovic (1996). Firms use one binary string, but it encodes two values: The first bits of the string encode the location, while the last bits encode the quantity. These binary strings carry more information, because it represents which combinations of choices worked well in the past rather than individual choices.

The results in this research mainly focus on the end states of the simulations, i.e. the assigned equilibrium labels, the distance to the different equilibria, and the profits. However there are also other factors which are an indication of the learning process, such as the convergence speed. After how many periods can one say that the model has converged? Even though the crossover methods do not show signs that they produce different end results, they might influence the speed of convergence of the model. It would be interesting to define different indicators for the learning process in the model and examine those instead of only the end results.

To make the analysis tractable, the circular market is discretized. The consequence is that not all locations on the circle can be reached. This means that some equilibria might not be reached. Some equilibria might also not be stable when the number of locations is lacking. It would be an interesting addition to increase the number of possible locations on the circle. The easiest way to do this would be to increase the parameter l, the length of the binary strings. However, another interesting track is to see how the market could be modelled differently such that the dependence on discretization for tractability disappears.

For an odd number of firms, the number of equilibria in the model is finite. Because the focus in the paper is on the outcomes of the simulations, only odd numbers are considered to keep the analysis tractable and comparable. However, the learning process and convergence speed for example are also interesting to look into. This might be different for even and odd numbers of firms because of the number of equilibria: Because markets

(31)

with an even number of firms have an infinite number of equilibria, does that increase the convergence speed because there are more equilibria to settle in, or does it decrease it because it becomes hard to coordinate on a single equilibrium? This could be an interesting characteristic of the market to further investigate.

The equidistant and the across equilibrium are theoretical equilibria for any number of firms, which is why these are investigated extensively. However there are many other equilibrium configurations which have not been examined here. For example, for 5 firms there is an additional equilibrium, but did not appear at all in the simulations results. The number of equilibrium types increases in the number of firms, which could appear in addition to the equilibria analyzed here. A logical next step would be to keep track of all possible types of equilibria, and extend the stability analysis. This requires grouping the different types of equilibria, which could be done according to the theorem in Gupta et al. (2004).

Lastly, the model depends on a handful of exogenously set parameters, such as the number of periods to simulate, or the number of periods before updating the strategy pool. While the values of some parameters are based on others, e.g. the number of periods before updating the strategy pool is larger than the size of the strategy pool, the values of others are independent or not motivated. Naturally, it would be good to check what the influence of changing these parameters would be on the results. The most interesting parameters to change are the number of simulation periods and the size of the strategy pools, since they allow the firms to have more time and choices to choose their location, possibly increasing the probability that the find an equilibrium. A small experiment (see Table 8) shows that adding more periods causes the equidistant equilibrium to be reached more often, but the effect seems to be limited and diminishes completely after adding a certain amount of extra periods.

(32)

References

Arifovic, Jasmina (1991). “Learning by genetic algorithms in economic environments.” Doctoral dissertation. University of Chicago, Department of Economics.

— (1994). “Genetic algorithm learning and the cobweb model.” In: Journal of Economic Dynamics and Control 18.1, pp. 3–28.

— (1996). “The behavior of the exchange rate in the genetic algorithm and experimental economies.” In: Journal of Political Economy 104.3, pp. 510–541.

Arifovic, Jasmina and John Ledyard (2004). “Scaling up learning models in public good games.” In: Journal of Public Economic Theory 6.2, pp. 203–238.

Arifovic, Jasmina and Michael K Maschek (2006). “Revisiting individual evolutionary learn-ing in the cobweb model–an illustration of the virtual spite-effect.” In: Computational Economics 28.4, pp. 333–354.

Arthur, W Brian (1994). “Inductive reasoning and bounded rationality.” In: The American Economic Review 84.2, pp. 406–411.

Audibert, Jean-Yves, R´emi Munos, and Csaba Szepesv´ari (2009). “Exploration–exploitation tradeoff using variance estimates in multi-armed bandits.” In: Theoretical Computer Science 410.19, pp. 1876–1902.

Becker, Gary S (2013). The Economic Approach to Human Behavior. University of Chicago press.

Bray, Margaret M and Nathan E Savin (1986). “Rational expectations equilibria, learn-ing, and model specification.” In: Econometrica: Journal of the Econometric Society, pp. 1129–1160.

Carlson, JA (1968). “An invariably stable cobweb model.” In: The Review of Economic Studies 35.3, pp. 360–362.

Carpentier, Alexandra et al. (2011). “Upper-confidence-bound algorithms for active learning in multi-armed bandits.” In: Kivinen J., Szepesv´ari C., Ukkonen E., Zeugmann T. (eds) Algorithmic Learning Theory. ALT 2011. Lecture Notes in Computer Science, vol 6925. Springer.

Conlisk, John (1996). “Why bounded rationality?” In: Journal of Economic Literature 34.2, pp. 669–700.

Ellison, Glenn (2006). “Bounded rationality in industrial organization.” In: Econometric Society Monographs 42, p. 142.

(33)

Erev, Ido and Alvin E Roth (1998). “Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria.” In: American Economic Review, pp. 848–881.

Ezekiel, Mordecai (1938). “The cobweb theorem.” In: The Quarterly Journal of Economics 52.2, pp. 255–280.

Forrest, Stephanie and Melanie Mitchell (1993). “Relative building-block fitness and the building-block hypothesis.” In: Foundations of Genetic Algorithms. Vol. 2. Elsevier, pp. 109–126.

Grefenstette, John et al. (1985). “Genetic algorithms for the traveling salesman problem.” In: Proceedings of the first International Conference on Genetic Algorithms and their Applications, pp. 160–168.

Gr¨une-Yanoff, Till (2007). “Bounded rationality.” In: Philosophy Compass 2.3, pp. 534– 563.

Gupta, Barnali et al. (2004). “Where to locate in a circular city?” In: International Journal of Industrial Organization 22.6, pp. 759–782.

Haupt, Randy L, Sue Ellen Haupt, and Sue Ellen Haupt (1998). Practical Genetic Algo-rithms. Vol. 2. Wiley New York.

Hommes, Cars (2011). “The heterogeneous expectations hypothesis: Some evidence from the lab.” In: Journal of Economic Dynamics and Control 35.1, pp. 1–24.

Hotelling, Harold (1929). “Stability in competition.” In: The Economic Journal 39.153, pp. 41–57.

Kahneman, Daniel (2003). “Maps of bounded rationality: Psychology for behavioral economics.” In: American Economic Review 93.5, pp. 1449–1475.

Li, Lihong et al. (2010). “A contextual-bandit approach to personalized news article recommendation.” In: Proceedings of the 19th International Conference on World Wide Web. ACM, pp. 661–670.

Matari´c, Maja J (1997). “Reinforcement learning in the multi-robot domain.” In: Robot Colonies. Springer, pp. 73–83.

Munier, Bertrand et al. (1999). “Bounded rationality modeling.” In: Marketing Letters 10.3, pp. 233–248.

Salop, Steven C (1979). “Monopolistic competition with outside goods.” In: The Bell Journal of Economics, pp. 141–156.

(34)

Simon, Herbert A (1972). “Theories of bounded rationality.” In: Decision and Organization 1.1, pp. 161–176.

Sivanandam, SN and SN Deepa (2007). Introduction to Genetic Algorithms. Springer Science & Business Media.

Tokic, Michel and G¨unther Palm (2011). “Value-difference based exploration: adaptive control between epsilon-greedy and softmax.” In: Annual Conference on Artificial Intelligence. Springer, pp. 335–346.

Vall´ee, Thomas and Murat Yıldızo˘glu (2009). “Convergence in the finite Cournot oligopoly with social and individual learning.” In: Journal of Economic Behavior & Organization 72.2, pp. 670–690.

Vega-Redondo, Fernando (1997). “The evolution of Walrasian behavior.” In: Econometrica 65.2, pp. 375–384.

Vriend, Nicolaas J (2000). “An illustration of the essential difference between individual and social learning, and its consequences for computational analyses.” In: Journal of Economic Dynamics and Control 24.1, pp. 1–19.

(35)

Appendices

A

Pseudocode

1 initialize seed;

2 initialize firms: locations and strategies; 3 for t ← 1 to T do

4 keep track of current locations and profits; 5 if t multiple of g then

6 keep best X% performing locations; 7 while size of the strategy pools < K do 8 select two random strategies for crossover;

9 apply crossover method;

10 perform election;

11 mutate new solutions;

12 add new solutions to pool; 13 end

14 end

15 update realized fitness; 16 choose new best location; 17 end

18 export location and profit histories;

Algorithm 1: Genetic algorithm learning in the circular market model

B

Simulation distance spread

Because firms never settle in the across equilibrium, the distances to this equilibrium are disregarded. The histograms in Figure 9 show the distributions of the distances to the equidistant equilibrium for a specific simulation configuration. There are two major types of histograms: The one type has its mass near 0.05, which means that the majority of the simulations converged to the equidistant equilibrium. This is usually the case for 3 firms. The other type has its mass spread over a larger interval. In these cases, the equidistant equilibrium is not reached very frequently. This is usually the case for 5 firms, even when

(36)

allowing the firms to learn over a longer period or introducting inertia to the system.

(a) for n = 3 (b) for n = 5

Figure 9: Distances to equidistant equilibrium for across initial locations, and single crossover

Genetic algorithm learning in a spatial Cournot oligopoly : learning, stability, and equilibrium selection

Faculty of Economics and Business

Amsterdam School of Economics

Requirements thesis MSc in Econometrics.

Genetic algorithm learning in a spatial

Cournot oligopoly

Learning, stability, and equilibrium selection

Leslie Dao (10561234)

Supervisor: D´

avid Kop´

anyi

Second reader: Marco van der Leij

Abstract

A thesis presented for the degree of

MSc in Econometrics, track Mathematical Economics

July 8

, 2018

Table of Contents

1

Introduction

2

Literature review

2.1

Bounded rationality and its importance

2.2

Reinforcement learning

2.3

Genetic algorithm learning

2.4

The circular market model and its equilibria

3

Equilibrium learning with genetic algorithms

3.1

Genetic algorithm modules

3.2

Baseline model

3.3

Extensions

4

Results

4.1

Simulation outcomes

4.2

Equilibrium stability

4.3

The effects of crossover

5

Conclusion

6

Discussion and recommendations

References

Appendices

A

Pseudocode

B

Simulation distance spread