• No results found

Stability and basins of attraction for the best-reply and Gradient Learning Processes in Dynamic Cournot Duopoly Games with decreasing marginal costs

N/A
N/A
Protected

Academic year: 2021

Share "Stability and basins of attraction for the best-reply and Gradient Learning Processes in Dynamic Cournot Duopoly Games with decreasing marginal costs"

Copied!
41
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Stability and Basins of Attraction for the

Best-Reply and Gradient Learning Processes in

Dynamic Cournot Duopoly Games with

Decreasing Marginal Costs

by:

Mark Verhagen,

University of Amsterdam, Faculty of Economics and Business

Supervised by:

Dr. M.I. Ochea,

University of Amsterdam, Faculty of Economics and Business

CeNDEF Abstract

The gradient and best-reply learning rules are examined in a duopoly style game with linear demand and quadratic costs. Dependent on the quadratic element, there is a type I or type II duopoly. In both types the interior equilibrium is a Nash equilibrium, but in type II duopolies there are two additional boundary equilibria. Analysis is done for homogeneous and heterogeneous behaviour. First for the simple duopoly case where two firms play a game and then for the case where firms are randomly selected from an infinite population to play a duopoly. In this popula-tion both learning rules will be represented through fracpopula-tions of the total population. Analysis in this infinite population game is done for both fixed population fractions, as for evolutionary fractions where firms may switch between behavioural rules. Population dynamics will be governed by the Replicator dynamics. In all simple duopolies, it is found that in type I duopolies the interior equilibrium is stable and in type II duopolies the boundary equilibria are stable. Whenever an infinite population is defined, the boundary equilibria disappear and there is no significant dif-ference between type I or type II duopolies whenever the heuristics are treated as equally costly. Both potentially converge towards an interior or asymmetrical equilibrium. Only for unrealistically high fractions of best-reply learners a two-cycle is found.

(2)

Contents

1 Introduction 1

2 The Cournot Duopoly Model 4

2.1 The Boundary Equilibria Model . . . 4

3 Best-Reply Learning Dynamics 6 3.1 The Model . . . 6

3.2 Stability of the Nash Equilibria . . . 7

3.3 Stability of the Two-Cycle . . . 10

4 Gradient Learning Dynamics 13 4.1 The Model . . . 13

4.2 Stability of the Nash Equilibria . . . 13

5 Analysis and Basins of Attraction 16 5.1 The Model . . . 16

5.2 Best-Reply Learning . . . 16

5.3 Gradient Learning . . . 19

6 Heterogeneous Heuristics 20 6.1 The Simple Heterogeneous Duopoly . . . 20

6.1.1 Analysis and Basins of Attraction . . . 21

6.2 Infinite Population Game and Population Dynamics . . . 23

6.3 The Model with Fixed Population Fractions . . . 24

6.3.1 Analysis and Basins of Attraction . . . 25

6.4 The Model with Evolutionary Population Fractions . . . 30

6.4.1 Analysis and Basins of Attraction . . . 30

(3)

1

Introduction

The Nash equilibrium is a fundamental keystone of modern economic theory and is central to the concepts related to game theory. In practice, the finding of the Nash equilibrium is guaranteed conditional on the existence of a unique equilibrium and complete information. These last two conditions will be the central focus of this paper. First of all, there are scant arguments that imply more Nash equilibria than just the interior one. Secondly, complete information is obviously a unrealistic assumption.

Complete information implies that every firm can predict the competing firms’ quantity, conditional on his own. From this it necessarily follows that the Nash equilibrium is the profit maximising strategy. In practice it is exactly this quantity of the other firm that is unknown. This information has to be estimated and a process of estimation and learning commences. This raises the question if the Nash equilibrium will eventually be converged upon when using certain learning processes, or that potentially another non-Nash equilibrium is found. Since the Cournot Oligopoly model is one of the most popular settings in industrial organisation, this paper will analyse differing types of behavioural rules in a quantity-setting, oligopoly-style game.

Several different learning processes have been shown to converge on the unique Nash equilibrium, most notably the dynamics set forth by Cournot him-self. In these dynamics, the so-called best-reply dynamics, each firm ’myopically’ chooses the output quantity that would maximise profit if all firms would play the same quantity as last period. This strategy is often dubbed as ’Nai¨ve’ play. It is found that no matter what starting quantities are chosen, the Nash equilib-rium will be converged upon when using the best-reply dynamics1. Theocharis

(1960) [15] shows that this convergence does not hold when the amount of firms in the game exceeds three players and as such this paper will limit its analysis to the duopoly game.

Milgrom and Roberts (1991) [11] show that not only for the best-reply learn-ing process, but for a broad class of learnlearn-ing models the Nash equilibrium is converged upon in a duopoly. Similar results have been found in an experimen-tal setting by Holt (1995)[8]: the Nash equilibrium is a reasonable prediction for the eventual behaviour of players in a duopoly game for multiple learning processes. All the above findings, however, build forth on the existence of a unique interior Nash equilibrium point on which convergence will take place. This unique interior point follows necessarily, whenever firms’ marginal costs are constant. See Kolstad (1987) for one of the many proofs for the uniqueness of the Nash equilibrium, when there are constant marginal costs [9].

As has been mentioned, exactly this uniqueness of the Nash equilibrium will be relaxed. Some research has already been done on the implications of multiple Nash equilibria. For instance Cox and Walker examine stability in a scenario where marginal costs are decreasing instead of constant, potentially creating additional Nash equilibria[5]. Widespread applicability of economies of scale

1Obviously following non-negativity constraints on prices, quantities and other basic

(4)

corroborate the existence of these cost functions and thus validate the analysis of these scenarios. It is found that for sufficiently decreasing marginal costs there are, in addition to the interior point, two boundary equilibria. Exactly this phenomena of boundary Nash equilibria, combined with incomplete information and the necessity of applying certain learning processes will be examined.

Cox and Walker define two types of duopolies. Whenever marginal costs are sufficiently decreasing, there is a case of a so-called type II duopoly with boundary equilibria. Whenever marginal costs are constant or just slightly de-creasing, there is a type I duopoly without boundary equilibria. This can be understood intuitively when examining a figure with two linear reaction func-tions. Whenever the marginal costs are sufficiently decreasing, the slope of the reaction curves decreases. This can possibly create two additional intersections between the two reaction curves, as is evident from figure 1. Whenever type II duopolies appear, the question arises towards which of the equilibria (if any) there will be convergence and if this differs for certain learning behaviours.

Cox and Walker examine various models, like the model used by Moreno and Walker (1991,1994) [12][13], where expectations are made by optimising output against the average of all previous outputs of the other firm. This learn-ing process is much more retrospective than the before mentioned best-reply dynamics. Another model that is discussed by Cox and Walker is the fictitious play model, where a chance distribution is created based on the frequency of previous quantities played by the other firm.

All the above mentioned models and best-reply dynamics converged to the unique interior point in case of a single interior Nash equilibrium [12][13][15]. When marginal costs were sufficiently decreasing, however, and boundary equi-libria appeared, Cox and Walker expected convergence towards one of the boundary equilibria. Towards which equilibria convergence takes place de-pended on the starting quantities of the system. When using the Best-Reply

(a)Type I Duopoly (b)Type II Duopoly

(5)

dynamics, it is even possible that an alternating equilibrium is reached, where one time period both firms play the non-zero boundary quantity and the next both play zero. Since all three models try to estimate the opposing quantity, the best-reply dynamics will be examined as representative for these types of heuristics and be examined in contrast to firms with different kinds of learning behaviour.

In addition to the quantity-estimating best reply dynamics, gradient learning behaviour will be examined. This makes for an interesting addition to the best-reply model, since gradient learning uses an introspective learning method, that is not based on creating expectations on the opposing firms’ quantity. This is obviously in stark contrast to the above mentioned models who all, in more or less degree, apply a na¨ıve strategy based on past experiences and observations from which they seek to estimate the quantities they will face.

The gradient learning model is focused around the idea that firms change their strategic variable in the direction towards which there profit would have been optimal. This requires the definition and derivation of the firms’ profit function. Naturally, this sort of learning can settle on a locally optimal equilib-rium, while global optima may never even be remotely reached. This motivates the analysis of boundary equilibria and their stability.

For these two learning rules, the implications of a type I or a type II duopoly will be examined. Firstly, the best-reply dynamics are subjected to the type II duopoly case and the assumptions and expectations of Cox and Walker are verified. Then it will be examined if the gradient learning is also part of the ’broad range of learning dynamics’ converging towards the interior equilibrium in a type I duopoly and towards a boundary equilibrium in a type II duopoly. When the homogeneous duopoly case has been examined for both the Gradient and the Best-Reply learning models, a heterogeneous case is introduced.

First, analysis will be done for a simple duopoly game with one best-reply firm and one gradient firm for both the type I and type II duopolies. For more complex analyses, where there might be a majority of one heuristic, an infinite population is defined where the two heuristics are represented by a certain fraction of this population. To remain in the duopoly-format and avoid the instability threshold proposed by Theocharis[15], two firms will be randomly selected from this population to play a duopoly game. Through this scheme, multiple behavioural rules and fractions can be applied and have an effect in the duopoly without necessarily increasing the amount of firms in the game. This infinite population game will be examined for fixed fractions and for evolutionary fractions, where ’switching’ between behavioural rules is allowed.

These population dynamics will be modelled through the so-called replicator dynamics. These dynamics depend on the profit that one heuristic yields in contrast to the other. This can be supplemented with an evolutionary pressure variable θ, pressing evolutionary change. These dynamics have been used by Droste et al. (2002) [6] and are motivated by Gale et al. (1995), Binmore and Samuelson (1997) and Schlag (1998) as representative population dynamics in a learning, aspiration or imitation style setting[7][1][14].

(6)

equilibria and interior equilibrium. Dynamical systems will be defined for the various models and Jacobian matrices derived where possible. Fixed points of the system will be analysed through simulations with E & F Chaos. These simulations will show for which starting values there will be convergence to each of the stable equilibria. Bifurcation analysis will show differing equilibria that arise by relaxing different parameters. Basin examination is motivated by the weak real-life application of asymptotic stability.2

Section two will briefly touch on the global Cournot duopoly framework and some simple conditions are discussed in order for the type II boundary equilibria to appear. In section three the Best-Reply dynamics will be examined, where stability of the boundary equilibria is analysed as well as the alternating equi-librium. Section four will examine boundary equilibria in a gradient learning process and determine if gradient learners are also part of the ’broad’ range of learning processes that converge towards the boundary equilibria in a type II duopoly. Section five will be dedicated to simulations of the above mentioned models and examining those sets of starting values that correspond to the dif-ferent equilibria. Lastly, section six will investigate the stability properties and basins of attraction of a heterogeneous game. Discussion of the results and possible future studies will be presented in section seven.

2

The Cournot Duopoly Model

In this section the global framework of the Cournot duopoly model will be discussed, beginning with the simplest conditions concerning the game. Some requirements will then be added to the framework that will guarantee a model with boundary equilibria. With these requirements in place, the interior and boundary Nash equilibria are defined. These equilibria will be examined for stability in different behavioural rules in the following sections.

2.1

The Boundary Equilibria Model

A quantity-setting duopoly is examined, where each firm produces identical goods. For simplicity, a setup with linear reaction functions is defined, where both firms apply the same cost function for simplicity. Total quantity produced in the market is defined as Q = q1+q2with inverse demand function P = α−βQ

governing the price, where α, β ≥ 0. In order to avoid negative prices it is assumed that Q < α

β. Cost functions are defined by C(qi) = γqi+ δq 2 i. The

objective of every firm is to maximise their profit functions, dependent on their estimation of the other player’s output:

max

qi

Y

(qi, ˆq−i) = (α − βQ)qi− C(qi).

2Asymptotic stability is defined as an arbitrarily small area  around the equilibrium in

which there will always be convergence towards the equilibrium. If this area is extremely small, it obviously is not likely to be converged upon in real-life learning processes.

(7)

Simple optimisation gives the following linear relationship between a firms’ strategic variable and their estimation of their competitor’s quantity:

qi= Ri( ˆq−i) = a − b ˆq−i,

where a = (α − γ)/2(β + δ) and b = β/2(β + δ). Note from this notation that the reaction functions are linear. To guarantee an interior Nash equilibrium, it is required that the reaction functions be decreasing monotonic and as such α − γ > 0 and β + δ > 0 is set.

From equating the reaction curves it follows that the interior Nash equilib-rium is given by

qi∗= (a − ba)/(1 − b2).

Note that for b ≥ 1, the curves intersect on the axes and boundary equilibria appear. Whenever δ gets sufficiently negative, thus making marginal costs more decreasing, b will become sufficiently large to make the denominator negative3.

In the interval of δ for which b2 < 1 there is a type I duopoly and in the interval for which b2 > 1 there is a type II duopoly. This can be understood intuitively when rewriting the reaction functions in equation 2.1 by solving for q1 or q2, and equating the derivatives to either one. When the derivatives are

equal, the critical value is found for which the functions also intersect ’from below’. Examining the parameter b = 2(β+δ)β , straightforward derivation yields the following condition on the parameter δ for boundary equilibria to exist:

β < |2δ|. (1)

This condition will return often as a critical point for stability and as such will be dubbed accordingly:

Condition 1 for a type II Duopoly Whenever β < |2δ| holds, there is a type II duopoly.

To determine the boundary equilibria, one can simply insert q−i= 0 into the

reaction function and find the intersecting quantity to be: qi= (α − γ)/2(β − δ).

As such the boundary equilibria are given by ((α − γ)/2(β + δ), 0), (0, (α − γ)/2(β + δ)).

With these Nash equilibria and conditions for the two types of duopoly in place, analysis can be done for the different behavioural rules. The following section will discuss the stability of the above equilibria for the best-reply dynamics, section four will examine them for the gradient dynamics.

(8)

3

Best-Reply Learning Dynamics

This section is dedicated to the before mentioned types of duopoly, when firms apply the best-reply learning process. The best-reply strategy is viewed as ”Na¨ıve”, since it expects competing firms to play the exact same quantity in the coming time period as they did in the previous one. This is modelled through the decision to optimise output conditional on last period’s quantity levels. This adapting process can be expanded by incorporating a sense of hesitance on the part of the firm in changing their output level, called ’the level of inertia’.

3.1

The Model

Cournot (1838) postulates the now-named best-reply dynamics[4], which are a purely ’myopical’ learning process. The firm uses his reaction function, provided that he knows it, conditional on last periods quantities of competitors. This yields the following dynamical system:

q1,t= R1(q2,t−1),

q2,t= R2(q1,t−1),

(2)

where Ri(.) is the best reply function for firm i. From Theocharis(1960) it

follows that the system with constant marginal costs yields the following solution for a duopoly: 4

q∗i = A1(1/2)t+ B1(−1/2)t+ (α − 3γ)/3β

which obviously converges to some value. A1 and B1 depend on the starting

values, but obviously do not influence the equilibrium after a certain time period. When expanding the system to a three-player game, the following solution is found:

qi∗= A1(1/2)t+ B1(−1)t+ (α − 5γ)/3β,

which oscillates around some value, given by the final term in the equation. From this it becomes clear that a complex situation arises when there are three or more firms in the game. For this reason, this paper will go at lengths to keep the game in a duopoly format.

As has been mentioned, the Best-Reply dynamics can be expanded with the introduction of inertia, as proposed by Bischi and Kopel (2001)[3]. This inertia represents some kind of hesitant behaviour on the part of the firm in changing quantities. The parameter ψ ∈ [0, 1] illustrates the tendency towards fully adapting to the newly derived quantity, which is the case for ψ = 1, or only partially adjusting the quantity of last period. The first will be dubbed as the ’Classical Best-Reply’ model, while for ψ 6= 1 the model will be called the ’Best-Reply with Inertia’ model. This model has the following dynamical

4Note that in Theocharis(1960) there are different constant marginal costs taken into

(9)

system:

q1,t+1= (1 − ψ1)q1,t+ ψ1(R1(q2,t))),

q2,t+1= (1 − ψ2)q2,t+ ψ2(R2(q1,t))).

(3)

Future analyses will be done for ψi= 1 first, after which the effect of differing

levels of inertia is examined.

3.2

Stability of the Nash Equilibria

The Nash equilibria that are defined in section 2 will be examined in the type I and type II case. Stability of the equilibria is found by evaluating the eigenvalues of the Jacobian matrices of the dynamical systems. In the simple dynamical system (2), the Jacobian matrix is given by:

J∗=  0 R0(q−1,t∗ ) R0(q∗ −2,t) 0  . (4)

In order to evaluate stability, it is of importance to note that the derivatives of the reaction functions are not continuous, which is obvious from figure 1. To circumvent having to analyse the derivatives in these kinks, the figure is divided in nine sets. This strategy has been applied by Bischi et al. in their 2010 book[2] and will be reproduced in this paper.5 In every sub-set the functions will be continuous. This is illustrated in figure 2. The dividing values can be found by close analysis of figure 1, which provides the following definition of the best-reply function: Ri(q−i) =      0 if q−i≥ q−iL Li if q−i<a−Lb i Ri(q−i) otherwise (5)

Where qLi is the intersection for which firm −i stops producing: R−i(qiL) = 0

and a−L1

b is the quantity of q−i for which firm i’s best-reply function gives

the maximum quantity producible. When analyzing this reaction function it becomes immediately clear that the function is piece-wise linear. Whenever the value qL

i is greater than the monopoly quantity qm, the monopoly quantity will

never be played since it is not an intersection of the reaction curves. Whenever the maximum quantity Li is smaller than the monopoly value qm, it will also

never be a feasible equilibrium. In order for (5) to have three distinct parts, it is necessary that qL

i < qm< Li. Assuming that this is the case, analysis can be

done per subset of the total strategy set [0, L1] × [0, L2], where kinks are located

5This book also derives the stability properties of the interior and boundary equilibria for

the best-reply duopoly, but will be included and examined since it shows the importance of condition one and is necessary to understand the derivation of the two-cycle that is done in the next sub-section.

(10)

on the boundaries of these subsets.6 The following nine variants of the system defined in (2) are found, where ψ ∈ [0, 1] is the inertia parameter:

Tψ|D1: ( q1,t+1= (1 − ψ1)q1,t+ ψ1(a − bq2,t), q2,t+1= (1 − ψ1)q2,t+ ψ2(a − bq1,t), Tψ|D2: ( q1,t+1= (1 − ψ1)q1,t+ ψ1(a − bq2,t), q2,t+1= (1 − ψ1)q2,t+ ψ2· L2, Tψ|D3: ( q1,t+1= (1 − ψ1)q1,t+ ψ1· 0, q2,t+1= (1 − ψ1)q2,t+ ψ2· L2, Tψ|D4: ( q1,t+1= (1 − ψ1)q1,t+ ψ1· 0, q2,t+1= (1 − ψ1)q2,t+ ψ2(a − bq1,t), Tψ|D5: ( q1,t+1= (1 − ψ1)q1,t+ ψ1· 0, q2,t+1= (1 − ψ1)q2,t+ ψ2· 0, Tψ|D6: ( q1,t+1= (1 − ψ1)q1,t+ ψ1(a − bq2,t), q2,t+1= (1 − ψ1)q2,t+ ψ2· 0, Tψ|D7: ( q1,t+1= (1 − ψ1)q1,t+ ψ1· L1, q2,t+1= (1 − ψ1)q2,t+ ψ2· 0, Tψ|D8: ( q1,t+1= (1 − ψ1)q1,t+ ψ1) · L1, q2,t+1= (1 − ψ1)q2,t+ ψ2(a − bq1,t−1), Tψ|D9: ( q1,t+1= (1 − ψ1)q1,t+ ψ1· L1, q2,t+1= (1 − ψ1)q2,t+ ψ2· L2.

When evaluating the derivatives of the above functions, it become clear that there are only a handful of possible derivatives. It is either 1 − ψior zero in case

the second term of the system is the constant ψiLi and doesn’t exist whenever

q−i = qm. If the second term is not constant but represented by the

best-reply curve (times ψi), the derivatives are 1 − ψi and −bψi, where the diagonal

elements will be equal to 1 − ψi and the off-diagonal elements can only be zero

or −bψi. This yields the following four possible Jacobian matrices:

J(1)=1 − ψ1 −ψ1b −ψ2b 1 − ψ2  ; J(2)= J(6)=1 − ψ1 −ψ1b 0 1 − ψ2  J(4)= J(8) =1 − ψ1 0 −ψ2b 1 − ψ2  ; J(3)= J(5) = J(7)= J(9)=1 − ψ1 0 0 1 − ψ2  . (6)

6Dependent on the values of the dividing lines, it is possible that only four subsets exist

since non-negativity of quantities must hold in Cournot competition. Imagine for instance that the value a−Li

(11)

Figure 2: The strategy set [0, L1] × [0, L2] divided in sub-sets.

Interior Equilibrium Examining the value of the interior equilibrium it be-comes clear that it is located in the first set, with accompanying Jacobian:

J(1) =1 − ψ1 −ψ1b −ψ2b 1 − ψ2

 .

This Jacobian matrix has a characteristic equation, given by: λ2+ (−2 + ψ1+

ψ2)+(1−ψ1)(1−ψ2)−ψ1ψ2b2=0. This can be rewritten in the form λ2+pλ+q =

0 for

p = −2 + ψ1+ ψ2,

q = (1 − ψ1)(1 − ψ2) − ψ1ψ2b2.

The following lemma is applied to examine the eigenvalues:

Lemma 1 When considering a quadratic polynomial λ2+ pλ + q = 0,

all roots of the polynomial lie within the unit plane if and only if 1 + p + q > 0,

1 − p + q > 0, q < 1.

From this, it follows that the following restrictions must hold for stability: q = (1 − ψ1)(1 − ψ2) − ψ1ψ2b2< 1,

1 + p + q = 1 − 2 + ψ1+ ψ2+ (1 − ψ1)(1 − ψ2) − ψ1ψ2b2> 0,

(12)

Where the first inequality holds since ψi ∈ [0, 1] and thus the first term is less

than one and the second term negative. The second inequality can be rewritten as ψ1ψ2− ψ1ψ2b2> 0 from which b2< 1 follows. The third inequality is implied

by the second, since it can be rewritten as 4 − 2(ψ1+ ψ2) + (1 − b2)ψ1ψ2 and

for any value of ψ1,2∈ [0, 1] the inequality holds if b2< 1.

It follows that invariant of the levels of inertia, stability of the interior equi-librium is solely dependent on the parameters β and δ that define the value of b. The stability threshold coincides with condition one for a type II duopoly. This implies that whenever boundary equilibria appear (|b| < 1), the interior equilibrium becomes unstable.

Boundary Equilibria When examining the boundary equilibria, analysis is done in either subsets 2,4,6 or 8 with accompanying Jacobian matrices:

J(4)= J(8)=1 − ψ1 0 −ψ2b 1 − ψ2  , J(2) = J(6)=1 − ψ1 −ψ1b 0 1 − ψ2  .

Applying the same strategy as for the interior equilibrium, the characteristic equation (1 − ψ1− λ)(1 − ψ2− λ) + b = 0 is written in the form λ2+ pλ + q = 0,

giving the following values for p and q: p = −2 + ψ1+ ψ2,

q = 1 − ψ1− ψ2+ ψ1ψ2.

Lemma 1 then yields the following conditions for stability: q = ψ1ψ2− ψ1− ψ2< 0,

1 + p + q = ψ1+ ψ2> 0,

1 − p + q = 4 − 2(ψ1+ ψ2) − ψ1ψ2> 0.

(7)

The first inequality holds since ψi ∈ [0, 1]. The second holds since ψi≥ 0. The

last inequality also holds, since ψi∈ [0, 1] and the left hand side is minimal for

ψ1,2 = 1. For this value, the first two terms cancel out but the third is still

positive. Note that the same results follow for all four Jacobian matrices, since they have the same characteristic equations.

The boundary equilibria are always locally stable, but are not fixed points of the map whenever condition one is violated. In effect, stability is transferred from the interior equilibrium to the boundary equilibria whenever there is a type II duopoly.

3.3

Stability of the Two-Cycle

Cox and Walker proposed the existence of an alternating equilibrium. When examining the value (qm, qm) and inserting it into the dynamical system (2) it

becomes clear that next period’s quantities will be (0, 0). Following this period will obviously be (qm, qm) again. This two-cycle will be analysed for stability

(13)

is rewritten as a two-period dynamical system by substituting qi,t= R(q−i,t−1),

which follows directly from the best-reply rule. This gives the following two-period mapping:

q1,t= R(R(q1,t−2)),

q2,t= R(R(q2,t−2)).

This system can obviously be expanded with the inertia parameter ψ. Similar to the previous sub-section and expanding on the findings of Bischi et al., the Jacobian matrix for the two-cycle will be defined and analysed.

Classical Best-Reply When setting the inertia parameter to ψ1,2 = 1, the

following Jacobian Matrix is found:

J = δR1(R2(q1)) δq1 0 0 δR2(R1(q2)) δq2 ! .

From the reaction function defined in (5) it becomes clear that these derivatives must both be zero when evaluated at the points (0, 0) and (M, M ) and thus the two-cycle for the classical best-reply model is stable.7

Best-Reply with Inertia For simplicity, inertia is considered to be identical: ψ1 = ψ2< 1. Whenever inertia is introduced, the fixed points of the two-cycle

’shrink’ along the diagonal. This can be understood intuitively, since it still holds that R(0) = qm, but the reaction is influenced by the inertia. For instance

when (0, 0) is played, the reaction will be (ψqm, ψqm) instead of (qm, qm). This

means that two new fixed points define the two-cycle. These points no longer benefit from the before mentioned unconditional stability, since the derivative of the reaction function evaluated in that specific point is no longer zero. In order to analyse the two-cycle in this case, the dynamical system is rewritten in the following form:

q1,t+1= (1 − ψ)2q1,t−1+ (1 − ψ)ψR(q2,t−1) + ψR((1 − ψ)q2,t−1+ ψR(q1,t−1)),

q2,t+1= (1 − ψ)2q2,t−1+ (1 − ψ)ψR(q1,t−1) + ψR((1 − ψ)q1,t−1+ ψR(q2,t−1)).

(8) Whenever the system is evaluated in full form and the Jacobian matrix is ex-amined, the following eigenvalues are found8:

λ1= (ψ(b − 1) + 1)2,

λ2= (ψ + ψb − 1)2.

7Full derivation has been omitted, since the results are obvious. 8Note that the derivative at points between 0 and qmare always b.

(14)

One of these is strictly larger than one for any ψ, meaning the two-cycle is strictly unstable. However, when taking the derivatives of the system (8), it is found that the top row of the Jacobian gets the following form9:

J (1, 1) = (1 − ψ)2+ ψR0((1 − ψ)q2,t−1+ ψR(q1,t−1))R0(q1,t−1),

J (1, 2) = (1 − ψ)ψR0(q2,t−1) + ψR0((1 − ψ)q2,t−1+ ψR(q1,t−1))R0(q2,t−1).

This matrix ends up yielding the same results as presented in the previous case, except when the term (1 − ψ)q2,t−1+ ψR(q1,t−1) is less than zero or more than

the monopoly quantity. When this is the case, it follows from (5) that the derivatives in these points are zero, yielding the following Jacobian matrix10:

J = (1 − ψ)

2 (1 − ψ)ψb

(1 − ψ)ψb (1 − ψ)2

 , with accompanying eigenvalues:

λ1= (ψ − 1)(ψ + ψb − 1),

λ2= −(ψ − 1)(ψ(b − 1) + 1).

These eigenvalues lie within the unit plane and thus the two-cycle is stable. As such, the stability of the two-cycle depends primarily on the absolute value of (1 − ψ)q2,t−1+ ψR(q1,t−1) evaluated at the two-cycle. This two-cycle is defined

by (0, 0) and (q0, q0). The point (0, 0) is obviously still stable by the same reasoning as in the classical best-reply, making it only necessary to examine q0, which is given by11:

q0= ψa. (9)

From this it is finally possible to define the stability of the two-cycle as a function of the parameter a, b and ψ. By inserting the non-zero fixed point q0 into (1 − ψ)q2,t−1+ ψR(q1,t−1) the stability of the two-cycle is given whenever:

(1 − ψ)ψa + ψ(a − bψa) ≤ 0. (10)

Closed Set for Symmetric Starting Values This paragraph is added to note the following: for any best=reply duopoly with identical levels of inertia (ψ1 = ψ2), the equality q1,t = q2,t at any time period t can only have been

reached if q1= q2in every previous time period and will continue to be so into

infinity.

Lemma 2 For any duopoly where two firms apply identical behaviour xt+1= f (xt) and choose identical starting values, the firms will experience the

same trajectory. As such, if q1,t= q2,tfor any period t it is the case for every t.

This means that there will always be a closed set that will never reach any of the boundary equilibria.

9Note that due to symmetry, the bottom values will be mirrored.

10Note that the quantities need to exceed qmor be negative in order for the term to be. 11This is found by inserting (q0, q0) into the dynamical system and equating to (0, 0).

(15)

4

Gradient Learning Dynamics

In this section gradient learning will be introduced. First the model will be defined and the reasoning behind the strategy will be touched upon briefly. Nash equilibria of the previous sections will be examined and verified that they are fixed points of the gradient mapping. When this is the case, their stability will be analysed accordingly, applying a similar strategy as in the previous section.

4.1

The Model

Gradient learning is focused around the slope of a firm’s profit function, con-ditional on the firm’s strategic variable. This means that if there was a higher profit to be found last period, the firm will change its quantity in that direction. The profit function in a Cournot duopoly, discussed in section two, is given by:

πi(qi, q−i) = (α − β(qi+ q−i))qi− (γqi+ δq2i),

the derivative of which is given by: δiπi(qi, q−i)

δiqi

= α − βq−i− 2βqi− γ + 2δqi. (11)

The amount of change towards this profit-increasing direction is determined by the slope of the derivative times the previous quantity times some parameter φ, that denotes the speed of adjustment.12 This yields the following dynamical

system for a homogeneous gradient-learning duopoly:

q1,t+1 = q1,t+ φq1,t(α − 2q1,t(β + δ) − βq2,t− γ),

q2,t+1 = q2,t+ φq2,t(α − 2q2,t(β + δ) − βq1,t− γ).

Fixed Points of the System It is first of importance to investigate if the interior equilibrium and the boundary equilibria are fixed points of the system. If they are not, it would be useless to examine their stability. When inserting those values of q1,t and q2,t that maximise profit (q1,t = R1(q2,t)) it is obvious

that the derivative will be zero, since the best-response gives the value that maximises profit conditional on the other’s output. This effectively means that for every quantity for which qi = R(q−i) there will be no further adjustment.

Since the boundary equilibria and the interior equilibrium are intersections of the reaction curves, this means that they are necessarily also fixed points of the gradient mapping.

4.2

Stability of the Nash Equilibria

To evaluate the stability of the equilibria (interior and boundary), the Jacobian matrix will again be determined and it’s eigenvalues examined. For the interior

12This obviously has some restrictions on the parameter values, i.e. if the derivative would

(16)

equilibrium, this paper will once again follow the strategy applied by Bischi et al. in their 2010 book [2] and will then continue to verify the stability of the boundary equilibria.

Firstly, note that it is not necessary to divide the strategy space into sub-sets, since there are no kinks in the function. Evaluating the partial derivatives yields the following Jacobian matrix for the gradient-learning duopoly:

1 + φ(α − βq2− γ − 4q1(β + δ)) −βφq1

−βφq2 1 + φ(α − βq1− γ − 4q2(β + δ)

 .

Interior Equilibrium When evaluating the Jacobian matrix at the interior equilibrium (q∗1, q2∗), the quantity q2∗ can be rewritten as a function of q∗1, since it necessarily holds at the equilibrium that q2 = R(q1). Substituting this into

the Jacobian and rewriting gives the following matrix:

J |q∗ 1,q2∗ 1 − 2φ(β + δ)q1∗ −φβq∗1 −φβq∗ 2 1 − 2φ(β + δ)q∗2  .

Applying lemma 1 gives the following values for p and q: q = (1 − 2φ(β + δ)q∗1)(1 − 2φ(β + δ)q∗2) − φ 2

β2q∗1q∗2,

p = −2 + 2φ(β + δ)q∗1+ 2φ(β + δ)q∗2.

For the eigenvalues to lie within the unit plane, the following conditions must hold: q = (1 − 2φ(β + δ)q1∗)(1 − 2φ(β + δ)q2∗) − phi2β2q1∗q∗2< 1 q + p + 1 = 4φ2(β + δ)(β + δ)q∗1q∗2− φ2β2q∗1q∗2> 0 q − p + 1 = 4 − 4(φ(β + δ)q1∗+ φ(β + δ)q∗2) + 4φ2(β + δ)2q2∗φδ)q1∗q2∗− φ2β2q∗ 1q ∗ 2 > 0.

These conditions can be rewritten in the following forms q = (1 − 2φ(β + δ)q1∗)(1 − 2φ(β + δ)q2∗) − φ2β2q1∗q2∗< 1 q + p + 1 = 4(β + δ1)(β + δ2) > β2 q − p + 1 = φ2q∗1q∗2(4(β + δ)(β + δ) − β2) − 4((β + δ1)q∗1φ + (β + δ)q ∗ 2φ − 1) > 0 (12) It follows automatically that the second condition is identical to condition one for a type II duopoly, since inserting the threshold value of δ = β2 there is equal-ity of the right- and lefthand side. From this, condition one automatically holds as well. Considering the value of p, however, shows that for certain parameter values this value is positive and as such may violate the third condition. This introduces a parameter restriction on φ, corroborating the intuitive problems that may arise when adjustment is too fast. For this reason, the parameter φ is set to the conservative value of φ = 0.01 for the remainder of the paper. Bischi et al. find that the threshold value for stability is found at:

(17)

φ1,2≤

1

(β + δ)q∗. (13)

The parameter value of φ = 0.01 will be shown to fall within this range for the coming analyses.

Boundary Equilibria When evaluating the boundary equilibria (q1m, 0) and

(0, q2m), the Jacobian gets the following form:

J |qm 1,0= 1 + φ(α − γ − 4qm 1 (β + δ)) −βφq1m 0 1 + φ(α − βqm 1 − γ)  , J |0,qm 2 = 1 + φ(α − βqm 2 − γ) 0 −βφqm 1 + φ(α − γ − 4qm 2(β + δ))  . (14)

Since one of the non-diagonal elements of the matrices is equal to zero, the eigenvalues follow directly from the diagonal elements of the matrices. The following restrictions for stability follow from these diagonal elements:

λ1= 1 + φ(α − γ − 4q1m(β + δ)) < 1,

λ2= 1 + φ(α − βqm1 − γ) < 1.

When inserting the monopoly quantity derived in (2.1), the inequalities become: λ1= 1 − φ1(α − γ) < 1,

λ2= 1 + φ2(α − γ −

β(α − γ) 2(β + δ)) < 1.

The first inequality holds automatically, since by design α − γ > 0. The second inequality holds whenever 2(β+δ)β > 1, which is once again condition one for a type II duopoly. This means that once again the stability is transferred from the interior equilibrium to the boundary equilibria for sufficiently decreasing marginal costs.

Closed Set for Symmetric Starting Values Note that for the gradient learners the same holds as for the best-reply: whenever behaviour is similar, the trajectories will be identical for identical starting values.

(18)

5

Analysis and Basins of Attraction

In this section time-sequences and basins of attraction will be determined for the homogeneous duopoly with best-reply learners and with gradient learners. After this homogeneous analysis, the next section will continue with the introduction of heterogeneous learning rules. The results found in this section will be a baseline for further analysis in the following section.

5.1

The Model

In order to analyse the basins of attraction, the following parameter values are set: α = 450, β = 30, γ = 275,δ = −11 for the type I duopoly and δ = −17 for the type II duopoly. Furthermore there is a capacity limit of L1 =

L2 = 7.5. Note that the cost functions are identical and as such there will be

identical reaction functions and a symmetric interior equilibrium. Also note that condition one holds whenever δ = −17 and as such there is a type II duopoly. Inserting the parameter values into (2.1) for the type II case gives the following reaction functions: Ri(q−i) = 450 − 275 2(30 − 17)− 30 2(30 − 17)q−i, ⇔Ri(q−1) = 175 26 − 15 13q−i.

From equation section 2 it follows that the interior and boundary equilibria are given by: (q1∗, q∗2) = (25 8 , 25 8 ), (q1m, 0) = (175 26 , 0), (0, q2m) = (0,175 26 ).

These equilibria will be analysed, since they have been shown to be fixed points of both the best-reply map and the gradient map.

5.2

Best-Reply Learning

In this sub-section the Nash equilibria in a type II duopoly are checked for stability. After all stable equilibria have been defined, basins of attraction are examined for differing parameter values of the inertia level ψ. Note that the threshold values that are used to divide the strategy space in subsets are not all positive. The value for a−Li

b is obviously negative for the chosen

param-eter values and as such the strategy space [0, 7.5] × [0, 7.5] is divided in just four areas. This means that only areas D1, D4, D5 and D6 in figure 2 are of

interest. Examination of the type I duopoly case has been omitted for the best-reply learners, since it has already been shown by Cournot that the interior equilibrium is globally attracting [4].

(19)

Basins of Attraction Analysis of the basins of attraction will show exactly how ’local’ the stability of the equilibria is, meaning exactly what set of starting values will converge towards the different equilibria. Since the model is a type II duopoly, the boundary equilibria and two-cycle are stable equilibria and thus will be analysed. Analysis is done for the classical Best-Reply model ψ1 = ψ2 = 1

and the best-reply model with inertia, where inertia is equal: ψ1= ψ2< 1.

Basins for the Classical Best-Reply Model (ψ1 = ψ2 = 1) When

setting the inertia-parameters to one, effectively creating a classical best-reply learning model, four rectangular basins appear. The top left and bottom right go to the boundary equilibria they represent, while all other starting values in the strategy set [0, 7.5] × [0, 7.5] converge to the alternating equilibrium. See figure 3d for a graphical representation of the basins, figure 3b for the time sequence of q1,t for starting values in the top left, figure 3c for starting values

in the bottom right and 3a for symmetric starting values.

(a) Time-sequence for q1,t and q2,t for starting values of (2.5, 2.5)

(b) Time-sequence for q1,t for starting values of (6, 1)

(c)Time-sequence q1,tfor starting values of (1, 6) (d)light-grey for (qm , 0) and ocher for the two-cycle.Basins of attraction. Dark-grey for (0, qm ),

Figure 3: Homogeneous duopoly with best-reply learners for α = 450,β = 30,γ = 275, δ = −17 and ψ1,2= 1

Basins for identical levels of inertia (ψ1= ψ2 < 1) When the levels

of inertia are decreased and adjustments are effectively ’slowed’, stability of the boundary equilibria remains the same, as is clear from the conditions set forth in (7). The basins, however, increase significantly for decreased adjustment. This increase must coincide with a decrease of the basins for the two-cycle. Inserting the values for a and b into the stability restriction for the two-cycle, given by (10), shows that the threshold value for the stability of the two-cycle is given

(20)

by ψ ≥1314 ≈ 0.9286. Note that this is corroborated by the bifurcation diagram in figure 4d.

In figure 4a the basins are represented for ψ = 0.98, figure 4b represents ψ = 0.93, which is just before the convergence swap and figure 4c is after the threshold. In figures 4e and 4f, time-sequences for starting values in the basin of the cycle are shown. It is clear that the non-zero fixed point of the two-cycle has decreases from the monopoly quantity and when the inertia parameter passed the critical value of ψ ≈ 0.9286, the two cycle ceases to exist. Also note the basin converging towards the interior equilibrium in green. This convergence is due to the diagonal set that has been shown to be closed. The ’basin’ is a subset of the diagonal q1= q2 and thus does not fulfil the properties of stable

point.

(a) Basins of Attraction for ψ1,2 = 0.98. Dark-grey for (0, qm ), light-grey for (qm , 0) and ocher for the two-cycle.

(b) Basins of Attraction for ψ1, 2 = 0.93. Dark-grey for (0, qm ), light-grey for (qm , 0) and ocher for the two-cycle.

(c) Basins of Attraction for ψ1,2 = 0.91. Dark-grey for (0, qm ), light-grey for (qm , 0) and green for the interior equilibrium.

(d) Bifurcation analysis on q1 for differing levels of ψ.

(e) Time-sequence for starting values of (2.5, 2.5) and ψ1,2 = 0.93.

(f) Time-sequence for starting values of (2.5, 2.5 and ψ1,2 = 0.92.

Figure 4: Basins of attraction and time-sequences for differing levels of inertia ψ1,2

(21)

5.3

Gradient Learning

For the gradient learners the same strategy will be implemented as for the best-reply learners. As has become clear from section 4, the adjustment parameters are bounded and as such analysis will be done for φ = 0.01 unless specified differently. Note that inequality (13) holds for φ = 0.01.

Basins of Attraction First of all, a short analysis of the type I duopoly for gradient learning will be discussed. This will illustrate the inherent divergent tendencies that gradient learners experience, defined by φ. After that the more interesting boundary equilibria are examined. Note the bifurcation diagram in figure 5a illustrating the instability threshold dependent on the parameter value of δ. Analysis will be done in the right-hand part of the diagram first, after which the left-hand part is examined.

The Case for δ = −11 Note that for δ = −11, condition one is violated and there is a type I duopoly present, as was illustrated by the bifurcation diagram in figure 5a. In figure 5b the basins of attraction for the interior equi-librium are shown. It becomes clear that for starting values in a significant proportion of the strategy set, there is divergent behaviour. This reiterates the importance of starting values for the gradient type learners. This is in stark contrast to the best-reply learning under type I duopoly, where the interior equilibrium is globally attracting. It becomes immediately clear that the size of this basin has to be taken into account at all times, especially in a heterogeneous context. Whenever the gradient learners diverges, they will spint out of control and effectively destroy themselves. When facing different learners, this might result in an asymmetrical equilibrium that resembles the boundary equilibria, where the gradient learners exploded and produce zero and the other learner can produce the monopoly quantity.

The Case for δ = −17 For the case where condition one holds and there is a type II duopoly, it has already become clear that the boundary equilibria are stable. This is illustrated by time-sequences in figure 5c. Fixing the adjustment speed at φ = 0.01 and inspecting the time-sequences for different starting values motivates four different basins, as is shown in figure 5c. The attracting sets are shown in figure 5d. The basin to infinity is once again present, but the convergent part is split into two equal basins along the diagonal. Note the thin green line, representing the set that converges to the interior equilibrium, similar to the best-reply learning.13 It is found that equally spaced basins arise for the

boundary equilibria, but diverging tendencies are still present in the type II duopoly and have to be taken into account.

13This convergence is obviously ’cut off’ by the divergent basin. There will still be equal

(22)

(a)Bifurcation for asymmetrical starting val-ues (q1, q2), where the blue line represents the lesser value of the two.

(b)Basin of attraction for ψ1,2 = 0.01. The light-grey basin attracts towards the interior equilibrium and the white basin diverges.

(c)Time-sequences for δ = 17. Blue for q1 = q2 = 2.5, Red and green for q1,0 = 3 and q2,0 = 2 where the red line is q1 and the green line is q2. Purple for starting values of q1,0 = q2,0 = 5.

(d) Basin of attraction for (0, qm ) in dark-grey, (qm , 0) in light-grey and the diverging basin in ocher. The thin green line is the closed set q1 = q2 that diverges towards the interior equilibrium.

Figure 5: Homogeneous duopoly with gradient learners for α = 450,β = 30,γ = 275, δ = −17 or −11 and ψ1,2= 1

6

Heterogeneous Heuristics

In this section the implications of a type II duopoly are examined for heteroge-neous heuristics. The simple case where a single best-reply firm plays a single gradient firm is first analysed. Jacobian matrices will be defined and basins of attraction examined. Then a more general setting is explored, where an infinite population of firms is defined. In this population both best-reply and gradient learners will be active. Every-time period, two firms are randomly selected from this population to play a duopoly game. The population is divided in two frac-tions, one for the best-reply and one for the gradient. At first these fractions will be kept fixed, after which the possibility to switch behaviour is introduced. The dynamics of this switching will be modelled through the replicator dynam-ics. The classical best-reply model will be examined for simplicity, instead of the best-reply with inertia model.

6.1

The Simple Heterogeneous Duopoly

Analysis will first be done for the simple case, where a single gradient firm plays against a single best-reply firm. This game yields the following dynamical

(23)

system: qt+1g = qtg+ φq g t(α − 2q g t(β + δ) − βqct− γ), qt+1c = R(qgt).

In this system, qgwill be the quantity produced by the gradient firm and qcthe

quantity produced by the best-reply firm. Note first of all that the boundary equilibria are once again fixed points of the mapping. When inserting (qm, 0)

or (0, qm), identical values follow. The interior equilibrium is also once again a

fixed point of the mapping. This motivates the analysis of the Jacobian in these three points. The Jacobian matrix is given by:

J =−φβq g t 1 + φ(α − βqct− γ − 4q g t(β + δ)) 0 R0(qg)  .

Evaluation at the first boundary equilibrium of (qm, 0) yields:

J |(qm,0)=

−φβqm 1 + φ(−(α − γ))

0 0

 .

Note due to the discontinuity of the reaction function, the second diagonal ele-ment is also zero when evaluating the boundary equilibrium. This automatically implies stability of the boundary equilibria, since λ1,2= 0. Since the derivative

of the reaction function is also zero when qc

t = qm, by the same reasoning the

other boundary equilibrium is stable.

For the interior equilibrium the following Jacobian is found:

J |(q∗,q)=

−φβq∗ 1 + φ(α − βq− γ − 4q(β + δ))

0 −b

 , with eigenvalues equal to:

λ1= φβq∗,

λ2= −b.

Implying that for sufficiently low values of the adjustment speed parameter φ there will be stability of the interior equilibrium for |b| < 1. Once again it is condition one that defines the attracting equilibrium. In type I duopolies, the interior equilibrium is stable and in type II duopolies the boundary equilibria.

6.1.1 Analysis and Basins of Attraction The same parameter values as in section five are set.

The Case for δ = −11 As becomes clear from figure 6, the interior equilib-rium is stable and attracts a large portion of the strategy space. Note, how-ever, the dark-grey basin that converges to the boundary equilibrium where the best-reply firm produces the monopoly quantity and the gradient firm produces

(24)

nothing. This is due to the inherent diverging nature of the gradient learners, as discussed before. This illustrates a case where the boundary equilibrium occur ’by divergence’. They are not organically converged upon, as was not expected either since the duopoly was of type I.

(a) Time-sequences for starting values (qc0 , qg0) of (2.5, 2.5) where qc is the blue line and qg the red and time-sequences for starting values (5, 5) where qc is the green line and qg is the purple line.

(b)Basin of attraction for the interior Nash equilibrium (q∗ , q∗ ) and the boundary equi-librium (qm , 0).

Figure 6: Basins of attraction and time-sequences for a type I heterogeneous duopoly, where a best-reply firm plays a gradient firm.

(a) Time-sequences for starting values (qc0 , qg0) of (1, 6) where qc is the blue line and qg the red and time-sequences for starting val-ues (2.5, 2.5) where qc is the green line and qg is the purple line.

(b)Basin of attraction for the boundary equi-librium (qm , 0) in light grey and the boundary equilibrium (0, qm ) in dark-grey.

(c)Basin of attraction for the boundary equi-librium (qm , 0) in light grey and the bound-ary equilibrium (0, qm ) in dark-grey and φ = 0.005.

(d)Basin of attraction for the boundary equi-librium (qm , 0) in light grey and the bound-ary equilibrium (0, qm ) in dark-grey and φ = 0.001.

Figure 7: Basins of attraction and time-sequences for a type II heterogeneous duopoly, where a best-reply firm plays a gradient firm.

(25)

The Case for δ = −17 As becomes clear from figure 7, the entire strategy space is divided between the two boundary equilibria. The basin for the gra-dient learners is significantly smaller than the basin for the best-reply learner, showing how easily the gradient will be overruled in a type II duopoly. When decreasing the adjustment speed parameter, however, it is found that these basins change significantly as becomes clear from the basins for φ = 0.005 and φ = 0.001. Notwithstanding, the adjustment speed parameters will remain fixed at φ = 0.01 for the remainder of the heterogeneous analysis, since the nature of the equilibria remain the same. Possible future studies could implement an evolutionary progress for the adjustment parameter, or an optimisation of the parameter.

6.2

Infinite Population Game and Population Dynamics

For the game with population fractions, an infinite population is defined. In this population there are two fractions, ηc for the best-reply learners and ηgfor

the gradient learners. Since these are the only heuristics in the population, it holds that: ηc= 1 − ηg.

As has been noted before, Gale (1995), Binmore(1997) and Schlag (1998) motivate the usage of replicator dynamics to model a learning, aspiring and/or imitation style population[7][1][14]. Switching from one heuristic to another is determined through two variables. Firstly a variable to represent the utility that a behavioural rule earns (expressed by Ui,t, which is the utility earned

from behavior i in time period t) and secondly a so-called evolutionary pressure variable θ. This variable introduces a more random allocation.

The utility function is slightly different from, but obviously related to the standard profit function. Firstly, the profits must be calculated for a hypothet-ical game against all the possible encounters that could occur and be weighted in order to find the average utility associated with a certain behavioural rule. The profit that is yielded by a firm using behavioural rule i, while playing a firm using behavioural rule j is illustrated as follows:

πi,j= P (qi+ qj)qi− C(qi).

The costs that might be associated with certain heuristics will be represented by the variable Ti, since it is intuitively clear that a more sophisticated information

gathering might inquire more costs than the ’simpler’ behavioural rules. This gives the utility function in full form:

Ui,t = K

X

j=1

[ηjt· πi,j,t] − Ti. (15)

The variable ηj,tcan be seen as the possibility that a firm with behavioural rule

i encounters a firm with behavioural rule j. πi,j is the profit associated with

(26)

The replicator dynamics governing the fraction ηi of firms using heuristic i are then defined as follows:

ηt+1i = (1 − Kθ) ηi tUi,t PK j=1η j tUj,t + θ, i = 1, . . . , K. (16)

It becomes immediately clear that an increase in Ui,tincreases the new fraction

of firms using that heuristic, but larger values of θ decrease this effect. Since the evolutionary analysis will only be done with Best-Reply and Gradient players, K=2. The two profit functions are given by:

πct(qct, ηc) = P (Q)qct− C(qct) − Tc,

πgt(qgt, ηg) = P (Q)qtg− C(qtg) − Tg. Where Q = ηc

tqct + ηgqg and πc the profit for the best-reply learners and πg

the profit for the gradient learners. Inserting these into (16) gives the following relation, expressed in terms of q, θ and η:

ηt+1c = (1 − 2θ) ηc,t(π c t(ηcqtc, ηgq g t) − Tc) ηcc t(ηcqtc, ηgq g t) − Tc) + ηg(π g t(ηcqct, ηgq g t) − Tg) + θ.

It will be this dynamical system that is added to the previous systems when evolutionary population fractions are modelled.

6.3

The Model with Fixed Population Fractions

In this sub-section the model will be restricted to fixed fractions of the two behavioural rules. A significant implication of the infinite population game must be noted immediately: because all the gradient learners and best-reply learners are playing the single quantity qgt and qc

t respectively, it is possible for a firm to

play against ’one of his own kind’. This means that the boundary equilibria are no longer fixed points of the map. Whenever the boundary equilibria are played at any time period, it will immediately be weighted by the population fractions. The question arises what the implications of a type II duopoly in contrast to a type I then is.

Combining the gradient and best-reply learning in an infinite game yields the following dynamical system:

qct+1= R(q g

tηg+ qtcηc),

qgt+1= qgt + φqtg(α − 2(β + δ)qgt − β(qtgηg+ qctηc) − γ). (17) Note the addition of the ’own’ fraction in the dynamical system that destroys the boundary equilibria as fixed points. Taking derivatives yields the following Jacobian matrix: J =  −ηcb −ηgb −φqgβηc 1 + φ(α − βqcηc− 4(β + δ)qg− 2βqgηg− γ)  .

(27)

For the Jacobian matrix evaluated at the boundary equilibria, note that the first row is equal to zero and as such the boundaries would be stable if they had been fixed points of the mapping. Because there is no obvious stability transfer when a type II arises, the following sections will examine if there are any real implications left.

Implications of Heterogeneous Heuristics From the above mentioned model it becomes clear that the boundary equilibria cease to exist in every model where an infinite population is defined and firms have to take their own quantity into account. The main reason for this is, that whenever firms obtain a dynamical system where their own quantity plays a role that is not cancelled out by design, the boundaries are simply no longer fixed points of the mappings. This means that analysis of the boundary equilibria is pointless. Analysis will be conducted on the difference between a type I and type II duopoly in order to see if there are any real implications left, since it has become clear that condition one does no longer imply the stability of boundary equilibria over the interior equilibria.

6.3.1 Analysis and Basins of Attraction

Basins of attraction and implications for differing parameters will be examined for the infinite population game with fixed population fractions.

The Case for δ = −11 First the population-fractions will be fixed at 0.5. Time-sequences show that two possible equilibria are found. First of all, the interior equilibrium that is obviously stable in a type I duopoly is found at (2.55, 2.55), which is the Cournot-Nash equilibrium. Another asymmetrical equilibrium (3.3, 0) is found. This asymmetrical equilibrium is established due to the inherent diverging nature of the gradient learners. The equilibrium is settled upon when the gradient quantity spins out of control and converges to zero. Note that in this case the best-reply learners consistently best-reply against their own quantity times their population fraction and it is not the monopoly quantity that is produced.14

Basins of attraction for these two equilibria are shown in figure 8. It becomes clear that the Nash equilibrium becomes more attracting for decreases in the adjustment speed. This is intuitive, since for lower adjustment speeds there will be less ’spinning out of control’ and as such the basin for the asymmetrical equilibrium decreases. This tendency to go to the asymmetrical equilibrium is once again illustrated, when briefly examining two bifurcation diagrams for the values of qc and qg on the parameter φ. It follows that for increased amount of φ, the same starting value that used to converge to the interior equilibrium will converge to the asymmetrical one. See figures 8d and 8e for starting quantities that are always in the light-grey area: (2.5, 2.5) and starting values that are in

(28)

either one of the basins, dependent on the values of φ: (1, 4). The parameter φ influences the stability and nature of the equilibria.

From figures 9a, 9b and 9c it becomes clear that besides the adjustment speed parameter φ, an increase in the amount of best-reply players will also change the basins of the two equilibria. Thus the basins for the interior equilibrium depend on both the population fraction ηcand the adjustment speed parameter φ. This

presents an interesting situation, since differing levels of the population fractions will obviously occur and fixing the adjustment speed parameter will not avoid starting values to change basins.

The influence of both ηc and φ motivate parameter-basin analysis. This is

further motivated by the bifurcations in figure 8d and 8e, showing that multiple cycles can arise for certain parameter values. Parameter-basin analysis will show for exactly which parameter values a cycle of a certain period is reached.15 These

parameter basins are illustrated in figure 10d for starting values of (2.5, 2.5), with corresponding time-sequences for the two-period, four-period and chaotic cycle in figures 10a, 10b and 10c. When fixing the parameter φ at 0.01 or less, the basins show that these multi-period cycles will not be reached. Two additional parameter basins are plotted for starting values of (1, 4) and (7, 7). It is found that due the potential divergence of (1, 4) the basins for higher period cycles is decreased. For the strictly diverging basin (7, 7) there are only one-period equilibria that can be converged upon. This is illustrated in figures 10e and 10f and follows from the fact that these starting values will strictly converge to the asymmetrical equilibrium for certain threshold parameter values.

(a)Basins of attraction for φ = 0.01 and ηc = 0.5.

(b) Basins of attraction for φ = 0.005 and ηc = 0.5.

(c) Basins of attraction for φ = 0.005 and ηc = 0.5.

(d)Bifurcation analysis of parameter φ and qct . (e)Bifurcation analysis of parameter φ and qgt.

Figure 8: Basins of attraction for ηc= 0.5 and differing levels of adjustment speed φ.

15Bear in mind that both the symmetrical and asymmetrical equilibria are equilibria of

(29)

(a)Basins of attraction for φ = 0.01 and ηc = 0.2.

(b)Basins of attraction for φ = 0.01 and ηc = 0.5.

(c)Basins of attraction for φ = 0.01 and ηc = 0.7.

Figure 9: Basins of attraction for differing levels of the parameter ηc, where φ = 0.01.

The dark-grey basin is for the asymmetrical equilibrium.

(a)Time-sequence for parameter values ηc = 0.5 and φ = 0.013.

(b)Time-sequence for parameter values ηc = 0.5 and φ = 0.016.

(c)Time-sequence for parameter values ηc = 0.5 and φ = 0.017.

(d) Parameter-basins for starting values (2.5, 2.5). A one-cycle in light-grey, a two-cycle in dark-grey, a three-two-cycle in green, a four-cycle in ocher and higher period cycles in blue. Non-converging behaviour is depicted in white.

(e) Parameter-basins for starting values (1, 6). A one-cycle in light-grey, a two-cycle in dark-grey, a three-cycle in green, a four-cycle in ocher and higher period cycles in blue. Non-converging behaviour is depicted in white.

(f) Parameter-basins for starting values (7, 7).A one-cycle in light-grey, a two-cycle in dark-grey, a three-cycle in green, a four-cycle in ocher and higher period cycles in blue. Non-converging behaviour is depicted in white.

(30)

The Case for δ = −17 When examining time-sequences for different start-ing values, it is found that both the interior and asymmetrical equilibrium are converged upon for ηc = ηg = 0.5. Figure 12d further shows that these are the only equilibria for φ = 0.01. It has followed from the previous paragraph that differing cycles can arise whenever φ and ηc vary. As before, three starting values will be examined that experience differing behaviour whenever φ = 0.01: (2.5, 2.5) is strictly converging, (7, 7) strictly diverging and (1, 6) is found to be either diverging or converging dependent on the value of ηc.16

Examining the parameter-basins for these three starting values, plotted in figure 11, all possible equilibria become immediately clear. For a fixed value of φ = 0.01 it is shown that either a two-cycle or a fixed point is converged upon. The most striking difference between the basins for the type I case and the type II case is that the two-cycle is stable for any parameter φ and for all types of starting values, as long as there is a sufficiently high value of ηc.

When examining the potentially diverging starting values (1, 6), it is found that the basin for the two-cycle remains intact but those for the multi-period cycles retreat. This is due to the same reasoning as in the type I duopoly. For the strictly diverging starting values, it is again found that there are no multi-period cycles left, except for the two-cycle.

In figure 12a, bifurcation diagrams are shown for the three above-mentioned starting values. These corroborate what was already illustrated in the parameter-basins: there is a large interval for ηc for which the two-cycle is stable, but it

retreats when the starting values near the diverging basin. When examining (2.5, 2.5), the threshold value of ηc for which the two-cycle disappear is found

at approximately ηc≈ 0.55. For the other two starting values, the same diagram

is initially plotted until ηcreaches a threshold value that push the gradient

learn-ers into divergence. For instance, bifurcation analysis of starting values (1, 6) is shown in figure 12b and 12f. For the larger part of the diagram, the exact same equilibria are found as in the converging case when looking at the best-reply quantities. The gradient quantity, however, is effectively nullified under the two-cycle. After the initial part, the bifurcation is once again similar to the converging case, but whenever the starting values enter the diverging basin, the asymmetrical equilibrium is immediately converged upon.17 When analysing the strictly diverging starting value (7, 7), there would have been outright con-vergence towards the asymmetrical equilibrium in the type I case. In the type II duopoly, however, the two-cycle remains stable until the threshold value of ηc≈ 0.86 is reached. From this value downwards, the asymmetrical equilibrium

is immediately converged upon.

16Note that in the type I duopoly, these values are slightly different since the market

pa-rameters are changed.

17Remember that in this case the best-reply learners solve the equation qc= R(ηc· qc) and

(31)

(a) Parameter-basins for starting value

(2.5, 2.5). (b)Parameter-basins for starting value (1, 6).

(c)Parameter-basins for starting value (7, 7).(d)0.5 for a type II infinite game duopoly, forTime-sequences for φ = 0.01 and ηc = starting values (2.5, 2.5) where blue is qc and red is qg and starting values (1, 6) where green is for qc and purple is qg .

Figure 11: Parameter-basins for differing starting values and time-sequence for a specific case for a infinite type II duopoly game.

(a)Bifurcation analysis of param-eter ηc and qct for starting values (2.5, 2.5)

(b)Bifurcation analysis of param-eter ηc and qct for starting values (1, 6)

(c)Bifurcation analysis of param-eter ηc and qct for starting values (7, 7)

(d) Basins of attraction for φ = 0.01 and ηc = 0.5

(e)Bifurcation analysis of param-eter ηc and qct for starting values (2.5, 2.5)

(f)Bifurcation analysis of param-eter ηc and qtgfor starting values (1, 6)

Figure 12: Bifurcation analysis for qgt and parameter η c

, for differing starting values and adjustment speeds of φ = 0.01.

(32)

6.4

The Model with Evolutionary Population Fractions

In this sub-section the model with fixed fractions is augmented with the pos-sibility to switch behaviour. These dynamics will be implemented through the replicator dynamics that have been discussed earlier. Analysis will be done for differing levels of costs for implementing a certain strategy and for differing levels of the evolutionary pressure variable θ.

In addition to the model specified in (17), the population dynamics rep-resented by (16) are added to the system. This yields the following three-dimensional system: qct+1= R(qtgηg+ qtcηc), qgt+1= qgt + φqtg(α − 2(β + δ)qtg− β(qtgηg+ qctηc) − γ), ηci,t+1= (1 − 2θ) η c tUc,t (πc t(qtc, ηct) − Tc) + (π g t(q g t, (1 − ηc)) − Tg) + θ.

This model will mainly be subjected to analysis of the population fractions, since it is known exactly what will happen for stable fractions of the populations from the previous section. The question that arises is: ’Will these population fractions converge or experience some sort of oscillating or even chaotic behaviour?’.

This model is an expansion on the analysis done by Droste et al. in their 2002 paper, where the best-reply learning was matched against rational players. In this paper, a type II duopoly was examined as well. The costs implemented on the rational heuristic where approximately 16of the total profit in the interior equilibrium, resembling T ≈ 21.167 for the parameters set forth in section five. Evolutionary pressure is examined on the interval θ ∈ [0, 0.5].

6.4.1 Analysis and Basins of Attraction

Analysis will be done for differing levels of the three parameters governing the population fractions: Tc, Tg and θ. First, evolutionary pressure and costs

re-lated to either gradient or best-reply are set to zero as a baseline example. Bifurcation analysis of the parameter ηc will then follow for differing values of

the costs, since it has become clear from the previous sub-section that depen-dent on the steady state of the parameter ηc the convergence can be predicted.

After examining the implications of different costs, the evolutionary pressure variable will be included in the analysis. It will be set to the conservative value of θ = 0.06, mirroring the pressure used in the analysis by Droste et al. in their paper. The adjustment speed parameter φ will be set to 0.01 and the inertia parameter ψ to 1. Analysis is done for the three types of starting values proposed in the previous sub-section: (2.5, 2.5),(1, 6) and (7, 7) for type II and (2.5, 2.5), (1, 4) and (5, 5) for type II.18

18In the simulations, costs are never zero but either equal or unequal. This is due to the

fact that the risk might arise that the denominator in the replicator dynamics is equal to zero whenever profits are zero.

Referenties

GERELATEERDE DOCUMENTEN

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

presenteerde reeds in 1953 een dispersieformule voor lucht op basis van metingen gedaan door Barrell en Sears in 1939 voor het NFL. Metingen uitgevoerd na 1953 wezen voort- durend

Christopher Wright (2010:74) says the following on the Old Testament and mission: “The prophets like the historians and the psalmists, focus most of the time on Israel in

When expressed as a ratio of 24 hour respiration to 24hr photosynthesis per plant, the sorghum plants showed a low response to growth temperature (Figure 1b)... Respiration of

An inquiry into the level of analysis in both corpora indicates that popular management books, which discuss resistance from either both the individual and organizational

A much different approach for probabilistic models is statistical model check- ing (SMC) [18,21,26]: instead of exploring—and storing in memory—the entire state space, or even a

The discovery of the expansion of the Universe led to the assumption of the initial state, the Big Bang, from which the Universe started its expansion. As both matter and radiation

Our study on how to apply reinforcement learning to the game Agar.io has led to a new off-policy actor-critic algorithm named Sampled Policy Gradient (SPG).. We compared some state