A learning model of online keyword auctions : the Genetic Algorithm applied to the first- and second-price auction

(1)

University of Amsterdam

A Learning Model of Online Keyword Auctions:

The Genetic Algorithm Applied to the

First-and Second-Price Auction

Author:

I.A.M. ter Laak

First Supervisor:

dr. T.A. Makarewicz

Second Supervisor:

prof. dr. J. Tuinstra

A thesis submitted in fulfillment of the requirements

for the degree of Master of Science (MSc)

in

Econometrics

Economics and Business department

(2)

Statement of Originality

This document is written by Inge Aleide Mathilda ter Laak who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Abstract

In this study a Genetic Algorithm (GA) optimization procedure is used to model individual learning of players in the repeated first- and second-price auction. The bidding strategies of the players in these auctions are studied, which are specified as a quadratic function (second-order Tay-lor approximation) of a player’s personal evaluation and a market signal about the evaluations of the other players. These strategies are updated for each player separately by GA operators in every auction round. The players were able to learn profitable strategies in case of the first-price auc-tion with a limited number of players and evaluaauc-tions that changed each auction period. In these auctions the players learned a bidding strategy that was a nonlinear positive function of the private evaluations and the market signal. It was even more profitable for the players to ignore the (imperfect) market signal and to bid according to a strategy that was a concave function of their personal evaluation only. The bidding behav-ior in the second-price auction remained volatile, even after convergence. Imposing budget constraints resulted in a clear distinction between per-formance of players, with in most cases only one player with a high final budget. The results are discussed in light of the current developments in online keyword auctions that are used by search engines.

(4)

1 Introduction

Online advertising is a form of marketing that has gained a great amount of popularity since the the increased availability of the internet. One advantage is that these type of advertisements can be targeted such that they are selectively displayed to customers based on particular traits. As a result, only the adds that are relevant for a particular customer are shown to this customer. This type of marketing is extensively used by internet search engines in keyword advertising. In this type of advertising companies pay the search engine for their add to be shown on top of the list of the search results of potential customers. The ranking of different adds is based on the outcome of an auction. The auction procedure is initiated each time a potential customer searches for a particular keyword and advertisers can place a bid on keywords that are relevant for their market-ing purposes. The bid that is placed represents the advertisers’ willmarket-ingness to pay per click that is generated by a particular customer on the advertisement (Liu, Chen, & Whinston, 2009). Recently, keyword advertising has become the most important type of online marketing with a total revenue of $49.5 billion in 2014, according to the annual report of the Interactive Advertising Bureau (PricewaterhouseCoopers, 2014).

The two leading search engines, Google and Yahoo!, both use a generalized second-price auction procedure in their keyword advertising. In this procedure the advertisements are listed on the search page of a customer according to a descending order of the placed bids. Each time a particular customer clicks on an add, the corresponding advertiser has to make a payment to the search engine of the amount that was bid by the advertiser one rank lower. Previous research has shown a primacy effect of ad rank on the performance of ad campaigns in which higher listed ads were associated with more clicks and sales (Jansen, Liu, & Simon, 2013). Advertisers will therefore be inclined to bid in such a way that they can attain one of the top rank positions, especially for keywords that are searched for by customers of their target group. In addition to the outcome of the auction, Google Adwords uses a quality score of an advertiser that is based on past Click-Through-Rates (CTRs) to determine the ranking of different advertisements (Liu et al., 2009; Vickrey, 1961). These CTRs are a measure the success of an online advertising campaign and are defined as the ratio of the number of customers to which the add was displayed (impressions) to the number of clicks on the link.

Previous research on keyword auctions has shown that single-slot auctions that use of a ranking system based on a CTR quality score are efficient (Liu & Chen, 2006) and that differentiated minimum bids in multiple-slot auctions can generate higher revenue compared to the efficient setting with equal minimum bids (Liu, Chen, & Whinston, 2010). These studies, however, are based on a one-period keyword auction, which is not realistic for the search-engine setting with a large amount of keyword auctions each day. Since an auction procedure starts every time a keyword is entered in the search engine, there are billions of auctions held by Google Adwords each month (GoogleAdWords, 2015). Arnon and Mansour (2011) designed a model in which agents repeatedly bid for mul-tiple identical items. The results showed that there always exists a pure Nash equilibrium if agents do not bid above their evaluations.

Besides the fact that in practice keyword auctions are held in a repeated set-ting, it is also the case that advertisers are often tied to a limited budget. It has

(6)

been shown that budgets have a significant influence on the strategic behavior of agents (Borgs, Chayes, Immorlica, Mahdian, & Saberi, 2005; Dobzinski, Lavi, & Nisan, 2012; Fiat, Leonardi, Saia, & Sankowski, 2011). These budget con-straints should therefore be taken into account in modeling keyword auctions. Shin (2015) showed that a budget constraint can cause advertisers to place the highest possible bid, that could be even higher than their evaluation, which con-tradicts the aforementioned findings of Arnon and Mansour (2011). This could be explained by the fact that for an advertiser that is on a tied budget, bidding the highest possible amount reduces advertising costs and in addition leads to a quick elimination of competing advertisers (Shin, 2015).

Most previous studies on keyword auctions use an open-bid auction. How-ever, the current auction procedures of both Yahoo! and Google are sealed-bid auctions in which information on bids of other advertisers is usually not com-mon knowledge (Kamijo, 2013). Therefore Kamijo (2013) modeled a repeated sealed-bid auction in which advertisers use trial bids to explore the bidding environment. With this trial bid, an advertiser temporary increases his bid to acquire information about bids of competing advertisers. This study shows that the sealed-bid setting generally results in higher bids than an open-bid setting, since exploring the bids of the competition requires an increase in bids which is possibly unnecessary. According to Kamijo (2013) a sealed-bid auction de-sign is more realistic than an open-bid auction, because in the former it can be investigated how bidders use their acquired personal experience from previous auction stages to adjust their bidding strategies. This trial bid is an important aspect from the study of Kamijo (2013), since it makes it possible to model how advertisers learn from their bidding experience. This learning aspect makes the study more realistic, since advertisers will use all the available information to learn an optimal bidding strategy.

The fact that information is not always available in practice makes it impos-sible for advertisers to always make rational decisions based on all past infor-mation and experience (Ching, Erdem, & Keane, 2013). In the last decades the idea of rational agents that maximize utility based on perfect, deductive reason-ing is challenged by many studies (see for an overview Plott, 1987) showreason-ing that many problems are too complex and that human reasoning is bounded (Arthur, 1994). Instead of exploring all possible options, agents often base their deci-sions on information from their own experience (Smith, 1991). Learning models are used to describe how agents make decisions based on experience (see for an overview of different types of learning models Weibull, 1997) and are there-fore suitable to study keyword auctions, in which generally not all information is available to the advertisers. For example, in keyword auctions there is no available information about evaluations of the other advertisers. Consequently, advertisers have to base their strategies on other available information like a signal from the market about the other advertisers’ evaluations. This market signal could for example be based on other information about the type of goods that they are advertising or outcomes of previous auction rounds. Although this market signal is an imperfect measure of how the other advertisers value the keyword, it is still informative about their strategies. Hence, using a market signal to model strategies of advertisers makes it possible to model imperfect information in the keyword auction setting.

Learning models have been applied to study auctions in general, but there exist very few applications to keyword auctions. Learning models are based

(7)

on the idea that players use their past experience to learn how to play in such a way that maximizes their utility (Tabandeh & Michalska, 2009). One of the first models that applied learning to auctions was proposed by Hon-Snir, Monderer and Sela (1998), which used generalized fictitious play. In this model the different (fixed) evaluations of players are determined before the auction starts. Then a repeated first-price auction procedure is initiated and at the end of each round the players observe the placed bids. In this study the players eventually learn the evaluations of the other players and the game converges to the Nash Equilibrium of a one-shot auction in which the types are known. Gaigl (2009) applied a similar model to both a first- and second-price auction with the extension that at the end of each auction round there is a probability for each players’ type to be revealed to the other players. In this model it is assumed that players learn through assessing the available (limited) information of the recent past, which is a restricted version of fictitious play in which players make decisions based on all past actions of competing players (Gaigl, 2009).

Other learning models that are used to model learning in auctions are re-inforcement learning in which agents interact with the environment to discover which actions yield the highest rewards (Bagnall & Toft, 2004) and evolutionary stochastic learning (Tabandeh & Michalska, 2009) that applies machine learning techniques to estimate the optimum bid in auctions. Finally, replicator dynamic methods model learning in such a way that the number of players that plays a certain strategy grows proportionally to its success. These type of models stem from evolutionary game theory which is based on the idea that certain strategies evolve because they are evolutionary rewarding (Weibull, 1997).

In this study another type of learning model that is related to evolutionary game theory will be applied to the auction setting. In this model individual learning will be modeled based on a Genetic Algorithm (GA) model proposed by Holland (1970). The GA is a stochastic optimization procedure which makes it possible to model individual learning. This procedure is based on evolution-ary operators like procreation, mutation, crossover and election, which appear in the process of DNA adaptation (Anufriev, Hommes, & Makarewicz, 2015). According to Hommes and Lux (2013) GA is a ’smart’ algorithm that is suitable for modeling learning by humans, because it provides tools to model learning experiences in new situation of which a person does not have a full understand-ing. Another advantage of GA is that it allows for individual learning in which agents learn solely from their own experience.1 _{This property makes it possible} to model heterogeneous learning by agents (Vriend, 2000). The assumption of individual learning is not unrealistic, since most people learn primarily from their own experiences and mistakes (Chernomaz, 2014). Previous research on the applications of GA to economic settings has shown that this optimization procedure provides a good approximation of how agents learn to anticipate on fluctuations in exchange rates and to successfully use arbitrage opportunities (Arifovic & Gencay, 2000). Moreover, Hommes and Lux (2013) showed that GA can model individual heterogeneous behavior in price forecasting experi-ments. In their study the researchers modeled the strategies of the agents as a function of both a constant and a price trend, which was an extension to previ-ous models that used GA to optimize a single parameter (i.e. price or exchange

1_{As opposed to social learning in which information of other strategies and their relative}

(8)

rate). Anufriev et al. (2015) state that an important advantage of the GA is that it is stochastic and nonlinear, which makes it applicable to optimization problems that occur in auctions. Previous research on the application of GA to auctions has shown that in an asymmetric first-price auction players benefit from the ability to learn non-linear strategies (Chernomaz, 2014).

The design of the study of this paper is based on M-C numerical simulations that will be used to investigate the dynamics in sealed-bid first- and second-price auctions. The individual learning aspect of the bidding behavior of players will be modeled by the stochastic GA procedure. In the first study both types of auctions will be investigated to examine what the differences are in terms of convergence of placed bids, the learned strategies, the profit of the winning player and the revenue of the auctioneer. Two types of strategy functions will examined in order to investigate which strategy yields the highest profit for the players. In the first strategy the placed bid of a player is a quadratic function (second-order Taylor approximation) of his personal evaluation as well as on an imperfect market signal about how the other players value the good that is on auction. The other strategy function is a quadratic function of a player’s private evaluations only. The evaluations of the players are either fixed or they change every auction round. The placed bids will be compared to the Nash Equilibrium to investigate whether GA updating results in a Nash solution. In the second study, budget constraints of the players will be incorporated to provide a more realistic environment, building upon the aforementioned studies on budget constraints in keyword auctions (Arnon & Mansour, 2011; Shin, 2015). It will be investigated how imposing the budget constraints will influence the learned strategies of the players. Moreover, the different types of auctions will be compared in terms of auction duration and final budgets of the players. The design of this study extends the literature on individual learning in (keyword) auctions in the following ways. By applying the GA to auctions, this study proposes a new way to model individual learning in (keyword) auctions and to study the dynamics of bidding behavior. By incorporating a learning aspect in the auction model, this study will be a more realistic representation of keyword auctions compared to the aforementioned studies. Similar to Kamijo (2013) players will be able to extract a limited amount of information from the auction and will select smart bids accordingly. Besides, applying GA to both a first- and a second-price auction model allows for comparison between learning in these two types of auctions. This will provide insight into whether it is beneficial for search engines to use a second-price auction procedure in their keyword advertising. Moreover, the GA updating in this study is more complicated compared to previous studies in the sense that it is used to optimize a strategy that is a function of multiple first- and second order parameters (of the personal evaluation and the market signal) instead of optimizing only one specific parameter, similar to the aforementioned study of Hommes and Lux (2013). This makes it possible to study GA optimization of a more challenging strategy function. Furthermore, using GA to model individual learning provides a good tool to study realistic bidding behavior in online auctions for which no (costly) experiments are required. Finally, the fact that GA is is stochastic and highly nonlinear (Anufriev et al., 2015) makes it applicable to learning in auctions, in particular to learning in the second-price auction in which the profit is not a linear function of the placed bids.

(9)

strategies in both types of auctions in case of changing evaluations. However, the convergence time for the bids was higher in the second-price auction compared to the first-price auction. Players were able to learn profitable strategies that were based on both personal evaluations and the market signal, but only when the number of players in the auction was not too high and the evaluations of the players were different in each time period. In that case the learned strategy was a nonlinear positive function of the personal evaluation and the market signal. However, the most profitable strategy was a positive, concave function of a players personal evaluation only (without accounting for the market signal). The players were able to generate a higher profit in the first- compared to the second-price auction, partly because the bidding behavior in the second-price auction remained volatile. This resulted in higher profits for the auctioneer in the second- compared to the first-price auction. The results of a second study with budget constraints showed that the final budget of the winning player was highest for random evaluations, with in the four-player first-price auction the highest final budget of the winning player. In case of random evaluations, the winning player was able to distinguish himself from the competing players by a final budget that was proportionally higher than that of the competing players. The paper is organized as follows. In Section 2 the model of the study is dis-cussed with a further explanation of the GA procedure, the specification of the strategy function and the corresponding equilibrium concepts, the specification of the simulation procedure and the expectations. In Section 3 the results of Study 1 without the budget constraints are presented. In Section 4 an exten-sion to the model is studied in which a budget constraint is incorporated in the model and the corresponding expectations and results are discussed. Finally the results will be examined in the light of the existing literature on keyword auctions by an overview of the contributions and limitations of this study and the managerial implications in Section 5, followed by the References and the Appendix.

2 Model

2.1 Profit function

In the first-price auction2 _{the winning player has to pays the amount of his} placed bid and therefore his profit is defined as this bid subtracted from his personal evaluation. In case of a tie, both of the players with the highest bid receive both half of this profit and the remaining players do not receive any profit. Therefore the profit in the FPA is specified as

π_hi,f pa=      si− bi if bi> maxj6=ibj 1 2(si− bi) if bi= maxj6=ibj 0 otherwise, (1)

with bi the bid placed by player i, maxj6=ibj the highest bid that is placed by the other players except for player i. In the second-price auction3 _{the winning} player pays the second-highest bid in the auction and therefore his profit is

2_{henceforth FPA} 3_{henceforth SPA}

(10)

defined as πi,spa_h =     

si− maxj6=ibj if bi> maxj6=ibj 1

2(si− maxj6=ibj) if bi= maxj6=ibj

0 otherwise.

(2)

2.2 Strategy function

The strategy function is specified as a quadratic function (second-order Taylor approximation) of a player’s personal evaluation si and the market signal s−i and is given by

β_i,th = ah_0,t+ ah_1,ts−i+ ah2,tsi+ a3,ts2−i+ a h 4,ts 2 i + a h 5,ts−isi, (3) where βh

i,t is the bid (strategy) that is recommended to player i at period t. The index ’t’ of the bidding function signals that the strategy changes over time and index ’h’ will be explained later. The market signal s−i is defined as the average of the evaluations of the other players, which is an imperfect measure of how the competing players value the good that is on auction. Although a player would benefit more from information about the actual evaluations of the other players and in particular about the highest evaluation, this information is not available in keyword auctions. Therefore the average of evaluations is used, which is unlikely to be informative about the highest evaluations under well-behaved distributions of evaluations. This makes it possible to model imperfect information in the auction environment, as was mentioned in the Introduction. Furthermore, using the average evaluations as a measure results in the fact that the market signal is different for every individual player and therefore a different strategy function is updated for each player.

Besides this function, another strategy function is studied in which the mar-ket signal is not taken into account. This function has the same definition as in (3), except for the market signal which is set equal to zero (s−i= 0) and is given by

βh_i,t= ah_0,t+ ah_2,tsi+ ah4,ts 2

i. (4)

The starting values of the coefficients ah_k,t for both strategy functions are spec-ified as discrete random numbers between 0 and 9 for simplicity, as opposed to Anufriev et al. (2015) that used binary values. Also, discrete instead of con-tinuous random numbers are used for simplicity and stability of the strategies. The starting values for fixed evaluations are S ∈ {2, 4, 7, 9} for the four-player auction, S ∈ {2, 0, 1, 7, 4, 3, 8, 5, 10, 6} for the ten-player auction and fixed num-bers between 0 and 10 for the 50-player auction.4 _{For random evaluations the} starting values are specified as discrete random numbers between 0 and 10.

The auction is held for i ∈ {1, ..., I} players in which each player has H = 20 chromosomes, which consist of K = 6 genes. The genes of a chromosome ah

k,t represent the coefficients in the strategy function (3), such that this chromosome forms a linear prediction rule of the strategy (similar to Anufriev et al., 2015). The chromosomes that form the base of the strategy functions (3 and 4) are updated by GA in every auction period. Because each player has a different set of chromosomes, which are independently updated according to the relative profit of the corresponding strategy, the learning experience is different for each

(11)

player. The use of GA updating makes it possible to model individual learning in which the players eventually learn different strategies, even though they began with the same distribution of starting values of the chromosomes.

2.3 The Genetic Algorithm

The GA procedure that is used in this study is based on the study of Anufriev et al. (2015). The procedure starts with a population of random chromosomes that is created as starting values, which will be the initial ’parent’ population. As was mentioned in the previous section, the strategies of the players are based on these chromosomes, in which each gene represents a coefficient in the strategy formula. The chromosomes are optimized by the GA operators procreation, mutation, crossover and election, which change the chromosomes each auction round. The first operator of the GA procedure is procreation in which a new population of chromosomes is drawn from the parent population, that will be referred to as the offspring population. Which chromosomes are selected for the offspring population is based on relative profit of the strategies that correspond to the chromosomes, that is given by

pi_h= π i h PH h=1πhi , (5)

with profit the πi

h defined in (6) and (7). This relative profit is a measure of fitness of a chromosome, in which the fit chromosomes are more likely to be pro-created. This means that the higher the relative profit of a particular strategy is, the higher the probability becomes that the corresponding chromosomes are selected for procreation.5

The profit in this function is based on a modification of general Equations 1 and 2, in which the bid bi of player i is replaced by the bid that is proposed to player i by GA (βh

i,t). This means that the bid proposed by GA is compared to the highest actual bid of the previous period for all the players except for the player that won in the previous period. The proposed bid of the winning player is compared to the second-highest bid of the previous period, to see whether his proposed bid would still make him the winning player in the previous period. Therefore the profit of the FPA that corresponds to the bid that is proposed by GA is defined as πi_h=     

si− βi,th if βi,th > b∗i,t−1 1 2(si− β h i,t) if β h i,t = b∗i,t−1 0 otherwise (6)

and for the SPA as

πi_h=     

si− b∗i,t−1 if βi,th > b∗i,t−1 1

2(si− b ∗

i,t−1) if βi,th = b∗i,t−1

0 otherwise,

(7)

with β_i,th the bid proposed by GA and b∗_i,t−1 the winning bid of the previous period. These equations show that if the proposed bid would have been the

5_{In the procreation phase it is possible that the same chromosomes are selected multiple}

(12)

highest bid in the previous auction period, the payment of the winning player would have been the amount of the proposed bid in case of the FPA (6). For the SPA (7) the payment would have been bid that was actually highest in the previous auction period, since this is now the second-highest value when the proposed bid is also taken into account. Therefore, these profit functions are used in Equation 5 to compute the relative profit of each of the proposed strategies for procreation (and for selecting the strategies for the next period, as will be discussed later in this section).

After the procreation phase, the two evolutionary operators mutation and crossover are applied to the offspring population that change the values of the genes. In the mutation phase, all the genes of the chromosomes have a small probability to undergo a change of normal random noise, in which a small num-ber from a normal distribution N (0, .25) is added to the value of the gene (ah

t,k). Afterwards, pairs of chromosomes of the offspring population have a probability to crossover and exchange their first two genes. In the main results of this study, the probabilities of mutation and crossover are set at p = .50. In Appendix A.2, different values of probabilities are investigated, to examine their influence on the outcomes of the model. However, in the election phase only the strategies are selected that are more profitable than before and therefore specific size and probability of mutation and crossover should not have a substantial influence on the learned strategies.

Finally in the election phase, the offspring population is compared to its parent population and only the chromosomes that are strictly better than their parents are selected for the new generation of the next period. Therefore, the election operator ensures that only the strategies that are most profitable sur-vive. This way the strategies of a player will generally improve (or will be at least as good) each time period. An overview of the GA procedure is shown in Figure 1, which is executed in every auction period.

Figure 1: The Genetic Algorithm

In every auction period the players select a strategy, based on relative profit of (5), which generates a bid that depends on a the personal evaluation and the market signal of that specific period. The strategy of the previous period

(13)

with the highest relative profit has the highest probability to be chosen as a bid for the next auction period. For the first period the first item of the set of chromosomes (h = 1) is selected as a base for the strategy the starting bid for each player.6 _{For the subsequent periods the recommended bids are generated} by the GA procedure.

2.4 Simulation procedure

The learning process of the strategies that are updated by the GA procedure is studied by 200 M-C numerical simulations that consist of T = 1000, T = 2000 and T = 1000 for I = 4, I = 10 and I = 50 players respectively. Each simu-lation starts with selecting the initial bid for all the players. Like in Anufriev et al. (2015) the GA updating procedure is carried out for each player indepen-dently. From the second auction period, the GA procedure starts with with the first- or second-price auction, in which the each of the recommended bids is com-pared to the highest bid from the previous period. The procreation process is then initiated in which the offspring population of chromosomes is selected. The evolutionary operators mutation, crossover and election are applied to this off-spring population. Finally the proposed bids are adjusted based on the elected chromosomes and the next auction period starts (see Figure 1).

This simulation procedure is carried out for a various number of players (I = 4, I = 10, I = 50). For all these cases, different types of evaluations are examined: fixed evaluations, which remain the same every time period and random evaluations that change every time period. Also, separate simulations are carried out for different values of probability of mutation and probability of crossover (p = 0.05, p = 0.5 and p = 1), for all number of players.

2.5 Equilibrium concepts

In order to compare bids that are proposed by the model to the outcomes in the existing literature on auctions, the values of these bids are compared to the Nash equilibrium.7 For the sealed-bid FPA with uniformly distributed evaluations the NE is given by

bf p−a_i (si) = n − 1

n si, (8)

with n the number of players and sithe evaluations of player i (Vickrey, 1961). The corresponding equilibrium payoff is given by

π_if p−a= (

si− bf p−ai if b f p−a

i > maxj6=ibf p−aj

0 otherwise, (9)

with maxj6=ibsp−aj the highest bid of all players except for player i.

For the sealed single-period SPA the NE strategy consists of truthful bidding in which players bid their evaluations (Vickrey, 1961), such that the equilibrium bid is given by

bsp−a_i (si) = si. (10)

6_{Since the initial set of chromosomes is based on uniform random numbers, it should not}

make a difference which initial bid is selected.

(14)

Assuming that with GA learning the players are myopic and therefore only care about the nearest auction round, the equilibrium solution of truthful bidding is (one of) the equilibrium solution(s) in the model of this paper. The payoff in the NE is given by π_isp−a= ( si− maxj6=ib sp−a j if b sp−a i > maxj6=ib sp−a j 0 otherwise (11)

The revenue of the auctioneer in both type auctions with uniformly distributed evaluations is given by

R = n − 1

n + 1χ, (12)

with χ = 10 the highest evaluation of the discrete, uniform distributed evalua-tions si ∼ uniform[0, 10].8 Because in both the FPA and SPA the evaluations have the same uniform distribution, according to the Revenue Equivalence The-orem by Vickrey (1962), the revenue of the auctioneer is the same in both types of auctions.

2.6 Expectations

In comparing the two auction types it is expected that the players will learn faster in the FPA. In the FPA the winning player pays exactly the amount of his bid as opposed the SPA in which different amount of bids can lead to the same payment, despite winning. Therefore, the outcome of the FPA provides more information about which strategies are profitable than the outcome of the SPA. Moreover, it is expected that the winning bid converges to a lower amount in the FPA compared to the SPA, which is caused by the fact that the winner pays the amount he bids. As was mentioned in Section 2.5, according to the NE the winning bid in the FPA is expected to be n−1_n times the bid of the SPA. Therefore it is expected that the difference between the bids in both types of auctions decreases with the number of players. Because learning is expected to be easier in the FPA, it is expected that the profit for the winning player is higher in the FPA compared to the SPA for the auctions with a relatively low number of players. However, because the bids in both types of auctions are expected to be approximately equal for a large number of players, the profit of the winning player will be higher in the SPA compared to the FPA in this case. The revenue of the auctioneer is expected to be higher in the SPA, because in this type of auction the players need more time to learn an efficient strategy, resulting in a longer period in which players bid more than they should.

When the different types of evaluations are compared, it is expected that random evaluations that change every period will result in slower convergence of the placed bids than fixed evaluations, because players cannot rely on the bidding information of the previous period and the bidding environment is more unpredictable in this case. In case of fixed evaluations, the players with low evaluations do not have space to explore different bids and will be forced to bid low amounts. The players with high evaluations, on the other hand, will win often relatively early in the auction and will therefore hold on to this ’successful’ strategy and will not be intended to lower their bids any further. Because of

8_{In Vickrey (1961) the uniform distributed values are continuous. In this study these are}

(15)

this, the bids will remain higher than necessary in case of fixed evaluations and therefore it is expected that the average profit will be lower in this case compared to random evaluations. This will in particular be the case for the FPA in which the players do not have space to bid above their evaluations. In that case the player with the highest evaluation can bid slightly below his private value to win the auction, leaving no room for the other players. This is not the case in for random evaluations, since the players switch roles in every auction period in this case.

For coefficients of the strategy function that is based on both the personal evaluations and the market signal, it is expected that the proposed bid is posi-tively related to a evaluation (ah2t> 0), since a higher evaluation will allow for higher bids. This will especially be true for the FPA because of the aforemen-tioned argument that a player pays what he bids. This relationship is expected to be concave (ah

4t< 0). Suppose the evaluation of a player increases from si = 2 to si= 3. Then it is more beneficial to increase the bid compared to when the evaluation increases from si= 8 to si= 9, since the highest possible evaluation is si= 10. This is because a player with a low evaluation will increase his chance of winning the auction by increasing the bid, while a player with a high evalua-tion would probably have won already and, in that case, increasing his bid will only lower his profit. For the SPA it is also expected that the relation between a player’s strategy and the market signal is positive (ah

1t> 0), since the higher the average evaluation of the other players is the higher the placed bid needs to be in order to win. This relationship is also expected to be concave (ah_3t < 0), since the advantage of bidding a high amount is larger when the average evaluations of the other players is low compared to high. The coefficient of the interaction is expected to have a positive sign (ah5t > 0), since bidding a high amount will be especially profitable when the evaluations of the other players are high.

For the strategy function that is based on personal evaluations only (s−i= 0), it is expected that convergence is sooner compared to the strategy function that also takes into account the market signal. This is because learning will be more efficient when there are only three coefficients that need to be updated. The learned strategy is also expected to be a positive concave function of the personal evaluation. However, due to the fact that the strategies are based on less information, the profits are expected to be be lower.

The number of players in the auction is expected to be reversely related to convergence time. When the number of players in the auction increase, it becomes more likely that the evaluations of the players will overlap. Therefore it becomes harder for the players to learn a strategy that generates positive payoff, resulting in slower convergence of the winning bid. It is also expected that mutation and crossover probability will influence the time before convergence to a certain extent. Increasing the probability of mutation and/or crossover will result in faster convergence, since players will be able to learn faster when mutation and crossover occur more often. Mutation and crossover adapt the coefficients of the strategy function and if these processes occur more often it becomes more likely that the corresponding strategy improves. This is because of the fact that only the coefficients of the offspring population that result in a more profitable strategy than their parent’s survive in the election phase. However, changing mutation and crossover probability will not influence which strategies are learned.

(16)

3 Study 1: Results

3.1 Number of players

The simulation results of the winning bid in the FPA and SPA for different number of players (I = 4, I = 10 and I = 50) are shown in Figure 2. The figure shows the course of the winning bid over 1000 periods for fixed and random evaluations. For both types of evaluations the winning bid convergences sooner for the FPA compared to the SPA. Moreover, for both types of auctions the convergence time was reversely related to number of players in the auction, as expected. For the FPA the convergence time was approximately t = 150 (I = 4), t = 1100 (I = 10) for fixed evaluations and t = 450 (I = 4), t = 1700 (I = 10) for random evaluations. For the SPA the convergence time for fixed evaluations was t = 370 (I = 4), t = 6500 (I = 10) and for random evaluations t = 650 (I = 4), t = 3500 (I = 10). For 50 players convergence time was beyond the range of the simulations. Contrary to the expectations, there was no clear difference in convergence time between fixed (2.a, 2.b) and random evaluations (2.c, 2.d).

Winning bid

(a) First-price auction (S fixed) (b) Second-price auction (S fixed)

(c) First-price auction (S random) (d) Second-price auction (S random)

Figure 2: Winning bid in each period, average over all markets (200 Monte-Carlo iterations).

(17)

3.2 Comparison between the FPA and the SPA

Figure 2 also shows that the average winning bid is higher in the FPA than in the SPA for all number of players. This is according to expectations, since the winning players in the SPA payed the second-highest bid and therefore they could bid higher amounts than in the FPA. The mean and median bids and profits of the winning player in both types of auctions are shown in Table 1. The winning player is defined as the player that won the most auction rounds during the last 100 periods of the auction. The profit was indeed higher in the FPA compared to the SPA, for all number of players and for both types of evaluations. Apparently, the players were better able to learn a profitable strategy in the FPA compared to the SPA, which is in line with the idea that learning was easier in the FPA. Moreover, the difference between the bids in these two types of auctions was expected to be lower for higher number of players. Table 1 shows that the difference between the bids for both types of auctions was, according to expectations, lower for ten players compared to four players, for both types of evaluations. However, the bids in the FPA were not a proportion _n−1n of the bids in the SPA, as was predicted by the NE. For the 50-player auctions the bids did not converge, and therefore the NE solutions were not applicable to these auctions.

FPA SPA 4 10 50 4 10 50 S fixed bid mean 7.61∗ (.067) 11.91∗ (.081) 266.75∗ (1.119) 30.01∗ (6.187) 16.40∗ (.853) 392.03∗ (2.838) median 4.91 9.38 267.05 27.37 14.10 391.57 profit mean −.88∗ (.075) −5.83∗ (.076) −262.68∗ (1.118) −9.34∗ (4.035) −3.42∗ (.811) −367.11∗ (2.067) median 1.55 -4.09 -260.57 -8.59 -3.016 -363.51 S random bid mean .41∗ (.016) 2.17∗ (.046) 285.71∗ (2.891) 18.40∗ (.193) 16.29∗ (.187) 421.55∗ (4.271) median 0.01 0.065 286.18 17.24 13.63 420.36 profit mean 4.62∗ (.046) 2.80∗ (.077) −283.67∗ (3.154) −6.96∗ (4.108) −3.91 (2.555) −395.48∗ (5.324) median 4.77 3.76 -281.43 -6.814 -3.40 -390.97

Table 1: Average and median bids and profits (SE) of the winning player of the last 100, average over 200 Monte-Carlo iterations. Coefficients marked with (*) were significantly different from zero at the 5% level.

(18)

In comparing the mean and median values of the bids and profits, it appears that the bidding behavior in the SPA was volatile for both types of evaluations. This can be concluded from the fact that the standard errors of the bids, shown in Table 1, were higher in the SPA than in the FPA. For example, the standard errors of the bids in the SPA with fixed evaluations for the four- (SE= 6.187) and ten-player auction (SE= .853) were large compared to the standard errors of the bids in the FPA with four (SE= .067) and ten (SE= .081) players. This volatility was caused by outliers in the bidding pattern, with an extremely high bid once every few periods. This is illustrated by Figure 3 in which the average profit of the last 100 periods of the SPA showed extremely low peaks, which was not the case for the FPA.

Figure 3: Average profit of the winning player in the last 100 periods for the FPA and SPA for the 200 markets with random eval-uations.

In the markets with extremely negative outliers the players were apparently not able to change their strategy of bidding very high amounts. This might have been caused by the fact that it was hard to learn in the SPA, because an extremely high bid could either result in positive profits or highly negative profits, depending on the second-highest bid. When a player experienced in previous auction rounds that a very high bid resulted in high profits, he might have been intended to mistakenly bid very high amounts in the subsequent periods. When in the subsequent periods the second-highest bid was also very large, the profits were suddenly extremely negative, which might have accounted for the observed negative outliers.

3.3 Four-player auction

Besides the cases of different fixed and random evaluations, the case in which two players had the same, highest evaluation was examined. For the FPA with evaluations S ∈ {2, 4, 9, 9} the average profit (πi = −1.5422, SE= .089) of

(19)

the winning player was smaller compared to the average profit of the winning player in the FPA with S ∈ {2, 4, 7, 9}.9 _{This might might be explained by} the fact that it was harder for the two players with the highest evaluation (si = 9) to win, because they had to outbid to each other. For the player that was the only one with the highest evaluation (si = 9) it was easier to win because he only had to bid higher than the player with second-highest evaluation (si = 7). For the FPA this resulted in the situation where the players with the highest evaluation tried to overbid each other, resulting in a higher final bid and lower profit. The winning bid in the FPA with equal evaluations β_i,th∗ = 9.160 (SE=.829) was indeed higher compared to the case with separate evaluations, β_i,th∗ = 7.611 (SE= .067). This bid was even higher than the maximum evaluation, indicating that the players occasionally placed a bid that was higher than their evaluation, which explains the negative average profit. From these results it can be concluded that an auction environment in which two players had the same, highest evaluation made it more difficult for the players to learn a profitable strategy in the FPA.

For the SPA the profit was proportionally lower compared to the FPA in the case of two players with the same, highest evaluation, πi = −11.161 (SE= 4.538), and was not significantly different than the profit for separate evaluations, π = −9.341 (SE= 4.035). This is not unexpected, since it was even harder for the players to win in the SPA compared to the FPA when the two highest evaluations were equal. This was caused by the fact that first and sec-ond place were often taken by the two players with the highest evaluation and therefore the winning player had to pay the second-highest bid that was placed by the player that had the same evaluation. Consequently, the players with the highest evaluation had to be careful not to both bid very high amounts to avoid risking a payment that was higher than the evaluation. However, because both players had an incentive to win, they both bid amounts that were higher than their evaluations (β_ith∗= 11.010, SE= 1.257), resulting in negative profits for the winning player. In this case, the players did not manage to learn that it was not profitable to bid an amount higher than their evaluation, since the other player with the same evaluation would have the same strategy in which he would bid at least the amount of his evaluation. Both players had an incentive to overbid the other player, but because of the fact that they had the same highest evaluation of si= 9, the highest possible bid was exactly this evaluation. As a result, both players bid around this number, which made it impossible for either of them to generate positive profit. This made learning of a profitable strategy difficult for the players, since bidding below the evaluation resulted in losing and bidding above the evaluation resulted in negative profits.

3.4 Ten-player auction

For the ten-player auction the case of equal evaluations was also examined with S ∈ {7, 3, 2, 6, 7, 4, 1, 10, 7, 5, 4}, in which three players had the same second-highest evaluation (si = 7). For both types of auctions the profits were lower in this case compared to separate evaluations (πi= −7.64, SE= .081 for FPA and πi= −27.36, SE= 1.869 for SPA). Especially for the SPA, the winning players seemed to be experiencing problems with generating positive profits, since the

(20)

profits were substantially lower in this case compared to separate evaluations (πi = −3.42, SE= .811). Presumably the fact that there were three players with the second-highest made it too hard for the players to learn a strategy that resulted in positive profits. This might be because the winning player anticipated on the second-highest bid that was often placed by the player with the second-highest evaluation. However, in this case the second-highest bid was alternating between these three players with different strategies, which made it harder for the winning player to anticipate on the upcoming bids.

3.5 50-Player auction

As was mentioned in the previous sections, the players were not able to learn a profitable strategy in the 50-player auction. Besides the fact that the large number of competitors made it harder for a player to anticipate on all compet-ing strategies, it was also the case that in the 50-player auction there were a lot of players with the same evaluations. This is because all evaluations were specified as discrete numbers between 0 and 10 with the result that all evalu-ations appeared multiple times. In line with the aforementioned argument, it became impossible for the players to outbid the other players with a bid that was lower than their evaluations, since there would always be another player with the same (or higher) evaluation that would try to do the same.

In practice, it is not unlikely that a lot of advertisers participate in a single keyword auction. However, it could be the case that in practice the evaluations are spread over a wider range of values, as opposed to values between 0 and 10, since certain keywords could be very valuable for particular advertisers. Because of this wider spread, it may be possible for certain advertisers to bid high amounts to distance themselves from the other players and therefore they might be able to generate positive profits in practice. The bidding environment would in that case possibly be more similar to the four-player case in which there is one player with a distinct highest evaluation. In that case the (large number of) advertisers with low evaluations do not have a substantial influence on the auction environment, since they will never have the opportunity to win, similar to the players with an evaluation of si = 2 and si= 4 in the four-player auction.

3.6 Comparison between fixed and random evaluations

Although the convergence rate did not differ between fixed and random evalu-ations, the final bid and profit were different. The mean and median bids and profits of the winning player in both types of auctions are shown in Table 1. As expected, the mean and median profit were higher with random evaluations compared to fixed evaluations for most auctions. This was caused by the fact that the winning bids were significantly lower for random evaluations. For ex-ample for the four-player FPA, the placed bid of the winning player with random evaluations (β_ith∗ = .409) was significantly lower than the placed bid for fixed evaluations (β_ith∗ = 7.611).10 This shows that the players were able to learn a profitable strategy, but only in an environment in which the evaluations changed each period. This may have the aforementioned explanation that a player had

(21)

more space to explore different strategies when his evaluation changed every period. Besides, the learning environment was more diverse in case of random evaluations, because the other players also had to anticipate new strategies in each auction period. Remarkably, the average bid for the four-player FPA with random evaluations was close to zero, which was proportionally lower than the average evaluation. This is unexpected, since the losing players could have won with a positive profit by bidding an amount that was proportionally higher than zero.

Another unexpected result is that all profits were negative on average for fixed evaluations (see Table 1), with the least negative for the FPA auction with four players. Apparently, the players were not able to learn to bid below their evaluations in case these were fixed. This might be explained by the aforementioned argument that the players did not have a lot of space to explore different strategies when their evaluations were fixed. Therefore the player with the highest evaluation won almost all auction rounds, leaving the other players no room to switch roles from losing to winning player. This could also have had the result that the winning player did not have to improve his strategy, since he already won with positive profits in each round, resulting in the fact that he continued to bid amounts that were higher than necessary.

The number of times each player won in the last 100 periods in the ten-player FPA are shown in Figure 4 for fixed and random evaluations. Markedly, in the case of random evaluations the winning player won a lot more rounds compared to the other players. This was not the case for the fixed evaluations, where the number of won rounds was approximately proportional to the evaluations (S ∈ {2, 0, 1, 7, 4, 3, 8, 5, 10, 6}). A similar pattern was observed for the SPA. This finding is quite remarkable given the fact that the winning and losing player eventually did not learn very different strategies in case of random evaluations. Apparently the changing evaluations made it possible for one of the players to learn such a profitable strategy that enabled him to distinguish himself from the other players. These findings are an illustration of the individual learning property of GA.

(a) FPA S fixed (b) FPA S random

Figure 4: Total number of auction rounds won by each players in the first-price auction in the last 100 periods of the total of 200 Monte-Carlo iterations, with S ∈ {2, 0, 1, 7, 4, 3, 8, 5, 10, 6} in case of fixed evaluations

(22)

3.7 Learned strategies

As mentioned in the previous section, the players managed to learn the most profitable strategy in the FPA with random evaluations. The learned coefficients of the winning player are shown in Table 2. The profit (πi= 4.62) was highest in the four-player FPA with random strategies with bidding function

βith = 3.536 + 1.647s−i+ 3.700si+ .626s−isi. (13) The remaining coefficient of the strategy function (3) were not significant. For the FPA with random evaluations, the relation between a player’s evaluation and the proposed bid was positive, in line with the expectation. However, the coefficient of s2

i was not significantly different from zero (for four players at the 5% level and for 10 players at the 10% level), which was contrary to the expected concave relation. Moreover, there was also a positive relation with the evaluations of the other players, with again an insignificant coefficient for the second-order term s2

−i. The interaction between the personal evaluation and the market signal, however, was significant and positive with a coefficient that was smaller than 1. This means that the higher the evaluations of the other players were, the more (moderately) beneficial it became to increase the bid. The positive nonzero constant reflects the fact that the winning player was biased towards always bidding a positive amount. For the SPA with random evaluations a similar relation was found, which was in line with expectations except for the insignificant second-order terms (Table 3).

I c s−i si s2−i s2i s−isi

S fixed 4 4.225∗ (.219) 2.868∗ (.189) 3.309∗ (.206) 2.446∗ (.171) −1.510∗ (.069) .692∗ (.126) 10 4.158∗ (.200) 3.116∗ (.194) 3.822∗ (.214) 3.182∗ (.196) −1.640∗ (.060) 1.149∗ (.135) 50 4.517∗ (.213) 3.530∗ (.202) 3.736∗ (.021) 2.955∗ (.185) .127∗ (.060) 1.436∗ (.127) S random 4 3.536∗ (.195) 1.647∗ (.167) 3.700∗ (.218) -.036 (.220) .260 (.225) .626∗ (.162) 10 3.704∗ (.201) 1.458∗ (.158) 3.343∗ (.210) -.187 (.186) .556∗ (.232) .863∗ (.184) 50 4.214∗ (.200) 3.934∗ (.201) 4.172∗ (.206) 2.544∗ (.168) 2.472∗ (.185) 2.384∗ (.185)

Table 2: Coefficients (SE) of the recommended bid in the last period for the winning player of the FPA, average over 200 markets. Coefficients marked with (*) were significantly different from zero at the 5% level.

(23)

I c s−i si s−i2 s2i s2−isi S fixed 4 3.978∗ (.203) 2.951∗ (.176) 3.355∗ (.196) 2.314∗ (.182) −1.585∗ (.083) .757∗ (.135) 10 3.649∗ (.208) .954∗ (.160) 3.170∗ (.199) −1.447∗ (.089) 1.671∗ (.171) .006 (.121) 50 4.898∗ (.207) 3.973∗ (.199) 4.185∗ (.206) 3.1694∗ (.184) .527∗ (.061) 1.962∗ (.135) S random 4 3.591∗ (.209) 1.816∗ (.187) 3.672∗ (.201) .163 (.227) .394 (.248) .641∗ (.170) 10 3.833∗ (.213) 1.622∗ (.169) 3.292∗ (.202) -.125 (.170) .664∗ (.236) .912∗ (.172) 50 4.412∗ (.199) 3.965∗ (.207) 4.034∗ (.188) 3.669∗ (.209) 3.002∗ (.203) 2.920∗ (.190)

Table 3: Coefficients (SE) of the recommended bid in the last period for the winning player of the SPA, average over 200 markets. Coefficients marked with (*) were significantly different from zero at the 5% level.

As was shown by Table 1, the players did not manage to learn a profitable strategy in the auctions with fixed evaluations. Especially for the SPA the play-ers faced negative profits. The most important differences between the strate-gies with random and fixed evaluations were the magnitude of the coefficient a1t (corresponding to s−i), and the sign and magnitude of the coefficient a3t (corresponding to s2

−i), for the four- and ten-player FPA and the four-player SPA (see Table 2 and Table 3). Apparently the players were intended to over-react in response to the market signal (high a1t) in case of fixed evaluations. This reaction was convex (a3t > 0), which means that the higher the average evaluation of the other players was, the dis-proportionally higher the placed bid by the winning player became. This may have resulted in extremely high bids when the average evaluation became very high. This part of the strategy might explain the relatively high bids that were observed in the auctions with fixed evaluations (Table 1).

For the ten-player SPA with fixed evaluations, the winning player learned a different strategy. The strategy in the ten-player SPA with fixed evaluations auction was given by

β_ith = 3.649 + .954s−i+ 3.170si− 1.447s2−i+ 1.671s2i, (14) including again only the significant coefficients. With this strategy, the winning player did not seem to overreact to the market signal (as opposed to the 4-player SPA with fixed evaluations), but to his personal evaluation. The bid was a positive concave function of the personal evaluation. This concavity implies that the winning player learned a strategy to bid amounts that were proportionally higher than his evaluation. Table 1 indeed shows that the average

(24)

bid (βh∗

it = 16.40, SE= .853) was higher than the personal evaluation (si= 10). Since the corresponding profit was less negative and less volatile than in the 4-player case (πi = −3.44, SE= .811 for ten players compared to πi = −9.34, SE= 4.035 for four players), the strategy of Equation 14 seemed to work best in case of fixed evaluations.

In further comparing the coefficients of the two types of auctions (Table 2 and 3), the following results were noteworthy. Firstly, the learned coefficients of the winning player in the FPA and SPA were similar for random evaluations. It seemed to be the case that the players learned the same strategy despite the difference in convergence time. Secondly, in both types of auctions with random evaluations the coefficient of s2_−i was not significantly different from zero and in addition the coefficient of s2_i was also very small. This means that there was a moderately nonlinear positive relation between evaluations and the proposed bids.

3.8 Comparison between winning and losing strategies

Table 4 shows the coefficients of the losing player for both types of auctions with fixed evaluations. The losing player was defined as the player that won the least auction rounds during the last 100 periods of the auctions. For example, the proposed bid for the losing player with fixed evaluations in the four-player FPA was given by

β_i,th = 3.922 + 1.348s−i+ 4.338si− 1.476s2−i+ 3.776s 2

i + 1.381s−isi. (15) In comparing the coefficients of Table 4 to those of the winning player (top part of Table 2 and 3), the main differences were the signs of the coefficients of s2

−i (ah

4t) and s2i (ah5t), and the value of the coefficient of s−i (ah2t). In the strategy function of the losing player the term s2

−ihad a negative sign and s2i a positive sign, which was the opposite for the function of the winning player. Moreover, the value of the coefficient of s−i was significantly lower for the losing player compared to the winning player (for four players ah

1t = 1.348 versus ah1t= 2.868 in the FPA11 _{and a}h

1t = 1.776 versus ah1t = 2.951 for the SPA12). This shows that the losing player was not as responsive to the market signal as the winning player. The winning player increased his bid more when the average evaluation became larger (convex), while the losing player showed a lower increase in his bid when the average evaluation became larger (concave). This resulted in a lower number of winning bids for the losing player compared to the winning player.

11_{Results of the two-sample t-test for difference in mean: t(199) = −6.157, p < .001.} 12_{Results of the two-sample t-test for difference in mean: t(199) = −4.893, p < .001.}

(25)

I c s−i si s−i2 s2i s2−isi FPA 4 3.922∗ (.213) 1.348∗ (.141) 4.338∗ (.207) −1.476∗ (.046) 3.776∗ (.213) 1.381∗ (.133) 10 3.872∗ (.201) 1.375∗ (.114) 4.220∗ (.202) −.640∗ (.028) 4.214∗ (.196) 4.358∗ (.189) 50 4.460∗ (.210) 3.835∗ (.204) 4.284∗ (.214) 1.784∗ (.133) 2.023∗ (.140) 2.118∗ (.137) SPA 4 3.840∗ (.206) 1.776∗ (.163) 3.844∗ (.214) −1.473∗ (.102) 1.557∗ (.181) .160 (.122) 10 4.434∗ (.218) 1.778∗ (.166) 3.674∗ (.209) 1.127∗ (.177) −1.359∗ (.100) −.396∗ (.122) 50 4.114∗ (.209) 4.166∗ (.216) 4.202∗ (.202) 2.850∗ (.186) .556∗ (.061) 1.842∗ (.131)

Table 4: Coefficients (SE) of the recommended bid for the losing player in the FPA and SPA with fixed evaluations, average over 200 markets. Coefficients marked with (*) were significantly different from zero at the 5% level.

For the auctions with random evaluations, the strategies for the winning and losing player were similar.13 _{This might be explained by the fact that in case} of random evaluations, all players had an equal probability of having a high evaluation each period and therefore all players had the same expected number of opportunities to learn a profitable strategy. Initially the players switched roles every period according to their evaluations, which provided every player with an equal probability of winning. This was not the case for fixed evaluations, where the player with the highest evaluation had a big advantage because he had an overall higher probability to win.

The finding that the players learned different strategies in case of fixed eval-uations is another reflection of the individual learning property of GA. Although all players began the auction with the same starting distribution of evaluations, eventually they learned different strategies according to their evaluations and the market signal, which were different for every player. Because GA updated the coefficients for every player independently (the same way as in Anufriev et al., 2015), the learning experiences were heterogeneous. For online keyword auctions this means that GA could possibly mimic learning experiences by ad-vertisers, since different types of advertisers value the goods that are on auction differently. Advertisers of luxury goods (i.e. cars) would for example always value keywords of provisions (i.e. basic groceries) with zero, while manufactur-ers of provisions would always value a keyword from that category to a certain extent.

13_{See the Appendix A.3 for the coefficients of the losing player for the auctions with random}

(26)

3.9 Comparison between Nash solutions and learned GA

strategies

As mentioned in the Model section, in the NE of the FPA the bids are given by Equation 1, which equals bf p−a₁ (2) = 1.5, bf p−a₂ (4) = 3, bf p−a₃ (7) = 5.25 and bf p−a₄ (9) = 6.75 for S ∈ {2, 4, 7, 9}. In this case the winning player, i = 4, is in all final 100 periods, which should correspond to a bid of bf p−a₄ (9) = 6.75 according to the NE. The actual proposed bid by the GA model with fixed evaluations was slightly higher βh

4,t = 7.611 (.067). The profit of the auctioneer in the FPA is given by Equation 12, which equals R = 8, R = 8.8 and R = 9.6 (one time period) in the NE for respectively 4, 10 and 50 players. Table 1 showed an average profit for the auctioneer of R = 7.61, R = 11.91 and R = 267.047, respectively (the average of the placed bids), which roughly corresponds to the NE, but only for the auction with four players. For the FPA with random evaluations the bids were proportionally lower than the Nash-Equilibrium, resulting in a lower profit for the auctioneer (R = .409, R = 2.169 and R = 267.047). Apparently, the players managed to learn a strategy that is more profitable than the Nash solution.

As was mentioned in the Introduction, one of the NE of the SPA is given by bidding exactly the amount of the evaluations, with corresponding profits given by Equation 2. For the SPA with fixed evaluations the NE bids would be bsp−a₁ (2) = 2, bsp−a₂ (4) = 4, bsp−a₃ (7) = 7 and bsp−a₄ (9) = 9. Table 1 shows, however, that the actual average (as well as the median) winning bid was pro-portionally higher than the highest evaluation, for both types of evaluations evaluations.

For both fixed and random evaluations, the profit of the auctioneer grew in proportion to the number of players. However, the relative increase in profit with the number of players was higher in this model than in the NE (12). The auctioneer seemed to benefit more from an increase in number of players in the model that was updated by GA than proposed by the Nash solution. This could be explained by the fact that the players in the auctions with a high number of players learned a strategy in which they tended to bid above their evaluations. This is not a rational strategy, since it is always more beneficial for a player to bid nothing than to bid an amount that is higher than his evaluation. This might explain the difference between the (rational) NE solutions and the found results.

3.10 Strategy function without the market signal

Besides the strategy function in which the market signal (s−i) was taken into account (3), the strategy function in which players only use information about their personal evaluations,

β_i,th = ah_0t+ ah_2tsi+ ah4ts 2

i, (16)

was also examined. The average and median bids and profits corresponding to this strategy for the winning player in the different auctions are shown in Table 5. Remarkably, the average and median profits were positive for almost all cases in both types of auctions with four and ten player, even in the case of fixed evaluations. In comparing Table 5 with Table 1, it becomes clear that only

(27)

in the case of the 10-player FPA with random evaluations, the players seemed to benefit from information from the market signal s−i. Particularly for the SPA, the players were better able to learn a profitable strategy when s−i= 0. Even the players in the 50-player auction managed to bid relatively low amounts in the last 100 out of 1000 periods.

This shows that players in most cases did not benefit from using information from the market signal. This is probably caused by the fact that this market signal is a noisy measurement for the evaluations of the other players in the market, since it is based on the average of the evaluations. The players would probably benefit more from using information about the exact evaluations in the market. However, since this information is not available in online keyword auctions, the market signal is the only information that the players can use. Therefore it would be more beneficial for players to ignore this signal and to only base their strategy on personal evaluations.

FPA SPA I 4 10 50 4 10 50 S fixed bid mean 5.52∗ (.059) 10.89∗ (.095) 55.00∗ (.358) 3.48∗ (.723) 10.12∗ (.127) 119.16∗ (2.197) median 3.85 7.80 52.56 1.15 7.03 116.43 profit mean 1.63∗ (.059) −4.38∗ (.086) −48.30∗ (.352) 4.75∗ (.414) 2.07∗ (.080) −98.12∗ (1.443) median 3.21 -2.13 -45.67 4.06 2.41 -95.94 S random bid mean 1.69∗ (.096) 5.70∗ (.128) 57.83∗ (.587) 7.57∗ (1.391) 4.82∗ (.135) 118.40∗ (3.079) median .01 2.13 55.84 3.96 .92 116.62 profit mean 3.34∗ (.098) −.79∗ (.133) −53.40∗ (.593) 2.92∗ (.758) 4.40∗ (.040) −98.27∗ (2.369) median 4.100 1.308 -51.165 4.605 3.111 -95.189

Table 5: Average and median bids and profits (SE) of the last 100 periods of the winning player of the FPA and SPA with s−i= 0, average over 200 markets.

Results marked with (*) were significantly different from zero at the 5% level.

The convergence time for this strategy function was proportionally lower for both types of auctions compared to the strategy function in which the market signal was accounted for (3). As an example, for the FPA with fixed strategies the convergence time was t = 50 for I = 4,t = 200 for I = 10 and t = 600 for

(28)

I = 50. This means that when players did not take into account market signal, they managed to learn relatively fast, even when the number of player was large. This can be explained by the fact that it was numerically easier to optimize the bidding function based on only three coefficients as opposed to seven coefficients in (3).

Although the winning player in the 50-player auction did manage to learn relatively fast when he did not take into account the market signal (Figure 5.a), his profit remained negative (Figure 5.b), even after convergence. Evidently, the GA still did not seem to work well when the number of players became very large. This can again be explained by the aforementioned argument that it was hard to learn for the players in the 50-player auctions, because the evaluations were very similar, leaving the player no space to bid below their evaluations.

(a) Bid of winning player (b) Profit of winning player

Figure 5: Plot of the bid and profit of the winning player in the 50-player auction with the strategy function based exclusively on personal evaluations (s−i= 0), S random, T=10000 and one M-C iteration

The coefficients of the strategy functions in the auctions with s−i= 0 are shown in Table 6. The coefficients were similar for both types of auctions in case of random evaluations. The strategy of the winning player in the 4-player FPA with random strategies was given by

β_i,th = 2.278 + 2.319si− .519s2i. (17) The profit of this strategy was higher than in the four-player FPA with the strategy based on the personal evaluation and the market signal (13). Compared to this latter strategy, the winning player had a smaller bias towards a positive bid when he ignored the market signal (ah_0,t= 2.278 in Equation 17 versus ah_0,t= 3.536 in Equation 13). Moreover, the relation between the private evaluation and the bid was positive, but lower than in Equation 13 ah2,t = 3.700) and (slightly) concave (ah4,t < 0). This means that the placed bid increased as the personal evaluation became higher, but that the benefit of this increase decreased as the private evaluation became higher. This again shows that a strategy that was only based on information about personal evaluations was more profitable compared to a strategy in which the market signal was taken into account.

A learning model of online keyword auctions : the Genetic Algorithm applied to the first- and second-price auction

University of Amsterdam