Fragility of the exploration phase of adaptive bidding algorithms

(1)

Fragility of the Exploration Phase of Adaptive

Bidding Algorithms

Boyd Biersteker (10623981)

Supervised by mw. dr. K.A. Lasak

Second Reader: dhr. dr. N.P.A. van Giersbergen

15/07/2018

University of Amsterdam

Faculty of Economics and Business

Msc in Econometrics

Big Data Business Analytics

(2)

Abstract

Real-time bidding is the most recent innovation in display advertis-ing. Algorithms control the trading of advertising spots aka impressions. Ghosh et al. (2009) designed an adaptive bidding agent that gathers a target number of impressions against a target price. Motivated by this re-search this paper studies the consequence of a bidding agent that not only bids against an unknown distribution, but also bids against another adap-tive bidding algorithm. The algorithm consists of an explore- and exploit phase. This study finds that the adaptive bidding algorithm is vulnerable during its explore phase when it interacts with other algorithms. The result is that it spends more than its target price and buys less than its target fraction.

(3)

Statement of Originality

This document is written by Student Boyd Biersteker who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than

those mentioned in the text and its references have been used in creating it.The Faculty of Economics and Business is responsible solely for the

(4)

2 Real-time bidding 8 3 Optimal Bid 9 3.1 Bidding Agents . . . 10 3.2 Feature Analysis . . . 12 4 Ad Exchange 12 4.1 Fully-Observable Exchange . . . 12 4.2 Partially-Observable Exchange . . . 13 5 Data 14 5.1 Data description . . . 14 5.2 Simulation . . . 15 6 Strategies 17 6.1 Fully-Observable . . . 17 6.1.1 Learn-Then-Bid . . . 17 6.2 Partially-Observable . . . 18 6.2.1 Bid-Two-T . . . 18 6.2.2 Max-Then-Bid . . . 19 6.2.3 Guess-Double-Panic . . . 19 7 Experiment 21 7.1 General Setup . . . 21

7.2 One Algorithm: Exploration in Isolation . . . 22

7.3 Two Algorithms: Simultaneous Exploration . . . 22

8 Results 23

(5)

8.1 Single Algorithm: Exploration in Isolation . . . 23 8.1.1 Learn-Then-Bid . . . 23 8.1.2 Bid-Two-T . . . 25 8.1.3 Max-Then-Bid . . . 27 8.1.4 Guess-Double-Panic . . . 28 8.2 Simultaneous Exploration . . . 30 8.2.1 Learn-Then-Bid . . . 30 8.2.2 Bid-Two-T . . . 32 8.2.3 Max-Then-Bid . . . 34 8.2.4 Guess-Double-Panic . . . 35 9 Conclusion 38 Appendices 42 A Figures 42 A.1 Actual Fraction Won: One Algorithm . . . 42

A.2 Actual Fraction Won: Two Algorithms . . . 44

A.3 Actual Spend Per Impression: One Algorithm . . . 46

A.4 Actual Spend Per Impression: Two Algorithms . . . 48

A.5 Empirical Distribution: One Algorithm . . . 51

A.6 Empirical Distribution: Two Algorithms . . . 53

B Tables 55 B.1 Indexes of Actual Fraction Won: One Algorithm . . . 55

B.2 Indexes of Actual Fraction Won: Two Algorithms . . . 59

B.3 Indexes of Actual Spend per Impression: One Algorithm . . . 63

(6)

1 Risks of Automated Trading

The sixth of May 2010 revealed just how vulnerable the stock markets have become. At 14:30 the day was still unmemorable. However, circumstances for calamity were sliding into place. Two minutes later a stock market plunge starts that would last 36 minutes (Kirilenko et al. (2017)). Newscasters scream in dis-belief. Traders do whatever they can to secure their positions. The Dow Jones Industrial Average drops by 998.5 points (9 %) (Kirilenko et al. (2017)). Then it stops. The market recovers. But what happened?

In the years before this 2010 Flash Crash automated trading had taken over. Programmers built trading algorithms. In isolation, these algorithms were well understood. However, their interactions proved unpredictable. On that partic-ular day low demand, extremely high supply and market manipulation came together (Kirilenko et al. (2017)). In part, this was the result of handing over the market to algorithms.

Automated trading doesn’t only reside on stock exchanges. Around 2009 it also made its way to advertising. Real-time bidding (RTB) enabled buyers and sell-ers of advertisement space to trade single impressions in milliseconds (Yuan et al. 2013). Algorithms manage these allocations. The similarities with the stock market raise the question if the risks that were uncovered on stock exchanges in 2010 also apply to RTB.

On the buy side, bidding agents are tasked with buying the impressions. Bidding agents are automated. These are the trading algorithms. Their objective can be divided into three subtasks. First, buy a certain amount. Second, achieve a set average price. And lastly, get the right quality. Meaning that the advertiser has a target audience in mind.

Ghosh et al. (2009) have developed an adaptive bidding algorithm. They focus on gathering a fraction of total impressions at a certain average price. Leaving out the task of targeting an audience. This way they come up with the

(7)

Double-Panic (GDP) algorithm. As a benchmark they also use the Learn-Then-Bid (LTB), Learn-Then-Bid-Two-T (BTT) and Max-Then-Learn-Then-Bid (MTB) algorithms. Most of these algorithms have a explore and exploit phase. During the explore phase they learn. When they’ve learned enough they enter the exploit phase.

Ghosh et al. (2009) prove that the Guess-Double-Panic algorithm performs well in simulations based on an RTB marketplace. However, they do not dis-cuss that the introduction of explorer-type algorithms like Guess-Double-Panic would change the market. This paper focuses on the performance of Guess-Double-Panic in a market where more than one such algorithm resides. Fur-thermore, the main interest lies in the vulnerability of the exploration phase. Algorithms than explore at the same time could see a reduction in performance. To be specific, the goal of this paper is to study what the effect is of simultane-ous exploration on the performance of the adaptive bidding algorithms. The iPinYou competition data set is used to legitimize the simulation of new data. This simulation creates a sequence of highest bids from an RTB auction. The algorithms then participate in these auctions given a wide variety of tar-gets. Moreover, two algorithms starting at the same time will be sent to the auctions. Afterward, their performance is analyzed.

The focus of this article lies in uncovering the effect of simultaneous exploring on the algorithms ability to gather the target fraction of impression against the target price. However, other questions also come up. Such as what happens with the empirical distribution formed by the algorithm? How do the algorithms act when targets are not feasible? What is the effect of a changing market? And, what are possible solutions to any vulnerability?

This paper is organized as follows. In section 2 I describe the RTB process, in section 3 we discuss how an optimal bid is formed and in section 4 we examine two types of ad exchanges. Furthermore, in section 5 a data description is given, in section 6 the strategies that are used in the experiment are discussed and in section 7 we examine the setup of the experiment. Finally, the results are given

(8)

in section 8 followed by the conclusion in section 9.

2 Real-time bidding

Real-time bidding (RTB) enables advertisers to select and buy ads on a per impression basis. It’s the most recent development in display advertising. Dis-play advertising uses banners to reach people on online platforms. RTB has improved targeting and the market efficiency (Yuan et al. (2013)).

Before the introduction of RTB in 2009 advertisements were sold in batch size. Advertisers had contracts with publishers for an, usually guaranteed, amount of impressions. They had little information about the impressions they bought. For example, they did not know when and to whom an ad was shown, how many times it was viewed by a person or where it was shown. Instead, advertis-ers relied on the reputation of publishadvertis-ers and reports about the identities of the people they reached. Moreover, impressions were priced using the cost-per-mille (CPM). In this paper, the word impression shall be used interchangeably with the words advertisement, ad or view.

As a first step towards RTB ad networks emerged. These networks combined publishers and matched impressions with advertisers. Ad networks were re-sponsible for understanding an impression. They would describe impressions by attaching target words. Advertisers then could state their ad campaign by subscribing to target words. Ad networks had problems (Yuan et al. (2013)). Some networks grew so large that they had too much inventory compared to the demand. Also, advertisers didn’t like having to subscribe to many different ad networks. Finally, ad exchanges emerged and they introduced RTB. This shifted the analysis of the impressions to the advertisers. Furthermore, it created an auction place where impressions are analyzed, picked and sold in real-time. Advertisers, publishers, and ad exchanges are the main agents within the RTB framework. First, the advertisers want to execute their ad-campaign. Secondly,

(9)

the publishers sell advertising space. Finally, the ad-exchanges bring the for-mer two together. Furthermore, demand-side platforms (DSPs), support-side platforms (SSPs) and data exchanges play a supporting role. Advertisers and publishers can do their trading through a DSP or SSP. This can be beneficial since they might not have enough expertise themselves. In their turn, the DSPs and SSPs might be making use of data exchanges that sell information about the impressions. Note that the product of interest is the impression, i.e. the people that go online and supply views.

The RTB process is displayed schematically in figure 1. The numbered arrows are the actions and the squares are the agents. It starts out with an impression that is created on the publisher’s platform (1). A signal goes to the ad exchange through an SSP or an ad network (2). Then the ad exchanges send a request for bids to the DSPs (3). The DSPs possibly get information from a data platform (4). Finally, the bid will be generated and submitted. The message will follow the same path back to the publishers. All of this will happen in a fraction of time. It could e.g. be triggered by someone entering a web page on their computer.

3 Optimal Bid

The challenge for an advertiser or DSP is determining the optimal bid. What optimal is, depends on the goals set by the advertiser. Typically advertisers want X impressions for an average price of Y. Additionally, advertisers target a particular audience. This target audience usually is described by features. In turn, this could be used to value an impression. In this paper, we focus on the target amount and price of impressions. However, feature extraction is also briefly discussed.

(10)

Figure 1: A picture by Yuang et al. (2009) describing the RTB process. The numbers stand for actions. Namely the following, 1: impression created, 2: signal to ad exchange, 3: bid request to DSP, 4: data gathering from data platform.

3.1 Bidding Agents

Bidding agents bid on behalf of the advertiser. Trading on an RTB exchange happens at high-frequency. Therefore bidding agents are automated. That is, they usually are algorithms.

In this paper, we assume that impressions with a higher price also have more value. Therefore no feature extraction is necessary. The reasoning behind this assumption is that different advertisers have different pieces of information. A bid often reflects this information. Therefore higher prices indicate a higher value. We also assume that bidding agents aim to exhaust their budget. In this paper, bidding agents want to buy d impressions. They have a total

(11)

budget of T to do so. This leads to a target price per impression t = T/d. Furthermore, assume the bidding agents knows the total amount of impressions is n. In practice, bidding agents don’t know this. However, d often is small compared to the total available amount. Which strengthens the assumption that n is known. If n and d are defined, then the fraction of impressions to be obtained is f = d/n.

Another important assumption is that the highest bid from other bidders fol-lows a distribution P. This is what the bidding agents aims to uncover. If P was known, coming up with the optimal bid would be easy. Then bidding

z∗ = P−1(f ) would achieve fraction f . If p∗ is defined s.t. E[X|X ≤ p∗] = t,

then bidding p* achieves target price t. Finally, if z∗ ≤ p∗ then bidding p* with probability A = f /P (p∗) achieves both t and d. However, if p∗ ≤ z∗ then bidding z* is not feasible. This indicates that the target amount d is not line with the target spend t. If that is the case, the bidding agents in this paper will give priority to achieving d. Thus if the targets are not in line it will try to buy impressions cheaply. This is a matter of choice and could easily be the other way around.

Usually, P is not known. Therefore, the task of the algorithm is to learn P. Learning means discovering what the other agents are bidding. Most auctions are second price auctions and thus bidding agents can learn by winning. This is done in the explore phase. Note that learning P comes at a cost. After the agent has learned enough it continues into the exploit phase. In this phase, the algorithm aims to achieve its target by exploiting what it has learned about P. Bidding agents strive to find an optimal balance between exploring and ex-ploiting. During the exploring, it wants to gains as much knowledge without overspending. After that, it is of importance that the algorithm starts exploiting at the right time and chooses the optimal bid and bidding frequency.

(12)

3.2 Feature Analysis

Advertisers gather data about impressions. For example, what browser is used, what time it is and where the impression is located. More specific information is also possible. For example, the person’s gender, hobbies, and age. These are all features. Features give information about a person or object. This is valuable for advertisers. And thus, feature extraction is used for optimizing an ad campaign.

For a detailed description of feature analysis applied to real-time bidding see Chen et al. (2011). Basically, Chen et al. formulate the feature extraction as an LP-problem.

In this paper, feature extraction plays no role. However, in practice, bidding agents could not do without. In order to understand the consequences of the exploring vs. exploiting structure, the model needs to be as simple as possible. One could view the model in this paper as if the feature extraction has already been done. Therefore, the pool of remaining impressions is in accordance with the target audience. Some people within the pool are more valuable than others and thus have a higher price. And finally, the goal now is to find the bid that achieves the remaining two targets. Finding a bid that satisfies both d and t.

4 Ad Exchange

The two main types of ad exchanges are the fully- and the partially observable exchange. Note that different exchanges require different strategies.

4.1 Fully-Observable Exchange

At a fully-observable auction, the winning bid is revealed after each auction. Therefore, bidding agents can learn without bidding. So one strategy is to first learn and refrain from bidding. Then when ready start exploiting. This strategy

(13)

is called Learn Then Bid and is discussed in section 6.1.1.

Ghosh et al. (2009) prove that Learn Then Bid theoretically should get close to its targets under the following requirements. That the problem is feasible after exploration and that observations are accurate during the exploration phase.

Moreover, the exploration phase has a length of m. This is the number of

auctions it’s in the explore phase. And m can be chosen s.t. for most f and t the problem is feasible (Ghosh et al. (2009)). In conclusion, given an appropriate m, and representative exploration phase the Learn Then Bid algorithm should do well in a fully-observable exchange.

Note that the fully-observable exchange is uncommon in practice. Nevertheless, Learn Then Bid is still of use as a benchmark.

4.2 Partially-Observable Exchange

At a partially-observable auction, only the winner knows the highest bid. At a second price auction, the second highest bid is also revealed to the winner. Because the winner pays the second highest bid. The partially-observable ex-change presents a more difficult problem than the fully-observable case. You have to win to learn. In other words, learning has a cost.

A simple but risky strategy is bidding ∞. Obviously, you now run the risk of overspending by ∞, but learn fast because you win every auction. Another ap-proach would be to bid the whole budget T . This reduces the upper boundary for overspending to f ∗ n ∗ T . However, it’s still a risky strategy.

A better strategy would reduce the risk of overspending and at the same time learn enough to reach the targets. Consider bidding 2*t until you reach d im-pressions. Maximum overspending is reduced to 2 ∗ t ∗ f ∗ n. Moreover, if the whole budget is spent you gather at least 0.5 ∗ d impressions. Thus this simple strategy will overspend at most by two times the budget and will gather at least half the target amount of impressions.

(14)

Already the strategy is getting more refined. Instead of 2*t, you could take ω ∗ t, but this just gives a trade-off between overspending and the minimum amount of impressions. Another modification would we the guess-then-double. This introduces the exploration phase. During this phase, you start out by bidding 2 ∗ t for a round of m auctions. You keep doubling your bid after each round until you’ve learned enough and can start exploiting. That is, use the empirical distribution to determine an optimal bid.

Finally, Ghosh et al. (2009) introduce the Guess-Double-Panic algorithm. Its working is similar to the guess-then-double strategy. However, now add a third phase. This is the Panic phase. After each round of exploring you check two things. First, you check if you can enter the exploit phase. If not, you check if it is feasible to continue your current bidding pattern. If this is also not the case, i.e. you expect to go over budget before reaching target d, you enter the panic phase.

5 Data

This chapter contains information about the iPinYou global RTB bidding algo-rithm competition dataset. Subsequently, it describes how we base the simula-tion on the data.

5.1 Data description

RTB is a recent development. Data is mostly in the possession of commercial parties. However, the Chinese technology company iPinYou decided to release a dataset. With it, they organized a competition. The goal of the competition is to design a bidding agent. For a thorough description of the dataset see Liao et al. (2014).

The data set is organized into three parts. First, there is the impression file. These contain all advertisements that are delivered. Second, a file with all the

(15)

clicks. This is advertising that is interacted with through a click. And third, a file with the conversions. Meaning, which clicks resulted in a buy by the visitor. Furthermore, all three files contain information on the identity of the impression. These features could be used to perform feature analysis. As mentioned before, this is beyond the scope of this paper. In fact, the only file of interest is the impression file. Moreover, only two columns are used. One column containing the publisher ID and the other containing the highest bid. The former is used to obtain subsamples for the justification of the simulations. The latter contains the pattern that the bidding agents aim to uncover.

5.2 Simulation

In order to test bidding agents, it is convenient to simulate data that fits the iPinYou data set. By simulating we can test bidding agents with a variety of bidding sequences. This is a common approach also taken by Ghosh et al. (2009) and Yuan et al. (2014).

Predicting the highest bid for an individual impression is hard. However, the highest bids belonging to a single publisher can be fitted with a log-normal distribution given a suitable mean and variance. Even though variances vary among publishers, impressions belonging to the same publishers generally follow a log-normal distribution.

Publishers with more than 20.000 impressions are selected from the iPinYou data set. Then, log-normal distributions with appropriate mean and variance are fitted to the data. For two publishers the result can be viewed in Figure 2. There the empirical win rate functions are presented in blue. These are the empirical CDFs belonging to each of the publishers. They represent the probability of winning given a certain bid. In fact, that is what the bidding agents try to uncover during the explore period. Then in red, the log-normal fit to the win rate function can be observed. Figure 2 indicates that the simulation

(16)

Figure 2: Log-normal distribution fitted to two single publishers from the iPinYou data set. The blue plots are the empirical CDFs of the publishers aka the win rate function. In red is the log-normal fit.

is a reasonable substitute for the iPinYou data.

(17)

6 Strategies

Bidding agents optimize their performance through strategies. As mentioned before, a strategy aims to achieve a target fraction f and an average price t. It further consists of checks, actions, and phases. The strategies below consist of at maximum three phases; the explore phase, exploit phase and panic phase. In the explore phase the focus is on learning. When enough data is gathered it is put to use in the exploit phase. Finally, if it appears targets are unfeasible the panic mode starts. The focus then narrows to gathering enough impressions. In this section, the strategies used in the experiment are explained. The al-gorithms are developed by Ghosh et al. (2009). The theoretical background can be found in their paper. First, we discuss strategies for fully-observable ex-changes. After that, the algorithms acting in the partially-observable exchanges are examined.

6.1 Fully-Observable

Fully-observable exchanges reveal the winning bid after each auction. Thus bidding agents can learn without bidding. This is what the Learn-Then-Bid algorithms does.

6.1.1 Learn-Then-Bid

In Figure 3 LTB is given. It first explores without bidding for m auctions. Then

it constructs an empirical CDF Pm(x) aka the win rate function. It calculates

the optimal bid Pm∗ by making use of Pm(x) and target price t. And then

it calculates the bidding frequency Am to achieve d. Furthermore, it checks if

Pm∗(x) is feasible by calculating Zm∗, the minimum bid that buys d impressions.

(18)

Figure 3: Learn-Then-Bid algorithm from Ghosh et al. (2009)

6.2 Partially-Observable

At a partially-observable auction, only the winner gets to know the highest- and second highest bids. This means that in order to learn, bidding agents need to win. There now is a cost to learning. This requires strategies to find a balance between exploring and exploiting. The strategies below address this problem. Bid-Two-T is the simplest, then Max-Then-id and finally Guess-Double-Panic is the most sophisticated.

6.2.1 Bid-Two-T

Bid-Two-t bids 2t until it has gathered enough impressions. Risking to over-spend by 2t ∗ f n. Moreover, it is expected to yield at least half of the target amount. Furthermore, Bid-Two-t goes directly into the exploit phase, it has no learning period.

(19)

6.2.2 Max-Then-Bid

Max-Then-Bid is a brute force strategy. During the explore phase it bids its whole budget until it has found a bid price that satisfies both targets. If the budget is strict, you run the risk of not gathering enough impressions. If you can go over the budget, you run the risk of overspending by f n ∗ T . In our case, the algorithms can go over their budget.

6.2.3 Guess-Double-Panic

The Guess-Double-Panic algorithm is the most advanced. It consists of three phases and learns from its wins. In Figure 4 the three phases of the algorithm are displayed. The input values are the fraction f it needs to win, target price t, total available impressions n, the length of the explore rounds m and the factor with which the multiply the bid with Φ. In our case, Φ is 2.

Guess-Double-Panic starts out by bidding 2t. It does so for m auctions. This is the first round. Then follows a check. If it has learned enough s.t. both targets could be achieved it enters the exploit phase. If not, then it continues with a new round of exploring, doubling the bid of last round.

In the exploit phase, the algorithm uses the last exploring round to create an empirical CDF. From this CDF it constructs an optimal bid and bidding fre-quency. It then proceeds to bid this until it has gathered enough impressions or it has exhausted its budget. However, it keeps checking if it’s going to make its targets. If the algorithm expects to gather d impressions or expects to exhaust its budget too early, then it calculates a new bid giving priority to achieving d. Finally, it enters the Panic if it expects to exhaust its budget prematurely. It then goes into cautious exploration. Again, the priority shifts to achieving the target amount of impressions. This is a choice by Ghosh et al. and could easily have been the other way around. Which target deserves priority depends on the priorities of the advertiser.

(20)

Figure 4: The Guess-Double-Panic algorithm from Ghosh et al.(2009). On top is the main algorithm with the Explore phase. In the middle the Panic phase is presented. Below is the Exploit phase.

(21)

7 Experiment

This section covers the setup of the experiment. The experiment is divided into two parts. First, we run simulations with 1 algorithm. We do this for each of the strategies from section 6. Second, we run simulations with 2 algorithms that start and explore simultaneously.

7.1 General Setup

The algorithms bid on the impressions. The bid can be greater than or equal to zero. If the algorithm has the highest bid it wins the impression. The price it then pays is equal to the second highest bid. Furthermore, if the two highest bids are equal both bidders have 0.5 chance of winning.

The simulated data consists of a sequence of highest bids. The motivation

for this simulation can be found in chapter 5. All simulations draw from a log-normal distribution with mean and variance one. Because this produces

invariable high bids an upper limit is established at the 99.7th_percentile.

More-over, each simulation consists of 10.000 impressions. Furthermore, for each set of input variables the simulation is repeated 100 times. Finally, the results are the average of all those runs.

The main parameters in this experiment are f, t and m. The latter is m = 100 for all experiments. Because at this size even small f are feasible (Ghosh et al. (2009)). Furthermore, a grid of combinations of f and t is used. In the single algorithm case f takes on 16 evenly spaced values between zero and one. However, when two algorithms compete for the impressions the upper limit for f is kept at its theoretical maximum. Thus, for two algorithms f ranges from 0 to 0.5 in eight evenly spaced steps. Furthermore, t is chosen such that feasible combinations with f are in the grid. Using simulation it is found that in order to acquire 95 % of the impressions an average price of around 3.50 is needed. Therefore t ranges from zero to four in 16 evenly spaced steps. This means that

(22)

at most 256 combinations of t and f are used.

7.2 One Algorithm: Exploration in Isolation

First, the experiment of Ghosh et al. (2009) is repeated. That is, each of the algorithms is tested using the simulated data. This serves to confirm that the algorithms are working properly. It also is a benchmark for the market with two algorithms later on.

Learn-Then-Bid is the only one that faces a fully-observable market. The other algorithms, Bid-Two-T, Max-Then-Bid and Guess-Double-Panic, perform in a partially-observable exchange. Since at this stage the algorithms are in the market alone they use 16 values for f between zero and one and use 16 value for t between zero and four.

7.3 Two Algorithms: Simultaneous Exploration

We examine two algorithms that simultaneously explore starting at t = 0. This is done for Learn-Then-Bid in the fully-observable market. For Bid-Two-T, Max-Then-Bid and Guess-Double-Panic it is done in the partially-observable market. Furthermore, t ranges from zero to four and f ranges 0 to 0.5.

When more than two algorithms are let into the market there are two causes for a change in the performance. Number one, higher demand increases competition. This drives up the average price and makes it harder to gather impressions. This is of less interest to us since in practice markets are large and such an effect is minor. The second thing that changes the results is the interaction between algorithms. This is the topic of this study. Studying two algorithms does not result in a loss of generalization.

(23)

8 Results

In this section, the results are discussed. First, the results of the experiment with one algorithm. In turn, the results for Learn-Then-Bid, Bid-Two-t, Max-Then-Bid and Guess-Double-Panic are examined. After that, the same thing is done but now with the results from two algorithms in the market. Note that all tables and figures we discuss in this section are in the Appendix.

8.1 Single Algorithm: Exploration in Isolation

Section 8.1 in part reproduces results from Ghosh et al. (2009). That is, the LTB, BTT, MTB and GDP strategies are examined in a market without other exploring-style algorithms. They therefore explore in isolation. Their perfor-mance is later used as a benchmark to evaluate the drop in perforperfor-mance as a consequence of simultaneous exploration. We discuss the latter in chapter 8.2. First, section 8.1.1 tests the LTB algorithm in a fully-observable market. Then section 8.1.2, 8.1.3 and 8.1.4 examine respectively the BTT, MTB and GDP algorithms in a partially-observable market.

The LTB algorithm performs well when it explores in isolation. There are two main reasons for this. First, apart from not competing for the first m impres-sions, there is no cost to learning. Second, it learns from all m impressions during the explore phase. This leads to a high-quality empirical distribution. Table 2 displays the results from the Learn-Then-Bid simulations in isolation. It contains indexed values of the achieved f for 256 different combinations of a target f and t. In each column, the base is the target f, which can be found at the top of the column. A score of 100 means it exactly achieves its target f. Anything below 100 means the algorithm gathers too little impressions and vice versa. However, we observe no values above 100, because the algorithm stops

(24)

bidding when it reaches f ∗ n impressions.

Part of the information from table 2 is presented graphically in figure 5. The actual f versus the target f is given for five target t values. This gives five graphs and one additional line that is the ideal graph.

What stands out in table 2 is that most values are close to the ideal score of 100. We conclude the same from observing in Figure 5 that all graphs stay close to the ideal line. This both indicates that LTB is often able to buy the target fraction of impressions. Only 8% of the time is the acquired fraction smaller than 95% of the target fraction. In those cases we observe that both the target f and t are relatively small. Apparently, LTB finds it hard to buy few impressions for a low price.

In appendix B.3 we find table 10, which gives the indexed values of the achieved t. Furthermore, each row has a different base value. This is the target t that can be found on the left side of that row. As before, a part of the results is presented graphically in figure 11 in appendix A.3. This time the actual t versus the target t is given for five values of f. Moreover, the ideal line can be observed as the straight black graph.

Looking at table 10 we immediately notice that the target t is achieved less often than the target f. None of the indices is lower 95, but 38% is bigger than 105. Moreover, 18% of the actual t is twice as big as the target. We conclude that we observe (almost) no underspending, but we do see significant overspending. This observation is strengthened by figure 11. All graphs are above the ideal line up to some point where they merge.

The above indicates that the LTB algorithm is more likely to overspend than it is to buy too little impressions. This can be explained as follows. The LTB algorithm prioritizes f. This means that, whenever it is unfeasible to achieve both f and t, LTB discards t and focuses only on f. Consequently, it starts spending more to buy enough impressions.

When we combine our observations from table 2 and 10 we come to the

(25)

sion that LTB achieves both t and f when the targets are theoretically feasible. This is in line with Ghosh et al. (2009). This becomes even more clear to us when we look at figures 5 and 11 in appendix A. We immediately see that the achieved fraction is always close to the target. Moreover, we conclude that overspending only happens when f and t are not in line. As soon as they are in line LTB achieves both targets.

Moreover, what makes these results possible and is relevant to the topic of this paper, is the quality of the empirical distribution that is constructed by LTB. An example of this is given appendix A.5 figure 17. The empirical distribution presented here is taken from a simulation where the target t = 1.88 and f = 0.24 . Note that the empirical distribution constructed by the LTB is as good as equal to the ideal distribution. It is both complete and accurate. Consequently, this means that the LTB succeeds in uncovering the underlying distribution P. Furthermore, it is exactly here that the algorithm is vulnerable during the explore phase. If it constructs a faulty win rate function (aka empirical distri-bution) its performance will deteriorate.

In conclusion, the Learn-Then-Bid algorithm performs (close to) as well as is theoretically possible. It under- and overspends only when targets are not in line. This is in agreement with Ghosh et al. (2009). What is more, the reason for its performance is the complete and accurate empirical distribution that is constructed by the algorithm.

8.1.2 Bid-Two-T

The Bid-Two-T algorithm performs well given its simple design. It bids two times t until it has gathered f ∗ n impressions. This limits overspending. More-over, BTT buys at least half its target amount when it spends the whole budget. Therefore, BTT shows us that simple can mean safe. In the remainder of this section, we discuss the performance of the BTT algorithm when it explores in isolation. We study the fraction it gathers and the average price it pays.

(26)

The results from the BTT in isolation simulations are given in table 3 and 11 in respectively appendix B.1 and B.3. The structure of these tables is the same as those of the ones discussed in section 8.1.1. Table 3 and 11 give the indexed values of respectively the actual fraction and actual spend per impression. In table 3, each column has a different base value. It can be found at the top of the column. In table 11 the rows have different base values. Base values can be found on the left of each row. Furthermore, in figure 6 and 12 in respec-tively appendix A.1 and A.3, some of the results are presented graphically. In the same figure the results for Max-Then-Bid and Guess-Double-Panic are also given.

We conclude from table 3 that BTT gathers the target amount f ∗ n more often for larger values of t. This makes sense since the bids are directly linked to t. Higher bids tend to win more auctions and thus buy more impressions. In total, BTT achieves the target fraction in 58% of the simulations. In the remainder of the simulation two scenarios are possible. Either the targets are not feasible. Or the targets are feasible, but BTT’s bid is not optimal. The latter is the consequence of the lack of an exploration phase. When we now look at figure 6 we see the same pattern as in table 3. For higher t the graphs are closer to the ideal line. However, at some point, the fraction becomes unfeasible given a bid of 2 ∗ t. The graph then stops following the ideal line and continues horizontally. Next consider table 11 and figure 12. The pattern is best observed when looking at the graphs in figure 12. We see that BTT spends the same per impression re-gardless of the target fraction. This makes sense because the spending is linked to the bid. Which in turn is depends only on t and not f. Furthermore, from theory, we know that BTT never overspends by more than 2 ∗ t per impression. This confirmed by looking at table 11. The highest observed index is 142. In conclusion, BTT does well if targets are feasible. It has no way of dealing with unfeasible targets other than failing to gather enough impressions. Its weakness is caused by lack of ability to learn and adapt. On the other hand, this could

(27)

be a strength when multiple BTT algorithms are in a market together. When it doesn’t learn, it cannot learn incorrectly. Thus its simplicity could protect it.

8.1.3 Max-Then-Bid

The Max-Then-Bid algorithm perform as well as the LTB algorithm when it is exploring in isolation. During its exploration period, it bids its whole budget. This is a risky procedure. However, it does gather a lot of information in this period. Very likely it wins every bid while learning and learn as much as the LTB algorithm. Something which can be observed by comparing their empirical distributions. Note, however, that the MTB algorithm is bound to perform badly when exploring simultaneously.

The tables 4 and 12 in respectively appendix B.1 and B.3 contain the results from the MTB in isolation experiment. These tables are organized the same as the tables discussed in 8.1.1 and 8.1.2. Moreover, figures 6 and 12 in appendix A.1 and A.3 present part of the results graphically. The MTB graphs are marked by the down facing open triangles.

MTB is able to gather the right amount of impressions most of the time. We first look at table 4. MTB has an index of 95 or higher 89% of the time. Very close to the 92% achieved by LTB. Similarly to LTB, the MTB algorithm prioritizes f. In combination with a complete and accurate empirical distribution, this leads to a good performance. We discuss the empirical distribution later on. Furthermore, the similarity between the achieved fraction of MTB and LTB becomes more apparent when we compare figure 5 and figure 6. They both follow the ideal line.

We now discuss the spend per impression that MTB achieves. Results are in table 12 and figure 12. It is clear that MTB prioritizes f. In order to always achieve the target fraction, it systematically overspends when targets are not in line. MTB has an index higher than 105 in 40% of the studied cases. For LTB this percentage is 38%. The similarity in their performance becomes most clear

(28)

from comparing the graphs from figure 11 with the MTB graphs in figure 12. For each of the target fractions, the actual spend per impression is above the target spend per impression until both targets become feasible. Then graphs for both the MTB and LTB are equal to the ideal line.

When MTB explores without the interfering of other algorithms it constructs a near perfect empirical distribution. Let’s look at figure 18 in appendix A.5. This presents the empirical distribution for t=1.88 and f=0.24. It is identical to the ideal distribution. This was also true for the empirical distribution constructed by LTB. The similarity comes from the fact that both LTB and MTB know all m highest bids during their explore phase. LTB because it operates in a fully-observable market. MTB because it wins all auctions in that period. However, MTB is extremely vulnerable during this period. If any other MTB algorithm would explore at the same time, one could easily see how this could go wrong. Both would bid their whole budget. And thus the winner pays the other’s whole budget.

In conclusion, the MTB algorithm performs as well as the LTB algorithm. That is, it performs close to what is theoretically possible. It gathers the target fraction most of the time and it overpays only when targets are not in line. Moreover, its empirical distribution is equal to the ideal. However, it takes big risks in acquiring this distribution.

Finally, we examine Guess-Double-Panic and how it performs when exploring in isolation. GDP starts bidding 2 ∗ t. After each round, it checks if it has gath-ered enough information. If not, it doubles its bid and continues. Otherwise, it enters the exploit phase. First, we examine the actual fraction GDP gathers. Then we look at the average spending. Finally, the empirical distribution is examined.

Table 5 and 13 in appendix B.1 and B.3 contain the indexed results from the

(29)

periment. As before, base values for table 5 are at the top of each column. Base values for table 13 are to the left of each row. Moreover, part of the information is also presented in figures 6 and 12. Lastly, GDP’s empirical distribution for t = 1.88 and f = 0.24 is given in figure 19 in appendix A.5.

From examining table 5 we conclude that GDP gathers 95% of the target im-pressions 92% of the time. LTB scores 92% too and MTB achieves 89%. One look at figure 100 then tells us that, just like LTB and MTB, the graphs that represent the actual fraction gathered are very close to the ideal line. This comes from the fact that GDP prioritizes the target fraction. And apparently, GDP constructs a high-quality win rate function. We discuss this later. We proceed to examine the spend per impression achieved by GDP. Looking at figure 6 we observe striking similarities between GDP and both LTB and MTB. When targets are not in line, it overspends. The graphs hover above the ideal line until targets are feasible. However, there is a noteworthy difference in the achieved target spend by MTB and GDP. For small f GDP overspends less and for larger f MTB overspends less. The reason for this is that MTB overspends and overbuys a lot during its short exploration phase. This makes a big impact on the realization of its targets when the targetf is small. On the other hand, GDP explores more cautiously. This causes it to spend a lot of time exploring when there is a big gap between the target t and the actual average price per impression that is necessary to achieve f. And thus, it overspends more it those situations.

We now come to an important part of this section. We study figure 19 which holds the empirical distribution of the GDP. Examining this helps us understand what happens when GDP algorithms explore simultaneously later on. First of all, we note that the empirical distribution constructed by the GDP algorithm is less complete than the LTB and MTB constructed ones. It does not fully follow the ideal distribution. However, it does follow it up till a probability of winning of around 0.4. Note that it doesn’t have to be more complete than this since its

(30)

target fraction, in this case, is f=0.24. So the empirical distribution is perfect for the part that matters. The algorithm does not use the top part of the win rate function. And thus spends no resources on acquiring it. In conclusion, the empirical distribution might be incomplete, but it is effective.

In conclusion, the GDP algorithm performs just as well as the LTB and the MTB. The LTB could be said to do better, but the difference is minor. GDP gathers the target fraction 92% of the time and overspends only when targets are unfeasible. Finally, its win rate function might not be complete, it is very functional since the parts that matter to its targets are as well as the ideal distribution.

8.2 Simultaneous Exploration

In this section we discuss the results from the experiments where two algorithms compete in the same market. In turn we discuss Learn-Then-Bid, Bid-Two-t, Max-Then-Bid and Guess-Double-Panic.

In this section, we study what happens to the performance of Learn-Then-Bid if two LTB algorithms explore simultaneously. From section 8.1.1, we know that LTB performance in isolation is near to the theoretical ideal. However, in these simulations, the algorithms explore at the same time, both form their empirical distributions and then enter the market to exploit.

Table 6 and 14 in appendix B.2 and B.4 contain respectively the indexed values of the achieved f and t. Moreover, some of these results are also graphically presented in the figures 7 and 13 in appendix A.2 and A.4. Note that we only include theoretically feasible fractions. And thus f ranges from 0.06 till 0.47. Finally, in figure 20 in appendix A.6 the win rate is presented. Note that the ideal empirical distribution now is slightly different than the one from the

(31)

ulations with exploration in isolation.

From table 6 we conclude that LTB now, in contrast to the LTB in isolation, is not able to gather the target fraction of impressions. In fact, only in 17% of the cases is LTB able to gather within 5% of the target fraction of impressions. When LTB explores in isolation this is 92 %. We also note that if the target spend per impression is high and the target fraction is low LTB tends to per-form better and vice versa. Furthermore, this pattern is also visible in figure 7. For any value of t, the graphs move away from the ideal line as the target fraction f increases. The reason for this is that low f and high t gives a low bid frequency. In turn, this means less interaction between the algorithms and thus it can perform more like the LTB in isolation. Finally, how much better the LTB performs in isolation becomes apparent when comparing figures 5 and 7. In figure 5 the graphs representing actual and target f are nearly identical. In figure 7, they are not.

Next, we consider table 14. We note that overspending when the algorithms explore simultaneously is larger than in the LTB in isolation case. To be pre-cise, only in 2% of the simulations is the target spend per impression achieved within a 5% interval. Remember that for LTB in isolation this number was 62%. Moreover, on average the index for the actual t in table 14 is 192. For table 10 this is 125. That’s an average increase of spending of 54%. When we compare figure 11 and 13 we see two different patterns. The former hovers above the ideal line while targets are unfeasible. However, it eventually merges. Then we have figure 13 which never connects with the ideal line and bends away from it in a curved shape. Implying that LTB is not able to spend the target amount per impression when it explores simultaneously.

Part of the spending increase is caused by an increase in demand. Suppose both algorithms aim to buy 10% of the impressions. If both succeed, together they must have bought 20%. The cheapest 10% of the impressions have a lower aver-age price than the cheapest 20% of the impressions. This increase in the averaver-age

(32)

price can be observed in table 1. However, note that the LTB does not always succeed in gathering the target amount of impressions. The other thing that drives the increase in spending is a faulty empirical distribution. This causes it to misjudge how often it is going to win and what it is going to pay. We will discuss the empirical distribution next.

The gap between LTB’s empirical distribution and the market it encounters is the main reason why its performance declines. The blue line in figure 20 is the empirical distribution formed by LTB. The red line is what would have been the ideal empirical distribution. That is, it is the win rate function for a market with one LTB algorithm in it. When LTB explores it has no way of telling that another LTB algorithm does the same. Thus when they simultaneously enter the market, their bid is not optimal. Because the market they enter is different from the market they learned from. Here a vulnerable aspect of the explore phase is uncovered. It is not prepared for a changing market. However, this could be fixed by creating a continuous or repetitive explore phase. Finally, when we look at figure 20 again we see that the algorithm indeed overestimates its win chances for bids ranging from zero to four. Which causes it to buy too little impressions.

In conclusion, the LTB algorithm does not gather its target amount of impres-sions when it explores simultaneously with another LTB algorithm. Moreover, it consequently overpays. In part, this is due to an increase in demand. However, a wrong empirical distribution is the main cause for the decline in performance compared to the case where LTB explored in isolation. And in turn, this has to do with LTB’s inability to adapt to a changing market. This could possibly be solved by a continuous or repetitive explore phase.

8.2.2 Bid-Two-T

In this section, we examine the performance of two Bid-Two-T algorithms that start bidding simultaneously. Remember, the BTT algorithm does not explore.

(33)

Table 1: Average price of cheapest impressions. It is given for lowest price percentages ranging from 5% till 50%.

Cheapest % of Impressions 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%

Average Price 183 504 934 1467 2104 2850 3712 4698 5820 7091

It simply bids two times the target spend per impression until it has gathered enough impressions. Moreover, from section 8.1.1 we know that if only one algorithm is in the market it does well as long as the targets are feasible. It has no way of dealing with unfeasible targets and therefore cannot buy enough impressions in such cases. However, BTT at maximum overspends by a factor two and therefore buys at least half of f if it spends its whole budget. In this section, we discuss the fraction and spend per impression the BTT achieves when it bids simultaneously with another BTT algorithm. Moreover, we compare these results with BTT in isolation and with the results from the LTB algorithm in section 8.2.1.

Table 7 in appendix B.2 presents the fraction that BTT wins in the experiment. In 59% of the simulations, BTT achieves 95% or more of the target fraction. Remember, when BTT performs in isolation, it achieves its target fraction 58% of the time. And thus it does equally as well in a market with two algorithms as with one. Furthermore, when we turn to figure 8 in appendix A.2 we come to the same conclusion. The graphs follow the ideal line until the targets become unfeasible for the algorithm. The same pattern is visible in figure 6 when it is the only algorithm in the market. BTT’s lack of a learning period turns into an advantage when multiple algorithms start at the same time. In the simulations where two algorithms start at the same time, BTT wins more than or equal to the number of impressions that LTB gathers.

The results in table 15 in appendix B.4 and figure 14 in appendix A.4 are clear. In every studied case BTT spends within 5 index points of twice its target price.

(34)

We explain this by considering its strategy. It bids 2 ∗ t every auction until it has f ∗ n impressions. If two identical algorithms bid simultaneously then there are three possibilities. One, the BTT bids are lower than the top market bid. Nothing happens. Two, the BTT bids are equal to or higher than the highest simulated bid. A coin toss follows, the winner wins the auction and pays the price which is 2 ∗ t. Three, one of the algorithms has bought enough impressions and withdraws. The other one continuous bidding. In this case, an average price lower than 2 ∗ t can be achieved. We observe this for some of the smaller target fractions in table 15. Finally, on average BTT has a slightly higher index for the actual t than LTB. The mean of the former is 199 and the latter 192. In conclusion, when two BTT algorithms start bidding simultaneously they nearly always win the target number of impressions. Moreover, since they bid against each other they pay 2 ∗ t on average. The simplicity of the strategy is its strength. Lastly, it could be argued that BTT, when algorithms start at the same time, performs better than the more sophisticated LTB.

8.2.3 Max-Then-Bid

The next algorithm that we examine in a setting where it simultaneously ex-plores is Max-Then-Bid. Just as BTT, MTB is a strawman algorithm. It takes a lot of risks when it bids its whole budget during the explore phase. In the remainder of his section, we will see that MTB overspends in an extreme matter. We will first study the fraction it gathers, then the price per impression and finally the win rate function. Note that the tables and figures discussed in this section sometimes have adjusted axis to show the high spending.

First of all, we examine the achieved fraction in table 8 in appendix B.2. We conclude that MTB is able to gather the target fraction most of the time. In fact, for 128 combinations of t and f it gathers 95% or more of the target im-pression 95% of the time. That’s 6% more often than MTB when it explores in isolation. However, as we will see later on, MTB pays a price for this

(35)

ment. Note that MTB does not achieve an index of 100 every time. Especially for the relatively higher values of Apparently, it is not able to predict its win rate accurately. This has to do with the win rate function which we examine later.

When we turn our attention to table 16 in appendix B.4 and figure 15 in ap-pendix A.4 we see the result of the risky explore phase. MTB overspends ex-tremely. On average it pays 1200 times the target price. The two algorithms engage in a bidding war. Blinding each other during the explore phase with extremely high bids. The only bids MTB learns from is the bid of the other MTB algorithm which is equal to their whole budget. We will see later that this leads to a useless empirical distribution.

In figure 21 in appendix A.6 the empirical distribution is given. Note that it is very different from the one in figure 18 that was formed by the MTB algorithm that explored in isolation. In the simultaneous case, MTB only sees bids from the other algorithm during its exploration phase. This bid is not at all repre-sentative of the rest of the market. The only information contained in figure 21 is that when the algorithm bids the whole budget it wins around 50% of the impressions. Therefore, it is not able to optimize its bidding correctly.

In conclusion, when MTB explores at the same time as another MTB algorithm it often gathers the target fraction but overspends by a lot. In the simulation, it overspends on average by a factor of 1200. The main reason for this is a failed explore phase. The algorithms cannot learn the underlying distribution because they only see the bid of the other algorithm. Therefore they construct a useless empirical distribution.

Lastly, we examine the Guess-Double-Panic algorithm when it explores simul-taneously. GDP is the most advanced and realistic strategy. Remember that the GDP algorithm starts with bidding 2 ∗ t. Then it doubles this bid each

(36)

explore round as long as it has not gathered enough information. When it has enough data it starts the exploit phase. Moreover, if at some point it expects to not meet its targets then it goes into a panic mode. We start by discussing the fraction that is gathered by GDP. Then we continue with the average spend. Finally, we consider the win rate function.

In table 9 in appendix B.2 the indexed value of the achieved fraction can be found. GDP performs relatively well. In 84% of the simulations, it gathers at least 95% of its target number of impressions. When exploring in isolation it managed to achieve the same 95% of the time. Moreover, it rarely has an index lower than 90, only in 3% of the cases. We conclude the same when looking at figure 10 in appendix A.2. The graphs that represent the actual fraction is close to the ideal line for all four values of t. However, we do see that each graph diverges a little downwards around a different target fraction. This is a consequence of the censored empirical distribution which we discuss later. Next now examine the achieved price per impression in table 17 in appendix B.4. On average it pays 4 times its target price. Remember, when GDP explored in isolation it paid 3.25 times its target. Furthermore, in only 8% of the studied cases does the GDP algorithm that explores together with another algorithm spend within 5% of its target. In isolation, GDP achieved this 65% of the time. However, this is not an entirely fair comparison since what is feasible differs in both situations. It is better to compare figure 12 and figure 16. What we see in figure 12 is that when GDP is on its own it is able to achieve the target spend per impression when targets have both become feasible. This point of convergence is what is missing in the simultaneous case. When GDP explores simultaneously is loses its ability to reliably achieve t when targets should be both feasible. As we will later see, this has a lot to do with the limited empirical distribution that it uses to pick an optimal bid.

Let us now compare the performance of the different strategies when they ex-plore simultaneously. The GDP algorithm overspends by more than the BTT

(37)

algorithm. Remember that BTT and GDP overspend respectively by a factor of two and four. However, the reason GDP overspends is because it gives priority to the target fraction. GDP gets within 5% of f 84% of the time, BTT only 59 %. Taken over both targets, GDP could be said to outperform BTT. Moreover, since we’ve argued that BTT outperforms LTB, GDP does better than LTB as well. Finally, if we compare MTB and GDP, GDP wins again. It is true that MTB is better at gathering the target amount. However, MTB pays such a high price per impression that GDP is the preferred algorithm.

Now consider figure 22, which holds the empirical distribution constructed by the GDP algorithm for t=1.88 and f=0.24. In blue we see the win rate con-structed by the algorithm. It looks similar to the one concon-structed by MTB. The probability of winning is zero up to some bid. Then for that bid, the chance of winning is enough s.t. the algorithm can achieve its target fraction and price. However, note that there seem to be multiple equilibria in the form of plateaux. This is part of the randomness that comes from the coin flips when two bids are equal. What’s most important about the empirical distribution from figure 22 is that it is censored. It only gives us information about one bid value. The reason this happens is that when the algorithm acquires information, when it wins, it only sees the bid of the other algorithm which is equal to its own bid. Let us now consider the consequence of this censored empirical distribution by going through the explore phase step-by-step. Both algorithms start bidding 2 ∗ t. They keep doubling this bid until they win their target fraction of im-pressions. This means that if they both need 10% of the impressions they up their bid until the moment when they together win 20%. This ends up to be an efficient method for achieving the target fraction, but a little less efficient for spending their target price per impression. The latter can be clarified by the following example.

Suppose the starting bid for both algorithms is 4, the target spend per im-pression is 10 and the optimal bid that achieves this target spend is 17. The

(38)

algorithms first double their bid to 8 and then to 16. Now suppose that at a bid of 16 they could win enough impressions. However, they continue doubling their bid one more time because their average spend per impression is not yet achieved. Their bid is now 32 and they enter the exploit phase. However, the censored empirical distribution forces it to bid 32 at a bidding frequency A. This achieves the target fraction, but it will lead to overspending since the optimal bid is 17. Possibly, a solution is to optimize the factor with which the previous bid is multiplied.

In conclusion, when multiple GDP algorithms explore simultaneously they are relatively good at gathering the target number of impressions. However, the GDP has trouble optimizing its bid such that it achieves the target price per impression. It manages this only in 13 % of the studied cases. Far less than when it explores in isolation. This has to do with its censored empirical dis-tribution. Only a small amount of information contained in it and that results in a bid that is not optimal. Consequently, the GDP algorithm performs worse when it enters a market where other GDP or explore style algorithms are at work. Its explore phase is vulnerable.

9 Conclusion

We study the consequence of two adaptive bidding algorithms that explore si-multaneously. This is inspired by the paper of Ghosh et al. (2009) who studied the problem of acquiring a given number of impressions with a given budget constraint by bidding against an unknown distribution. In this paper, we go one step further and we not only bid against an unknown distribution but also bid against another adaptive bidding algorithm. Moreover, in this section, we summarize the results and conclusions from this study. First, we discuss the experiment where one algorithm took part. Then we discuss the case where two algorithms compete in the market.

(39)

With only one algorithm in the market the Learn-Then-Bid, Max-Then-Bid and Guess-Double-Panic performed equally well and outperformed the simple Bid-Two-t. LTB, MTB, and GDP perform close to as well as is theoretically possible. This is in line with what Ghosh et al. (2009) found. These three algorithms are able to gather the target amount of impressions against the target price per impression as long as the targets are feasible. Another requirement is that the observations during the explore phase have to be accurate. This then results in an accurate empirical distribution. When they explore in isolation, all three strategies construct an accurate empirical distribution.

However, when two algorithms are in the market simultaneously the perfor-mance of all four algorithms deteriorates. The risky Max-Then-Bid strategy is able to gather the target number of impressions but overpays by a factor of 1200. Furthermore, Learn-Then-Bid consequently overpays and instead of al-ways gathering the target number of impressions it now only does so in 17% of the cases. Strikingly, the BTT algorithm does arguably better than both Max-Then-Bid and Learn-Max-Then-Bid in the market with two algorithms. Through it, simplicity it at maximum overspends by twice its target amount and if it spends its whole budget it at least gathers half of its target amount of impressions. Fi-nally, the performance of Guess-Double-Panic also gets worse. It still is able to gather within % of its target fraction of impressions 84% of the time. However, it has trouble achieving the target price per impression and only succeeds in this 13% of the time. Still, Guess-Double-Panic performs best of the four algorithms in a situation where two algorithms are in a market simultaneously.

The reason for the deterioration of the performance of Learn-Then-Bid, Max-Then-Bid and Guess-Double-Panic is that they construct faulty empirical dis-tribution. For Learn-Then-Bid this is the consequence of a changing market. The market it prepared for is different from the market it enters because it has no way of knowing that another Learn-Then-Bid algorithm enters simultane-ously. Furthermore, Max-Then-Bid and Guess-Double-Panic both suffer from a

(40)

censored empirical distribution. Because the only bid they learn from is the bid of the other algorithm. Therefore their empirical distribution contains informa-tion about just one bid. This especially hinders them in determining a bid that achieves the target spend.

The goal of this paper is to study the effect of simultaneous exploration by two adaptive bidding algorithms. We can conclude that a censored empirical distribution causes the adaptive bidding algorithm to underperform. Future studies could focus on improving the adaptive bidding algorithm. Possible solu-tions could be a continuous exploration phase, a delayed exploration or repeated explore phase.

(41)

References

[1] Kirilenko, Andrei; Kyle, Albert S.; Samadi, Mehrdad; Tuzun, Tugkan The Flash Crash: High-Frequency Trading in an Electronic Market The Journal of Finance, 2017

[2] Yuan, S.; Wang, J.; Zhao, X. Real-time bidding for online advertising: mea-surement and analysis ADKDD’13, Proceedings of the Seventh International Workshop on Data Mining for Online Advertising , Chicago, 2013

[3] Ghosh, A.; Rubinstein, B. I. P.; Vassilvitskii, S.; Zinkevich, M. Adaptive bid-ding for display advertising WWW’09, Proceebid-dings of the 18th international conference on World wide web, Pages 251-260, Madrid, 2009

[4] Liao, H.; Peng, L.; Zhenchuan, L.; Shen, X. iPinYou Global RTB Bidding Algorithm Competition Dataset ADKDD’14, Proceedings of the Eighth In-ternational Workshop on Data Mining for Online Advertising, Pages 1-6, New York, 2014

[5] Chen, Y.; Berkhin, P.; Anderson, B.; Devanur, N. R. Retime bidding al-gorithms for performance-based display ad allocation KDD ’11, Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, Pages 1307-1315, San Diego, 2011

(42)

Appendices

A

Figures

A.1 Actual Fraction Won: One Algorithm

Figure 5: Fraction of supply won by Then-Bid in a market with 1 Learn-Then-Bid algorithm, with m = 100 for 4 values of t.

(43)

Figure 6: Fraction of supply won by Bid-Two-t, Max-Then-Bid and Panic in a market with 1 Bid-Two-t, Max-Then-Bid or Guess-Double-Panic algorithm with m = 100 for 4 values of t.

(44)

A.2 Actual Fraction Won: Two Algorithms

Figure 7: Fraction of supply won by Then-Bid in a market with 2 Learn-Then-Bid algorithms, with m = 100 for 5 values of t.

(45)

Figure 8: Fraction of supply won by Bid-Two-t in a market with 1 Bid-Two-t algorithm, with m = 100 and for 4 values of t.

Figure 10: Fraction of supply won by Guess-Double-Panic in a market with 1 Guess-Double-Panic algorithm, with m = 100 and for 4 values of t.

(46)

Figure 9: Fraction of supply won by Then-Bid in a market with 1 Max-Then-Bid algorithm, with m = 100 and for 4 values of t.

A.3 Actual Spend Per Impression: One Algorithm

Figure 11: Spend per impression by Then-Bid in a market with 1 Learn-Then-Bid algorithm, with m = 100 and for 5 values of f.

(47)

Figure 12: Spend per impression by Bid-Two-t, Max-Then-Bid and Panic in a market with 1 Bid-Two-t, Max-Then-Bid or Guess-Double-Panic algorithm with m = 100 for 4 values of t.

(48)

A.4 Actual Spend Per Impression: Two Algorithms

Figure 13: Spend per impression by Then-Bid in a market with 2 Learn-Then-Bid algorithms, with m = 100 for 4 values of t.

(49)

Figure 14: Spend per impression by Bid-Two-t in a market with 2 Bid-Two-t algorithms, with m = 100 for 4 values of t.

Figure 15: Spend per impression by Then-Bid in a market with 2 Max-Then-Bid algorithms, with m = 100 for 4 values of t.

(50)

Figure 16: Spend per impression by Guess-Double-Panic in a market with 2 Guess-Double-Panic algorithms, with m = 100 for 4 values of t.

(51)

A.5 Empirical Distribution: One Algorithm

Figure 17: In red, the Empirical distribution (or win rate function) by Learn-Then-Bid in a market with 1 Learn-Learn-Then-Bid algorithm, with m=100, t=1.88 and f=0.24. In blue, the ideal distribution.

(52)

Figure 18: In red, the Empirical distribution (or win rate function) by Max-Then-Bid in a market with 1 Max-Max-Then-Bid algorithm, with m=100, t=1.88 and f=0.24. In blue, the ideal distribution.

Figure 19: In red, the Empirical distribution (or win rate function) by Guess-Double-Panic in a market with 1 Guess-Guess-Double-Panic algorithm, with m=100, t=1.88 and f=0.24. In blue, the ideal distribution.

(53)

A.6 Empirical Distribution: Two Algorithms

Figure 20: In red, the Empirical distribution (or win rate function) by Learn-Then-Bid in a market with 2 Learn-Learn-Then-Bid algorithms, with m=100, t=1.88 and f=0.24. In blue, the ideal distribution.

(54)

Figure 21: In red, the Empirical distribution (or win rate function) by Max-Then-Bid in a market with 2 Max-Max-Then-Bid algorithms, with m=100, t=1.88 and f=0.24. In blue, the ideal distribution.

Figure 22: In red, the Empirical distribution (or win rate function) by Guess-Double-Panic in a market with 2 Guess-Guess-Double-Panic algorithms, with m=100, t=1.88 and f=0.24. In blue, the ideal distribution.

(55)

B

Tables

B.1 Indexes of Actual Fraction Won: One Algorithm

Table 2: Indexed values of fraction won by Learn-Then-Bid in a market with 1 Learn-Then-Bid algorithm, with m=100 for 256 combination of f and t. Each column has a base value that is the target fraction the top of the column.

f = 0.06 f = 0.12 f = 0.18 f = 0.24 f = 0.29 f = 0.35 f = 0.41 f = 0.47 f = 0.53 f = 0.59 f = 0.65 f = 0.71 f = 0.76 f = 0.82 f = 0.88 f = 0.94 0.24 86 90 92 94 93 94 96 95 97 96 98 96 98 98 98 98 0.47 92 89 90 95 93 93 95 96 95 95 96 97 97 98 98 99 0.71 94 95 94 95 93 94 95 94 95 94 97 97 97 98 98 99 0.94 95 97 97 96 96 94 94 94 96 96 96 97 97 98 98 98 1.18 99 97 98 97 98 97 97 96 95 96 97 96 97 97 98 98 1.41 99 97 99 98 98 97 100 98 99 97 97 98 97 98 97 99 1.65 97 98 99 99 98 99 98 99 98 97 96 97 98 97 98 98 1.88 99 99 99 99 99 99 99 99 99 99 98 98 98 98 99 98 2.12 99 99 99 100 100 99 100 99 100 100 99 99 98 98 98 98 2.35 99 99 99 100 100 99 100 100 99 99 99 99 99 98 98 99 2.59 99 99 99 99 99 100 100 100 99 100 100 100 99 99 98 98 2.82 99 100 100 100 100 100 100 100 100 100 100 100 100 99 99 99 3.06 99 100 100 100 100 100 100 100 100 100 100 100 100 100 99 99 3.29 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 99 3.53 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 99 3.76 99 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

(56)

Table 3: Indexed values of fraction won by Two-t in a market with 1 Bid-Two-t algorithm, with m=100 for 256 combination of f and t. Each column has a base value that is the target fraction the top of the column.

f = 0.06 f = 0.12 f = 0.18 f = 0.24 f = 0.29 f = 0.35 f = 0.41 f = 0.47 f = 0.53 f = 0.59 f = 0.65 f = 0.71 f = 0.76 f = 0. 82 f = 0.88 f = 0.94 t = 0.24 68 34 23 17 14 11 10 9 7 7 6 6 5 5 5 4 t = 0.47 100 100 82 61 49 41 35 31 27 25 22 21 19 18 16 15 t = 0.71 100 100 100 100 87 73 62 55 49 44 40 36 34 31 29 27 t = 0.94 100 100 100 100 100 100 87 76 68 61 55 51 47 44 40 38 t = 1.18 100 100 100 100 100 100 100 94 84 75 69 63 58 54 50 47 t = 1.41 100 100 100 100 100 100 100 100 98 88 80 73 68 63 59 55 t = 1.65 100 100 100 100 100 100 100 100 100 98 89 82 75 70 65 61 t = 1.88 100 100 100 100 100 100 100 100 100 100 97 89 82 77 71 67 t = 2.12 100 100 100 100 100 100 100 100 100 100 100 95 88 82 76 71 t = 2.35 100 100 100 100 100 100 100 100 100 100 100 100 93 86 81 75 t = 2.59 100 100 100 100 100 100 100 100 100 100 100 100 97 90 84 79 t = 2.82 100 100 100 100 100 100 100 100 100 100 100 100 100 93 87 82 t = 3.06 100 100 100 100 100 100 100 100 100 100 100 100 100 96 90 84 t = 3.29 100 100 100 100 100 100 100 100 100 100 100 100 100 99 92 87 t = 3.53 100 100 100 100 100 100 100 100 100 100 100 100 100 100 94 88 t = 3.76 100 100 100 100 100 100 100 100 100 100 100 100 100 100 96 90 56

Fragility of the exploration phase of adaptive bidding algorithms