An approach of solving early phase multiplayer No-Limit Hold'em poker using empirical Bayesian statistics and vector spaces

(1)

An approach of solving early phase multiplayer

No-Limit Hold’em poker using empirical

Bayesian statistics and vector spaces

A thesis presented by Damiaan Reijnaers (10804137)

for the degree of Bachelor of Science in Artiﬁcial Intelligence

Credits: 18 EC University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor dr. M.A.F. Lewis

Institute for Logic, Language and Computation Faculty of Science

University of Amsterdam Science Park 107 1098 XG Amsterdam

(2)

Abstract

This thesis aims to propose an efficient methodology for the purpose of representation, and decision-making based on comparison, of early-game states in No-Limit Texas Hold’em poker. The paper presents a predictive model in the form of an augmented decision tree based on the distance between geometrically represented game situations in an euclidean vector space, wherein opponent models contribute as situational characteristics. This research identifies a shortcoming in existing work on opponent modelling, and solves it by introducing a mixed technique based on Bayesian statistics and beta-binomial regression. An implementation is suggested and tested. By using the proposed method, it can be concluded that an artificial agent is able to distinguish between different game situations and to make (a variety of) decisions based on situational factors.

(3)

List of Figures

1 Dependency graph for variables in a poker game . . . 9

2 Visualisation of VPIP- and PFR/VPIP-values for players in dataset . . . 11

3 Clusters of player characteristics . . . 13

4 Relation between number of observations and player metrics . . . 14

5 Beta distributions resulting from beta-binomial regression . . . 15

6 Generated situational space for a raise from ﬁrst position . . . 16

7 Relation between folds and raising size . . . 19

8 Sketch of situation with raise, call and fold to agent on BU . . . . 20

9 Sketch of situation with raise from ﬁrst position, folds to agent on SB . . . . 27

10 Part of decision tree after agent re-raises . . . 27

List of Tables

1 Estimations for player metrics using various methods . . . 22

2 Analysis of parameter d in equation 14 . . . . 23

3 Observations of situations in used dataset . . . 23

4 Estimated weights for raise from ﬁrst position . . . 26

5 Estimated hole card range for a player who raised from ﬁrst position. . . 27

List of Algorithms

1 Estimating a raising size for the agent . . . 25

(5)

1 Introduction

The rise of Deep Blue, solving chess in 1996; and the recent victory of AlphaGo, ‘solving’ the game of Go in 2017—the human player against whom AlphaGo competed actually won one out of three rounds, which can be called a victory for AlphaGo, but can merely be called a definitive ‘solution’—has made developments towards solving Texas Hold’em poker more relevant. Occasionally, programs such as PokerSnowie gain a considerable amount of attention by making progress towards creating an ‘unbeatable poker algorithm1.’ However, poker is a totally different game, which, unlike Chess and Go, involves a great deal of chance. It is a game with imperfect and unreliable information, involving opponent modelling, risk management and deception – this illustrates the relevance of the game to significant areas in AI research (Billings et al., 1998).

Despite the relatively narrow scope of this thesis, multiple different disciplines of AI are involved in this paper. Moreover, the methods presented in this paper are relevant for broader use than just poker. The to-be-introduced formulas for beta-binomial regression in section 3.1.2 are applicable for any problem in which the estimation of long-term probabilities plays a role – an example may be the ‘batting average’ of baseball or cricket players. This is the number of a player’s ‘hits’ divided by their ‘at bats’2 (Robinson, 2017). Vector spaces (further explained in section 3.1.3) can be used as an approach for a wide variety of problems containing a large or an infinite number of (game) states that can be represented by numbers, such as drug repositioning and stock markets (Manchanda and Anand 2017; Bai et al. 2018, p. 217). For this reason, concepts in this paper are represented both symbolically, using first-order logic, and mathematically, in an attempt to make the concepts more accessible for other purposes than poker.

1PokerSnowie, Challenge PokerSnowie, https://www.pokersno wie.com/blog/taxonomy/term/13, Accessed on February 27th, 2020 2Major League Baseball, What is a Batting Average (AVG)?, http:// m.mlb.com/glossary/standard-stats/batting-average, Ac-cessed on February 27th, 2020

The game’s complexity does not necessarily implicate that a computer can never be profitable playing games of poker. Since there is a clear distinction between profitable and non-profitable players, it can be assumed that at least a part of the game is not based on chance. And, however small this part in the distribution between skill and chance might be, it is big enough to turn hundreds of online play-ers into winnplay-ers of tens of thousands of US dollars3. Two studies involving groups of instructed and non-instructed poker players reinforced the statement of a greater role of skill than chance (DeDonno and Detterman, 2008). Moreover, although broadly sceptical on the matter, Meyer et al. (2013) as well states that “experts seem to be better able to minimize losses when confronted with disadvantageous conditions.” Nevertheless, it is worth noticing that besides concluding “that the outcomes of poker games are predominantly determined by chance,” Meyer et al. nuances this conclusion by stating that their conclusion applies “at least to short game sequences,” thus leaving long game sequences open for discussion. DeDonno and Detterman filled up this gap by showing that skill is the determining factor in long-term outcome. This is the same advice professional poker coaches teach their students: “it is about focusing on small advantages to win in the long run4,5.”

In contrast to the work discussed above, this thesis will not yield an answer to the debate whether poker is a game of chance or skill, but will instead focus on how a computer could ﬂawlessly take over certain aspects of the game, using the theory of Bayesian statistics and vector spaces. The aim of this thesis is to answer the following question: how can an agent eﬃciently make use of the theory of Bayesian statistics and vector spaces to represent and compare game situations in order to make decisions in the early stages of a multi-player No-Limit Texas Hold’em game?In order to adequately come up with an answer to this question, this research question will be subdivided into two subquestions: How can opponents be modelled by considering a population of players? and How can these opponent models be used to make decisions based on publicly available information?

3HighstakesDB, Biggest Poker Winners - Top Money Winners in

On-line Poker, https://www.highstakesdb.com/poker-players.asp

(6)

The approach presented in this paper is based on a couple of personal observations. At ﬁrst, if for a willing and competent human, taking part in a relatively small sample of poker hands (a couple of hundred thousand or million) suﬃce to turn the person into a winning player, a computer can certainly do it with the same number of hands and probably less. Secondly, when playing against an unknown opponent, one uses the knowledge gained previously by playing against other opponents, which indicates the use of a ‘population’ – a statement supported by Chen and Ankenman (2006, p. 38, 67).

It should be noted that it is assumed that the reader is aware of the rules of Texas Hold’em, which is necessary to understand the essence of the method presented in this paper. A brief introduction to the rules of the game relevant for this thesis are outlined in Appendix A. Throughout the document, conventional poker terms will be used. A list of these terms and their explanations can be found in the glossary in Appendix B. This document will often use terms such as ’the agent’ or ’the program,’ referring to a hypothetical program implementing the strategy described in this thesis.

2 Putting the idea into perspective

The significance of poker as a testbed for Artificial Intelligence has led to extensive research in the field, which yielded a wide variety of attempted approaches towards ‘solving the game.’ These include reinforcement learning (Dahl, 2001) and neural networks (Davidson 1999; Billings et al. 2002, p. 226-227). The approach which is presented in this thesis will solely try to maximize estimated expected value for a decision tree, using statistics directly derived from past observations (further explained in section 3.1.4) – this highly benefits the explainability of choices the agent makes.

According to popular opinion within the commu-nity of poker players; within their commucommu-nity, poker players are divided into two groups: mathematical players (players who base their decisions on calculations) and intuitive players (players who base their decisions on a certain ‘gut feeling’). I challenge the distinction between these two ‘types’ of players and treat them as equivalent – mathematical players actively make use of existing theorems and formulas, while intuitive players apply these same methods subconsciously. Because of the uncertain and complex nature of the game, one has to infer playing characteristics of opponents solely by observing an opponent play. The sample sizes of these observations are often small or non-existent, which makes a Bayesian approach to poker a straightforward choice compared to a frequentist interpretation6 of the game. This reasoning is followed by Chen and Ankenman (2006, p. 38).

A known approach to the problem of predicting op-ponents’ hole cards is to use Bayes’ theorem in a like manner (Chen and Ankenman 2006, p. 60-64; Van der Kleij 2010, p. 39; Korb et al. 1999). A corresponding

4My Poker Coaching, How to become a professional poker

player, https://www.mypokercoaching.com/become-professio

nal-poker-player/, accessed on February 4, 2020

5Best Poker Coaching, Create Good Habits with a Winning Pregame

Routine, https://www.bestpokercoaching.com/create-good-h

abits-winning-pregame-routine/, accessed on February 4, 2020 6In probability theory, a distinction is often made between frequentist and Bayesian approaches; the former deﬁning probabilities as represent-ing the occurrence of events in long run frequencies, the latter basrepresent-ing a probability on indications of the plausibility of an event. “A Bayesian is

one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.” - Karl Pearson

(7)

approach is taken for the prediction of opponents’ actions (Ponsen et al. 2008; Southey et al. 2012; Korb et al. 1999). This thesis introduces an empirical Bayesian approach towards the modelling of opponents. The proposition made in this paper is uncommon—and proba-bly unprecedented—in the field of research on poker. The method allows for an accurate estimation of statistics on opponents (e.g. a player’s degree of aggression) based on the player population. It demonstrates a bias in existing work, such as in the mentioned Ponsen et al. (2008), and solves it using beta-binomial regression which is explained in section 3.1.2. Although Bayesian statistical methods will be extensively used in the most fundamental aspects of the opponent models, this paper will differ from the earlier mentioned work by using vector spaces for the representation of different game states and the prediction of an opponent’s actions, hole cards and raise sizes. This choice is inspired by the idea of conceptual spaces.

Conceptual spaces allow for the representation of concepts in a geometric space, in which the dimensions represent characteristics (or features) of the concept. A similarity function is used to measure the similarity of concepts within the space (Gärdenfors, 2004). An imple-mentation based on conceptual spaces is pre-computable and handles bet sizing and stack sizing in a continuous way—No-Limit Hold’em is often deemed ‘too complex’ for AIs to solve because of the abundance of possible game states due to the allowance of bet sizes without limit7,8 (Johanson, 2013)—whereas diﬀerent implementations opt for pre-speciﬁed static bet sizes through abstractions (Moravčík et al. 2017, p. 3; Brown and Sandholm 2017; Brown et al. 2018), including already cited work (Van der Kleij, 2010, p. 44). The implementation proposed in this thesis does not pre-specify any kind of bet sizing.

7Pokernews, Artiﬁcial Intelligence and Holdem, Part 3: No-Limit

Holdem, The Next Frontier, https://www.pokernews.com/strate

gy/artificial-intelligence-hold-em-3-23218.htm, Accessed on November 15, 2019

8Cardschat, Poker Bots Arent Powerful Enough to Solve No Limit

Hol-dem (Yet), https://www.cardschat.com/news/poker-bots-are

nt-powerful-enough-solve-no-limit-holdem-yet-52909, Ac-cessed on November 15, 2019

In addition, when combining multiple variables into a Bayesian model, which is generally required when mod-elling a multi-player No-Limit Hold’em game, another problem arises: the huge complexity of the game. A large number of possible game states causes data sets to often be too sparse for reliably representing the needed prior beliefs9,29. The choice of vector spaces also formulates a solution to this problem, as any previously observed poker game state can be considered in a weighted comparison with a newly encountered game state in which the same action sequence occurred. This model will still behave similarly as when a complete Bayesian model would have been considered, albeit still achieving to avoid the explained complications related to continuous bet and stack sizes, and being less prone to small samples. This will be further discussed at the end of section 3.1.3.

Essential for calculating the expected value for branches of the to-be-constructed decision trees, as explained in section 3.1.4, is estimating the relative strength or equity of a starting hand. After discounting the two hole cards initially dealt to the agent, (50

5) = 2,118,760

different board combinations exist. The strength of one single starting hand versus another hand can be precisely calculated by concerning all possible run-outs of the board. The results of these calculations are readily available10. In more thorough problems, such as when determining the strength of a starting hand versus a range of hands, a hand’s equity can be approximated with Monte Carlo simulations11 and are observed to be fastly converging12 (Metropolis and Ulam, 1949, p. 335-341). Many different implementations of hand evaluators are available on software development platforms13, many of which are based on mechanisms invented for efficient computing14. Large lookup tables for pre-calculated evaluations, such as for the Two Plus Two evaluator (containing 32,487,834 entries) are freely available15.

9More formally called the prior probability of an event occurring. It expresses one’s ‘belief’ of a random event happening when applying Bayesian statistical inference.

10PokerStove, preﬂop-matchups.txt.gz, http://web.archive.org/ web/20110612052656/http://www.pokerstove.com/analysis/ preflop-matchups.txt.gz, accessed on February 28th, 2020

11Monte Carlo simulations are a class of random sampling algorithms to approximate probabilities by considering subsets of a set of events.

(8)

Considering that a hand’s perceived equity depends on various factors, such as ‘post-flop playability,’ other ap-proaches have been studied as well. Dalpasso and Lancia (2015) observe the concept of equity as a combination of features of a (hand on the) flop, such as ‘the flop contains one overcard.’ Although based on the limit variant of the game, groupings of starting hands have been proposed by, among others, Johanson et al. (2013) and Sklansky and Malmuth (1999, p. 14-15). The hand groupings stemming from the latter mentioned work will be used in section 3.1.3 to ‘learn’ weights for the formerly introduced vector spaces. As the scope of this thesis pertains only to the ‘pre-flop’ phase of the game, a precise interpretation of equity is prefered over an interpretation depending on ‘post-flop play.’ In this thesis, Monte Carlo simulations will be used to estimate the equity of a hand versus (possibly multiple) opponents’ ranges of hands.

12Miscellaneous Remarks, Ideas, Trials, Poker 4: Monte Carlo

Anal-ysis, http://oscar6echo.blogspot.com/2012/09/poker-4-mon

te-carlo-analysis.html, accessed on February 28th, 2020

3 Method

This section is composed of three subsections. The first subsection is concerned with the proposed methodology and it effectively answers the main research question stated in the introduction: how can an agent efficiently make use of the theory of Bayesian statistics and vector spaces to represent and compare game situations in order to make decisions in the early stages of a multi-player No-Limit Texas Hold’em game? In section 3.2, an implementation of the proposed methodology is suggested. Section 3.3 proceeds to present the findings of this implementation and serves to strengthen the usefulness of further investigating the, in section 3.1, proposed methods. In appendix C, as an additional feature, a questionnaire and its results regarding the quality of the suggested implementation is attached.

In this thesis, a database of 6,552,060 hands and 63,657 diﬀerent players is consistently used. Each hand is played by six players. The choice of this particular dataset is motivated in section 3.2.1.

Only the first stage of the game is considered: the pre-flop betting round. This is the first betting round of the game when no ‘community cards’ have been revealed. For convenience purposes this part of the game will be referred to by using the following terminology: poker game, game, poker hand or hand. All games are assumed to take place in a ‘rake-free environment:’ no commission is withheld, unlike usually the case in poker games. Although briefly explained in appendix B, it is necessary to clarify what is meant by a ‘hole card range.’ At the start of every hand, player are dealt two hidden cards, which are used to form combinations with five community cards visible to all players. These cards are a player’s ‘hole cards.’ A ‘hole card range’ is a group of cards which a player is assumed to hold in a certain situation – instead of predicting an opponent’s exact hole cards, a weighted spectrum of possible holdings is proposed.

13GitHub, XPokerEval - A collection of poker hand evaluation source

code compiled by James Devlin., https://github.com/tangentfo

rks/XPokerEval, accessed on February 28th, 2020

14Suﬀecool, Cactus Kev’s Poker Hand Evaluator, http://suffe.co ol/poker/evaluator.html, accessed on February 28th, 2020

15GitHub, HandRanks.dat, https://github.com/christophsc hmalhofer/poker/blob/master/XPokerEval/XPokerEval.TwoP lusTwo/HandRanks.dat, accessed on February 28th, 2020

(9)

αi+1 ∈ϕ(ω) s_β Ω γ ζ η A {Υ(oβ) ∣ oβ∈ Bω∧ oβ ∈ O} B′ ∀pn∈ Ψ(ωh) Ω_β = {ω∣β∈ B_ω} [s_β∣β= m ∧ om∈ O] ω∈ Ω O τ

Figure 1: Dependency graph for variables in a poker game

ω. The dashed line illustrates the dynamics of players inﬂuencing each other. Dotted lines point to logical ex-pressions for relations.

3.1 Proposed methodology

This subsection has been further subdivided into four parts. The first part introduces a formal explanation of the game and deals with general definitions which are used in the three subsequent subsections to answer the research question. Section 3.1.2 addresses the first subquestion:

How can opponents be modelled by considering a population of players?While the two remaining sections focus on the second subqestion: How can these opponent models be used to make decisions based on publicly available information?.

3.1.1 General deﬁnitions

A set of deﬁnitions and variables are introduced below. The dependence of these variables is shown as a graphical model in ﬁgure 1.

A poker game will be denoted by ω_h and is deﬁned as a set consisting of the following elements:

• An integerν_ωdenoting the value of the big blind.

– Players on a later position at the table have ac-cess to more information (the actions of earlier positioned opponents) which inﬂuences action taken on such position (seat).

– The player’s stack is divided by the number of

big blinds in order to preserve the ability to shift between, and take into account diﬀerent stakes.

• An ordered action sequence A_ω of I actions ⟨α1, . . . ,αI⟩. During a game, this action sequence can

be expanded by an action[αi∈ϕ(ωh)] performed by

a player[pn= {{γ,ζ,η},β} ∧ pn∈ Bω].

– A poker situation or situation is reﬀered to by a game’s action sequence. If referring to a similar situation, an identical action sequence is assumed, regardless of the size of x when

αi= raise x.

– xdenotes the value of the total bet including the amount before the bet’s increment. In this paper, raise sizes are also denoted as a percent-age for the total bet, with respect to the pot. For example: if the pot contains $0.15 and a player raises to $0.25, this raise is denoted as a 166.67%-raise.

And in general deﬁned are:

• A functionϕ(ωh) returning a set of currently legal

actions a⊂ {fold,call,check,raise x} where x is a dis-crete amount betweenν_ωandγ_β.

• A functionψ(ωh) returning a one-element set (or the

empty set) containing the player pn= {{γ,ζ,η},β}

who is to generate the next actionαi+1. The ordered

sequence of all acting players is denoted by Ψ(ωh)

and we denote the i-th acting player p_iby Ψ_i.

• A set B_ω = {β1. . .βN} ⊆ O of N players and a

nested set B′_ω= {p1, . . . , pn} = {{γ,ζ,η}}×B where γβ denotes the player’s current stack size divided byν_ω; ζ_β denotes the relative position on the ta-ble; andη_β1,η_β2∈ {2,3,4,5,6,7,8,9,10,J,Q,K,A}× {♠,♡,♣,♢},η_β1 ≠ η_β2 denotes the player’s hole cards.

– Since players can get raised, Ψ(ωh) might

con-tain the sameβ-element multiple times.

– When referring to a player’s hole cards, a two-letter notation (such as AK and T9 for variants of A

♣K♠ and 10♣9♣) is often used, which may

in-clude an additional o or s (see appendix B under ‘suited’). A ‘T’ refers to a ten (10

(10)

• A set Ω = {ω1. . .ωH} of H previously

com-pleted games. Each completed poker game ωh=

{Aω, B_ω,ν_ω} is inferred by parsing hand histories. • A variableτrepresenting the factor of time.

– A population changes its playing style over time since opponents are inﬂuenced by each other and by the increasing availability of outlines on strategy and the mathematics of the game16,17. • A set O of all possible opponents, which is referred

to as the population. Every possible opponent is considered as having strategy Υ(om) = sm. Further,

deﬁned is a set O′= {o1. . . oM∣ om∈ Bωh∧ωh∈ Ω}

of all previously encountered opponents.

3.1.2 Opponent modelling

One of the unknown variables defined in figure 1 is the strategy s_β or playing style of an opponent. In accordance with section 3.1.1, it is assumed that a player’s strategy depends on the previous games in which the player participated. This relation is illustrated in figure 1. The agent will emulate and enhance this behaviour by additionally taking into account games in which the agent did not participate.

A form of representing poker situations is needed in order to compare similar game situations. In the next section, a multi-dimensional space will be introduced in which vectors live whose components consist of values describing the characteristics of these situations. The described opponent models of each player participating in the hand serve as some of the components of these vectors. Since only the pre-ﬂop part of the game is considered, Equation 1 and Equation 2 are introduced for describing s_β. These two metrics will be used solely to characterize each player and to identify similar players, by comparing these metrics in a geometric space.

VPIP(β) =count({ω∣ω∈ Ω ∧β ∈ Bω∧ ∃αi(αi∈ Aω∧αi∈ {call,raise x} ∧β∈ Ψi(ω))})

count({ω∣ω∈ Ω ∧β∈ B_ω}) ⋅ 100 (1)

PFR(β) =count({ω∣ω∈ Ω ∧β∈ Bω∧ ∃αi(αi∈ Aω∧αi= raise x ∧β∈ Ψi(ω))})

count({ω∣ω∈ Ω ∧β∈ B_ω}) ⋅ 100 (2)

16Reddit, How has NLHE strategy changed over the last 5

-10 years?, https://www.reddit.com/r/poker/comments/2ojlig

/how_has_nlhe_strategy_changed_over_the_last_5_10/, Ac-cessed on November 20th, 2019

17TwoPlusTwo, Will Poker games continue to get tougher?, https://forumserver.twoplustwo.com/32/beginners-quest ions/will-poker-games-continue-get-tougher-1435373/, Accessed on November 20th, 2019

(11)

Equation 1 formalizes the concept of a VPIP-value – the percentage of hands a player voluntarily puts money into a pot when presented with the opportunity of doing so. This value is an indicator of the tightness or looseness of a player, i.e. how many hands a player likes to play. Equation 2 is similar to the preceding equation, except that in this equation only raises count towards the percentage. The resulting value is the player’s pre-ﬂop raise- or PFR-value and indicates the aggressiveness or passiveness of a player pre-ﬂop.

In agreement with commonly held beliefs in the poker community, different groups of poker players exist, distinguished by their playing characteristics. The plots in figure 2 hint the existence of at least three of the ‘classical’ groups of players which are believed to significantly populate the overall player population: tight-aggressive, tight-passive and loose-passive players. For a minimum of 1,000 hand observations per player, the Mean Shift procedure (Fukunaga and Hostetler, 1975) seems to affirm this intuition as shown in figure 3a on page 13. Thorndike’s famous Elbow method (Thorndike, 1953) as well hints the existence of three clusters when a clustering range of 1 to 10 is specified. This is visualized together with respective k-means clustering (with k= 3) in figure 3b and 3c (Lloyd, 1982). These findings confirm the usefulness of characterizing players by the two introduced metrics, as they convey sufficient information to classify a playing style.

As oftentimes there are too little observations (or no observations at all) on opponents to be encountered, a frequentist approach to calculating an individual opponent’s VPIP and PFR values is often inaccurate. Nevertheless, many proﬁtable online poker players use software based on frequentist analyses to keep track of these statistics18,19,20. Although some players opt to

18BeatingBetting, How Valuable Are Poker HUDs?, https: //www.beatingbetting.co.uk/poker/best-poker-hud/#How_ Valuable_Are_Poker_HUDs, accessed February 29th, 2020

19CardsChat, How necessary is a HUD?, https://www.cardscha t.com/f61/how-necessary-a-hud-234552/, accessed on February 29th, 2020

20TwoPlusTwo, What % of players use a HUD?, https: //forumserver.twoplustwo.com/32/beginners-questions /what-players-use-hud-962957/, accessed on February 29th, 2020 20 40 60 80 0 20 40 60 80 VPIP PFR/VPIP

(a) Sample of 3,821 players (≥ 1000 hands)

20 40 60 80 100 VPIP 0 20 40 60 80 100 PFR/VPIP

(b) Sample of 20,989 players (≥ 100 hands)

Figure 2: Visualisation of VPIP- and PFR/VPIP-values for players in dataset

(12)

tweak their software21,22; the only known software in the ﬁeld to have directly implemented an alternative acknowledges the shortcomings of a frequentist approach, but does not implement a full solution such as described below23. To solve the problem of inaccurate statistics, players using this kind of software generally wait until their statistics on players start to converge24,25,26, missing valuable information in the meantime.

Instead, in this paper these metrics are proposed to be estimated using the population as prior knowledge. By replacing the introduced PFR-metric with a derived met-ric abbreviated as PFR/VPIP and deﬁned as PFR

V PIP⋅ 100,

both metrics can be interpreted as probabilities27. Since the population can be modelled as a distribution of these player characteristics, the population essentially becomes a distribution of probabilities. As the beta distribution28 is a continuous probability distribution deﬁned on [0,1], this distribution is a probability distribution of probabilities. This makes it perfectly suitable as the Bayesian prior29. Since the player characteristics are probabilities themselves, the Bayesian likelihood is Bernoulli distributed30. The theory of conjugate priors can be used to construct a player’s own Beta distribution

21TwoPlusTwo, Help - Using PT4 to analyse population tendencies, https://forumserver.twoplustwo.com/185/heads-up-sng-s pin-gos/help-using-pt4-analyse-population-tendencie s-1208561/, accessed on March 1st, 2020 22TwoPlusTwo, , https://forumserver.twoplustwo.com/15/ poker-theory/wilson-score-interval-1023431/, accessed on March 1st, 2020

23Poker Copilot, User Guide – Statistics or probabili-ties?, https://pokercopilot.com/userguide/6/en/topic/sta

tistics-or-probabilities, accessed on February 29th, 2020 24Smart Poker Study, HUD Reliability: Number of Hands and

Sam-ple Sizes, https://www.smartpokerstudy.com/hud-reliability

-number-of-hands-and-sample-sizes-226, accessed on March 1st, 2020

25PokerStars School, HUD Stats Youre Doing it Wrong!,

https://www.pokerstarsschool.com/strategies/hud-sta ts-doing-it-wrong/668/, accessed on March 1st, 2020

26TwoPlusTwo, How Do I Use HUD Stats With Small Samples?, https://forumserver.twoplustwo.com/32/beginners-quest ions/how-do-i-use-hud-stats-small-samples-1551067/, accessed on March 1st, 2020

27If we wouldn’t have divided the value for a player’s PFR by their

VPIP-value, then their PFR-value would always been capped at their VPIP-value.

28The probability density function of a Beta distribution is f (x;α,β) = 1

B(α,β)xα−1(1 − x)β−1, where 0≤ x ≤ 1 andα,β> 0.

of probabilities of a probability (in this case a players’ VPIP or PFR/VPIP value) as shown in equation 3 to 6 (Raïffa and Schlaifer 1961; Diaconis and Ylvisaker 1979, p. 274). Here, s denotes the number of successes (e.g. a player voluntarily putting money into a pot), while f = n − s denotes the number of failures where n is the total number of observations on a player. As P(θ) is beta distributed, and P(X∣θ) is binomially distributed, P(θ∣X) is also beta distributed. By integrating the prod-uct of their probability mass and density functions, the probability density function for another beta distribution, with parametersα+ s andβ+ f , is derived. This is the player-specific distribution based on observations on the analysed player. Note thatα,β andσin this section refer to parameters of distributions rather than the definitions in section 3.1.1. player specific Ì ÏÍ Î P(θ∣X) =P(X∣θ) ⋅ population Ì ÏÍ Î P(θ) P(X) = P(X∣θ) ⋅ P(θ) ∫ P(X∣θ) ⋅ P(θ)dθ (3) = ( n s)θ s_{(1 −} θ)f_⋅θα−1(1−θ)β−1 B(α,β) ∫ ((n s)θ s_{(1 −}_θ₎f_⋅θα−1(1−θ)β−1 B(α,β) )dθ (4) = ( n s) θs+α−1_(1−θ)f+β−1 B(α,β) ∫ ((ns)θ s+α−1_(1−θ)f+β−1 B(α,β) )dθ (5) =θ s+α−1_{(1 −} θ)f+β−1 B(s +α, f+β) = Beta(α+ s,β+ f ) (6)

29Bayes’ theorem deﬁnes a probability by taking prior knowledge into account. A probability of an event A given evidence B is given by:

P(A ∣ B) =P(B∣A)P(A)_P_(B) where P(B ∣ A) conveys the likelihood of evidence

B given that A indeed occurred, and P(A) conveys the prior probability

of A occurring at all.

30The probability density function of a Bernoulli distribution is

f(k; p) = pk(1 − p)1−k where k∈ {0,1} and 0 ≤ p ≤ 1. Here, p is the parameter of the Bernoulli distribution.

(13)

20 40 60 80 0 20 40 60 80 VPIP PFR/VPIP

TAg

L/TPs

LPs

(a) Mean Shift clustering (3 clusters detected)

1 2 3 4 5 6 7 8 9 k 250000 500000 750000 1000000 1250000 1500000 1750000 k= 3,score = 439298.53

(b) Distortion score elbow for K-means clustering

20 40 60 80 0 20 40 60 80 VPIP PFR/VPIP

TAg

T/LPs

T/LAg

(c) K-means clustering of Figure 2a (k= 3)

Figure 3: Clusters of player characteristics

A subsequent problem arises: the majority of players in a population are often players on which there are very few observations. The dataset used in this paper consists for 67,02% of players with less than 100 observations. Attempting to improve the level of certainty of a dataset by leaving out all players with a number of observations below a certain threshold, results in a bias. As pointed out in section 2 of this thesis, some well-received papers on this subject still incorrectly implement this process. The reason for the biased Bayesian prior is straightforward: professional players can be assumed to play diﬀerently than ‘recreational players,’ but professional players also play more often and thus have a higher chance of ending up (more frequently) in the dataset of observations on players.

By letting n denote the total number of played hands for a player in the dataset, I proceed to verify the statement above by calculating the mean VPIP and PFR/VPIP for all players for every possible n. So, the means for n= 55 are the averaged metrics of every player who participated in a total of 55 hands. To indicate convergence of a metric, an uncertainty value based on the Wilson Score Interval31 is introduced and shown in equation 7 (Wilson, 1927). This uncertainty value is plotted against a logarithmic scale of n in figure 4 on page 14. By performing both weighted and unweighted linear regression, the figure illustrates a correlation between the VPIP and PFR/VPIP metrics and the number of observations on players. These graphs comply with a generally known idea among the poker player community that recreational players (or fish in poker jargon) often play more hands and show less aggression32,33. Uncertainty= 2 ⋅ ( ˆp − ˆ p+_2⋅nz2 − z ⋅ √ ˆ p⋅(1− ˆp)+₄z2_⋅n n 1+z2 n ) (7)

31The Wilson Score Interval is a method for showing binomial pro-portion conﬁdence intervals.

32BlackRain79, What Does VPIP Mean? An Extremely Sim-ple Explanation, https://www.blackrain79.com/2016/07/what

-is-vpip.html, accessed on December 7th, 2019

33PokerVIP, Tips for Identifying the Recreational Players, https://www.pokervip.com/strategy-articles/texas-h old-em-no-limit-beginner/tips-for-identifying-the-rec reational-players, accessed on December 7th, 2019

(14)

In equation 3 to 6, a new beta distribution is introduced of which the parameters depend not only on observations on the population as a whole, but also on player-specific observations. As information on specific players gradu-ally accumulates, information on the population becomes less relevant for the estimation of player-specific statistics. The more we observe a player’s play, the more we look at that player’s playing history instead of to that of the population. The less information we have on a player, the more we use the population to fill the gap. If we totally lack information on an opponent—if we have never observed that particular opponent—the population is the only data we look at. In order to deal with the problem related to uncertainty, as explained above and illustrated in figure 4, we let the original parameters(α,β) depend on ln(n). This way, a different pair of (α,β) can be obtained for every possible n. Now this pair of parameters (α,β) (which now only depends on the number of hands played) can be updated in accordance with equation 6 as player-specific observations come through.

In equation 8, µ is defined as the inverse logit function (or expit function)34 of a linear function of ln(n). The parametersα andβ can now be defined with regard to a mean and dispersion value (Pananos, 2020). By definingσ as the dispersion value, α andβ are written in terms of µ andσ in equation 9. By considering the probability density function of the beta distribution; a, b and σ can be estimated using maximum likelihood estimation35 (Robinson, 2017, p. 56-64). The effect is illustrated in figure 5 on page 15, which plots the beta distributions for four subsequent exponents of n. As expected, the distribution shifts towards a lower value for a player’s VPIP as the number of played hands increases. Finally, the expected value for a metric estimated using this method is given in equation 10.

µ= 1

1+ e−(a+b⋅ln(n)) (8) ⇒α=µ⋅σ,β= (1 −µ) ⋅σ (9) E[Beta(α,β)] = α

α+β (10)

34The logit function maps probabilities (thus, values ranging from 0 to 1) to a value ranging to inﬁnity. The inverse function, which we use in this thesis, does the opposite: it maps any value to a probability.

0 2 4 6 8 ln(n)for0< n ≤ 5000(3,821 players) 0.0 0.2 0.4 0.6 0.8 1.0 VPIP as pr obability Uncertainty = 2(pˆ-Wilson(z = 0.95,n, ˆp = 0.257)) Mean VPIP for players withnhands

Trend (unweighted linear regressor) Trend (weighted by number of players)

(a) Average VPIP values among players seem to decrease as cer-tainty increases. Value ˆp= 0.257 is chosen as it is the weighted

average VPIP value over the whole sample.

0 2 4 6 8 ln(n)for0< n ≤ 5000(3,821 players) 0.0 0.2 0.4 0.6 0.8 PFR/VPIP as pr obability Uncertainty = 2 (pˆ-Wilson(z = 0.95,n, ˆp = 0.720)) Mean PFR/VPIP for players withnhands Trend (unweighted linear regressor) Trend (weighted by number of players)

(b) Average PFR/VPIP values among players seem to increase as certainty increases. This is partly due to decreased VPIP values as shown in subﬁgure a. Value ˆp= 0.720 is chosen as it is the

weighted average PFR/VPIP value over the whole sample.

Figure 4: Relation between number of observations and player metrics

(15)

0.0 0.2 0.4 0.6 0.8 1.0 V PIP 0.0 0.5 1.0 1.5 2.0 2.5 3.0 p( V PI P ∣α ,β ) 100hands 101hands 102hands 103hands

Figure 5: Non-player-speciﬁc Beta distributions resulting from performing beta-binomial linear regression on the variable of the number of hands played by a player.

3.1.3 Situational space

The, in the previous section, introduced VPIP and PFR/VPIP values can be directly used to set up a strategy against opponents by assuming a player’s hole card range to consist of the top portion of hands sampled from the distribution of all starting hands, where the player’s metrics determines the size of this portion36. As it is not impossible for a player to hold a hand outside of this implied range (for example, players can have a liking for a certain type of hand), I opt to diverge from this idea. More importantly, players can play diﬀerent cards in diﬀerent ways – a player could, for example, decide on a more conservative play while holding a weaker hand (e.g.

η=9

♢8♢), and bet a lower amount compared to when the

same player would have been dealt a stronger hand (e.g.

K

♠K♣). This kind of behaviour among opponents can be

proﬁtably exploited by taking a more ﬂexible approach, such as the method I propose in this thesis.

35In maximum likelihood estimation (or MLE) the most likely param-eters for a probability distribution are estimated by either diﬀerentiating or using iterative methods of trying diﬀerent parameters.

36The Pokerbank, How to use VPIP in poker, https://www.thep okerbank.com/articles/software/vpip/, accessed on March 8th, 2020

The idea of a ‘situational space’ is introduced, in which a player’s expected value for the VPIP and PFR/VPIP metrics are merely used as ‘input values’ for constructing this geometric space. For every poker situation, a multi-dimensional space is generated in which vectors live whose components represent ‘characteristics’ of the situation. Each vector, directing to a point in this space, refers to an identical historical situation (as explained in section 3.1.1, a poker situation is described to be identical if the same action sequence occurred) drawn from the dataset. Every player taking part in the situation (in other words: players who did not fold at the point of evaluation) produces a part of the attributes of the situation, captured by the components of the vectors living in the corresponding situational space. The following values are described for each active player:

1. The player’s expected VPIP-value, estimated by the method presented in section 3.1.2 and equation 10; 2. the player’s expected PFR/VPIP-value, estimated by

the method presented in section 3.1.2 and equation 10;

3. the player’s stack size at the beginning of the hand; 4. whether the player was all-in (either 0 or 1); 5. the size of the raise made by the player (if applicable)

as a percentage with respect to the pot size;

6. the size of a second raise (if applicable) as a percent-age with respect to the pot size.

Each vector has an additional corresponding ‘label.’ This label captures the hole cards of the analysed player (if known); the action of the analysed player; and the size of the raise made by the analysed player (if applicable). An example could be the situation in which the player on the first position raised. If focussing on the player directly due to act after the raising player, the constructed situational space would span R25 since six players participate of which one specifies a raise size (in other words, a 25-dimensional space would be generated). The similarity of these situations can be measured by concerning the distance between the points defined by these vectors. Building on the previous example, if facing a raise from the first position and analysing the player in the second position, all identical historical situations in the dataset are used to generate vectors with labels that

(16)

convey information about the successive action taken by the player in the second position. If we already know that the player in the second position took a certain action (e.g. called the bet raised by the player in the first position), we can consider all vectors with a label corresponding to that taken action (in this example we would only draw vectors reflecting historical situations in which the player in the second position called after the player in the first position raised) and suggest a hole card range for the currently encountered opponent. Since the vectors geometrically encompass the characteristics of historical situations with the same action sequence, the closer two points are in space, the more similar the situations they represent. Weight is given to the actually held hole cards with respect to the distance between the point representing the historical situation in which these hole cards are known to be held, and the point representing the current situation in which the opponent’s hole cards are still unknown. This example is illustrated in figure 6 below.

VPIPraiser PFR/VPIPraiser

raise size (%/pot)

weighted Mink ow_ski dis tance ˆ ⎡⎢⎢ ⎢⎢⎢⎢ ⎢⎢⎣ .48 .50 .03 ⎤⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎦ K ♣6♢ ˆ ⎡⎢⎢ ⎢⎢⎢⎢ ⎢⎢⎣ .18 .74 .09 ⎤⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎦ 8 ♡7♣ ˆ ⎡⎢⎢ ⎢⎢⎢⎢ ⎢⎢⎣ .82 .31 .31 ⎤⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎦ A ♠K♡ ˆ ⎡⎢⎢ ⎢⎢⎢⎢ ⎢⎢⎣ .85 .61 .46 ⎤⎥⎥ ⎥⎥⎥⎥ ⎥⎥⎦} part of vector in R 25

Figure 6: Situational space for a faced raise from ﬁrst position. The hole card range for the faced raiser is ap-proximated by using the weighted Minkowski distance for comparing with past events. The 1st, 2ndand 5th compo-nents of the normalized vectors are shown. For illustrative purposes, only 4 manually selected vectors out of 536,951 (of which 79,578 with known hole cards) are shown. The coloured vector represents the ‘live situation’ for which hole cards should be predicted.

If a player is due to act after the agent, the labels can be used to predict a future action by the opponent. Elaborating on the previous example, if the agent would be seated in the ﬁrst position and would have yet to decide on which action to take, these situational spaces can be constructed as if the agent would have raised, in order to determine the (1) likelihood of the player in the second position calling the raise, and (2) the estimated hole card range with which this player would call. As will be explained in section 3.1.4, in order to make a decision based on predictions for every participating opponent’s actions, raise sizes and hole card ranges, the agent will generate situational spaces for all actions for every likely action sequence.

Considering not all of a situation’s characteristics are of equal importance, the weights for each individual vector component (corresponding to a dimension in the space) should be estimated. For example, the player facing a raise from the ﬁrst position is probably more interested in the VPIP-value of the raiser, than in the VPIP-value of the person to act next. These weights cannot be obtained by solving linear systems of equations as we lack knowledge of any distance (or similarity) between two given situations. In order to solve this problem, two ‘substitutes’ for distance values are introduced. These serve as ‘outcome values,’ with which the similarity (or distance) between two situations can be roughly approximated:

• Historical situations where the hole cards of the player for which the situational space is analysed are known are grouped together, according to Sklansky’s hand groups. These hand groups, which have been shortly mentioned in section 1, embody nine ranges of hands in decreasing order with respect to hand strength. These groups are used when estimating weights for the purpose of determining hole cards and raise sizes. These groups are divided as follows: (Sklansky and Malmuth, 1999, p. 14-15)

– Group 1: AA, AKs, KK, QQ, JJ – Group 2: AKo, AQs, AJs, KQs, TT – Group 3: AQo, ATs, KJs, QJs, JTs, 99

– Group 4: AJo, KQo, KTs, QTs, J9s, T9s, 98s, 88 – Group 5: A9s-A2s, KJo, QJo, JTo, Q9s, T8s, 97s,

87s, 77, 76s, 66

(17)

– Group 7: K9s-K2s, J9o, T9o, 98o, 64s, 53s, 44, 43s, 33, 22

– Group 8: A9o, K9o, Q9o, J8o, J7s, T8o, 96s, 87o, 85s, 76o, 74s, 65o, 54o, 42s, 32s

– Group 9: All other hands

• Categories grouped by the actual action taken are used when determining weights for the purpose of predicting a future opponent’s action: (1) check/fold, (2) call, (3) raise x.

Each group now contains vectors representing historical situations (note that a vector can now occur in both one of the groups categorized by the Sklansky hand groups listed above, and one of the groups divided by the action taken). Now, we generate a matrix of which every row represents a vector in the group. This matrix is then ‘divided’ by (de-ﬁned as taking the element-wise division of every value for every row in the matrix, with every value of a denom-inator vector) a vector holding the maximum encountered values for the considered component. In other words: the vector components are normalized by dividing their individual values by the outcome of a max()-function ex-ecuted column-wise37. If for a component the maximum encountered value equals 0, the maximum value is set to 1 to avoid division by zero. For every group, the means are taken for each component over the normalized values. The variance of these mean values corresponding to the same component among groups is then taken as the weight for that component in that situation. This idea rests on the the intuition that the more a value varies among diﬀerent groups, the more that value helps to determine to which group it belongs. This is summarized in equation 11 and 12. In the last step, all weights are normalized so that ∑i

i=1wi= 1 where i in c k

ji corresponds to the number of

dimensions in the situational space, k to the number of groups and j to the number of vectors within group k.

37NumPy v.1.17 manual, numpy.ndarray.max, Return the maximum

along a given axis.https://docs.scipy.org/doc/numpy/referen

ce/generated/numpy.ndarray.max.html, accessed on March 9th, 2020 w_i=1 k⋅ k ∑ k=1 (»»»»_»»»»

mean within group

Ì ÏÍ Î (1_{j ⋅} j ∑ j=1 ck_ji) −

mean of means within groups

Ì ÏÍ Î (1_k⋅ k ∑ k=1 (1_{j ⋅} j ∑ j=1 ck_ji))»»»» »»»» 2 ) (11) ⇒ wi= wi− min w ∑i i=1wi− min w (12)

The similarity of two vectors (representing situations) us-ing these weights is then calculated by dividus-ing by the weighted Minkowski distance38, as shown in equation 13. The sum of all the similarities for a speciﬁc element (e.g.

A

♡K♡(hand); call (action); or 133.33 (raise size)) is divided

by the sum of all similarities for elements of the same type, to obtain the weight of that element being relevant to the analysed opponent (e.g. holding A

♡K♡; calling; or when

raising, raising with 133.33).

S(a,b) = 1

1+ ∑i_i₌₁∣ wi⋅ (ai− bi) ∣

(13)

The proposed method in this paper is a combination of two main ideas: empirical Bayesian statistics and vec-tor spaces. As explained in section 2, this approach is chosen to diminish the negative effects of having too lit-tle data available. This effect is further reduced by not only concerning situations at the same point of evalua-tion. In other words: based on the example used on the previous page, if we are constructing the vector space for the raiser, we would also include situations in which the player in the second position folded or re-raised, instead of only considering situations in which that player called the faced raise. If a player participated in many hands, the player will often have contributed to many different points in a vector space generated for an often-occuring situation. But as players are characterised by their VPIP and PFR/VPIP-values, which serve as components for the

38The Minkowski distance is a generalization of the Eucledian distance (a straight line between two points) and ‘taxicab distance’ (or ‘Manhat-ten distance’ – a stepwise distance calculated by summing the absolute diﬀerence of discrete coordinates) used in normalized vector spaces.

(18)

vectors used to compare situations, points reflecting situ-ations in which the same player played on the same posi-tion will automatically be posiposi-tioned close to each other as their VPIP and PFR/VPIP-values are identical. This is compatible with ideas such as that a player’s range often widens when being positioned later39, and that holdings are usually stronger when a player raises from an early po-sition compared to when that player would be seated in a later position40. The more observations obtained on a spe-cific player, the more significant the historical actions of that particular player, and the heavier these are weighted in predicting future actions for that player. This can be seen as a geometric adaptation of prior information in a Bayesian model. Furthermore, this method allows for pat-terns in an opponent’s playing style to be recognized and exploited. Also, since all players are taken into account in the situational vectors, this model supports the idea that opponents adapt to each other at the table, instead of only the agent adapting to the opponents. Another advantage of this technique is the possibility of caching the matrices— together with their corresponding weights—in which the vectors are contained, in order to speed up future com-putation. In addition, the values of the matrix represent explainable values which make it easier to understand why the algorithm would decide on a particular move.

3.1.4 Decision trees

In order to explain the ﬁnal concept which connects all previously mentioned techniques, a distinction is made between two types of player action sequences:

• Action that happened before the agent is due to act. For the opponents involved in this type of action only the hole card ranges will be estimated using the vector spaces discussed in section 3.1.3.

• Action that happens after the agent is due to act. For these opponents the action and the hole card range corresponding to that action is predicted using vector spaces. If the action is a raise, the raising size is also predicted. If a raise is made by a player not seated in

39CardsChat, Your Guide To Pre-Flop Calling Ranges, https: //www.cardschat.com/preflop-calling-hand-ranges.php, ac-cessed on March 9th, 2020

40PartyPoker, How To Play Poker In ‘Early’ Position,

https://www.partypoker.com/en/how-to-play/school/a dvanced/early-position, accessed on March 10th, 2020

the ﬁrst position, players who did not fold before that raise are due to act more than once. This means that the implemented algorithm should be recursive and causes the previously listed type of action sequence to be nestedly included within this type as well.

When presented with a situation in which the agent has to make a decision, a tree of all possible action sequences and corresponding vector spaces will be generated. As the raise-option could theoretically cause the tree to be inﬁnite, the number of allowed raises is ideally limited. Having observed too few instances of a situation is an insurmountable obstacle which results in inadequate estimations, although to a lesser degree than when exact situations (including raise sizes and such) would have been compared with each other. This problem is inevitable and can be overcome by leaving these situations out of the equation. As this only applies to situations with a low number of observations, these observations are inherently statistically improbable. Table 3 on page 23 shows the degree of relevance of this phenomenon to the used dataset in this thesis. As cutting out situations could cause probabilities of actions occuring to not add up to one, the probabilities should be normalized after a few more steps are introduced.

An example is considered wherein the agent is seated in the fourth position, ‘BU’ or the ‘button,’ and A= ⟨raise .20,fold,call⟩. If the maximum amount of raises is set to 1, all possible action sequences are generated as shown in the list below. Bold text denotes the agent’s action – this style is used throughout the paper.

1. A= ⟨raise .20,fold,call,call,call,call⟩ 2. A= ⟨raise .20,fold,call,call,call,fold⟩ 3. A= ⟨raise .20,fold,call,call,fold,call⟩ 4. A= ⟨raise .20,fold,call,call,fold,fold⟩ 5. A= ⟨raise .20,fold,call,fold,call,call⟩ 6. A= ⟨raise .20,fold,call,fold,call,fold⟩ 7. A= ⟨raise .20,fold,call,fold,fold,call⟩ 8. A= ⟨raise .20,fold,call,fold,fold,fold⟩

The likelihood for every listed action sequence is calcu-lated by consulting the similarity measures for the actions derived from the situational spaces. If an opponent raises after the agent called or raised, the agent is due to act multiple times. This needs to be handled in a

(19)

recursive manner. Let us consider the action sequence A= ⟨call,fold,fold,call1, raise .20, fold, fold, call2⟩. On the second point in which the agent is due to act—when the agent calls for the second time—the agent will assume the action sequence before call2, as if it already happened, and evaluate it as the first type of action sequences. On the first point, when the agent decides to call for the first time, the agent already needs to evaluate all future actions (that are likely to happen) – including the agent’s own call after an opponent in a later position raised. As mentioned in the previous paragraph, this means that the first element in the list with types of action sequences is ‘nestedly’ embedded into the second type of action sequence.

Besides needing to estimate the probabilities for ev-ery likely action sequence, we also need to estimate the equity of the hand held by the agent versus the estimated hole card ranges for the opponents still in the pot. Since an opponent’s range is weighted, the equity of the agent’s hand for all combinations of all possible simultaneous hand holdings for every involved player is approximated using a Monte Carlo Simulation, as explained in the introduction (Metropolis and Ulam, 1949, p. 335-341). All different card combinations for a hand need to be taken into account, as for example, 12 combinations exist of AKo, while only 4 combinations of AKs are possible. Every pocket pair, such as AA, has 6 possible combinations of suits. Since the agent holds cards itself, these cards act as blockers and allow for the removal of some of these possible combinations in the opponents’ ranges – this can skew an opponent’s range significantly when the agent, for example, holds an ace (see section 3.3.2). A hand’s equity is equal to the probability of that hand winning the pot. When only two players are actively involved in a situation, ties have to be divided equally among all players. When multiple players are involved, more complicated situations occur. For example: when a player with the lowest stack wins the pot, all ‘losing players’ can tie between themselves and divide up the remaining pot. In order to overcome this complexity, in all simulations, all players with the exact same ‘made hand combination’ (as shown in appendix A) should be regarded as having ‘won’ the pot. The significance of the simulation should then be discounted by a factor depending on the number of players who did win.

The last ingredient is the expected final size of the pre-flop pot for a generated action sequence. The implementation should ensure that raise and call sizes are compatible with the acting player’s stack size: players cannot bet ‘more than they have.’ As explained in section 3.1.3, raise sizes for opponents are estimated using vector spaces. However, due to the findings presented in section 3.3.1, the size with which the agent raises is determined by linear regression. The first step is to create ‘buckets’ of a fixed number of historical situations. These buckets correspond to sizes of raises by players in historical situations on a specific seat, which are defined as a percentage with respect to the pot. For every raise size encountered in the dataset, the percentage of folds by players on another specific seat is calculated. These buckets serve as datapoints with which a linear function (of the form y= ax + b) describing the relation between a raising size and the number of expected folds can be obtained using ordinary least squares linear regression41. An example is illustrated in figure 7 below. The likelihood of an opponent calling or raising can be predicted in a likewise manner. If a specific raising size occurs more often than this fixed limit, the bucket corre-sponding to this raising size will contain more situations than the set limit.

100 200 300 400 500 600 700

size of raise as % of total pot size 70 75 80 85 90 % of folds Regressor (a≈ 0.026,b ≈ 73.096) Bucket of≥ 1000 situations

Figure 7: Folds by a player in the fourth position, reﬂected against the size of a raise by a player in the ﬁrst position.

41By minimizing the squared distance of the sampled points (in this case the percentage of folds in a bucket) to a line deﬁned by two param-eters a and b (as in y= ax + b), the most ‘optimal’ parameters (a,b) can be obtained. That will say, values for a and b that best ﬁt the presented data.

(20)

By distincting between all possible actions for the agent when the agent is due to act for the first time; and by calculating the final pre-flop pot size for every generated likely action sequence multiplied by the likelihood of that action sequence, and consequently multiplied by the equity of the agent’s hole cards versus the estimated weighted ranges of all active opponents; the expected value for all possible actions for the agent is obtained. The action yielding the most estimated expected value is the action the agent chooses in the presented game situation. The action sequences in which the agent folds are neglected and don’t need to be analysed since their expected value is always zero. Since the hole card ranges for opponents remain practically fixed, as shown in section 3.3.2; the size with which the agent raises can be optimised by maximizing the expected value of a particular action sequence involving a raise using the agent’s equity in combination with the estimated fold, raise and call parameters for all active opponents. The likelihoods of future actions by opponents are not adjusted by the likelihoods returned by the linear function.

If elaborating on the example suggested at the be-ginning of this section, which is visualized in ﬁgure 8, the steps below can be read as a summary of all previously explained methods in this thesis. These steps are a rough sketch for an algorithm based on all explained material:

• The (cached) vector space is consulted for all situa-tions wherein EP raised (including situasitua-tions wherein MP did not fold and CO did not call). Using the (cached) weights (estimated by grouping all historical situations wherein the hole cards for EP are known, according to Sklansky’s hand groups), a range for EP’s possible holdings is estimated by consulting the Minkowski distance.

• Likewise, the (cached) vector space is loaded for the determination of the hand range of CO.

• All possible action sequences for the game are gen-erated, these are listed with numbers 1-8 on page 18. • For each action in each action sequence (except for the agent’s action) a (cached) vector space and its weights (diﬀerent than those used to determine hand ranges) is examined to estimate the likelihood of the opponents performing the supposed action and the hole card ranges with which the opponents would

perform that action. If the analysed action concerns a raise, the raise size is estimated by the same manner. Since, in this example, a limit of one raise is set and the sequences in which the agent folds are ignored, three additional vector spaces are generated: a vector space describing the event in which the agent calls on BU, with the space’s labels focussing on SB; and two spaces focussing on BB for a call or fold on SB given a call on BU.

• For all sequences, Monte Carlo simulations are used to approximate the equity of J

♢8♢ against all active

opponents’ hand ranges withJ

♢s and 8♢s removed.

• For action sequences in which the agent acts by rais-ing, which do not exist in this example due to the limit; a function deﬁning the number of folds, calls and raises given a raising size is constructed for all active opponents. These functions are used—in com-bination with the agent’s hand equity—to ﬁnd the optimal raising size for the agent.

• All action sequences are grouped by the agent’s ﬁrst action and evaluated by calculating the expected value for every possible action of the agent by multi-plying the agent’s hand equity by the size of the pot and by the likelihood of the action sequence occuring.

EP $0.30 MP CO $0.30 BU J ♢ ♢ J ♢ ♢ 8 ♢ ♢ 8 ♢ ♢ ♢ ♢ ♢ ♢ ♢ ♢ SB $0.05 BB $0.10 total pot: $0.75 a₁ a2 a3 a4 a5 ai

Figure 8: Situation in which the agent is seated on the button holding J8s and faces a 200% raise and a call from the ﬁrst and third positions respectively. A white node indicates that a player folded.

(21)

3.2 Suggested implementation

This section includes several personal decisions (e.g. for parameters in section 3.2.3) and serves to demonstrate a possible implementation of the presented method in sec-tion 3.1. Although all of the personal choices posed in this section are motivated, the presented methods in this thesis can be implemented in various ways.

3.2.1 Dataset

The need for modelling the influence of time is outlined in section 3.1.1: a population changes its playing style over time since opponents are influenced by each other and by the increasing availability of outlines on strategy and the mathematics of the game16,17. The population can be further divided into multiple subpopulations as different groups of players play at different ‘stakes’ (i.e. different values for ν, or the big blind). As shown in figure 3, even additional subdivisions make sense. As this thesis pertains only to demonstrating a general approach towards designing a poker-playing agent, aiming it to be expanded and further developed; ideally, the influence of these variables could be diminished or even eliminated. In section 1, from a somewhat philosophical standpoint, I expressed that an agent should not necessarily be based on a dataset of unrealistic measures compared to a real player’s playing history.

For all reasons stated above, a dataset consisting of 6,552,060 6-max Zoom hands played within a relatively short timeframe (21st of July 2017 - 4th of September, 2017) on a fixed stakes size ($0.05/$0.10) is chosen. ‘Zoom’ is a subtype of poker wherein opponents are changed after every hand played, offering players a faster game pace42. As opponents change after every hand, players can get less exploitative (as it becomes harder for players to observe each other individually) and are thus expected to influence each other’s play to a lesser degree. By selecting a dataset of hands played on fixed stakes during a short timeframe, the significance of the variable of time and stakes decreases, allowing for

42PokerStars, ZOOM! PokerStars launches new fast-paced poker

game in beta, https://www.pokerstars.com/en/news/zoom-pok

erstars-launches-new-fast-paced-092053/15308/, accessed on March 7th, 2020

a more convenient analysis of the method presented. Furthermore, in the chosen game variant, the number of players seated at the table is always six. This way, it is not needed to account for a variable number of players in the model. The dataset is purchased from a commercial company that oﬀers data mining services43.

3.2.2 Population

These hand histories can be imported by using available commercial software44 which maintains these hands in a database45. An exported player report46, including the number of hands for a player and the frequentist calcula-tions for a player’s VPIP and PFR/VPIP values is used for all methods described in section 3.1.2 and the correspond-ing ﬁgures 2, 3 and 4. The ﬁle is imported in Python 3.8.2 using the csv library47. The calculations are done using NumPy48 and SciPy49. For all further described methods involving data from past observations, the database is di-rectly queried using SQL50 with the psycopg2 package in Python51. The database has the following features relevant for the implementation of the proposed algorithm:

• A table in which hand summaries are stored in which an ordered actors (players who did call or raise) and aggressors (the big blind and all players who raised) sequence is stored. Players are denoted by the num-bers 3-2-1-0-9-8, where 3 is the ﬁrst player to act and 8 is the last player to act (the player seated on

43The hands are purchased from https://hhdealer.com 44The program used for this thesis is PokerTracker 4: https//www.p okertracker.com

45In this thesis, a PostgreSQL database is used which strongly resembles the database structure illustrated on https://www.pokertracker.com/guides/PT3/databases/ pokertracker-3-database-schema-documentation, accessed on March 9th, 2020

46PokerTracker, Conﬁguring reports, https://www.pokertrack er.com/guides/PT4/tutorials/configuring-reports, accessed on March 9th, 2020

47Python 3.8.2 documentation, csv – CSV File Reading and

Writ-ing, https://docs.python.org/3/library/csv.html, accessed

on March 9th, 2020

48NumPy, https://numpy.org, accessed on March 9th, 2020 49SciPy, https://scipy.org, accessed on March 9th, 2020 50SQL is a simpliﬁed language with which search queries in databases can be eﬃciently formulated.

51PyPI, psycopg2 - Python-PostgreSQL Database Adapter, https: //pypi.org/project/psycopg2/, accessed on March 9th, 2020

An approach of solving early phase multiplayer No-Limit Hold'em poker using empirical Bayesian statistics and vector spaces