• No results found

Measuring and Comparing the Performance of Elite Judokas

N/A
N/A
Protected

Academic year: 2021

Share "Measuring and Comparing the Performance of Elite Judokas"

Copied!
75
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Measuring and Comparing the Performance of Elite

Judokas

Kristian Beumer, 2976226

(2)

Master’s Thesis Econometrics, Operations Research and Actuarial Studies

Supervisors: dr. G.H. Kuper and prof. dr. G. Sierksma

Second assessor: prof. dr. T.J. Wansbeek

(3)

Measuring and Comparing the Performance of

Elite Judokas

Kristian Beumer

Abstract

(4)

Contents

1 Introduction 1

1.1 Interests of JBN . . . 1

1.2 Problem relevance and field of research . . . 2

1.3 Data limitations and drawbacks of WRL . . . 3

1.4 Contributions and research questions . . . 4

2 Literature review 5 3 Tournament regulations and world ranking lists 6 3.1 Types of events . . . 7

3.2 Tournament rankings . . . 8

3.3 Draw regulations . . . 9

3.4 Structure of the IJF world ranking list . . . 10

4 Data 11 5 Methodology 13 5.1 Probit models . . . 13

5.1.1 Clustering . . . 13

5.1.2 Setup of estimated models . . . 14

5.1.3 Drawbacks of probit models . . . 16

5.2 Athlete performance tracking . . . 17

5.2.1 General Elo rating system . . . 17

5.2.2 Specification of Elo systems for judo . . . 18

5.2.3 General Glicko model . . . 21

5.2.4 Specification of Glicko system for judo . . . 25

5.3 Simulations . . . 26

5.3.1 Inputs and general specification . . . 26

5.3.2 True winning probabilities . . . 27

5.3.3 Simulation procedure . . . 29

6 Results 30 6.1 Probit models . . . 30

6.2 Simulations . . . 32

6.2.1 General results and example output . . . 32

6.2.2 Simulation results of Elo systems . . . 36

6.2.3 Simulation results of Glicko systems . . . 42

7 Discussion 42

(5)

9 Recommendations 45

10 Non-technical summary of thesis 46

References 50

(6)

1

Introduction

This section first introduces the interests and ambitions of the Dutch Judo Federation (JBN) and positions this thesis on the boundary between Human Resource Management and data analysis. Then we indicate the data limitations faced by this research. Thereafter, we explain the research approach of this thesis and state the research questions.

1.1

Interests of JBN

JBN is the governing body of the sports aikido, jiujitsu and judo in the Netherlands. With approximately 48.900 members in 2018, it is one of the larger sport associations in the Netherlands (NOC*NSF, 2018).

JBN states objectives regarding the future of her elite sports divisions in four-year plans. The objectives relevant to our research are twofold. Firstly, JBN discusses her ambitions to further improve the coaching and development of young talented athletes. The most important tool currently in use is the Meerjarenopleidingsplan (Multiple Year Development Program, MYDP), which provides a framework for conveying relevant technical, tactical, physical, and mental skills to the judoka. The MYDP describes the role of coach, parents and JBN in this process. The MYDP is based on the Long Term Athlete Development (LTAD) model: a framework used in many individual sports that divides the development of an athlete into seven stages, in order for them to reach their full potential.1 Four of the

Regionale Trainingscentra (Regional Training Centres, RTC) of JBN in the Netherlands play a key role in the MYDP, as they offer talented athletes a full-time training program from the age of twelve. The RTC’s in turn prepare the athletes for a training career at the National Training Center at Papendal. One of the ambitions of JBN is the institution of a central academy at Papendal, which should smooth the transition from being a junior athlete to being a full-time senior athlete training at Papendal. The performance of senior athletes are of major importance to the JBN, especially at the European and World Championships, and the Olympic Games (OG). The positions on the World Ranking List (WRL) are also essential, as they determine both the athletes that may compete at the OG and the seeding for other tournaments. Furthermore, JBN states medal aims for tournaments.2

Whether the targets regarding WRL positions and medal wins can be realized depends on many factors. One of these is the quality of the selection procedure that is used to select athletes for tournaments. This is relevant at both the cadet and junior level, and the senior level, in different ways. Regarding the cadet and junior level - consisting of judokas with ages between 15 and 17, and 15 and 20 respectively - an important matter is when an athlete is deemed to be of sufficient ability to take the step towards higher-difficulty tournaments, or make the transition to senior-level tournaments. In other words, when can the athlete take the next step in his or her career? Regarding the senior level, JBN faces the same challenge, however there is also the crucial issue of selecting athletes for the OG. Winning medals at

1See Balyi, Way, Cardinal and Higgs (2005).

2Viz for the cadets (aged 15-17) and juniors (18-20): at least three medals at the European Championships;

(7)

the OG - now and in the long run - is very important, and determining an optimal group of athletes to be sent to the OG is thus something that must be carefully considered.

JBN would like to improve the quality of the selection procedures. For instance, whether a junior judoka can take the next step in their career is now primarily decided through dialogue between coach and judoka. Decision-making through these dialogues is of course a subjective affair, as coach and athlete have subjective and possibly conflicting opinions on the athlete’s performance. Furthermore, it is difficult to objectively assess the performance level of the judoka, and also whether the judoka has shown sufficient improvement in this period. JBN is interested in the role data analysis can play in order to assist the selection procedures for tournaments as indicated in the previous paragraph (Jaarplan 2020 (2019)).

Another topic that JBN is interested in relates to the probability that an athlete will reach a certain tournament result. Having estimates for these probabilities is relevant as it gives an indication what the expected tournament performance of an athlete will be. Furthermore, it can be of use in the tournament schedule for an athlete in the next period. This way, the coaches and athletes can make better choices regarding the tournaments an athlete will compete in, to maximize for example the WRL position of the athlete or his or her medal wins. One should ensure that not merely a small group of elite athletes takes up the majority of places in tournaments where a high number of WRL points can be earned. There should also be ample opportunities for athletes ranked relatively low on the WRL and junior athletes. Lastly, these probabilities are relevant as inputs for a model to aid the decision on which athletes to designate for the OG. They support the Managementteam Topsport (Management Team Elite Sports) in deciding which athlete should be sent to the OG in case multiple athletes satisfy the selection criteria of the Internatiol Judo Federation (IJF).

1.2

Problem relevance and field of research

The challenges faced by JBN are problems in the field of Human Resource Management (HRM). Taylor, Doherty and McGraw (2008) define HRM as “the policies, practices, proce-dures, and systems that influence the behavior, attitudes, values, and performance of people who work for the organization”. This definition clearly encompasses the various athlete selection and tracking procedures of JBN: recognizing young talented athletes, tracking the performance of athletes over time and making sure the right athletes compete in tournaments by maximizing the expected performances and developments. We focus on the second topic, which primary use is to support the decision-making process with regards to the third topic. After all, tracking the performance of athletes is not an end in itself: its results should be used to ensure that athletes compete at the peak of their abilities at the level of competition that is suitable to them.

(8)

indispensable, and is important to avoid tensions and disappointments when athletes are not selected for major events such as the OG. For example, the selection committee of JBN faces the task of selecting which athletes are designated for the OG in the circumstances of multiple athletes fulfilling the qualification requirements for participation within the same weight category.3 The selection committee makes a choice based on certain criteria. How-ever, many of these criteria are fairly subjective.4 Potential negative consequences of such

subjective selection procedures are numerous and have led to court proceedings between the athlete and the respective national sports organization.5 This selection problem applies to other tournaments as well, but primarily to the more elite tournaments like the World Championships, European Championships and Grand Slams. Using data analysis of tourna-ment performances may lead to more objective selection criteria such as to avoid the above problems.

1.3

Data limitations and drawbacks of WRL

In order to optimally assist JBN with respect to the questions raised in the previous para-graph, we need both the right types and a sufficient amount of data. Unfortunately, at this moment these are not available. We would require a data set that includes the match results of individual athletes. More precisely, we need to know for all relevant athletes the results of their matches, the names of the athletes, and preferably the WRL position of each athlete.

One might argue that we could simply use the evolution of the WRL position of the athlete as a proxy for his development level. There are however problems with this approach. Namely, one would like to incorporate the strength of the opponents when determining the true value of a tournament result. The number of ranking points earned at each tournament, which determines the WRL position, is however independent of the WRL positions of opponents. We may solve this problem somewhat by taking the WRL position of the opponent as a proxy for his strength. However, the ranking points do not completely reflect the true achievement, as these points are somewhat arbitrary. This is illustrated by the fact that they have been altered twice since the start of the WRL in 2009. For example, a first place at a Continental Open (CO) has been awarded with 100 ranking points in all years, whereas a first place at the World Championships Senior (WCS) has increased from 500 to 2000 points. However, it can hardly be true that the achievement of winning a WC quadrupled compared to winning a CO in the same period. Hence, we should look beyond the rankings on the WRL and calculate a performance score based on match-level data. All the results from tournaments in the IJF ranking events should be included to get a precise idea of the current level the athlete is at, and how it has evolved over time.

3See table A1 in appendix A for an overview of all weight categories.

4See appendix B for the exact criteria.

5For example, see Sullivan v. The Judo Federation of Australia Inc. and al (2000) for an example in judo.

(9)

1.4

Contributions and research questions

The first interest of this thesis is designing a athlete performance tracking system (APTS) to track the progress of judokas over time. As explained in the previous section, we require a more sophisticated way to determine the development of an athlete over time than the WRL. A common approach is to use an Elo rating system, which is what we shall focus on. The Elo rating system was developed in the 1950’s and has been formalized in Elo (1978). The Elo system assign each player a rating, which is updated over time based on the results of individual matches. The difference in rating points between two competing players reflects the expected winning chance of both players. For example, when a player with a high rating loses against a player with a low rating, then the high-rated player loses many rating points, whereas the lower-rated player gains exactly the same number of points. The standard Elo model is rather simple, however many extensions have been proposed in the literature. We shall make use of some of these different extensions to create an Elo rating system for judo. Specifically, we investigate the results of six different specifications of the Elo system, divided into three groups of two specifications each. The specifications in the first group give equal weight to all matches in the simulation. The specifications in the second group value results in more prestigious tournaments more, and the specifications in the third group value results in later tournament rounds more than those in earlier rounds. The Elo model requires data on individual matches. Since this data is not available to us, as we explained in the previous section, we conduct an extensive simulation study to generate individual match results. The simulations incorporate most characteristics of a judo season, namely the distinction between different types of tournaments, the dates of the tournaments, draw regulations, and the structure of the WRL. We also need to specify a functional form for the true winning probabilities in the simulation. As we lack individual match results to base these probabilities, we adopt a procedure in Klaassen and Magnus (2003) to determine these probabilities. Based on the individual match results in the simulations, we determine Elo ratings for th each of the specifications of the Elo system. This enables us to observe possible differences between the distribution of Elo ratings at the end of the simulations, and the effect of each Elo specification on the Elo ratings of individual athletes.

Furthermore, we shall discuss another more advanced model: the Glicko model (Glickman, 1999). Its primary advantage is that it accounts for the uncertainty of the ratings, whereas the Elo model treats the ratings as being estimated without error. As this model is more involved than the Elo model, we first take some time to explain it. The Glicko model requires two parameter values to be optimized using individual match results, in order for the rating estimates to be accurate. As we noted before, we lack this data, and thus we pick different values for these two parameters in order to demonstrate the general output of the Glicko models.

(10)

test whether we may pool tournament results across date, gender, weight categories and type of event. We use these findings in the design of the APTS, since if pooling across a certain dimension, e.g. gender, turns out to be justified, then we are able to design the same APTS for both genders.

We summarize the interests of this thesis by the following research questions:

Question 1 What is the effect of the specification of the Elo system on the distribution of all Elo ratings at the end of the simulations? Question 2 What is the effect of the specification of the Elo system on the

Elo rating of an individual athlete at the end of the simulation ? Question 3 Is it justified to pool match results across gender, weight categories,

and event types?

The focus of this thesis lies in answering question 1 and 2. That is, our primary goal is to develop an athlete tracking system (Elo system) and investigate the sensitivity of the outcomes of this system to the different specifications of the Elo system.

The remainder of this paper is structured as follows. Section 2 gives a concise review of previous research on judo with a data-driven analysis, and points out the reason for relatively little research interest in judo. Section 3 gives necessary general information about different judo tournaments, the setup of these tournaments, and the composition of the official WRL’s. Section 4 describes the data set used in the analysis of the probit models. Section 5 first describes the estimation framework of the probit models. Then we explain the general Elo system, followed by some of the extensions that have been proposed in the literature. This is followed by the specifications of the Elo systems that we evaluate in the simulations. We then explain the Glicko model and its specifications that we evaluate in the simulations. Thereafter we explain the general characteristics of the simulations, and the procedure that is used to determine a functional form for the winning probabilities in the simulation. Section 6 presents the results of the probit models, the Elo specifications and the Glicko specifications. Section 7 provides some discussion of the results and section 8 concludes. Section 9 provides recommendations for the further development of the APTS. Finally, we provide a non-technical summary of this thesis.

2

Literature review

(11)

a match make it difficult to draw significant conclusions from match outcomes. This is in sharp contrast to individual sports where the anatomical and physiological characteristics of other athletes do not have a direct effect on one’s own performance. A second potential cause for relatively little scientific research related to judo may be the highly traditional nature of the sport, still apparent in the greeting rituals at the start of each match. At a more fundamental level, these traditions may have caused a reluctance to change training and selection methods, which in turn would reduce the need for scientific judo research. There is however certainly interest for judo-related research. For example, since 2014 the European Judo Union holds an annual Judo Festival, at which there is both a research symposium and a scientific conference, where some of the latest research related to judo is discussed.6

Most research in judo has focused on the aforementioned physiological characteristics of athletes. Data-driven analyses focusing on the probabilities of winning matches are scarce. The only two papers which conduct a thorough analysis in this regard are Ferreira Julio, Panissa, Miarka, Takito and Franchini (2013) and Krumer (2017). Ferreira Julio et al. (2013) study the possible effect of home advantage in World Championship Senior, Grand Slam and Grand Prix competitions. Using a logistic model and a Poisson generalized linear model they find some evidence of home advantage, primarily for male athletes. Krumer (2017) is more closely related to our thesis. Krumer (2017) uses the outcomes of 3,302 matches from all World Championships Senior, Masters and Olympic Games during 2010-2013 to investigate the relationship between an athlete’s position on the WRL and winning probabilities in individual matches at these tournaments. A specific logistic model based on Koning (2011) is used with variables indicating the difference in world ranking, head-to-head results and home advantage as covariates, in order to estimate the winning probability of an athlete in a single match. The variable that indicates the difference in world rankings is significant when using the observations from all weight categories. More interestingly, Krumer (2017) finds that there is imbalance between different weight categories. Specifically, the -66kg category for men is significantly more balanced than the other male categories, meaning that the probability that a high-ranked athlete wins against a low-ranked athlete is lower in the -66kg category than in other male categories. The female categories are overall more balanced.

3

Tournament regulations and world ranking lists

This section provides general information about IJF tournaments and WRL’s. Firstly, we describe the various tournaments that yield points for the senior WRL. Then we elaborate on tournament regulations that are relevant for our analysis, namely the system that is used to assign the final tournament rankings, the draw regulations, and the number of ranking points that can be earned for each tournament. Furthermore, we discuss how the official WRL of the IJF is composed. This section is mostly based on the IJF Sport and Organisation Rules (2020).

(12)

3.1

Types of events

Some basic information about the tournaments we consider is given in table 1. These are all the tournaments that yield points for the WRL of the senior category of the IJF. We now give some additional information for each of the eight type of events.

• Olympic Games (OG): Judo was first included in the 1964 OG and has since 1972 been included in every OG.7 Initially there were only male categories, but since 1992 women

also participate. Judo at the OG is viewed as the most prestigious judo event. Since one has to both go through a long qualification period to enter the OG and winning a medal comes with much prestige, the level of play is very high.

• World Championship Seniors (WCS): The WCS are considered as the highest level of annual international judo competition. The event is held each year, except in years when the OG takes place. The tournament was first organized in 1956 and in 1987 the tournament took its current form, with both men and women participating.

• Masters: The Masters is a prestigious invitation tournament. The latest edition in 2019 invited the top 32 of the WRL, whereas the other editions since the beginning of the Masters in 2010 invited only the top 16. Mean player strength is high at this tournament.

• Grand Slam (GS): The GS tournaments in its current form were first organized in 2009. By then they were organized four times per year, however this increased to five times per year in 2014, and seven times per year in 2019. The venues of GS tournaments have changed over time, although the GS in Paris exists since 2009. The majority of athletes competing in GS tournaments are ranked in the top 50 of the WRL, making it one of the more prestigious tournaments.

• Continental Championships (CC): There are five CC’s, namely the European, Asian, African, Pan-American, and Oceanian tournaments. These tournaments are organized at different dates throughout the year. Therefore, for equal treatment of continents, the IJF has decided that the tournaments are treated as if held in week 17 of the year. This means that the addition and reduction of ranking points earned at CC tournaments is done in week 17 of the year for all CC tournaments. The mean player strength varies widely between Continental Championships, with the European and Asian tournaments being the strongest. The European Championships took their current form in 1987, similarly to the WCS. Approximately half of the participants of CC’s are ranked in the top 70 of the WRL.

7At the 61st International Olympic Committee (IOC) session in October 1963 in Baden-Baden, it was

decided to accept a maximum of 18 sports in the 1968 OG programme. Thereby, judo was not included in the

programme of the 1968 OG. At the 64th IOC session held in Madrid in October 1965, the number of sports

(13)

• Grand Prix (GP): The level of GP’s is considered to be one step below the Grand Slam tournaments. These tournaments were first organized in 2009, similar to the GS tournaments. The number of GP’s per year has varied over time; however, in the last years there have been ten GP tournaments per year.

• World Championship Juniors (WCJ): These are the official World Championships of the Junior division of the IJF. Similarly to the WCS, the WCJ’s are organized every year, except in years when the OG takes place. An athlete is allowed to compete in the WCJ in year x if he or she turns at least 15 in year x and does not turn 21 in year x. Note that an athlete is allowed to compete in both the WCJ and WCS. Ranking points earned at a WCJ count for the senior WRL since 2017.

• Continental Open (CO): The CO tournaments are one step below the GP tournaments. They feature on average the least strong playing field. As we observe in table 1, the number of CO’s per year varies considerably between years. These tournaments are meant as stepping stones towards the more prestigious tournaments.

Table 1: Names of the tournaments and their acronyms, the maximum number of athletes per weight category per country, and the number of tournaments in our data set. Note that there are limits on the number of participants per country, but no limits on the total number of participants per tournament. Regarding the right-most column, the data set contains all matches from February 2015 until March 2020.

Maximum number of participants per No. of tourna- No. of tourna-Tournament name weight category per country ments per year ments in data set Olympic Games (OG) One Once in four years 1

World Championship Seniors (WCS) Two; nine men and nine women over all categories 1 4

Masters (MA) No limit 1 5

Grand Slam (GS) Two; four for host nation 5-7 29 Continental Championships (CC) Two; ten men and ten women over all categories 1 5 Grand Prix (GP) Two; four for host nation 8-10 47 World Championships Juniors (WCJ) Two; ten men and ten women over all categories 1 3 Continental Open (CO) No limit 12-19 78

3.2

Tournament rankings

(14)

3.3

Draw regulations

The draw regulations are the same for all tournaments in the data set and all weight cate-gories. However, the draws are subject to many restrictions.

We first explain the issue of first-round matches. There is no limit to the total number of participants per weight category for each of the tournaments. The only relevant limit is the number of participants per country, as displayed in the second column of table 1. This results in many draws where the number of participants is not equal to a power of two. This necessitates a number of first-round matches in order to bring the number of participants down to a power of two. For example, if there are 35 players registered in some weight category, then three first-round matches will be played in order to bring the number of players down to 32 (= 25). We will explain below which athletes are designated to compete in first-round matches.

Secondly, athletes are either seeded or not seeded in a tournament. At most eight athletes are seeded per weight category. The position of the athlete on the WRL at the start of the week in which the tournament takes place determines which athletes are seeded. Specifically, the athlete with the k-th highest WRL position of all participants per weight category is the k-th seeded, k ∈ {1, . . . , 8}. All other athletes are not seeded.

Seeded athletes benefit from two privileges. The first privilege of seeded athletes is that they do not compete in first-round matches, whenever possible. This requires some explana-tion. Suppose the number of participants equals a power of two minus one, say 15. Then the number one seeded is the only athlete who does not have to compete in the first round. The non-seeded athletes and the number two until eighth seeded are paired randomly in such a way that (1) seeded athletes do not compete against each other and (2) separation by nations is respected as much as possible.8 More generally, when the number of participants equals a

power of two minus k, then the number one until k seeded players do not compete in the first round, k ∈ {1, . . . , 8}. For example, when then number of players equals 18, then there will be two first-round matches, each between two randomly drawn non-seeded athletes – taking seperation of nations into acount.

The second privilege of seeded athletes is that they are separated in the draw from the other seeded athletes in such a way, that they do not meet each other before the quarter-finals. More specifically, each tournament consist of four pools: A, B, C and D. The number one and eight seeded are placed in pool A, numbers two and seven are placed in pool C, numbers three and six are placed in pool D and numbers four and five are placed in pool B. In each pool, the two seeded athletes are placed such that they do not meet until the quarter-finals. The winners of pools A and B then compete in the semi-finals, and similarly the winners of pools C and D. Thus, the numbers one and four seeded, and the numbers two and six seeded cannot meet before the semi-finals. Consequently, the numbers one and two seeded cannot meet before the final.

8Note that seeding position takes priority over seperation by nations (IJF Sport and Organisation Rules,

(15)

3.4

Structure of the IJF world ranking list

Table 2 lists the number of points an athlete earns for the WRL if he or she progresses to a specific stage in a tournament. Importantly, when an athlete does not win any match in a tournament, he or she only earns the number of points in the “Participation” row in table 2. For example, in a tournament with nine participants, there will be seven athletes who instantly reach the finals. If one of these seven athletes loses both in the quarter-finals and the repechage semi-final, he or she does not earn the points corresponding to the row “7th place” in table 2, but instead merely the points in the row “Participation”. As the IJF issued changes in the distribution of points on 1 January 2017, there are two numbers for each stage of each tournament in table 2: the first and second number denote respectively the number of points for results in tournaments after and before 1 January 2017.

From table 2 we observe that both before and after 1 January 2017 most points can be earned at the WCS, followed by the Masters and GS tournaments. Before 1 January 2017 the CC, GP and CO tournaments yielded, in that order, the most points, whereas after 2017 the CC and GP tournaments yield equal points. Furthermore, we observe that the number of points for each tournament has increased after 1 January 2017, however not by the same factor. Proportionally, points awarded at the WCS and Masters have increased the most, followed by the GS, CC and GP tournaments. The number of points for CO tournaments has remained virtually the same. Another notable change is that an athlete earns 200 points by participating in a Masters tournament after 1 January 2017, whereas this yielded no points before.

The number of points of each athlete on the WRL is determined as follows. For each athlete, in every year the five results that yield the highest number of points count for the WRL, and furthermore an extra result from either the CC or the Masters counts as well. In case the judoka competes in both the CC and the Masters in the same year, then the highest result counts as the sixth result and the other result may count as one of the first five best results. Results obtained until one year ago count for 100% on the ranking list, and the results between one and two years ago count for 50%. Results obtained more than two years ago do not count for the WRL.

Table 2: Number of ranking points for the IJF ranking tournaments in the data set. The first number corresponds to the period after 1 January 2017, and the second (in parentheses) to the period before that. See table 1 for the meanings of the acronyms.

(16)

4

Data

The data set consists of two parts. The first part consists of the number of ranking points earned by each judoka, in each weight category of each ranking tournament that was organized between 21 February 2015 and 7 March 2020. The second part consists of the world ranking lists for each weight category at all dates between 1 January 2015 and 7 March 2020.9

In order to define the first part of the data set, we first define the product set DS, which has four dimensions. This set DS is defined as

DS = (JM ∪ JF) × (WM ∪ WF) × (D15−16∪ D17−20) × (S1∪ S2) (1)

= J × W × D × S, (2)

where

JM = the set of male athletes;

JF = the set of female athletes. Note that J = JM ∪ JF;

WM = {-60kg, -66kg, -73kg, -81kg, -90kg, -100kg, +100kg}, the set of weight

categories for male athletes;

WF = {-48kg, -52kg, -57kg, -63kg, -70kg, -78kg, +78kg}, the set of weight

categories for female athletes. Note that W = WM ∪ WF;

D15−16 = {20-2-2015, . . . , 2-12-2016}, the set of starting dates of each of the

tournament in 2015 or 2016;

D17−20 = {14-1-2017, . . . , 21-2-2020}, the set of starting dates of each of the

tournament from 2017 until 2020. Note that D = D15−16∪ D17−20;

S1 = {W CS, M A, GS, CC, GP, OS}, the set of tournaments of the type

World Championships Seniors, Masters, Grand Slam, Grand Prix and the Olympic Games;

S2 = {CO, W CJ }, the set of tournaments of the type Continental Open

and World Championship Juniors. Note that S = S1∪ S2.

The set DS covers all possible combinations of athletes, weight categories, events and dates that could have occurred during the time frame of the data set. Note that we divide the starting dates of the tournaments into two disjoint subsets, since the distribution of WRL points changed on 1 January 2017, as explained in section 3.4.

Now we can define DB as the first part of the data set, which is the subset of DS that contains all combinations of athletes, weight categories, events and dates that occurred during the time frame of the data set. Thus, (j, w, d, s) ∈ DB means that judoka j has participated

9The data has been acquired through email contact with the IT department of the IJF between the 16th

(17)

in weight category w in a tournament of type s that started at date d. The set DB has N = 57, 855 elements. We have at our disposal the number of ranking points earned for each (j, w, d, s) ∈ DB, and thus we can reconstruct using table 2 the final rankings of athletes at each tournament. Thus, for each (j, w, d, s) ∈ DB we can define

rjwds = final ranking of judoka j in weight category w in tournament of type s (3)

commencing on date d.

The second part of the data set contains the WRL’s of each of the fourteen weight categories during the time frame of the data set. We can thus define

wrjwd = position on the WRL of judoka j in weight category w at the start of (4)

the week in which the tournament commencing on date d is held.

The draw of tournaments is based on the WRL at the start of the week in which the tour-nament is organized. Therefore we consider the positions on the WRL at that particular moment.

Table H1 in appendix H gives the total number of participants for each weight category in each type of tournament, and their grand totals. Comparing the number of participa-tions between the men and the women, we note that the total number of male participaparticipa-tions exceeds the number of female participations by approximately 50%. For the men, the inter-mediate categories are most densely populated, and the +100kg category has by far the least participations. In the women’s categories there are roughly as many participations in the five “lightest” categories, and significantly less in the -78kg and +78kg categories. Concern-ing the types of tournaments, the GS, GP and CO tournaments had the highest aggregate number of participations. This is to be expected, as there are multiple instances of these tournaments per year, in contrast to the WCS, MA, CC, WCJ tournaments.

To discern differences among type of events and weight categories, tables H2 and H3 in appendix H give some summary statistics of the number of participants for each weight category and type of tournament respectively. From table H3 we observe that the distribution of the number of participants per tournament is right-skewed, as the median exceeds the mean for each weight category. Furthermore, we observe that for most weight categories, the number of participants can be as low as two. From table H2 we observe, similarly to table H3, that the distribution of the number of participants per type of tournament is right-skewed. The mean number of participants varies widely per event type. The WCS and CC tournaments attract the most participants, which is to be expected as they are among the most prestigious tournaments. The CO tournaments have the lowest mean number of participants, which is likely due to the following two factors. Firstly, CO tournaments for the continents of Oceania and Africa have relatively few participants, as there are relatively few professional judo athletes in those continents. Secondly, the number of ranking points one earns at CO tournaments is by far the lowest of all types of events.

(18)

with similar positions on the WRL, which we do in the probit models. Figures G1, G2 and G3 in appendix G give the distribution of WRL positions of the participants for each type of tournament. Athletes with the highest world rankings mostly participate in the WCS, MA, GS, CC and OG tournaments. A second group consists of athletes with a ranking from around 16 to 40, who compete mainly in GP tournaments, and to a lesser extent in the WCS, GS, CC and CO tournaments. Athletes below rank 40 compete mainly in the CO tournaments, although there is also a substantial number of participants in the WCS, GS, CC and GP tournaments. Lastly, the distribution of rankings at the WCJ tournaments varies widely, starting at 30. This is to be expected as these athletes are necessarily younger than 21 - usually much younger - and hence are mostly at the start of their professional careers.

5

Methodology

This section consists of three parts. The first part explains the framework used to estimate the relationship between the position on the WRL and the tournament round to which an athlete advances, e.g. the quarter-finals. This framework is applied to the data set described in section 4.

The second part explains two rating systems that we propose to use in the APTS: the Elo and Glicko rating systems. For both systems we first explain the general framework, and for the Elo system we also discuss some extensions that have been proposed in the literature. Lastly, we explain the specifications of both the Elo and Glicko systems that we use in the simulations.

The third part describes the general setup of the simulations, that is, aside from the specifications of the Elo and Glicko systems. We explain the general inputs of the simulations, and the characteristics of the tournaments that we incorporate in the simulations to make the simulations realistic. Furthermore, we describe the procedure that was used to obtain the winning probabilities that are used in the simulations.

5.1

Probit models

In this section we use probit models to estimate the relationship between the position on the WRL and the probability of reaching a specific tournament round. We discuss the issue of clustering tournament results of athletes with similar WRL positions. We then introduce dummy variables for gender, weight categories, event types and tournament dates, which are used to assess whether we can pool tournament results over these categories. We subsequently state the estimated models and explain some deficits of the probit models. The motivation of using the probit model, and the general specification of the probit model can be found in appendix C.

The approach in this section is based on Kuper, Sierksma and Spieksma (2014), who estimate the relationship between the ATP and WTA ranking of a player, and the probability that he or she reaches a specific round of a tennis Grand Slam or an OG.

5.1.1 Clustering

(19)

The reason is that the world rankings of the competitors varies substantially between the two groups, as we have know from figures G1, G2 and G3 in the appendix. The participants of the first and second subset of tournaments are mostly relatively high-ranked and low-ranked athletes respectively. For the first subgroup of tournaments, we thus use clusters of size 4 for the world rankings 1-48 and clusters of size 12 for the world rankings 49-100. For the second subgroup of events, we use clusters of size 20 for the world rankings 1-100 and clusters of size 50 for the world rankings 101-300. Note that in both subgroups of events we use larger clusters for the lower world rankings, as in all events the number of participants decreases as the position on the WRL decreases. To ensure that clusters with low-ranked athletes contain sufficiently many observations, we therefore choose larger cluster sizes for the lower positions on the WRL. Furthermore, note that the cluster sizes are significantly larger for the CO and WCJ events, since participants in these events are both high- and low-ranked players.

5.1.2 Setup of estimated models

There are four possible dimensions across which we may pool the data, namely gender, weight categories, event types and whether the tournament is held before or after 1 January 2017, i.e. the date of the tournament. Our approach will be to estimate the winning probabilities, while controling for gender, weight categories, type of events and the date of the tournament by adding corresponding dummy variables. The potential significance of the these dummy variables provides information whether pooling tournament results across the aforementioned dimensions is justified.

We first turn to the question of pooling the tournaments before and after 1 January 2017. Note that we need to test whether the date of a tournament has a significant effect on the winning probabilities per cluster, since the distribution of WRL points changed on 1 January 2017, as explained in section 3.4. This has affected the relative importance of the tournaments, at least when judged by the number of ranking points that can be earned. Consequently, the mean world ranking of participants at certain event types may differ before and after the change, affecting the probability that an athlete reaches a certain stage of that particular tournament, ceteris paribus. To test whether pooling tournament results from all dates is allowed, we add a dummy variable dated, d ∈ D, that equals 1 if d ∈ D17−20, and 0

otherwise.

Furthermore, we want to test for gender effects and weight category specific effects. To test for differences in winning probabilities between gender, we add a dummy variable malej, j ∈

J, that equals 1 if j ∈ JM, and 0 otherwise. The weight category dummies are given by

weightjw, j ∈ J, w ∈ W \{−48kg, −60kg}, which equal 1 if judoka j competes in weight

category w, and 0 otherwise. Note that we need to leave out two weight categories, as otherwise problems of perfect multicollinearity arise.

Lastly, we turn to the question of pooling results from different types of tournaments. For tournaments of the type WCS, GS, GP, CC and OG, we add dummy variables tournjds, j ∈

J, d ∈ D, s ∈ S1\{GP }, which equal one if judoka j competed in the tournament of type

s commencing on date d. Thus the set of GP tournament is the reference group. For the CO and WCJ tournaments we add a dummy variable wcjjd, j ∈ J, d ∈ D, which equals

(20)

tournaments are the reference group in this case.

Now we state the estimated models. We choose a minimum tournament result RE that must be attained, RE ∈ {3, 5, 7}.10 For the tournaments in S

1, tournaments of the type

WCS, MA, GS, GP, CC and OG, the individual clusters of WRL positions are given by the sets

Ck1 =

(

{1 + 4(k − 1), . . . , 4k}, for k ∈ {1, . . . , 13}

{53 + 12(k − 14), . . . , 52 + 12(k − 13)}, for k ∈ {14, . . . , 17}. (5) and similarly for the tournaments in S2, tournaments of the type CO and WCJ, the clusters

of WRL positions are given by

Ck2 =

(

{1 + 20(k − 1), . . . , 20k}, for k ∈ {1, . . . , 5}

{101 + 50(k − 6), . . . , 100 + 50(k − 5)}, for k ∈ {6, . . . , 9}. (6) Now, define for all c ∈ {1, 2} and k ∈ {1, . . . , 12},

Rkc = the set with all final ranking positions rjwdsof athletes whose world ranking (7)

wrjwd is an element of Ckcat the start of the week in which the tournament

is organized;

i = result index, i ∈ Rkcrefers to the ith result in Rkc. (8)

The dependent variable yi,kc in the probit models is as follows: yi,kc equals 1 if the ith element of Rck is less than RE, and 0 otherwise. For example, if RE = 3, then y

i,kc = 1 means that

result i ∈ Rkc was either a first, second, or third place. We estimate for RE ∈ {3, 5, 7} and

k ∈ {1, . . . , 17} the index function models11 y∗i,k

1 = α1k+ dateidk · β1k+ maleik· β2k+

X

w∈W \{−48kg,−60kg}weightiwk· βwk+ (9)

X

s∈S1\{GP }

tournisk· βsk+ uik, i ∈ R1k,

and for RE ∈ {3, 5, 7} and k ∈ {1, . . . , 9} we estimate the index function models yi,k2 = α1k+ dateidk· β1k+ maleik· β2k+

X

w∈W \{−48kg,−60kg}weightiwk · βwk+ (10)

tourni,W CJ,k· βsk + uik, i ∈ R2k.

Note that we do not estimate these models for RE ∈ {1, 2}, as the number of athletes who attained first or second place is too low for most clusters to produce precise estimates.

Using the estimates of the paraters in (9) and (10), we can compute probabilities of

10A final result of “3”, “5” and “7” means that the athlete respectively won the final repechage, lost the

final repechage or lost the semi-final repechage. See section 3.2.

(21)

achieving a certain round by substituting the linear combination of the covariates and the parameter estimates into the standard normal distribution function. Suppose we want to estimate the probability that a female judoka in the weight class -52kg ranked fifth on the WRL reaches at least fifth place at any GS tournament in 2020. We would use the parameter estimates in (9) for k = 2, with dateidk = 1, maleik = 0, weightiwk = 1 for w = -52kg and

0 otherwise, and tournisk = 1 for s = GS and zero otherwise. The relevant probability

estimate is then given by Φ( ˆα1,2+ 1 · ˆβ1,2+ 0 · ˆβ2,2+ 1 · ˆβ−52kg,2+ 1 · ˆβGS,2).

5.1.3 Drawbacks of probit models

Due to large variation in attendance levels between weight categories, a complication arises. That is, if we take all observations from all weight categories into account the parameter estimates in (9) and (10) will be biased. For example, the probability of attaining at least fifth place is ceteris paribus higher for weight categories with a lower mean number of participants. To mitigate this issue, we take only the results of players into account who reached at least the 32nd finals. Note that this means that the estimated probabilities of reaching a certain tournament round overestimate the true probabilities. For the higher clusters the deviation will likely be small, as the eight seeded athletes - those with the eight best positions on the WRL - generally do not have to play first-round matches. For example, in a tournament with 48 participants, the eight seeded athletes all do not have to compete in the first round. This implies that the probability that a seeded athlete reaches at least, say, fifth place is the same whether we include only athletes who reached the 32nd finals in this tournament or not. However, when the number of participants exceeds 56, certain seeded athletes do have to compete in the first round. In this case, there will be a difference in the winning probabilities. For relatively low-ranked athletes, the differences between the estimated and true probabilities will be even larger.

We also disregard draws with less than sixteen participants, as including these would greatly bias the estimated probabilities upwards. For example, in a tournament with 11 participants in total, there will be five participants who reach the quarter-finals without having to play a single match. These five participants have attained at least seventh place without competing.

Despite the above restrictions on the tournaments that are taken into account, the first round matches complicate things even more. This has been explained in section 3.3 already. The problem is most severe when comparing a draw with 17 players to a draw with 32 players. In both draws there will be eight players with final ranking 16. However, in the first draw only one player had to win a match to attain this, whereas in the second case all players needed to win a match. We thus make the assumption that for all draws where the number of participants lies between 16 and 32, the mean number of participants is equal for all weight categories.

(22)

5.2

Athlete performance tracking

This section consists of five parts. Firstly, we explain the standard form of the Elo rating system, which is commonly used to determine the relative skill levels of participants in various sports. Then we discuss extensions to the standard Elo model that have been proposed in the literature. Thirdly, we formulate specifications of the Elo system that we will analyze in the simulations, using extensions proposed in the literature. In the fourth subsection we discuss the Glicko model: an extension of the Elo model proposed in Glickman (1999), which improves over the Elo system by incorporating the uncertainty of the skill levels. Lastly, we formulate the specifications of the Glicko model that we analyze in the simulations.

5.2.1 General Elo rating system

The primary deficiency of using the number of world ranking points as an indicator of the strength of an athlete over time, is the fact that the strength of opponents is not taken into account. Clearly, wins against strong opponents should be valued higher than wins against weaker opponents. We thus need a model for measuring the performance level of athletes that depends on the strength of opponents.

We propose the use of the Elo model, formulated by Arpad Elo in the late 1950’s and formalized in Elo (1978). Elo proposed this model for chess rankings, however it can be used in any zero-sum game or sport, such as judo. Outside of chess, the method has been shown to predict match results and approximate the strength of players well in team sports such as soccer (Lasek, Szl´avik and Bhulai, 2013) and basketball (Barrow, Drayer, Elliott, Gaut and Osting, 2013), but also in individual sports such as tennis (Kovalchik, 2016).

The general form of the Elo rating system is as follows. Consider a pool of n competitors indexed by i ∈ {1, . . . , n}. The strength of player i at the start of period t is summarized by a strength parameter θit. The estimated probability of athlete i defeating athlete j in a

single match in period t is then given by ˆ

F (ˆθit, ˆθjt) =

1

1 + 10−(ˆθit−ˆθjt)/400

, i, j ∈ {1, . . . , n}, i 6= j, t ∈ {1, 2, . . . }, (11) where ˆθit and ˆθjt are estimators of θit and θjt respectively.12

Now we explain how one obtains the estimators ˆθit, i ∈ {1, . . . , n}, t ∈ {1, 2, . . . }. In

period t, player i plays mj matches against player j, j ∈ {1, . . . , n}, j 6= i. Then the

estimator of the strength parameter of player i after the end of period t, the so-called Elo rating, is given by ˆ θi,t+1= ˆθit+ n X j=1 mj X k=1 Khbtjk− ˆF (ˆθit, ˆθjt) i , i ∈ {1, . . . , n}, t ∈ {1, 2, . . . }, (12)

where btjk equals 1 if player i won the k-th match against opponent j in time period t, and

12Note that ˆF (·) is a logistic function with base 10 instead of e and growth rate parameter 400. These

two parameters were arbitrarily chosen by Elo such that a difference of 400 between estimated strength

(23)

0 otherwise. The constant K governs the variability of the strength parameters and is also known as the K-factor. Note that when player i wins a single match against player j, then the increase of the strength estimate of player i’s equals the decrease of the strength estimate of player j. This implies that the sum of Elo ratings over all players remains constant over time. When a new player enters the pool of competitors, he or she is either assigned some predetermined initial rating, or is assigned an initial rating based on earlier match results. Regarding when to update the ratings, there are multiple possibilities. One is to collect all results from a tournament and update the Elo ratings afterwards. Another option is to update the Elo ratings after a specific time interval, for example one month.

We present a small example to show how the Elo ratings are updated. Suppose at the start of period t = 1 athlete i has an Elo rating of ˆθi,1 = 2000. During period t = 1 she competes

in one tournament, in which she wins in the eighth-finals, quarter-finals, and semi-finals, but loses the final. Her opponents are indexed respectively 1, 2, 3 and 4 with Elo ratings of respectively 1800, 2050, 1900 and 1850. We now want to know player i’s Elo rating at the end of period t = 1. First, from (11) the estimated winning probabilities of athlete i are given by 0.760, 0.428, 0.640 and 0.703 for each match respectively. Using that b1j1= 1 for j = 1, 2, 3

and b1j1 = 0 for j = 4, and using K = 32, we compute the Elo rating of athlete i in period

t = 2 as ˆθi,2 = 2000 + 32 [(1 − 0.760) + (1 − 0.428) + (1 − 0.640) + (0 − 0.703)] = 2015.008.

We see that the Elo rating of athlete i has increased. Note that the most Elo rating points are earned by defeating athlete 2, as athlete 2 has the highest Elo rating of all four opponents. If instead the four opponents had Elo ratings of 1800, 1800, 1800 and 1650 respectively, then we would obtain ˆθi,2 = 1994.816. Thus, even though athlete i won three matches and lost

only one, athlete i’s Elo rating still dropped compared to period t = 1. The reason is her loss against the player with rating 1650, which is very unexpected due to player i’s much higher rating. Thus, athlete i loses many rating points due to this loss.

5.2.2 Specification of Elo systems for judo

This section proposes various Elo systems for judo by using some extensions to the stan-dard system. Note that our interest lies in investigating the implications of different Elo specifications. We do not attempt to design any form of an optimal system.

Many studies have proposed modifications to the basic Elo model to increase the fit of the Elo model to real-life match outcomes. The most significant is the choice of the K-factor, and extending the K-factor to depend on match characteristics and player history. The choice of the K-factor has a major impact on the dynamics of the Elo ratings, but is inherently sport-specific: it depends on the true rate at which players improve. Thus, in sports where one improves gradually, the K-factor cannot be too high, as otherwise a few recent results yield a very large increase in the Elo rating. More generally, if the value of K is relatively high, then the Elo rating depends predominantly on recent results and may thus vary considerably over time. A low K-factor may however lead to undervaluing recent results and thus the Elo ratings could lag significantly compared to the true strength of a player. Based on these observations, we decide to use both a relatively low and a relatively high K-factor in the simulations.

(24)

is for example incorporated in the Elo rating system of FIFA since 2018 (“2026 FIFA World Cup,” 2018), where the K-factor ranges from 20 (friendly matches) to 60 (World Cup and Olympic Games). With regards to judo, we similarly assign weighting factors to each tourna-ment counting for the WRL. Lastly, we incorporate the use of different K-factors for different rounds in a tournament.

The two governing equations of the Elo system are (11) and (12). We decide to use (11) in the simulations without alterations, but use varying extensions of (12) to analyze deviations from the baseline simulations, where the K-factor is constant. To extend (12), we define

rijt = the round of the tournament in which athlete i competes against athlete j

in period t, rijt ∈ R = {f inal, repechage-f inal, semi-f inal, repechage semi-f inal,

eighth-f inal, 16th-f inal, 32nd-f inal, f irst round}.13 (13)

sit = the type of the tournament in which athlete i competes in period t, sit ∈ S. (14)

We update the ratings after each tournament. Thus, ˆθi,0 is the initial Elo rating of athlete

i and ˆθit is the Elo rating of athlete i between tournaments t and t + 1. The formula for the

estimator of the rating of athlete i in period t becomes ˆ θi,t+1 = ˆθit+ n X j=1 K(rijt, sit) h btj− ˆF (ˆθit, ˆθjt) i , i, j ∈ {1, . . . , n}, t ∈ {1, 2, . . . }, (15)

where btj = 1 if player i won against player j in time period t and 0 otherwise. Note that

compared to (12) we used mj = 1, j ∈ {1, . . . , n}, since athletes cannot meet the same

opponent twice in a tournament.

First we specify the initial Elo ratings. We base these ratings on the number of world ranking points of the athlete at t = 0, as this should be a reasonable approximation to the Elo ratings. If pi, i ∈ {1, . . . , n} denotes the number of world ranking points of athlete i at

t = 0, then the initial ratings are given by ˆ

θi0= 1000 +

1000 max{p1, . . . , pn}

pi, i ∈ {1, . . . , n}. (16)

Thus, the initial Elo ratings range from 1000 to 2000. The athlete with the most world ranking points has an Elo rating of 2000, and athletes with zero world ranking points start with a rating of 1000. We can now list the various Elo systems that we will evaluate in the simulations. This comes down to specifying the functional form of K(rijt, sit) in (15).

• Cases 1.1 and 1.2: Use K(rit, sit) = 35 and K(rit, sit) = 50 respectively.

13We acknowledge that basing the formula for the K-factor on the round of the tournament means that

(25)

• Cases 2.1 and 2.2: Use respectively K(rijt, sit) =                              50 if sit = WCS 50 if sit = MA 40 if sit = GS 40 if sit = CC 35 if sit = GP 30 if sit = WCJ 25 if sit = CO 60 if sit = OG (17) and K(rijt, sit) =                              40 if sit = WCS 40 if sit = MA 37 if sit = GS 37 if sit = CC 35 if sit = GP 30 if sit = WCJ 25 if sit = CO 45 if sit = OG (18)

In these cases wins and losses at more prestigious tournaments lead to higher gains and losses of Elo rating points than wins and losses in less prestigious tournaments. This idea is inspired by the FIFA rating system explained above. Cases 2.1 and 2.2 differ in the sense that in the former case the weighting of results in different tournaments is more distinct than in the latter category.

• Cases 3.1 and 3.2: Use respectively

K(rijt, sit) =                                    35 · 2 if rijt = final 35 · 1.5 if rijt = repechage-final 35 · 1.75 if rijt = semi-final

35 · 1.25 if rijt = repechage semi-final

35 · 1 if rijt = quarter-final

35 · 0.9 if rijt = eight-final

35 · 0.7 if rijt = 16th-final

35 · 0.7 if rijt = 32nd-final

35 · 0.7 if rijt = first round

(26)

and K(rijt, sit) =                                    35 · 1.25 if rijt = final 35 · 1.1 if rijt = repechage-final 35 · 1.2 if rijt = semi-final

35 · 1.05 if rijt = repechage semi-final

35 · 1 if rijt = quarter-final

35 · 0.95 if rijt = eight-final

35 · 0.9 if rijt = 16th-final

35 · 0.9 if rijt = 32nd-final

35 · 0.9 if rijt = first round

(20)

These two specifications of the Elo system value match results at distinct rounds in tourna-ments differently, in such a way that results in “later” tournament rounds lead to a higher increase (or decrease) of Elo rating points. This is motivated by the idea that the stakes of winning are higher in later rounds and thus the reward (or loss) should be higher.14 Cases 3.1 and 3.2 differ in the sense that the former case assigns more weight to results in “later” rounds and assigns much less weight to earlier tournament rounds. In the latter case, distinct rounds are weighted more evenly. Note that the factor of 35 in the definition of K(rijt, sit)

in (19) and (20) is based on case 1.1.

5.2.3 General Glicko model

The Elo model does not take uncertainty of the estimators ˆθit of θit, i = 1, . . . , n, t ∈

{1, 2, . . . }, in (15) into account. Thus, the Elo model does not yield confidence intervals for the estimators ˆθit. It is however desirable to know how certain we are that an estimated

rating is close to its true value. Furthermore, the uncertainty of rating estimates should clearly matter when determining the increase or decrease in rating points due to respectively a win or a loss. For example, when one competes against an opponent who has not played a match in a ranking tournament for a long time, then the change in one’s number of rating points should be relatively small. This follows from the observation that the current rating of the opponent could be far off from his or her true rating, and hence one should not put too much weight on the result of this match. Also, if an opponent has competed recently, the effect of the outcome of this match on the rating should be larger, ceteris paribus. Taking uncertainty of the strength estimates into account is especially relevant in judo, as athletes play relatively few matches per year. Furthermore, using the uncertainty of the strength estimates may be useful when devising (semi-)annual schemes for the tournaments that an athlete will attend. In addition, they may be used when choosing which athlete will be selected for the OG in case multiple athletes satisfy the qualification requirements of the

14One may also argue that that wins in later rounds are more informative of an athlete’s true strength,

(27)

IJF.

In order to explain how to incorporate uncertainty of the strength estimates, we shall first introduce a more general class of ranking models, of which the Elo model will turn out to be a special case.

The Elo model is a special case of a more general class of parametric pairwise comparison models. These models are meant to compare items in couples and yield a preference ordering between each of these couples. These models can be applied to the ordering of sports teams or individuals in order to give a ranking of the relative strengths of all teams or individuals, which is essentially our goal when we want to track the development of athletes.

Formally, these models assume that the true probability πijt of athlete i defeating athlete

j in period t is given by

πijt= F (θit− θjt), i, j ∈ {1, . . . , n}, i 6= j, t ∈ {1, 2, . . . }, (21)

where F (·) is a symmetric distribution function, n is the total number of competitors and θit and θjt are the strength parameters of competitors i and j respectively. Glickman (1999)

proposes a method to both derive estimators of θitin (21) when F (·) is the logistic distribution

function, and to determine the standard errors of these estimates.15 We summarize the

methodology of Glickman (1999) and the reader is directed to this papers for details. The model in Glickman (1999) has become known as the Glicko 1 model, which we call the Glicko model from this point on.16 The Glicko model casts the problem of pairwise comparison in a

Bayesian framework. Coulom (2008) and Glickman (2016) show that the Glicko models yield superior match outcome predictions compared to the Elo model in respectively the game of Go and volleyball. We first discuss the main idea of the Glicko model.

The first step is to initialize the ratings. Assume time is discretized into periods of equal duration, indexed by t ∈ {1, 2, . . . }. Before competing, say at t = 0, the prior distribution of an athlete’s strength parameter θi0 takes the form

θi0|σi02 ∼ N (1500, σ 2

i0), (22)

π(σi02) ∝ 1, (23)

for i ∈ {1, . . . , n}, where the notation π(σ2

i0) ∝ 1 means that σi02 has a uniform prior, i ∈

{1, . . . , n}. The initial variance parameter σ2

i0 should be inferred from the data.17 Note that

the mean of 1500 is arbitrarily chosen, similar to the initial rating of new players in the Elo system.

15The choice of the logistic distribution function is known as the Bradley-Terry model (Bradley and Terry,

1952).

16An extension of the Glicko model has been proposed in Glickman (2001), and this extension has become

known as the Glicko 2 model.

17Instead of using a uniform prior for σ2

i0, one could make use of athlete characteristics such as age

or training time and quality to obtain more informative prior distribution. The procedure to optimize

σ2

(28)

At the start of some period t, the strength θit of player i has distribution

θit|µit, σ2it∼ N (µit, σit2). (24)

Thus, the estimator of the strength parameter θitof player i of period t is normally distributed

with mean µit and variance σit2. To update the ranking from period to period, one makes the

simplifying assumption that all matches are played at the start of the period. Now, given an estimate of the strength parameter of player i in period t, one assumes that the strength parameter in period t + 1, denoted as θi,t+1, has conditional distribution

θi,t+1|θit, ν2, t ∼ N (θit, ν2t), (25)

where ν2 denotes the increase in variance of the competitor’s strength per time period t. A

high value of ν means that the ability of athletes becomes uncertain after a relatively short amount of time. The value of ν should be inferred from the data.18 Note that the expected

value of the strength estimator in period t + 1 equals the value of the strength estimator in period t. Glickman then integrates (25) with respect to (24) which yields

θi,t+1|µi,t+1, σ2i,t+1, ν

2, t ∼ N (µ

i,t+1, σ2i,t+1+ ν

2t). (26)

We indeed see that the variance of the strength estimator of player i increases by ν2t after

one period of time has elapsed. In fact, the increase is proportional to time.

For all players i ∈ {1, . . . , n}, (26) gives the posterior distribution of the player’s strength at the end of period t + 1. We now require estimates of µi,t+1 and σ2i,t+1in order to update the

strength estimators of the players. Glickman (1999) chooses not to determine all n posterior distributions simultaneously, as the computations would be time-intensive.19 Instead,

Glick-man (1999) approximates the posterior distributions by integrating out opponent’s strength parameters over their prior distributions instead of over their posterior distributions. This is indeed less time-intensive, as all the n prior distributions are already known at the start of the period. Another advantage is that a set of closed-form solutions for the estimators of µi,t+1 and σ2i,t+1 is obtained. A disadvantage is that some relevant information is lost.

Consider players i, j, k ∈ {1, . . . , n} who all competed against each other once in some tour-nament in period t. Then, the result of the match between players i and j in period t is not used to update the strength distribution of player k at the end of period t, as only the prior distribution of the strength of players i and j is used to update the strength of player k.20

The posterior distribution for each θi,t+1 is approximated by a normal distribution with

mean µi,t+1 and variance σi,t+12 and the interested reader finds the derivations in the appendix

18See section 4 of Glickman (1999) for the optimization procedure.

19At present this may however be computationally tractable.

(29)

of Glickman (1999). The estimates of µi,t+1 and σi,t+12 are given by µi,t+1 = µit+ q 1/σ2 it+ 1/δi,t+12 n X j=1 mj X k=1 g(σjt2)sijk− E(s|µit, µjt, σjt2) , (27) σi,t+12 = 1 σ2 it + 1 δ2 i,t+1 −1 , (28) where q = ln(10)/400, (29) g(σ2jt) = q 1 1 + 3q2σ2 jt/π2 , (30) E(sijk|µit, µjt, σ2jt) = 1 1 + 10−g(σ2jt)(µit−µjt)/400, (31) δi,t+12 = q2 n X j=1 mjg(σ2jt) 2

E(sijk|µit, µjt, σjt2)[1 − E(sijk|µit, µjt, σjt2)]

!−1 (32)

for all i ∈ {1, . . . , n}. Note that mj is defined in (12). To interpret these updating equations,

let us focus on player i whose strength estimator has variance σ2it. As the value of σjt2 increases, the value of g(σ2

jt) decreases. Consequently, the marginal weight of a match of player i

against opponent j is reduced, as we note from (27). Thus, playing against opponents with uncertain strength estimators has relatively little impact on the distribution of one’s own strength estimator, ceteris paribus. Furthermore, what happens if the variance of one’s own strength estimate has high variance, i.e., if σ2

it is large? Then, the factor

˜ Ki = q 1/σ2 it+ 1/δ2 (33) in (27) is relatively large. We thus see that when one’s strength estimator has high variance, outcomes of games have a relatively large effect on the distribution of one’s own strength distribution. Also, if your strength is already precisely estimated, new games will have less effect on your strength estimate. In Elo terminology, one could say that active and inactive athletes have respectively a low and a high K-factor.

The Elo system turns out to be a special case of the Glicko updating scheme. In the Elo system we implicitly assume σ2

it = 0, for all t ∈ {1, 2, . . . } and i ∈ {1, . . . , n}, namely

the estimators of the strength parameters of all players are known without error. One easily verifies that the expression of µi,t+1 in (27) then reduces to that of ˆθi,t+1 in (11), with the

K-factor given by (33).

(30)

strength estimates are deemed less important to update your own rating. Secondly, it takes one’s own strength estimate into account. Namely, when your strength estimate is uncertain, outcomes of new games have a relatively large effect on your rating, and vice versa.

There are some disadvantages however. Firstly, the Glicko model has n + 1 parameters that have to be inferred from the data, namely σi02, i ∈ {1, . . . , n}, and ν. Glickman (1999) describes a method to estimate these parameters from the actual data. Secondly, one needs to choose the length of a rating period t, which involves a typical bias-variance trade-off: us-ing long ratus-ing periods means that player’s strengths may have changed significantly within periods, whereas short rating periods lead to reduced accuracy of the estimates due to the ap-proximations used in the Glicko algorithm. Glickman (1999) notes that at least five matches per player on average should be sufficient for the algorithm to produce adequate estimates. Since judo tournaments all follow a knock-out system, this may mean that rating periods should be relatively long. Otherwise, athletes who lose often in the first or second rounds -mostly low-rated players - will have played too few matches per rating period for the Glicko model to yield precise strength estimates. However, this problem also arises in the Elo rating system to some extent: when one has played few matches, the Elo rating will generally be imprecise. A last deficit of the Glicko rating system is that there are no obvious ways to give more weight to wins in more prestigious tournaments or wins in later rounds of tourna-ments. One could alter (27) as to give more weight to these types of matches, as we did when specifying the different forms of the Elo system in section 5.2.2. However, these parameters should probably depend on the uncertainty of a player’s strength given in (28). This strategy may therefore be not be useful. One decide to treat each match equally, thereby giving each round in each type of tournament the same weight.

5.2.4 Specification of Glicko system for judo As described in the previous subsection, the parameters σ2

it, i ∈ {1, . . . , n} and ν need to be

inferred from actual match results in order for the Glicko model to yield reliable strength estimates. Since such data is not available, we choose a set of parameter values as a reference case, and vary each specific parameter to observe the effect on the ratings distribution. This is the same approach as for the Elo models. The following specifications of the Glicko model should not be viewed as a realistic choice to be applied to real-life judo data, however serve as an example to show and analyze the output of the Glicko model. In all cases, the initial ratings are as in (16).

• Case 1: σ2

i0= 300, ν = 15 and the length of a time period is 45 days. This will be the

reference case. Note that σ2

i0 = 300 means that the estimators of the initial strengths

of the athletes are very imprecise. • Case 2: σ2

i0 = 100, ν = 15 and length of time period is 45 days. In this case the

estimates of the initial strengths are less imprecise. • Case 3: σ2

i0 = 300, ν = 30 and the length of a time period is 45 days. In this case the

Referenties

GERELATEERDE DOCUMENTEN

De medewerkster antwoordt dat ze wel naar andere organisaties hebben gekeken, maar dat deze niet zijn meegenomen, omdat het Zorginstituut tot een andere de conclusie komt

Global DNA methylation Aberrant DNA methylation profiles can result in genomic instability and altered gene expression patterns, and these changes are often associated with

For example, involuntary initiated (spontaneous) facial expressions are characterized by synchronized, smooth, symmetrical, consistent and reflex-like facial muscle movements

Volgens die huidige SSR sal dit veral vir alle voornemende SSR-Iede van groot hulp wees as hulle hierdie kongres kan by- woon, aangesien die SSR- verkiesing vanaf hierdie jaar 'n

Finally, the fact that the respondents in some of the used studies do not provide any drawbacks or cost risks associated with adopting cloud computing services does suggest that

When Child Protection services reported higher levels of sexual abuse in young female adolescents, these women also reported lower levels of Borderline personality

Infrastructure, Trade openness, Natural resources as economic determinants and, Political stability, Control of corruption, Rule of Law as potential institutional determinants

As far as China and Africa is concerned, China’s conception of its own national interests in the realist paradigm is what drives Chinese foreign policy, as this study shows in