• No results found

Fouls in Dutch soccer: A Poisson point process

N/A
N/A
Protected

Academic year: 2021

Share "Fouls in Dutch soccer: A Poisson point process"

Copied!
45
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)
(2)

University of Groningen Faculty of Economics & Business

Fouls in Dutch soccer: A Poisson point process

Jorik Harbers S2978245 Supervisor Prof. dr. R.H. Koning Second assessor Prof. dr. R.J.M Alessie January 6, 2021 Abstract

(3)

Contents

1 Introduction 3 2 Literature 4 2.1 Unintentional foul . . . 5 2.2 Professional foul . . . 5 2.3 Hostile foul . . . 5 2.4 Referee . . . 5 2.5 Home advantage . . . 6 2.6 Morals of a foul . . . 6 2.7 Goal scoring . . . 6

2.8 Poisson point processes . . . 7

3 Methodology 7 3.1 Estimation process . . . 8

3.2 Poisson point process . . . 8

3.3 Homogeneous process . . . 10

3.4 Non-homogeneous process . . . 12

3.5 Time difference process . . . 13

3.6 Self-exciting process . . . 14

3.7 Likelihood ratio test . . . 16

4 Data 16 4.1 Team level . . . 17

4.2 Home vs away . . . 19

4.3 Match level . . . 22

5 Results 23 5.1 Homogeneous Poisson process . . . 23

5.2 Inhomogeneous Poisson process . . . 25

5.3 Time difference Poisson process . . . 28

5.4 Self-exciting Poisson process . . . 32

5.5 Fouls within a match . . . 36

6 Conclusion 38

7 Discussion 38

A Derivation log-likelihood time difference process 42

(4)

1

Introduction

Soccer, one of the most popular and largest sports across Europe, is generally seen as a major entertainment industry. There are several professional competitions worldwide, with large fan bases following the league and supporting their favourite teams. These fan bases contribute to large turnovers and profits made by teams. The most common way to make a profit is by selling players, attaining a high rank in the competition and attracting sponsors. A higher rank yields more fans, more media coverage and could increase the willingness to pay by sponsors. Such a ranking is attained by gaining points, either by a draw or a win. The most impor-tant factor determining the result of a game is the number of goals scored and conceded by a team. However, there are more underlying factors for a team winning a game. This could be home advantage (Pollard & Pol-lard (2005)), playing tactics (Lago-Ballesteros et al. (2012)) or just plain luck (Yue et al. (2014)). However one of the less known factors determining the result of a match is a foul.

Fouls occur widely during a match and in many different ways. Examples are a deliberate handball, holding an opponent, a flying tackle and many more. All have in common that the rules set by the organiser of the competition before the start of the season are broken. Whenever a foul is committed there are several pun-ishments possible.

The most common punishment for fouls is a free-kick. In this way, the disadvantaged team keeps possession of the ball and can continue or start a new attempt to score a goal. Besides this free-kick, the referee can choose the give a yellow or a red card. A yellow card is a severe warning. The foul was not harsh enough for receiving a red card, however the player has to be cautious as another harsh foul will result in a second yellow card. A player receiving two yellow cards will receive a red card and will be excluded from the game without being allowed to be replaced. In addition, the player is not allowed to play in the next game. Red cards can also be handed directly. This has the same effect with the player being sent off the pitch, and those are for the more extreme fouls. The last punishment is the penalty kick. This will be awarded to a team if a player of the opponent is ending an attack in the box and yields a free shot at the goal with only the goalkeeper, which has a high chance to result in a goal.

All of these different fouls can be classified in roughly 3 categories, using a similar approach as G¨um¨us¸daˇg et al. (2011). The first category, the unintentional foul is a foul occurring due to circumstances controlled on the pitch. For instance, a player losing its balance and falling into an opponent, or an accidental handball. The player has no intention of making the foul, however, the opposing team has a disadvantage and cannot continue their action.

The second way is classified as the intentional instrumental foul. This is more commonly known as the strate-gic foul, an intentional foul made by a player to end a possible dangerous counter-attack. These fouls are occurring more often across soccer according to Carmichael et al. (2000). These fouls are made intentionally but are not meant to harm or injure the opponent.

The last type of foul is the intentional hostile foul. This foul is made intentionally and aims to harm the oppo-nent. Reasons for intentional fouls can be forms of payback or getting revenge after a harsh foul, or the result of irritation/disappointment when a match is not going according to plan.

(5)

a team within a match. Is there a difference in the number of fouls made within parts of a match? For instance, is there a difference in the number of fouls made in the 30th minute compared to the 75th. What happens in the time following a foul, is a foul more likely to occur short after another foul or is there some time without fouls? And lastly, does the number of fouls made before a certain point in time in the game give an insight into how many fouls will follow in the remainder of the game. In addition, it might be of interest whether there are differences in these questions for teams playing at home or away. And how does this relate at a match level, the combination of both these teams? This match setting could be of interest in the attractiveness of the game, possibly increasing the amount of viewers at home or in the stadium.

This will be investigated with multiple models in the setting of a Poisson point process. First using a con-stant rate, to model the expected number of fouls to occur within a match. After that a model assuming an increasing number of fouls made over time will be fit, followed by a model taking into account the time that has passed since the last foul. Finally, the difference in time between current time and all previous fouls is investigated to see if the expected number of fouls increases when more fouls are made and possibly giving more weight to recent fouls made compared to fools made earlier in the match.

These methods using the most recent events are quite common in modelling earthquakes and is upcoming in the field of finance. For instance, one commonly cited paper in the field of earthquakes is Ogata (1988). He investigates the timing and magnitude of earthquakes and the influence of aftershocks and seismologic activ-ity in predicting the next major earthquake. Similarly in finance, the method has implications for estimating value-at-risk as Chaves-Demoulin et al. (2005) performed. They model a peak over threshold approach for daily percentage return for indices, so estimated when a certain threshold was crossed and by what value. Especially, the main point of this process lies in tail estimation and emphasises the influence of recent events in comparison to more distant ones.

This paper will be structured as follows. First, a brief literature review of relevant factors will be given, fol-lowed by the different methods used in this paper. Section 4 will describe the data at the different investigated levels. In section 5 the results will be discussed. The paper ends summarising the conclusions and provides a brief discussion and possible suggestions for further follow up research.

2

Literature

Currently, there exists a gap in literature regarding the frequency of fouls within game dynamics and possible relations between them. A lot of soccer-related research has been focused on the frequency of goals being scored and the presence of home advantage during matches. Furthermore, research has been made in the medical field, concerning the presence of injuries and the influence of endurance ability within a match. De-spite the lack of literature regarding the actual frequency of fouls, there are related topics that could explain partly why fouls are made, drivers for fouls and why they occur often during a match.

(6)

discussed, and finally one of our specific models will be discussed briefly.

2.1 Unintentional foul

The unintentional foul is a foul made by accident or some sort of clumsiness. There is no real intention to make a foul or harm the opponent and it just happens during the match. For instance, Rampini et al. (2011) and Carling & Dupont (2011) both mention that the sprinting speed of players decreases during the game and players covered less distance when the game progressed. This could result in players being slightly late for an interception missing the ball and making a foul. These fouls are unintentionally, however, they occur. Other reasons for this type of foul could be players who are too eager, misjudge a pass or have an incorrect timing in their tackle.

2.2 Professional foul

The professional foul can be seen as a foul made on purpose, but is not meant to harm the opponent or create injuries. It is a decision made by a player to end an attack. According to Carmichael et al. (2000) the professional foul has become part of a defenders repertoire. There are benefits in fouls that seem to outweigh the punishment of a foul, making this particular foul beneficial for the team. Similarly, G¨um¨us¸daˇg et al. (2011) mention that players seem to find a balance between the punishment and the profit of a foul, for instance to avoid a goal by a foul near the end of a game. An example of this is the red card conceded by Luis Suarez during the quarter-final of the world championship against Ghana. The player made a handball on the goal line, preventing a goal to be scored. Ghana was rewarded with a penalty kick and Suarez with a red card. The resulting penalty was missed and Suarez team made it to the next round. The penalty and the red card were accepted by the player, in the hope of giving his team a chance to win the game and proceed. This is an extreme example of a professional foul. However, there are more examples. For instance a player not keeping up with the speed of an attacker and pulling back the attacker, or make him lose his balance.

2.3 Hostile foul

These intentional harsh fouls are not intensively discussed in current literature. However, G¨um¨us¸daˇg et al. (2011) mention that these are mostly emotional fouls. For instance things not going according to plan triggers a reaction based on disappointment or frustration or a payback. An example of an aggressive foul is the well known foul by Zinedine Zidane in the world championship final in 2006. He gave a headbutt to opponent Marco Materazzi and received a red card. Other more general examples of such fouls are intentional elbows, pedalling afterwards or even biting.

2.4 Referee

After mentioning the different types of fouls and possible reasons for them, they still have to be classified as fouls during the game. The referee decides when something is a foul or not and has the lead of the match. A referee is basing his decision on what he sees and makes a judgement whether an action can be classified as a foul. As this classification is based on a judgement call there are differences between referees. Some referees allow players to have more physical contact than others in the same situation. Depending on their style the referee can have a large impact on the game.

(7)

(2002) conclude that referees were less certain about their decisions regarding home teams. This could lead to biases of the referee, by not whistling for a foul by the home team, or be unintentionally more willing to give fouls against the away team. However, it is not investigated whether it results in an actual advantage for the teams playing at home related to the frequency of fouls.

2.5 Home advantage

Regarding home advantage, Pollard & Pollard (2005) give a review of home advantage in different competi-tions and researches. Their conclusion is that home advantage is present in all levels of soccer. They state that it is probably due to the home crowd in the stadium, which in turn can result in the previously mentioned referee bias, and possibly increases the home teams chances to win. Other reasons for home advantage could be familiarity with the location. Knowing which part of the pitch is flat for clear passing or making better use of the spaces on the field as the size of fields are allowed to differ. Similar to this, territoriality drifts of the player can have an influence as well, with players being more reluctant to lose during a game at home. Furthermore, some teams have different playing tactics at home. Teams usually play more offensive at home compared to away. Which could result in more goals being scored and could alter the result of the match. This difference in playing tactics could also be of interest to our research. As different playing styles could affect fouls being made by a team. In conclusion, regarding home advantage in a match, Pollard & Pollard (2005) do conclude it is present based on multiple researches, however, the reasons are not necessarily clear.

2.6 Morals of a foul

Concerning that fouls are common practice, there are some publications that treat fouls from a moral point of view. For instance Moore (2017) elaborates on whether it is ethically acceptable to make a professional foul, and lists both reasons why it is acceptable or not.

Differently, Traclet et al. (2011) have classified fouls and the players reactions on these fouls as moral disen-gagement. They performed a research of what justified these fouls from a players point of view. In some cases the referee was blamed for the foul, coaches were blamed. And more different reasons why players did not take responsibility for the foul made. Both publications show a more moral point of view on fouls being made, and gives some more insight in what drives players to make a foul.

2.7 Goal scoring

(8)

2.8 Poisson point processes

This paper aims to estimate the different models using a Poisson point process, with a particular interest in the self-exciting processes. For our sample, these self-exciting processes turn out to be similar to a so-called Hawkes process. According to Hawkes (2018), the usage of this process was small in the last 25 years with an exception among seismologists. However, in recent years the use in finance and social network studies has been increasing fast.

The most commonly known paper in the seismologic field is Ogata (1988), investigating spatial influences in an earthquake process, using a specific variant of this model for their modelling of earthquakes. If an earthquake or shock is captured early, this model could predict the location and magnitude of a possible new large shock for the upcoming time. Furthermore Ogata performed multiple research concerning the Hawkes process, in Ogata (1978) investigating stationarity of the model and imposing certain restrictions to the earthquake-related Poisson processes.

In finance the contribution has been growing over the years. One of the first papers using such a similar ap-proach are Chaves-Demoulin et al. (2005). Using a self-exciting process they model the timing of exceedances using recent events affecting the current intensity more than distant ones. They conclude that their model yields a reasonable model for the behaviour of returns. Other papers involving a self-exciting process are Bacry et al. (2015), giving an overview in Hawkes process usage in finance and mathematical theory. They conclude that the Hawkes process allows characterising precisely between different evens and accounts for causal relations however all of these relations can be modelled within a rather simple framework.

Considering the fouls and the motivation behind of them, there are large differences present. We are not particularly interested in why a foul is made or whether or not it is justifiable to make the foul. The 3 categories mentioned distinguishes between motivational issues regarding the foul. However, the point of interest of this paper lies in the timing of fouls made and the influence of previous fouls.

3

Methodology

(9)

3.1 Estimation process

All parameter estimates are based on maximum likelihood estimation. This implies that for each used specifi-cation of λ(t) a log-likelihood function has to be specified. Thereafter this log-likelihood has to be maximised with respect to the parameters to find the corresponding parameter estimates. Different methods can be used to maximise the log-likelihood function. For instance, by Azzalini (1996) we can use numerical differentiation, and set the partial derivatives of the log-likelihood function equal to zero. Or alternatively checking multiple parameter values and search for the parameters where the maximum value is obtained. Since a log-likelihood function can have multiple local maximums, in the last option multiple starting values have to be used to obtain the global maximum. After comparing these function values an overall optimal function value can be found with the according parameters. These parameters will equal our maximum likelihood estimates if a second-order condition is fulfilled; the hessian should be negative definite. The hessian is a matrix with the second-order derivatives of the log-likelihood function and can be evaluated at points of interest.

This hessian, we can use to obtain the standard errors for our parameters estimates. According to McNeil et al. (2015), one of the properties of the maximum likelihood estimators is that:

n( ˆθn− θ) d

→ N (0, I(θ)−1),

where I(θ) denotes the expected Fisher information matrix. The expected Fisher information matrix is defined by: I(θ) = −E  ∂2 ∂θ∂θ0L(θ; X)  . Due to the asymptotic properties, we have that:

ˆ θ ∼ N  θ,1 nI(θ) −1  .

Instead of using the Fisher information, I(θ), we can use the hessian to approximate it when the sample size is large. Using this, for an individual parameter j of θ we have that our standard error equals:

se(ˆθj) = r 1 nI(θ) −1 jj .

And therefore the standard errors can be approximated by the hessian after certain translations. This hessian needs to be evaluated at the parameters of the optimal point. The hessian can either be found numerically by differentiating the log-likelihood function and fill this in. Or for more complex functions it can be approxi-mated numerically using software. And the standard errors will equal the square root of the diagonal elements of the approximated hessian.

3.2 Poisson point process

(10)

The statistical properties of a point process can be defined by a set of non-negative integer-valued random variables N(A), for A ⊂ A, where N(A) is the number of points in set A. And can be used to specify the probability distribution in a consistent way for each N(A). And one of the main features of the point process is:

Λ(A) = E(N (A)),

which provides the expected number of point in any subset A ⊂ A and is the intensity measure of the process. For a one-dimensional process the common form is a Poisson process. Either a homogeneous or an inhomo-geneous Poisson process or another form of Poisson process. All of these make use of a intensity rate λ(t) that has to be specified beforehand.

For a Poisson process it must hold that: • for all A = [t1, t2] ⊂ A,

N (A) ∼ P oi(Λ(A)),where, Λ(A) =

Z t2

t1

λ(t) dt

• for all non-overlapping subsets A and B of A, N(A) and N(B) are independent random variables. Therefore, one of the intrinsic properties of a Poisson process is that points occur independently of another. A occurrence of an event at point x has no direct or causal influence of events happening at any other moment in time and is memoryless regarding direct relations to the previous events. Differences in the number of occur-rences are possible due to variations in the rate λ(t) and is not due to the presence or absence of events nearby. Application of this model requires a set of observed points N, occurring random at time T1, . . . , TN, and are

the realisations of the Poisson process on A, with rate λ(.; θ) for some value of the parameters θ. For the likelihood we assume that events have occurred at times T1, . . . , TNand can not occur at other times in A. Let

Ii = [Ti, Ti+ δi], for i = 1, . . . , N be a set of small intervals based around the different observed occurrences

and let I = A\ ∪N

i=1Ii. Then by the Poisson property it holds that:

P r(N (Ii) = 1) = exp(−Λ(Ii; θ)Λ(Ii; θ), where

Λ(Ii; θ) =

Z Ti+δi

Ti

λ(u) du ≈ λ(Ti)δi.

Where we have used that exp(−λ(Ti))δi ≈ 1, for small δi. Furthermore,

(11)

Using these different probabilities we can construct the likelihood function as: L(T1, . . . , TN; θ) = P r(N (I = 0, N (I1) = 1, N (I2) = 1, . . . , N (IN) = 1) = P r(N (I = 0)) N Y i=1 P r(N (Ii) = 1) ≈ exp(−Λ(A; θ)) N Y i=1 λ(Ti; θ)δi.

This expression can be turned into a density by dividing by δi, this leads to:

L(T1, . . . , TN; θ) = exp(−Λ(A; θ)) N Y i=1 λ(Ti; θ), where Λ(A; θ) = Z A λ(T ; θ) dt.

Alternative to the likelihood function, the log-likelihood function can be constructed. This requires a mono-tonic transformation by means of the natural logarithm to the likelihood function and can be used more easily with regard to our modelling approach.

L(T1, . . . TN; θ) = log(L(T1. . . TN; θ) = log  exp(Λ(A; θ)) N Y i=1 λ(Ti; θ)  = log  exp(−Λ(A; θ))  + log  N Y i=1 λ(Ti; θ)  = −Λ(A; θ) + N X i=1 log(λ(Ti; θ)). (1)

This equation for L(T1, . . . , TN; θ) is for a general Poisson point process and can be used for performing

maximum likelihood estimation when a specific form of λ(t) is known. This form has to be filled in and can be used to find our maximum likelihood estimates. For λ(t) different forms can be chosen, depending on the sort of process. Furthermore, when this general from of λ(t) is known and our estimation interval, we can find and fill in the actual expression for Λ(A; θ) into the expression for the log-likelihood.

In the remainder of this paper for a single observation z, Nzevents occur with the according times T1, . . . , TNz

in the M minutes for our investigated sample. And we will first calculate the individual log-likelihood Lz(T1, . . . , TNz; θ). For the log-likelihood of multiple observations, the time of multiple events will be noted

as data, due to the different Tifor each sample z.

3.3 Homogeneous process

The most simple version of a Poisson point process is the homogeneous Poisson process. This process assumes one constant parameter λ > 0 for λ(t) over the whole time span. The log-likelihood can be obtained by filling λinto equation (1) and calculating Λ(A; θ). Doing this yields:

Λ(A; θ) = Z M

0

(12)

Now filling in this expression for Λ(A; θ) and the expression for λ(t) into our likelihood expression we have that: Lz(T1, . . . , TNz; θ) = − Λ(A; θ) + Nz X i=1 log(λ) = − M λ + Nz X i=1 log(λ) = −M λ + Nz· log(λ).

Now for the full likelihood for a total number of matches within a sample of size Z, we need to add the log-likelihood of the separate matches.

L(data; θ) = Z X z=1 Lz(T1, . . . , TNz; θ) = Z X z=1 − M λ + Nz· log(λ) = −Z · M · λ + Z X z=1 Nzlog(λ).

To obtain our maximum likelihood estimator we need to differentiate with respect to λ and equate this to zero. Doing this we find that:

−M · Z + Z X i=1 Nz λ = 0 1 λ Z X z=1 Nz = M · Z ˆ λ = PZ z=1Nz M · Z

To obtain the standard errors we should take a second hand order derivative, this results in: Hessian(data; θ) = ∂2L(θ;data) ∂λ2 = ∂ − M · Z +PZ z=1Nλz ∂λ = − Z X i=1 Nz λ2.

As this is a hessian for a single parameter model, it holds that: V ar(ˆλ) = 1 − − PZ i=1Nz ˆ λ2 = PZ1 i=1Nz ˆ λ2 = ˆ λ2 PZ i=1Nz .

Therefore we can find our maximum likelihood estimate for λ by dividing our mean number of events by the total time spanned. Our standard error for ˆλ equals the square root of ˆλ squared divided by the total number of events.

(13)

3.4 Non-homogeneous process

The previous model assumed a constant rate for λ(t). However, this rate does not have to be constant as there might be differences in number of events over time. Therefore, in this case, λ(t) is allowed to change over time and another specification for λ(t) has to be decided. For this particular setting we make use of the function:

λ(t) = α + β · t, where α > 0, β ≥ 0.

As the rate λ(t) must be greater than zero, we assume that α > 0, furthermore as we assume an increasing rate, it must hold that β ≥ 0. Since the trend might not be increasing, β must be allowed to equal 0 as well. We can estimate these parameters, by filling in equation (1) with regard to this process, and obtain the log-likelihood. First of all, this equation needs to be simplified in order to get to a solution. Starting with Λ(A; θ): Λ(A; θ) = Z M 0 λ(s) ds = Z M 0 α + βs ds = Z M 0 α ds Z M 0 βs ds =  αs M 0 + 1 2βs 2 M 0 = α(M − 0) + 1 2β · (M 2− 0) = M · α +1 2· M 2· β.

Filling in both equations for Λ(A; θ) and λ(t) into equation (1), we find that: Lz(θ; T1, . . . , TNz) = −Λ(A; θ) + Nz X i=1 log(λ(Ti) = −M · α − 1 2· M 2· β + Nz X i=1 log(α + β · Ti)).

Now we want to find our maximum likelihood estimators for the total sample, therefore we need to add the likelihoods of the individual cases. Then we have:

L(data; θ) = Z X z=1 Lz(T1, . . . , TNz; θ) = Z X z=1 − M · α −1 2· M 2· β + N X i=1 log(α + β · Ti) = Z X z=1 − M · α − 1 2· M 2· β + Z X z=1 Nz X i=1 log(α + β · Ti) = Z · − M · α −1 2 · M 2· β + Z X z=1 Nz X i=1 log(α + β · Ti).

Now we have to take derivatives with respect to α and β to find equations our the parameters α and β have to fulfil to be the maximum likelihood estimators for our sample.

(14)

∂L(data; θ) ∂β = − ∂Z · − M α −12 · M2· β + PZ z=1 PNz i=1log(α + β · Ti) ∂β = −1 2 · M 2· Z + Z X z=1 Nz X i=1 Ti α + β · Ti

So it should hold simultaneously that: M Z = Z X z=1 Nz X i=1 1 PZ z=1α + β · Ti 1 2· M 2· Z = Z X z=1 Nz X i=1 Ti α + β · Ti .

These equations can not be solved directly and therefore our maximum likelihood estimators fulfilling the con-straint should be found by numerical optimisation. Furthermore, to obtain standard errors of the parameters the hessian can be estimated numerically to find them.

3.5 Time difference process

Another possible form for λ(t) takes into account the time that has passed since the previous event. In fact, this is a relaxation of the yet to come self-exciting Poisson process. The difference lies in the assumptions regarding the parameter γ and the inclusion of the last event instead of all previous events. However, this will be addressed later. For this model we assume that:

λ(t) = τ + ψ ·1Tj<t<Tj+1· exp(−γ · (t − Tj)),

where Tjdenotes the last event that has occurred before time t and Tj+1is the moment in time of the following

event. For this particular case, τ implies the initial rate and it should hold that τ > 0. And ψ ≥ 0 shows the increase if a foul has been made, while γ shows the excitation over time and can be both negative and positive. Whether there is a increase or decrease in the rate λ(t) when more time has passed. Now we can fill this particular form of λ(t) into equation (1), and we find that:

Lz(T1, . . . , TNz; θ) = −M · τ + ψ γ · N −1 X j=1 (exp(−γ · (Tj+1− Tj)) − 1) + ψ γ(exp(−γ · (M − TNz)) − 1) + Nz X i=1 log(λ(Ti)),

(15)

observations. L(data; θ) = Z X z=1 Lz(T1, . . . , TNz; θ) = Z X z=1 −M · τ +ψ γ · Nz−1 X j=1 exp(−γ · (Tj+1− Tj)) + ψ γ exp(−γ · (M − TNz) + log(λ(Ti)) = −M · Z · τ +ψ γ Z X z=1 Nz−1 X j=1 (exp(−γ · (Tj+1− Tj)) − 1) + ψ γ(exp(−γ · (M − TNz)) − 1) + log(λ(Ti))

We can solve this numerically using software and find the optimal values for the parameters. Furthermore, we can find a numerical approximation for the hessian in order to find the standard errors for the parameters.

3.6 Self-exciting process

A special class of the Poisson point processes are the self-exciting processes. And can be seen as an exten-sion to the homogeneous and inhomogeneous Poisson point processes. The differences between the regular Poisson process and the self-exciting process is that the self-exciting process assumes an additional intensity function. This intensity function is based on the assumption that events, in this occasion fouls, are clustered. Therefore a first occurrence could start a set of events in the near future. Using McNeil et al. (2015) as a ref-erence point, we can construct the form of λ(t) with the according likelihood function and assumptions. Given that for a match z, we have Nzoccurring events. The general self-exciting process will be:

λ(t) = τ + ψ · X

Tj<t

h(t − Tj),

where τ > 0, ψ ≥ 0 and h() is a positive-valued function.

In this setting only the timing of fouls are relevant. In other settings such as earthquakes in Ogata (1988), value at risk in Chaves-Demoulin et al. (2005) or spatial sciences it is also of interest what size the occurring event is. And events will only be considered an event after a certain threshold has been made for the variable of interest. For possible versions of this we want to refer to McNeil et al. (2015).

There are different possibilities for the intensity function. In this particular case, we have chosen for an intensity function of:

h(s) = exp(−γ · s),where γ > 0. Using this, we can find our final equation for λ(t), and we find that:

λ(t) = τ + ψ · X

Tj<t

exp(−γ · (t − Tj)).

(16)

estimates for γ are allowed to be negative, this to allow for an increase over time. However in this setting a negative estimate for γ would imply that events occurring further in the past provide more information about the expected events to occur at time t, which discards the properties of this model. Therefore this model has the extra assumption of γ ≥ 0.

Figure 1: Example simulation self-exciting process

Due to the combination of the assumptions for our team, our process above will boil down to a so called Hawkes process as defined and focused on in Ozaki (1979). An example of a simulated self-exciting or Hawkes process can be seen in Figure 1. In this figure such a process has been simulated with τ = 0.2, ψ = 0.5 and γ = 1, while the latest event is allowed to occur at moment 90. This figure shows us that events are clustered. Multiple events occur at the same moment in time, while for other long periods no event occur. The periods between the fouls depend on the size of τ. A large τ implying larger periods between the occasions. The number of events nearby depends on the parameter ψ, the larger ψ, the more events occur after one event. Lastly, γ shows the behaviour of λ(t) shortly after an event. A small γ suggests that more events are likely to occur in the following short period, while a large γ suggests that the large rate of events occurs local and has a lower probability to have influence on the rate for future events nearby.

To estimate the parameters the likelihood should be maximised. From equation (1) a solution to the log-likelihood for the general Poisson point process depends on the form of λ(t). We can transform this for our specific case by filling in our expression for λ(t) and finding an expression Λ(A; θ). To maximise this function numerically it needs to be rewritten into an easier form, when doing this we find the expression for the log-likelihood below for a single observation z. For the full manipulations of obtaining Lz(T1, . . . , TNz; θ)we

refer to Appendix B. Lz(T1, . . . , TNz; θ) = −M · τ + ψ γ · N X i=1 exp(−γ(M − Ti)) − 1 + Nz X i=1 log(λ(Ti)).

(17)

L(data; θ) = Z X z=1 Lz(θ; T1, . . . , TNz) = Z X z=1  − M · τ + ψ γ · Nz X i=1 exp(−γ(M − Ti)) − 1 + Nz X i=1 log(λ(Ti)  = −M · τ · Z + ψ γ Z X z=1 Nz X i=1 exp(−γ(M − Ti)) − 1 + Z X z=1 Nz X i=1 log(λ(Ti))

This expression can be maximised numerically using software to obtain our maximum likelihood estimates. Furthermore, the hessian can be approximated to get the according standard errors.

3.7 Likelihood ratio test

For comparison of different samples a likelihood ratio test will be used. The likelihood ratio test is based on the log-likelihood and will asymptotically follow a χ2distribution with the degrees of freedom equalling the

number of parameters in difference between the models. This likelihood ratio test has a null hypothesis of the parameters being equal, versus the alternative hypothesis of different parameters. The likelihood ratio equals −2 · (LLnull− LLalternative), and will be rejected if this ratio is larger then the value of the according χ2 distribution. If it smaller than the value of the χ2 distribution we fail to reject the null hypothesis. For full

details regarding this test we want to refer to Azzalini (1996).

4

Data

The data is provided by Infostrada and contains records of matches in the Eredivisie, the national soccer league of the Netherlands. The Eredivisie is a round-robin competition with 18 teams, each team playing each other twice. One time they play in their stadium at home and the other time away in the stadium of the opponent. Due to this set up of the league, every season contains 306 distinct matches with in total 612 different obser-vations to investigate. The data is available starting from the season 2007-2008 and continues up and until the season 2013-2014, totalling 2142 matches.

For each match, there is data available for every distinct action regarding foul play. For every foul, yellow card or red card there is information on the minute it has occurred, the player who made the foul, the team he is playing for, the opponent and the referee. For the yellow and red cards, there is also a general description of the foul available. As we are interested in modelling the occurrence of fouls, the main variable of interest is the minute a foul occurred, and identification variables for the foul regarding team and match.

(18)

instance, when advantage has been given and the referee only gave the card to the player at the end of the attack. Therefore the data on all actions is added and afterwards filtered on whether a player has had multiple fouls in the same minute or the minute before.

Figure 2: Distribution fouls across matches

Examples of how fouls are distributed among a match are given in Figure 2. The fouls made in the match De Graafschap-Ajax on the 19th of August 2007 can be seen on the left. While on the right the fouls made in the match Excelsior-FC Groningen on the 18th of December 2010 can be seen. From this figure, it can be noticed that in general fouls do not come on their own and there is a clustering present. There are occasions where multiple fouls are made in the same minute or shortly after a foul in the next minutes. However, there are also periods where no fouls are made in more than 30 minutes after the previous foul. These two examples look slightly similar as our simulated example in Figure 1. However, there is a lot of variability in the time between fouls within matches and this will be addressed in more detail over the seasons later by the average waiting times.

Lastly, since multiple models are being estimated for all teams, a distinction between home playing and teams playing away will be made and a combined match evaluation will be investigated. Multiple samples will be used for estimating these differences. All of these samples will be described separately starting with separate teams, followed by home and away teams and finishing with fouls at a match level.

4.1 Team level

(19)

that on average a team makes a foul in less than every 3 minutes. The average number of fouls for the general sample lies around 14.660, implying a foul every 6 minutes. More summary statistics can be found in Table 1.

Table 1: Summary statistics all teams over seasons per match

Season Season no. fouls Mean no. fouls Sd. no. fouls Avg wait time Sd. Avg wait time

2007-2008 9813 16.034 4.614 5.753 1.832 2008-2009 9781 15.982 4.421 5.761 1.736 2009-2010 9308 15.209 4.379 6.062 1.891 2010-2011 9171 14.985 4.278 6.153 2.022 2011-2012 8484 13.863 4.358 6.752 2.44 2012-2013 8362 13.663 4.261 6.803 2.390 2013-2014 7871 12.861 4.032 7.214 2.634 All 62790 14.660 4.479 6.357 2.221

When inspecting Table 1 first of all it stands out that the number of fouls is decreasing over the seasons. This can both be observed from the decrease in the total number of fouls in the season as well in the mean number of fouls per match. The standard deviation is decreasing over the seasons as well, indicating this might be true. Performing an Anova test for an equal mean number of fouls per match over all the seasons gives us the evidence. A p-value of less than 0.001 rejects the null hypotheses of equal means and therefore we conclude that there are differences between the seasons amongst the mean. Besides the average number of fouls per match, the time between fouls is of interest as well.

Figure 3: Count average waiting time in a match over seasons

(20)

the 7th minute for all distinct seasons. Furthermore, it can be noticed that the shapes are quite similar for all seasons, all having a bell shape. This indicates that most of the observations lie around the mean. However, the tail of the distribution is slightly larger than the head, also due to the presence of matches with a small number of fouls resulting in large average waiting time, for instance, more than 20 minutes.

Figure 4: Histogram number of fouls per minute across season

Besides the previously mentioned summary statistics, it is important to know at what time a foul is occurring. Figure 4 shows the total number of fouls per minute for each distinct season. First of all, it stands out that the y-axis is different regarding the maximum amongst seasons. Overall it can be concluded that there is a baseline in the level of fouls per minute where it moves around depending on the minute with both highs and lows. However, based on eye-balling the data a clear trend can not be observed, there is for instance no increase in the number of fouls over time.

Due to the differences between seasons all significant at a 1 per cent level, all seasons must be evaluated separately. However, to make general conclusions, a combination of all seasons will be evaluated as well.

4.2 Home vs away

(21)

differences in the mean number of fouls per match amongst seasons and home and away teams at a 1 per cent significance level. However, based on the statistics in Table 2 the differences seem to become smaller over time. In the season 2007-2008, there is a difference of 1.395 fouls per match, decreasing to 0.709 for the season 2013-2014. Differences between occurrences in the data are small besides the average. For instance, the maximum number of fouls made by home teams equals 31, while for away teams this is 32. Both samples have a minimum number of fouls equalling 3, and the frequencies of both the high and low count of fouls seem quite similar across both samples.

Table 2: Summary statistics number of fouls home vs away teams per match

Home Away

Season Season No. fouls avg no. std Season No. fouls avg no. std. p-value t-test

2007-2008 4693 15.337 4.415 5120 16.732 4.710 1.714E-04 2008-2009 4703 15.369 4.347 5078 16.595 4.417 5.800E-04 2009-2010 4504 14.719 4.435 4804 15.699 4.274 5528E-03 2010-2011 4397 14.369 4.126 4774 15.601 4.345 3.484E-04 2011-2012 4105 13.415 4.389 4379 14.310 4.288 1.093E-02 2012-2013 4053 13.289 4.220 4309 14.082 4.272 2.128E-02 2013-2014 3827 12.507 3.896 4044 13.216 4.140 2.949E-02 All 30282 14.144 4.379 32508 15.176 4.519 3.776E-14

Table 3: Summary statistics average waiting time home vs away teams per match Season home std home away std away p-value t-test

2007-2008 6.010 1.972 5.496 1.645 4.866E-04 2008-2009 5.966 1.723 5.556 1.726 3.382E-03 2009-2010 6.292 2.056 5.831 1.681 2.502E-03 2010-2011 6.386 2.179 5.919 1.826 4.209E-03 2011-2012 7.039 2.653 6.466 2.174 3.623E-03 2012-2013 7.024 2.601 6.583 2.139 2.249E-02 2013-2014 7.334 2.485 7.095 2.860 2.607E-01 All 6.579 2.300 6.135 2.117 5.740E-11

(22)

Figure 5: Average waiting time between fouls per match

When inspecting the distribution across minutes in Figure 6 it appears that for both home and away teams a sort of wave can be observed. This wave pattern seems more present for away teams than for home teams and can be observed for every distinct season. There are some differences in the height of the waves as in the later seasons the peaks are lower compared to the earlier seasons. In relative comparison, the total number of fouls per minute by home teams is larger till the 20th minute. From that moment in time, the number of fouls of away teams seems to be larger than home teams.

(23)

4.3 Match level

In the previous paragraphs teams were analysed separately. We will now investigate the combined number of fouls by both teams at match level. The differences between both settings are small as for instance the distribution across minutes stays the same as in Figure 4. However, the differences are present in the minimum number of fouls in a game equalling 9, while the maximum number of fouls in a game equals 56. This implies a larger range for the number of fouls compared to our models at team level. Summary statistics regarding the number of fouls can be found in Table 4. From these observations, it stands out that the mean number of fouls is decreasing over time. Performing an Anova test confirms the differences in mean, rejecting the null hypothesis of equal means at a 1 per cent significance level.

Table 4: Summary statistics fouls made at match level

Season Mean no. fouls Sd. No. fouls Avg wait time std avg wait time

2007-2008 32.069 6.828 2.870 0.659 2008-2009 31.964 6.917 2.893 0.682 2009-2010 30.418 6.435 3.019 0.656 2010-2011 29.971 6.225 3.065 0.688 2011-2012 27.725 6.892 3.384 0.976 2012-2013 27.327 6.538 3.406 0.918 2013-2014 25.722 5.848 3.570 0.861 All 29.314 6.906 3.172 0.827

Figure 7: Average waiting time distribution at match level

(24)

confirmed by an Anova test significant at a 1 per cent level. This can also be seen in Figure 7, as the peaks for the different seasons are all slightly at different times, however, do show similar patterns.

5

Results

First of all, before discussing the results, we want to stress that each foul is seen as an independent event during this process. That is, there is no direct or causal relation between two distinct fouls within a match and all models make use of the memoryless property of the Poisson process. Furthermore, due to the low number of events, the parameters have been estimated over the whole population, including all distinct teams within a particular subset of the data. All estimates are limited to occur within 90 minutes, setting M to 90 and no events can occur after this. Therefore, it is focused on what happens with the dynamics of the game, how the different models relate within a game and what happens regarding fouls in those 90 minutes. If there is a difference in the number of fouls during the match, or an increased number of expected fouls when more time has passed since the last foul. And the possible effects of inclusion of all previously made fouls.

5.1 Homogeneous Poisson process

First, the homogeneous Poisson point process will be discussed. This can be seen as a relatively simple model and can be used as a comparison for the other models. This model assumes a constant parameter over time and in fouls, while the point estimates can be obtained using the mean number of fouls made. For this model will give an indication for the number of expected fouls per minute.

Table 5: Results homogeneous Poisson process

Season team se(team) home se(home) away se(away) Match se(match) 2007-2008 0.178 1.796E-03 0.170 2.482E-03 0.186 2.599E-03 0.356 3.594E-03 2008-2009 0.178 1.799E-03 0.171 2.493E-03 0.184 2.582E-03 0.355 3.590E-03 2009-2010 0.169 1.752E-03 0.164 2.444E-03 0.174 2.510E-03 0.338 3.503E-03 2010-2011 0.167 1.743E-03 0.160 2.413E-03 0.173 2.504E-03 0.333 3.477E-03 2011-2012 0.154 1.671E-03 0.149 2.326E-03 0.159 2.403E-03 0.308 3.344E-03 2012-2013 0.152 1.662E-03 0.148 2.325E-03 0.156 2.376E-03 0.304 3.324E-03 2013-2014 0.143 1.612E-03 0.139 2.247E-03 0.147 2.312E-03 0.286 3.224E-03 All 0.163 6.504E-04 0.157 9.022E-04 0.169 9.373E-04 0.326 1.301E-03

Note that all parameters are significant at 1 per cent level

The results for all different samples can be found in Table 5, starting at a team level with the point estimates in column two and the standard errors in column three. It has to be noted that all parameters are significant at a 1 per cent level. Besides this high level of significance for all point estimates it stands out that the estimates are becoming smaller over the seasons. This seems to be confirmed with the decrease in the average number of fouls made and the increase in the average waiting time. Regarding the interpretation of the point estimates, we find that λ(t) = λ. In season 2007-2008 and season 2008-2009 according to the model we expect 0.178 fouls to occur in a minute. For season 2009-2010, there is an expected number of 0.169 fouls per minute. For the other seasons, the parameters can be interpreted in a similar manner where the point estimate stands for the number of expected fouls per minute.

(25)

while also the trend of decreasing estimates can be seen. Furthermore, there is a large difference between the point estimates of home and away teams while both have small standard errors. For instance, the difference in the expected number of fouls per minute in the season 2007-2008 between home and away teams is 0.016, which within a 90 minute match will result into an expected 0.016 · 90 = 1.440 more fouls made by away teams compared to home teams. This difference in expected fouls within a match is decreasing over time, as in season 2013-2014 away teams are expected to make 0.008 · 90 = 0.720 more fouls compared to home teams. The differences in point estimates show that it is likely that there is a difference between home and away teams. This is confirmed by the likelihood ratio test. This test compares the log-likelihood of the combination of home and away teams with the model without a distinction. The null hypothesis states that the initial model without the distinction fits better to the data. According to the results in Table 6, we can reject this null hypothesis for all seasons. This implies that for every season the models with a distinction between teams playing at home and away fits better to the data than one combined model for both. Based on this we can conclude that there is indeed a difference in teams playing at home or away regarding the expected number of fouls made.

Table 6: Result likelihood ratio test homogeneous model

Season LLf ull LLhome LLaway LR p-value

2007-2008 -2.674E+04 -1.300E+04 -1.373E+04 18.585 1.625E-05 2008-2009 -2.669E+04 -1.302E+04 -1.366E+04 14.380 1.494E-04 2009-2010 -2.586E+04 -1.266E+04 -1.319E+04 9.672 1.871E-03 2010-2011 -2.561E+04 -1.246E+04 -1.314E+04 15.471 8.378E-05 2011-2012 -2.435E+04 -1.192E+04 -1.243E+04 8.843 2.942E-03 2012-2013 -2.411E+04 -1.181E+04 -1.230E+04 7.025 8.039E-03 2013-2014 -2.318E+04 -1.138E+04 -1.180E+04 5.994E 1.435E-02

All -1.767E+05 -8.632E+04 -9.037E+04 77.653 0

The interpretation of the models is quite similar to the models at a team level. For instance, a team playing at home in the season 2007-2008 has an expected number of fouls per minute equalling 0.170, while if the same team is playing away it would have an expected 0.186 fouls per minute. For the combination of all seasons, a team playing at home is expected to make 0.157 fouls per minute in a match, while the same team playing away would be expected to make 0.169 fouls per minute. All distinct seasons can be interpreted similarly regarding the point estimates shown in Table 5.

The estimated combination of 2 teams within a match present in the last columns of Table 5. Again all param-eters are significant at a 1 per cent level and the point estimates are decreasing over the seasons. Regarding the interpretation of the estimates, it can be noticed that within a match in the season 2007-2008, 0.356 fouls per minute are expected to occur. This foul can be made by either the home team or the away team during the match. For the season 2008-2009, 0.355 fouls are expected to be made per minute. For the other seasons, the point estimate equals the number of expected fouls per minutes as well. Overall in a match within our data, 0.326fouls per minute are expected to occur within a match.

(26)

From this model we can infer that there are differences in the expected number of fouls made whether a team is playing at home or away. Furthermore, it can be concluded that there are differences in the expected number of fouls made over the seasons. In the later seasons less fouls per minute are expected to be made in comparison to the early seasons for all distinct samples.

5.2 Inhomogeneous Poisson process

The inhomogeneous Poisson point process estimates of λ(t) are according to the formula: λ(t) = α + β · t, and makes use of the assumption that α > 0 and β ≥ 0. Therefore it relies on the assumption that the number of fouls made increases during the match.

Team level

Recall that from Figure 2, not a true clear trend can be observed regarding an increasing number of fouls during the match. Also on average a foul is made at the 46th minute, so slightly after half time. This does not give a clear view that fouls are made at later times during the match. However, it could still be true that the number of fouls increases during the match. The results for separate teams can be found in Table 7.

Table 7: Results inhomogeneous team-level

Season αˆ se(ˆα) βˆ se( ˆβ)

2007-2008 1.711E-01∗∗∗ 2.764E-03 1.574E-04∗∗∗ 4.129E-05

2008-2009 1.752E-01∗∗∗ 3.572E-03 5.191E-05 6.892E-05

2009-2010 1.622E-01∗∗∗ 3.476E-03 1.507E-04∗∗ 6.762E-05

2010-2011 1.588E-01∗∗∗ 3.476E-03 1.694E-04∗∗ 6.745E-05

2011-2012 1.465E-01∗∗∗ 3.310E-03 1.657E-04∗∗ 6.454E-05

2012-2013 1.426E-01∗∗∗ 3.270E-03 2.119E-04∗∗∗ 6.392E-05

2013-2014 1.316E-01∗∗∗ 3.176E-03 2.502E-04∗∗∗ 6.245E-05

All 1.556E-01∗∗∗ 1.288E-03 1.628E-04∗∗∗ 2.507E-05

*** p<0.01, ** p<0.05, * p<0.1

(27)

expected to be made in the entire game. Similar interpretations can be made for the other seasons and the overall sample. Only the season 2008-2009 is different. Due to the insignificance of β, we can not conclude this estimate is different from zero, and the estimate of ˆλ(t) equals α overall and yields a homogeneous Poisson process. In general, we can conclude that at a team level there is an increase in the expected number of fouls per minute made within a match.

Home vs away

According to the homogeneous Poisson point process, there is a significant difference between home and away teams. In Figure 6 a difference seems to be present between home and away teams with respect to the average waiting time. Therefore both samples are investigated separately and the results for the teams playing at home can be found in Table 8.

Table 8: results inhomogeneous home teams

Season αˆ se(ˆα) βˆ se( ˆβ)

2007-2008 1.635E-01∗∗∗ 4.911E-03 1.531E-04 9.541E-05

2008-2009 1.708E-01∗∗∗ 4.962E-03 1.565E-06 9.542E-05

2009-2010 1.554E-01∗∗∗ 4.835E-03 1.812E-049.436E-05

2010-2011 1.564E-01∗∗∗ 4.835E-03 7.276E-05 9.339E-05

2011-2012 1.379E-01∗∗∗ 4.579E-03 2.476E-04∗∗∗ 8.986E-05

2012-2013 1.395E-01∗∗∗ 4.568E-03 1.817E-04∗∗ 8.914E-05

2013-2014 1.290E-01∗∗∗ 4.422E-03 2.195E-04∗∗ 8.674E-05

All 1.503E-01∗∗∗ 1.789E-03 1.537E-043.483E-05

*** p<0.01, ** p<0.05, * p<0.1

From Table 8 it can be observed that all the estimates for α are significant at a 1 per cent level, while some of the estimates of β are significant at a one, five or ten per cent level. For the season 2011-2012, 2012-2013 and 2013-2014 the estimates of β are significant at least at a 5 per cent level, while for the season 2009-2010 and the overall sample the estimates are significant at a 10 per cent level. The other seasons do not have a relevant level of significance, indicating the estimated parameters do not statistically differ from zero. Furthermore regarding the estimates of α it can be noticed that this seems to be decreasing over the seasons, while for β no clear increasing or decreasing trend over the seasons can be observed.

(28)

Table 9: Results inhomogeneous away teams

Season αˆ se(ˆα) βˆ se( ˆβ)

2007-2008 1.784E-01∗∗∗ 4.193E-03 1.669E-04∗∗ 6.559E-05

2008-2009 1.795E-01∗∗∗ 4.087E-03 1.090E-04 6.240E-05

2009-2010 1.691E-01∗∗∗ 4.995E-03 1.178E-04 9.688E-05

2010-2011 1.614E-01∗∗∗ 4.995E-03 2.636E-04∗∗∗ 5.752E-05

2011-2012 1.556E-01∗∗∗ 4.786E-03 7.728E-05 9.265E-05

2012-2013 1.456E-01∗∗∗ 4.674E-03 2.422E-04∗∗∗ 9.150E-05

2013-2014 1.339E-01∗∗∗ 4.557E-03 2.867E-04∗∗∗ 8.991E-05

All 1.605E-01∗∗∗ 1.851E-03 1.809E-04∗∗∗ 3.609E-05

*** p<0.01, ** p<0.05, * p<0.1

From Table 9 in general similar results as for the home teams can be observed. All the estimates for α are significant at a 1 per cent level, while for β significance is changing per season. The estimates for β in sea-sons 2007-2008, 2010-2011, 2012-2013, 2013-2014 and the combined sample are significant at least at a 5 per cent level. However, the estimate for β is not significant in the seasons 2008-2009, 2009-2010 and 2011-2012. Due to the insignificance of β for these seasons, we can not conclude there is a time increase in the expected number of fouls per minute and the expected number of fouls per minute, equals α. For the seasons where β is significant, we can conclude λ(t) = α + β · t for t smaller or equal to 90. For instance for 2010-2011 the expected number of fouls in minute t equals ˆλ(t) = 0.1614 + 0.0002636 · t. The same applies for other seasons where β is significant.

When comparing the estimates of the models all estimates for α are larger for teams playing away rather than at home. This is similar to the homogeneous model where the expected number of fouls for away teams was larger. Regarding the estimates of β, no direct conclusion can be drawn. For some seasons the estimate of β for home teams is larger while for other seasons the estimate of β for away teams is larger. In addition to the actual comparison of the estimates, the combination of models can also be compared using the likelihood ratio test. According to the results in Table 10, we can conclude that for every season the null hypothesis can be rejected. This implies that the model estimating home and away teams separately fits significantly better to the data than one model for the combined data. Therefore, the conclusion can be drawn that there are significant differences in whether a team is playing at home or away regarding the expected number of fouls per minute made.

Table 10: Results likelihood ratio test inhomogeneous models

Season LLf ull LLhome LLaway LR p-value

2007-2008 -2.674E+04 -1.300E+04 -1.373E+04 18.586 9.207E-05 2008-2009 -2.669E+04 -1.302E+04 -1.366E+04 15.010 5.503E-04 2009-2010 -2.585E+04 -1.266E+04 -1.319E+04 9.963 6.864E-03 2010-2011 -2.561E+04 -1.246E+04 -1.314E+04 17.198 1.843E-04 2011-2012 -2.435E+04 -1.191E+04 -1.243E+04 10.953 4.184E-03 2012-2013 -2.411E+04 -1.180E+04 -1.230E+04 7.183 2.755E-02 2013-2014 -2.318E+04 -1.138E+04 -1.180E+04 6.167 4.581E-02

(29)

Match level

There can be differences between a match and team level. The higher number of fouls could, for instance, lead to a higher estimate of ˆβ, implying a higher time increase. Particularly since the distribution of the average waiting times is more centred around the mean. The results for fitting this model can be found in Table 11.

Table 11: Results inhomogeneous process match level

Season αˆ se(ˆα) βˆ se( ˆβ)

2007-2008 3.421E-01∗∗∗ 7.112E-03 3.170E-04∗∗ 1.382E-04

2008-2009 3.502E-01∗∗∗ 6.956E-03 1.087E-04 1.287E-04

2009-2010 3.246E-01∗∗∗ 6.954E-03 3.004E-04∗∗ 1.353E-04

2010-2011 3.178E-01∗∗∗ 6.954E-03 3.396E-04∗∗∗ 1.253E-04

2011-2012 2.934E-01∗∗∗ 6.395E-03 3.271E-04∗∗∗ 1.181E-04

2012-2013 2.848E-01∗∗∗ 6.301E-03 4.178E-04∗∗∗ 1.166E-04

2013-2014 2.630E-01∗∗∗ 6.099E-03 5.062E-04∗∗∗ 1.129E-04

All 3.110E-01∗∗∗ 2.497E-03 3.277E-04∗∗∗ 3.609E-05

*** p<0.01, ** p<0.05, * p<0.1

First of all, from Table 11 it stands out that except for the estimates for β in the seasons 2007-2008 and 2008-2009 all the estimates are significant at a 1 per cent level. The estimate for β in 2007-2008 is significant at a 5 per cent level, while the estimate in 2008-2009 is not significant. Similar as in all previously investigated models the estimates for α are decreasing over the seasons, while for β no clear trend regarding the size of the estimates is clear. As the expected number of fouls per minute equals λ(t) = α+β ·t, we can use the estimates to construct this rate. For instance, for the seasons 2007-2008 we have that ˆλ(t) = 0.342 + 0.003170 · t for t smaller than or equal to 90. Then this rate equals the expected number of fouls per minute at minute t. Such a formula can be constructed for all seasons and be interpreted similarly except for season 2008-2009. For this season we can not conclude that the estimate for β is different from zero and our expected number of fouls per minute equals 0.3502.

So overall, there seems to be evidence for an increase in the expected number of fouls made throughout the game for both teams and matches. Furthermore, there seems to be a decreasing trend in the estimates of α and an increasing trend for β in both team and match samples. For home and away teams the increase throughout the game are less present, there are certain seasons where the point estimates for β are not significant, not showing the time increasing trend. However, based on the likelihood ratio test it can be concluded that the distinction between home and away teams fits better to the data than estimating one combined model and there are differences between home and away teams. This can be seen in the differences in the estimates of α.

5.3 Time difference Poisson process

Recall that for this model we assume that the rate λ(t) = τ + ψ · 1Ti<t<Ti+1exp(−γ(t − Tj), and therefore

(30)

Team level

First of all we start with modelling this for single teams within a match. The result for this can be found in Table 12.

Table 12: Results time difference model

Season ˆτ se(ˆτ) ψˆ se( ˆψ) ˆγ se(ˆγ)

2007-2008 1.508E-01∗∗∗ 5.102E-03 2.198E-02∗∗∗ 4.533E-03 -5.067E-02∗∗∗ 7.153E-03

2008-2009 1.631E-01∗∗∗ 5.002E-03 1.171E-02∗∗∗ 4.177E-03 -4.676E-02∗∗∗ 8.744E-03

2009-2010 1.551E-01∗∗∗ 4.798E-03 1.099E-02∗∗∗ 4.025E-03 -4.866E-02∗∗∗ 9.778E-03

2010-2011 1.480E-01∗∗∗ 4.951E-03 1.514E-02∗∗∗ 4.372E-03 -4.403E-02∗∗∗ 8.366E-03

2011-2012 1.426E-01∗∗∗ 4.806E-03 9.885E-03∗∗ 4.254E-03 -3.485E-02∗∗∗ 1.036E-02

2012-2013 1.407E-01∗∗∗ 4.501E-03 9.288E-03∗∗ 3.823E-03 -4.154E-02∗∗∗ 9.676E-03

2013-2014 1.268E-01∗∗∗ 4.219E-03 1.310E-02∗∗∗ 3.620E-03 -4.111E-02∗∗∗ 6.921E-03

All 1.496E-01∗∗∗ 1.818E-03 1.135E-02∗∗∗ 1.5616E-03 -4.120E-02∗∗∗ 3.4837E-03

*** p<0.01, ** p<0.05, * p<0.1

Table 12 shows that all parameters are significant at least at a 5 per cent level. Furthermore, it can be observed that all estimates for the distinct parameters are of similar size. To interpret the parameters the formula for λ(t) = τ + ψ ·1Ti<t<Ti+1exp(−γ · (t − Tj)needs to be used. The estimate of τ provides a base level, this

is the minimum level and is the expected rate of fouls per minute until the first foul has been made. The estimate of ψ tells us the increase in the rate after the first foul has been made, and this will be scaled by the term exp(−γ · (t − Tj)), where Tjequals the time of occurrence of the previous foul. The negative parameter

estimate of γ for all seasons tells us that the higher the waiting time, the higher the expected number of fouls per minute are. Furthermore, it can be noticed that except for season 2007-2008 there is a decreasing trend over τ across seasons, implying that on average fewer fouls are made.

As an example for interpretation of λ(t), we take season 2007-2008. Until the first foul has been made we expect 0.1508 fouls per minute, right at the moment the first foul has been made we expect a minimum of 0.1508 + 0.02918 · exp(0.05067 · 0) = 0.17278fouls to be made per minute. The first minute after a foul our model expects 0.1508 + 0.02918 · exp(0.05067 · 1) = 0.1739 fouls to be made per minute, the second minute 0.1508 + 0.02918 · exp(0.05067 · 2) = 0.1751, while for our average waiting time, the 7th minute, 0.1508 + 0.02918 · exp(0.05067 · 7) = 0.1821are expected to be made per minute. In a similar manner the expected fouls for every minute can be calculated, given the time that has passed since the last foul is known. For other seasons the model can be interpreted similarly, and it holds that more fouls per minute are expected to be made the longer has passed since the last foul.

Home vs away

(31)

Table 13: Results time difference model home teams

Season τˆ se(ˆτ) ψˆ se( ˆψ) ˆγ se(ˆγ)

2007-2008 1.477E-01∗∗∗ 6.927E-03 1.817E-02∗∗∗ 6.061E-03 -4.899E-02∗∗∗ 1.064E-02

2008-2009 1.639E-01∗∗∗ 7.009E-03 5.708E-03 5.803E-03 -3.960E-02∗∗ 1.847E-02

2009-2010 1.486E-01∗∗∗ 6.725E-03 1.208E-02∗∗ 5.766E-03 -4.553E-02∗∗∗ 1.297E-02

2010-2011 1.444E-01∗∗∗ 6.614E-03 1.221E-02∗∗ 5.705E-03 -4.637E-02∗∗∗ 1.299E-02

2011-2012 1.335E-01∗∗∗ 6.995E-03 1.440E-02∗∗ 6.634E-03 -2.495E-02∗∗ 1.158E-02

2012-2013 1.382E-01∗∗∗ 6.767E-03 8.363E-03 6.074E-03 -2.968E-02∗∗ 1.511E-02

2013-2014 1.244E-01∗∗∗ 5.253E-03 1.052E-02∗∗ 4.093E-03 -5.413E-02∗∗∗ 1.007E-02

All 1.438E-01∗∗∗ 6.927E-03 1.115E-026.061E-03 -3.900E-02∗∗∗ 1.064E-02

*** p<0.01, ** p<0.05, * p<0.1

From Table 13 it can be noticed that all estimates for τ and γ are significant at least at a 5 per cent level. For the estimates for ψ the significance depends on the season. Several seasons provide significant estimates, except for seasons 2008-2009, 2012-2013 and the combined sample. Regarding the seasons where ψ is significant, we can conclude that the expected number of fouls per minute increases the more time has passed since the last foul. For the season where ψ is not significant it can not be concluded there is any effect regarding the time that has passed since the previous foul. Although the scaling factor for this effect, γ is significant, it can not be said that ψ is statistically different from zero and therefore there is no effect on the time passed since the last foul. As such, the expected number of fouls per minute stays constant over the whole match. Furthermore, in this case, a decrease of τ over the seasons is less present in comparison to other models. Continuing with teams playing away, their results can be found in Table 14.

Table 14: Results time difference model away teams

Season τˆ se(ˆτ) ψˆ se( ˆψ) ˆγ se(ˆγ)

2007-2008 1.533E-01∗∗∗ 7.433E-03 2.571E-02∗∗∗ 6.662E-03 -5.575E-02∗∗∗ 9.949E-03

2008-2009 1.606E-01∗∗∗ 6.888E-03 1.778E-02∗∗∗ 5.692E-03 -6.199E-02∗∗∗ 1.064E-02

2009-2010 1.616E-01∗∗∗ 6.730E-03 9.947E-035.493E-03 -5.248E-02∗∗∗ 1.471E-02

2010-2011 1.510E-01∗∗∗ 7.301E-03 1.864E-02∗∗∗ 6.551E-03 -4.307E-02∗∗∗ 1.073E-02

2011-2012 1.487E-01∗∗∗ 6.289E-03 7.930E-03 5.162E-03 -4.971E-02∗∗∗ 1.644E-02

2012-2013 1.422E-01∗∗∗ 5.915E-03 1.009E-02∗∗ 4.742E-03 -6.103E-02∗∗∗ 1.415E-02

2013-2014 1.266E-01∗∗∗ 6.583E-03 1.828E-02∗∗∗ 6.188E-03 -2.856E-02∗∗∗ 9.052E-03

All 1.507E-01 2.642E-03 1.466E-02∗∗∗ 2.297E-03 -4.425E-02∗∗∗ 4.345E-03

*** p<0.01, ** p<0.05, * p<0.1

First of all, from Table 14, it stands out that again all estimates for τ and γ are significant at a 5 per cent level, while most of the estimates of ψ are significant at this level as well. The exceptions are the estimates for season 2009-2010 and season 2011-2012 respectively significant at 10 per cent level and not significant at a relevant level. Similar as for the home teams for most of the seasons the expected number of fouls increases when more time has passed since the previous foul. Only for seasons 2009-2010 and 2011-2012, this is not the case and the number of expected fouls per match stays constant during the match and does not depend on the time that has passed since the last foul.

(32)

teams. Regarding the estimates of ψ, there is some fluctuation between them. For some seasons the estimates for away teams are larger and vice versa. Furthermore, both have some seasons that show non-significant estimates in altering seasons. Comparing the estimates of γ, it can be noticed that except for season 2013-2014 the estimates for away teams are larger. This could imply that the effect of the time passed since a foul is more relevant for away teams. However, most of these estimates are within range of one standard error, indicating the estimates are close, while the full effect of the time increase depends on ψ as well, which mostly are also within range of one standard error.

Furthermore, from the results obtained by the likelihood ratio test in Table 15, it can be concluded that there is a difference in the expected number of fouls whether a team plays at home or away. For all investigated cases this test replies a p-value smaller than 0.05, indicating we reject the null hypothesis that the model without a distinction between playing at home and away fits the data better. As the null hypothesis is rejected it can be concluded that there are differences in the expected number of fouls per minute whether a team plays at home or away.

Table 15: Results likelihood ratio test time difference

Season LLf ull LLhome LLaway LR p-value

2007-2008 -2.672E+04 -1.299E+04 -1.372E+04 22.196 5.938E-05 2008-2009 -2.669E+04 -1.301E+04 -1.365E+04 36.185 6.842E-08 2009-2010 -2.585E+04 -1.266E+04 -1.319E+04 10.064 1.803E-02 2010-2011 -2.560E+04 -1.246E+04 -1.313E+04 16.715 8.087E-04 2011-2012 -2.435E+04 -1.192E+04 -1.243E+04 10.618 1.398E-02 2012-2013 -2.411E+04 -1.180E+04 -1.230E+04 12.141 6.914E-03 2013-2014 -2.317E+04 -1.137E+04 -1.180E+04 9.881 1.960E-02

All -1.767E+05 -8.630E+04 -9.034E+04 86.301 0 Match level

As this model mostly relies on the average waiting time, the results are likely to be different compared to the results obtained by at team level. The result after fitting the models can be found in Table 16.

Table 16: Results time difference model match

Season τˆ se(ˆτ) ψˆ se( ˆψ) ˆγ se(ˆγ)

2007-2008 2.521E-01∗∗∗ 1.136E-02 8.023E-02∗∗∗ 9.770E-03 -1.004E-01∗∗∗ 7.712E-03

2008-2009 2.476E-01∗∗∗ 1.075E-02 8.019E-02∗∗∗ 9.067E-03 -1.102E-01∗∗∗ 7.487E-03

2009-2010 2.492E-01∗∗∗ 1.107E-02 7.167E-02∗∗∗ 9.420E-03 -8.109E-02∗∗∗ 6.247E-03

2010-2011 2.222E-01∗∗∗ 1.010E-02 8.466E-02∗∗∗ 8.651E-03 -1.008E-01∗∗∗ 6.652E-03

2011-2012 2.364E-01∗∗∗ 1.094E-02 5.814E-02∗∗∗ 9.591E-03 -7.345E-02∗∗∗ 7.987E-03

2012-2013 2.373E-01∗∗∗ 1.042E-02 5.253E-02∗∗∗ 8.961E-03 -7.905E-02∗∗∗ 8.270E-03

2013-2014 2.059E-01∗∗∗ 9.426E-03 6.114E-02∗∗∗ 8.082E-03 -8.617E-02∗∗∗ 7.203E-03

All 2.391E-01∗∗∗ 4.062E-03 6.844E-02∗∗∗ 3.495E-03 -8.511E-02∗∗∗ 2.736E-03

(33)

this case, the same interpretation holds as for single teams. At the start of the match, the expected number of fouls per minute equals τ, while after a first foul is made by either of the teams it increases with ψ directly. Then after some time has passed this increases by a factor equal to exp(−γ · (t − Tj)), where Tj denotes the

time of the last foul. For instance, for season 2007-2008, the match starts with an expected number of 0.2521 fouls per minute by either of the teams. Then, after the first foul has occurred the expected number of fouls per minute increases to 0.33233. If one minute later no foul has occurred, the expected number of fouls increases to 0.3408, after 2 minutes without a foul it increases to 0.3502, after 3 minutes 0.3605 and after 10 minutes without a foul, we expected 0.4312 fouls in the next minute. All the expected fouls per minute differ with re-spect to the time that has passed and all can be calculated by using the formula λ(t) = τ +ψ·exp(−γ·(t−Tj)).

This formula can be used for all seasons and will yield the expected number of fouls to be made in minute t, given Tj is known.

Overall, both for separate teams within a match and matches itself, it can be concluded that the expected number of fouls per minute increases when the waiting time is larger and the time since the last foul is larger. Furthermore, the first foul in a match can be seen as some sort of starting point for more fouls to be made, both for teams and matches. After this first foul has occurred, the expected number of fouls increases by a relatively large amount, and this could imply that at the start of the match players are somewhat reluctant to make the first foul.

Also, comparing the differences to whether a team is playing at home or away it can be concluded there are several differences. In models for both samples, there are some insignificant parameters for the increase over time. The estimates of ψ and γ are furthermore of similar size and lie mostly within one standard deviation. There are greater differences for the parameter τ. By the likelihood ratio test, it can be concluded that a team playing away are expected to make more fouls compared to the same the same team playing at home. However, the effects in the difference since the last foul seem similar for teams playing at home and teams playing away.

5.4 Self-exciting Poisson process

First of all note that for the self-exciting Poisson process the rate λ(t) is estimated using the following formula: λ(t) = τ + ψ ·P

Tj<texp(−γ(t − Tj)). So to obtain the value of λ(t) at a certain time t we need to know the

previously made fouls up till time t for that particular team. When these are known we can construct our rate λ(t)and use this for the expected number of fouls per minute. This model is similar as the time dependent model more influenced by the waiting time between fouls, as for the latest added term, the waiting time since the last foul, is included.

Team level

Referenties

GERELATEERDE DOCUMENTEN

First of all, it will discuss the number of counts with respect to the MPV, secondly the relation with the atmospheric pressure will be discussed and finally, the number of events

Andere controlevariabelen die invloed kunnen hebben op alcoholconsumptie zijn vermoeidheid (o.a. Conway et al., 1981), het verlangen van de respondent naar alcohol, waarbij

Bottleneck of experimental evaluation approaches, in particular for analysis and verification tools for distributed systems, is the shortage of adequate benchmark problems [22],

Note that as we continue processing, these macros will change from time to time (i.e. changing \mfx@build@skip to actually doing something once we find a note, rather than gobbling

Other factors associated with significantly increased likelihood of VAS were; living in urban areas, children of working mothers, children whose mothers had higher media

Using the sources mentioned above, information was gathered regarding number of inhabitants and the age distribution of the population in the communities in

The prior international experience from a CEO could be useful in the decision making of an overseas M&amp;A since the upper echelons theory suggest that CEOs make

AKKERBOUW VAN DE HOOFDAFDELING ONDERZOEK BEDRIJFSVRAAGSTUKKEN FAW In een vorig nummer is een inventarisatie opgenomen van het bedrijfseconomisch onderzoek in Nederland naar