• No results found

The information inefficiencies in the English Premier League betting market : are they mere anomalies or persistent occurrences?

N/A
N/A
Protected

Academic year: 2021

Share "The information inefficiencies in the English Premier League betting market : are they mere anomalies or persistent occurrences?"

Copied!
29
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

The information inefficiencies in the English

Premier League betting market

Are they mere anomalies or persistent occurrences?

Abstract

This paper investigates the Efficient Market Hypothesis within the English (Premier League) football betting market through statistical and economic tests. The paper employs a sample of

more than 4500 football games, across 12 seasons, to construct an ordered Probit match-outcome forecasting model. The model’s forecasts are used to investigate the informational

efficiency of bookmakers’ odds. If the market is efficient, the implicit probabilities derived from the odds set by bookmakers should be unbiased predictors of match outcomes. The results indicate that bookmakers’ odds are statistically significant predictors of match results,

therefore the market is said to be informationally efficient.

Gitanjeli Kler

University of Amsterdam

Faculty of Economics and

Business

15

th

July 2016

Research Supervisor

Junze Sun

University of Amsterdam

J.Sun@uva.nl

(2)

1

Statement of Originality

This document is written by Student Gitanjeli Kler who declares to take full responsibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion

(3)

2

1. Introduction

In the mid-1960s Eugene Fama introduced the Efficient Market Hypothesis (EMH), a theory which asserts that the prices within an efficient market always fully reflect all available information (1965). Thus, prices can be viewed as accurate, unbiased estimates of the underlying value of assets and can be considered as informative. This is important because informative prices enable the efficient allocation of resources during production and investment decisions. If prices were uninformative, then investors would be less willing to invest or trade because they would not know what value to expect. With less financing and liquidity, funding new projects would become more expensive, which would reduce activity and impede economic growth. The later impacts incomes and, ultimately, jobs. Efficient markets are, therefore,

important because they have the potential to improve welfare. As such, it is unsurprising that market efficiency has been widely studied within finance and economic literature.

Fama (1970, 1998) explains that market efficiency can be considered at three

informational levels. At the most basic level, prices reflect all information that can be derived from historical prices, this is known as weak-form efficiency. If prices reflect all relevant publicly available information they are semi-strong form efficient and if they also reflect all private information they are strong form efficient. Researchers explain that efficiency with respect to a particular information set implies that one cannot consistently generate abnormal returns by trading on the basis of that information set (Jensen, 1978; Fama, 1991; Malkiel, 1992). However, measuring ‘abnormal’ returns within financial markets is problematic because this requires the use of expected returns, which are predicted by asset-pricing models. This makes it difficult to interpret evidence of inefficiencies – such evidence could be the result of the market being inefficient or due to a poor asset-pricing model, or both (Jensen, 1978; Campbell & MacKinlay, 1997). Consequently, the EMH test becomes a joint hypothesis problem.

Most EMH studies have been conducted within standard financial markets such as stock markets or foreign exchange markets, but a growing number of researchers are shifting their attention towards football betting markets as an alternative venue for EMH tests. This is motivated by a consensus that these markets are comparable to financial markets, thus, they offer a useful perspective for considering the evidence concerning investor behaviour within conventional financial markets (Thaler and Ziemba, 1988; Williams, 2005). Furthermore, there are practical reasons that make football betting markets better candidates for EMH studies. The key issue with testing the EMH within financial markets is that an asset-pricing model must be used to obtain a reliable estimate of the fundamental value of the assets. However, within financial markets, most assets are infinitely lived and there are many asset-pricing models in

(4)

3

use, which makes it difficult to objectively determine the true value of a given asset. Within football markets, the prices, i.e. the football odds, are fixed and the time horizon within these markets is much shorter than in financial markets due to the existence of well-defined

termination points. The fundamental value of an asset is known with certainty, therefore, there is no need to use an asset-pricing model to estimate its value. Since an asset-pricing model is not utilised anymore, the joint hypothesis problem is not encountered and, consequently, any evidence of inefficiency can be interpreted as a deviation from the EMH as opposed to

potentially being the result of a poor asset-pricing model. This enables football betting markets to act as a laboratory setting for efficiency studies (Rajkovik, 2013).

According to researchers the existence of a terminal point in a sports betting market enables a more productive and clearer learning process and should, therefore, promote informational efficiency (Paton & Williams, 2002). However, numerous efficiency studies in football betting markets have discovered exploitable market inefficiencies despite employing different testing methods (Cain, Law & Peel, 2000; Goddard & Asimakopoulos, 2004;

Constantinou & Fenton, 2013; Keating, 2015). This contrary evidence, therefore, is of great interest and provides the motivation for this paper. The study explores whether the most recently discovered glaring violations of efficiency within the English Premier League, by Keating (2015), persist over prolonged periods of time, thus implying that these markets are indeed inefficient, or whether they can be classified as temporary anomalies that disappear over time, which would enable one to conclude that the markets are in fact efficient in the long-run.

Statistical tests indicate that prices, i.e. bookmakers’ odds, are efficient predictors of match results and the forecasting model developed in this paper is not able to aggregate publically available information more efficiently than the bookmakers. In other words, bookmakers’ odds seem to reflect all relevant public information. This implies that the EMH holds within the football betting market. Economic tests comparable to those used by Keating (2015) suggest that positive returns can be generated during the 2013-14 season but not during the four consecutive seasons investigated in this paper. The possibility to generate positive returns, using simple strategies like those presented in this paper, suggests that there are

inefficiencies but they do not persist over time. This paper identifies a new betting strategy that is able to exploit inefficiencies while consistently producing positive returns during each of the football seasons that are investigated. However, further expanding the investigation time-frame may indicate that the strategy is not robust. Therefore, this finding is not deemed sufficient to reject the EMH.

(5)

4

The remainder of this paper is arranged as follows. Section 2 reviews the existing literature on the subject matter. Section 3 describes the data employed in this study, the methodology used for a statistical analysis and the results that are obtained. Section 4 reviews efficiency through an economic perspective. Section 5 provides concluding remarks.

2. Literature review

The empirical work on testing the EMH within the football betting market domain generally entails a forecasting model that is built using publically available information, and is used to predict football match outcomes. After establishing a forecasting model, researchers test efficiency through two approaches – a statistical approach and an economic approach. The remainder of this section elaborates on each testing method and provides an overview of the findings from each method. The section closes with remarks regarding the gap in the literature.

During a statistical analysis of the EMH, researchers investigate whether bookmakers’ odds efficiently aggregate all available, relevant public information. When conducting such an analysis, researchers establish a benchmark model where football match outcomes are regressed on bookmakers’ odds. Next, a series of regression-based tests are done using bookmakers’ odds and other potential predictors of match outcomes. These predictors tend to incorporate the information contained within the forecasting model, which is generated as part of the analysis. The idea is that if the additional predictors are found to be statistically significant in predicting match results, then it would be possible to deduce that the market is inefficient as the odds do not aggregate all relevant public information.

The evidence found from statistical analysis of the EMH is mixed. Pope and Peel (1989) find that bookmakers’ odds are efficient at aggregating public information. However, Kuyper (2000) finds statistical evidence against efficiency. Further research by Forrest et al. (2005) illustrates that over time bookmakers’ odds become more efficient at aggregating public

information. Likewise, Nyberg (2014) finds some evidence of inefficiency within the second half of the English Premier League’s football season, but confirms that this disappears when looking at the entire season. Keating (2015) finds some inefficiencies at the beginning of the season and an increasing amount within the second half of the season. Furthermore, his results indicate that the inefficiencies do not disappear when looking at the full season.

During an economic analysis of the EMH, researchers investigate whether it is possible to “beat” the market and yield excess returns by trading on the basis of the predictive power of forecasting models that are constructed using publicly available information. The idea is that in an efficient market a forecasting model should not be economically important and should not

(6)

5

enable one to establish profitable betting strategies. With this in mind, researchers have investigated the potential return of numerous betting strategies and the potential arbitrage opportunities from differentials in the odds set by diverse bookmakers. For example, Pope and Peel (1989) investigate whether betting according to the match outcome predictions of a

forecasting model would reduce losses and possibly generate profits compared to betting on the basis of the information contained within bookmakers’ odds. The returns are computed based on the amount that would have been wagered by betting on the basis of the forecasting model’s predictions or the bookmakers’ odds and the payoff that this would have generated, which depends on the realised match outcomes.

The results from economic analysis of the EMH are also mixed. Pope and Peel (1989) find that their trading rules could not generate abnormal returns, thus conclude that the football betting market is efficient. However, due to the statistical evidence of inefficiencies in the prices for draw outcomes, the authors also state that it may be possible to find strategies that reduce betting losses, which could be significant to the individual bettor in terms of expected utility. Dixon and Coles (1997) and Kuyper (2000) find that their forecasting models generate positive abnormal returns when used as the basis of a betting strategy, thus providing evidence against the EMH. Others find that inefficiencies within the market disappear over time, thus supporting the EMH (Forrest and Simmons, 2001; Forrest et al, 2005).

Most EMH studies within the football betting market are based on football data prior to the year 2000. Xu (2011) and Keating (2015) use more recent data in their analysis but both focus on a limited time span. The authors build their forecasting models using one and three years respectively. This is much shorter than the time spans that previous studies have focused on, which range from five to fifteen years. Furthermore, the authors test the performance of their models on one season only, which can be misleading. This paper addresses both of these issues by studying the EMH through a forecasting model that is developed on the basis of more than ten years of recent data, i.e. data from the year 2000 onwards, and investigates the model’s performance on four consecutive seasons.

3. Testing the EMH – A statistical approach

In terms of the football betting market, the EMH implies that the market is efficient if all publically available information is fully reflected in the market prices, i.e. the bookmakers’ odds. Bookmakers set their odds on the basis of their beliefs about the probabilities of the match outcomes, which are in turn motivated by the technical analysis that they conduct. If the set odds fully reflect all available information, then it should not be possible for an informed bettor to

(7)

6

aggregate the available information more efficiently than the bookmakers and conduct a technical analysis that yields more accurate predictions of match outcomes. The following section investigates this using a statistical approach that comprises of an ordered Probit forecasting model and two binary Ordinary Least-Squares (OLS) models. Studies by Goddard and Asimakopoulos (2004) and Forrest et al (2005) used such models when investigating market efficiency using football data until the 2000s. More recently, Keating (2015) used such models when investigating efficiency using smaller, but more recent, sample of data. Adopting the same type of models ensures consistency and, therefore, enables a comparison between the efficiency results found within this study and the results found by previous studies whilst making it feasible to determine whether expansion of the sample size affects the inferences regarding efficiency. In other words, adopting these models would make it possible to determine whether the inefficiencies discovered by Keating (2015) persist over time or not.

Section 3.1 describes the data used. Section 3.2 presents the forecasting model that is built using publicly available information. Section 3.3 describes how bookmakers’ odds are converted into implicit probabilities. Section 3.4 uses the forecasting model and bookmakers’ implicit probabilities to test the EMH via the binary OLS models and reports the findings.

3.1 Data set

The data used to develop the parameters for the ordered Probit forecasting model is obtained from an online football archive developed and monitored by Buchdahl (2001). This paper also uses the official data published on the Barclays Premier League website (Barclays Premier League, 1992) and the FA Cup website (The FA, n.d.). For the distance parameter, the address of each team’s football ground is obtained from Clarkson’s online ground mapping tool (Football Ground Map, 2007). This information is used in conjunction with the Google Maps Distance Matrix API to compute the geographical distances between the grounds of the home and the away team. The big team effect parameters are computed using match attendance data obtained from European Football Statistics (n.d.). The paper studies the betting odds for home win, draw and away win outcomes from seven bookmakers – Bet 365 (B365), Bet & Win (BW), Interwetten (IW), Ladbrokes (LB), Stan James (SJ), VC Bet (VC) and William Hill (WH). This data is obtained from the online football archive developed and monitored by Buchdahl (2001). The reason for focusing on these bookmakers is that the accessible data is most complete for this set of bookmakers only.

(8)

7

3.2 Forecasted probabilities using observed characteristics

The ordered Probit forecasting model developed in this section is used to form

predictions for the English Premier League match outcomes during the seasons 2012-13, 2013-14, 2014-15 and 2015-16. In each case the football data from the preceding 12 years is used to estimate the model. To illustrate, the data from seasons 2000-01 till 2011-12 is used to develop the forecasting model that is used to predict the match outcomes during the 2012-13 season. Subsequently, a rolling window approach is applied, where the window is rolled by one season.

The probability of a particular match outcome is a latent construct, as it cannot be measured directly. However, these probabilities are influenced by observable variables such as the team’s recent match results. In this paper, the unobserved (latent) variable 𝑦𝑖,𝑗,𝑐, for a match

observation, c, played by the home team, i, and the away team, j, is said to be a linear function of a range of systematic influences that are relevant for predicting match results and is

determined as follows: (1) 𝑦𝑖,𝑗,𝑐= 𝑃 𝑖,𝑡,𝑠,𝑐𝑑 ( 𝛽1 𝛽2 𝛽3 𝛽4 𝛽5 𝛽6 𝛽7 𝛽8) + 𝑃𝑗,𝑡,𝑠,𝑐𝑑 ( 𝛽9 𝛽10 𝛽11 𝛽12 𝛽13 𝛽14 𝛽15 𝛽16) + 𝑅𝑖,𝑐−𝑚𝐻 ( 𝛽17 𝛽18 𝛽19 𝛽20 𝛽21 𝛽22 𝛽23 𝛽24 𝛽25) + 𝑅𝑖,𝑐−𝑛𝐴 ( 𝛽26 𝛽27 𝛽28 𝛽29 ) + 𝑅𝑗,𝑐−𝑛𝐻 ( 𝛽30 𝛽31 𝛽32 𝛽33 ) + 𝑅𝑗,𝑐−𝑚𝐴 ( 𝛽34 𝛽35 𝛽36 𝛽37 𝛽38 𝛽39 𝛽40 𝛽41 𝛽42) + 𝛽43𝐴𝑃𝑖,2+ 𝛽44∆𝐴𝑃𝑖,1+ 𝛽45𝐴𝑃𝑗,2+ 𝛽46∆𝐴𝑃𝑗,1+ 𝛽47𝑆𝐼𝐺𝑖,𝑐+ 𝛽48𝑆𝐼𝐺𝑗,𝑐+ 𝛽49𝐶𝑈𝑃𝑖,𝑐+ 𝛽50𝐶𝑈𝑃𝑗,𝑐+ 𝛽51𝐷𝐼𝑆𝑇𝑖,𝑗+ 𝑢𝑖,𝑗 (i, j = 1,2, … . , n) and (c = 1,2, … , n)

The error term 𝑢𝑖,𝑗is assumed to be independent and identically normally distributed. The dependent variable 𝑦𝑖,𝑗,𝑐 indicates all three possible match outcomes – ‘home win’ (𝑦𝑖,𝑗,𝑐= 1), ‘draw’ (𝑦𝑖,𝑗,𝑐 = 0.5) or ‘away win’ (𝑦𝑖,𝑗,𝑐 = 0). These football match outcomes are determined by

(9)

8

the latent variable 𝑦𝑖,𝑗,𝑐 through a three-outcome discrete choice ordered Probit model specified

below: (2) 𝑦𝑖,𝑗,𝑐= { 1, 𝑖𝑓 𝜇2< 𝑦𝑖,𝑗,𝑐∗ 0.5, 𝑖𝑓 𝜇1< 𝑦𝑖,𝑗,𝑐∗ < 𝜇2 0, 𝑖𝑓 𝑦𝑖,𝑗,𝑐∗ < 𝜇1

where 𝜇1 and 𝜇2 represent the cut-off points for the adjacent levels of the dependent variable. The explanatory variables used in the forecasting model were selected in order to capture a variety of key factors such as team quality, recent performance, home ground

advantage, etc. A detailed description of these explanatory variables, as well as a discussion of their relevance, can be found in Appendix A. The estimation results are reported in Table 1.

After estimating the coefficients implied by equation (1) and (2), the estimates are substituted into the ordered Probit model in order to generate fitted values for the matches during the seasons 2012-13 to 2015-16. These fitted values are denoted by 𝑦̂𝑖,𝑗∗ and are converted into probabilities using the following approach:

(3) Home win: 𝑃𝑖,𝑗𝐻 = 1 − Φ(𝜇2− 𝑦̂𝑖,𝑗∗ )

Draw: 𝑃𝑖,𝑗𝐷 = Φ(𝜇2− 𝑦̂𝑖,𝑗∗ ) − Φ(𝜇1− 𝑦̂𝑖,𝑗∗ ) Away win: 𝑃𝑖,𝑗𝐴 = Φ(𝜇1− 𝑦̂𝑖,𝑗∗ ),

where 𝑃𝑖,𝑗𝑟 denotes the forecasted probability for match outcome r, for r = H (home win), D (draw)

or A (away win), and is computed using the standard normal distribution, Φ, and the cut off points 𝜇1 and 𝜇2.

(10)

9

Table 1: Ordered Probit Forecasting Model coefficients (estimated based on data from 2000-01 to 2011-12 inclusive) 1. Win ratios across preceding 24 months (𝑷𝒊,𝒕,𝒔,𝒄𝒅 and 𝑷𝒋,𝒕,𝒔,𝒄𝒅 )

Matches played

Home Team (i) Away Team (j)

0-12 months (t=0) 12-24 months (t=1) 0-12 months (t=0) 12-24 months (t=1) Current season (s=0) Previous season (s=1) Previous season (s=1) Two seasons ago (s=2) Current season (s=0) Previous season (s=1) Previous season (s=1) Two seasons ago (s=2) Current division (d=0) 0.272 (0.142) 0.693*** (0.129) 0.222 (0.131) 0.377** (0.122) -0.272 (0.144) -0.454*** (0.129) -0.393** (0.132) -0.179 (0.121)

One division lower (d=-1) 0.085

(0.254) 0.444 (0.256) -0.007 (0.109) -0.162 (0.256) -0.288 (0.260) 0.081 (0.111)

Two divisions lower (d=-2) -0.818*

(0.413) -0.500 (0.400) 2. Recent performance (𝑹𝒊,𝒄−𝒎𝑯 ; 𝑹𝒊,𝒄−𝒏𝑨 ; 𝑹𝒋,𝒄−𝒏𝑯 ; 𝑹𝒋,𝒄−𝒎𝑨 ) Number of matches ago (m,n) 1 2 3 4 5 6 7 8 9

Home Team (i)

Home matches 0.064 (0.045) -0.017 (0.044) 0.040 (0.044) 0.121** (0.044) 0.041 (0.044) 0.033 (0.044) 0.042 (0.044) -0.031 (0.044) -0.006 (0.044) Away matches 0.039 (0.047) 0.011 (0.045) 0.076 (0.045) -0.020 (0.045) Away Team (j) Home matches -0.055 (0.047) -0.002 (0.045) -0.044 (0.044) 0.007 (0.044) Away matches -0.061 (0.046) -0.052 (0.045) -0.126** (0.044) -0.027 (0.044) -0.033 (0.044) -0.028 (0.044) -0.056 (0.044) -0.069 (0.044) -0.071 (0.044)

3. Additional explanatory variables

AP2i

AP1i

AP2j

AP1j

SIGi

SIGj

CUPi

CUPj

DISTi,j

0.261*** (0.070) 0.035 (0.153) -0.242*** (0.070) 0.070 (0.155) -0.0344 (0.050) -0.018 (0.049) 0.050 (0.053) -0.055 (0.054) 0.049** (0.016)

4. Cut off parameters

𝝁𝟏 𝜇2

-0.417* (0.166)

0.341* (0.166)

(11)

10

3.3 Implicit probabilities from bookmakers’ odds

The bookmakers’ odds within the English football betting market are known as decimal odds; they include the bettor’s stake as well as the payoff that could be won. Bookmakers set a variety of odds but the most basic ones are odds for a home win, a draw, and an away win. The quoted prices can be converted into implicit probabilities, 𝜙𝑖,𝑗𝑟 , by first converting the odds into

actual probabilities, 𝜃𝑖,𝑗𝑟 , and normalizing to ensure that they add up to 100%. The implicit probability would therefore be 𝜙𝑖,𝑗𝐻 = 𝜃𝑖,𝑗𝐻

𝜃𝑖,𝑗𝐻+𝜃

𝑖,𝑗𝐷+𝜃𝑖,𝑗𝐴 for home win, 𝜙𝑖,𝑗

𝐷 = 𝜃𝑖,𝑗𝐷

𝜃𝑖,𝑗𝐻+𝜃

𝑖,𝑗𝐷+𝜃𝑖,𝑗𝐴 for draw and 𝜙𝑖,𝑗𝐴 = 𝜃𝑖,𝑗𝐴

𝜃𝑖,𝑗𝐻+𝜃 𝑖,𝑗𝐷+𝜃𝑖,𝑗𝐴

for away win.

To illustrate, consider the match between Chelsea and Arsenal on 5th October 2014. The odds by the bookmaker Bet 365 are listed in Table 2. Given these odds, if a bettor were to place a £1 bet on Chelsea winning the match then the bettor would receive a total payoff of £1.67 for that bet if Chelsea wins. This payoff comprises of the £1 stake, which is refunded to the bet winner, as well as a £0.67 profit.

Table 2: Bet 365’s odds for Chelsea vs Arsenal match on 5th October 2014

Outcome Odds Potential Profit per £1 stake

Chelsea to win 1.67 £0.67

Draw 4.2 £3.20

Arsenal to win 5.25 £4.25

Odds convey the bookmakers’ beliefs regarding the match outcome. In the example of Chelsea and Arsenal, the odds indicate that the bookmaker believes that Chelsea is likely to win so the payoff offered is lower than for the outcomes that the bookmaker considers less likely. These odds can be converted to actual probabilities by taking their inverse. To illustrate, the odds in Table 2 imply that the probability of Chelsea winning is 59.88% (=1/1.67), 𝜃𝑖,𝑗𝐻, the probability of a draw is 23.81% (=1/4.2), 𝜃𝑖,𝑗𝐷, and of Arsenal winning is 19.05% (=1/5.25), 𝜃𝑖,𝑗𝐴. However, these probabilities add up to 102.74%. The percentage exceeding 100%, in this case 2.74%, is known as the bookmaker’s margin; this is somewhat of a hidden transaction cost and is the manner in which bookmakers finance their services. Normalizing these probabilities yields the implicit probabilities. In the Chelsea and Arsenal example this translates into 𝜙𝑖,𝑗𝐻 =

0.5988 0.5988+0.2381+0.1905= 0.5828, 𝜙𝑖,𝑗 𝐷 = 0.2381 0.5988+0.2381+0.1905= 0.2318 and 𝜙𝑖,𝑗 𝐴 = 0.1905 0.5988+0.2381+0.1905= 0.1854, which add up to 1.

(12)

11

3.4 Methodology and interpretations of the EMH statistical test

Once the implicit probabilities have been generated, the EMH can be tested using binary OLS models. For each football season, first, a benchmark model is produced whereby the match outcomes are regressed on the probabilities implied by the bookmakers’ odds. The motivation for this is to demonstrate that bookmakers’ odds are statistically significant predictors of match outcomes, which is necessary before one can continue to investigate whether the odds fully reflect all relevant available information. The model takes on the following form:

(4) 𝑟𝑖,𝑗 = 𝛼𝑟+ 𝛽𝑟𝜙𝑖,𝑗𝑟 + 𝑢𝑖,𝑗,

where 𝑟𝑖,𝑗 takes on a binary value – it is 1 if the match outcome is r, for r = H, a home win; D, a draw; or A, an away win, and 0 otherwise. Hence, the regression is repeated three times, once for each possible match outcome. The independent variable 𝜙𝑖,𝑗𝑟 is the implicit probability for the

match outcome r.

Next, a second binary OLS model is developed. This model takes the following form: (5) 𝑟𝑖,𝑗 = 𝛼𝑟+ 𝛽𝑟𝜙𝑖,𝑗𝑟 + 𝛾𝑟𝑥𝑖,𝑗𝑟 +𝑢𝑖,𝑗,

where 𝑥𝑖,𝑗𝑟 = 𝑃

𝑖,𝑗𝑟 - 𝜙𝑖,𝑗𝑟 . This captures the difference between the information contained within the

forecasting model, from section 3.2, in the form of the predicted result probabilities, 𝑃𝑖,𝑗𝑟 , and that

contained in the bookmakers’ odds in the form of implicit probabilities, 𝜙𝑖,𝑗𝑟 . Once again, this regression is conducted for r = H, a home win; D, a draw; or A, an away win.

The binary OLS regression estimates of equation (4) are reported in the odd numbered columns of Table 3. For convenient comparison, the estimates of equation (5) are reported in the even numbered columns of Table 3. It must be noted that both models are based on the

probabilities implied by the odds set by bookmaker Bet 365. The reason for focusing on one bookmaker’s odds is that the bookmakers’ implicit probabilities for each match outcome are almost perfectly correlated (see Appendix B), hence one can assume that the estimates resulting from the analysis of one bookmaker would be representative of the rest of the bookmakers.

When analysing the results reported in Table 3, hypothesis (6) is examined in order to determine whether the bookmakers’ odds are statistically significant predictors of match

outcomes and hypothesis (7) is examined to find out whether the forecasting model from section 3.2 contains relevant information that has not been incorporated within the bookmaker’s odds: (6) 𝐻0: 𝛽𝑟 = 0 𝑣𝑠 𝐻1: 𝛽𝑟 ≠ 0

(13)

12

Under the EMH, it is expected that 𝛽𝑟 ≠ 0 while 𝛾𝑟 = 0. The results indicate that all three coefficients: βH, βD and βA, are statistically significant. Therefore, the null hypothesis, 𝐻0, from

hypothesis (6), can be rejected for each outcome. There is not sufficient evidence to reject the

1 Robustness checks using Probit specification lead to identical results, including the decrease in significance for

Draw outcomes.

Table 3: Binary OLS estimation results – Testing EMH1

2012-2013 2013-2014 2014-2015 2015-2016 Column (1) (2) (3) (4) (5) (6) (7) (8) Home Win βH 1.198*** (0.128) 1.260*** (0.133) 1.102*** (0.115) 1.091*** (0.124) 1.001*** (0.130) 1.029*** (0.134) 0.908*** (0.144) 0.922*** (0.155) 𝜸𝑯 0.443 (0.273) -0.071 (0.290) 0.202 (0.233) 0.061 (0.249) Constant -0.110 (0.063) -0.144* (0.066) -0.025 (0.057) -0.020 (0.062) 0.002 (0.063) -0.015 (0.066) 0.010 (0.068) 0.002 (0.075) 𝑹𝟐 0.188 0.194 0.197 0.197 0.136 0.138 0.096 0.096 adj. 𝑹𝟐 0.186 0.189 0.195 0.193 0.134 0.134 0.093 0.091 Draw βD 2.307*** (0.547) 2.496*** (0.572) 1.142** (0.408) 1.441** (0.466) 1.155* (0.489) 1.349* (0.523) 1.178* (0.583) 1.621* (0.648) 𝜸𝑫 0.942 (0.828) 0.992 (0.753) 0.712 (0.682) 1.255 (0.812) Constant -0.294* (0.139) -0.347* (0.147) -0.077 (0.103) -0.161 (0.121) -0.046 (0.125) -0.096 (0.134) -0.022 (0.152) -0.143 (0.170) 𝑹𝟐 0.045 0.048 0.020 0.025 0.015 0.017 0.011 0.017 adj. 𝑹𝟐 0.042 0.043 0.018 0.020 0.012 0.012 0.008 0.012 Away Win βA 1.066*** (0.137) 1.035*** (0.145) 1.092*** (0.124) 1.007*** (0.140) 1.033*** (0.137) 1.061*** (0.145) 0.801*** (0.156) 0.827*** (0.174) 𝜸𝑨 -0.194 (0.297) -0.409 (0.313) 0.158 (0.261) 0.097 (0.285) Constant -0.034 (0.046) -0.029 (0.046) -0.007 (0.043) 0.009 (0.045) -0.005 (0.047) -0.010 (0.047) 0.067 (0.052) 0.061 (0.054) 𝑹𝟐 0.139 0.140 0.171 0.175 0.131 0.132 0.065 0.066 adj. 𝑹𝟐 0.137 0.135 0.169 0.170 0.129 0.127 0.063 0.061

(14)

13

null from hypothesis (7). The reported R2 statistics are rather low, however, this is to be expected

when the dependent variable is binary.

For home win and away win the bookmakers’ odds are statistically significant predictors of match outcomes at the 1% level. For draw outcomes the significance decreases over time, suggesting that bookmakers are becoming less efficient at predicting draw outcomes. This result is interesting because Forrest et al (2005) find a contrasting development using a similar

analytical approach but with data prior to the 2000s. Their results indicate that bookmakers’ odds become more statistically significant over time, thus the authors conclude that efficiencies within the market disappear over-time. Both Goddard and Asimakopoulos (2004) as well as Keating (2015) speculate that there may be a cyclical trend of rising and diminishing inefficiencies within the market. The results of this section provide further motivation to investigate this.

For each match outcome, the benchmark binary models, equation (4), reported in the odd numbered columns of Table 3 are compared to the models in the even numbered columns of Table 3, equation (5), using a likelihood ratio test. The test is not significant for any outcome, thus confirming that the variable 𝛾𝑟 does not contain additional relevant explanatory information

that is not already incorporated within the bookmakers’ odds. This suggests that the EMH holds.

4. Reviewing efficiency through an economic approach

Researchers have suggested that conducting a technical analysis, which entails creating a forecasting model on the basis of publically available data, and using that in conjunction with different betting strategies to determine the potential returns that could have been generated, can provide some insight into market efficiency (Pope & Peel, 1989; Goddard & Asimakopoulos, 2004; Forrest et al, 2005; Keating, 2015). This approach is often called the economic approach for investigating efficiency and is based on the idea that if a market is efficient, then an informed investor should not be able to develop a model that enables the investor to devise any profitable trading rules based on that model. Having already developed a forecasting model, in section 3.2, this paper reviews efficiency through the economic approach as well. The following section presents several basic betting strategies that are implemented conditional on the forecasting model from section 3.2.

First, two very basic betting strategies are reviewed. Strategy 1 entails placing a £1 bet on all possible results – home win, draw, away win. For every season, the returns are calculated for each bookmaker individually2. Strategy 2 entails placing a £1 bet with the best available odds

2 The return for betting with Stan James during the 2014-15 and 2015-16 seasons has been excluded

(15)

14

for each outcome and every match (Forrest et al, 2005). Both strategies are applied for every match in season 2010-11 to 2015-16. Applying such naïve strategies over several years is expected to provide some indication of how exploitable bookmakers’ odds are from one year to the next. Accordingly, the approach should provide further insight into the potential cyclical trend in the efficiency of the football betting market. The results of Strategy 1 are reported in Panel 1 of Table 4 and the results of Strategy 2 are reported in Panel 2 of Table 4.

The basic strategy of placing a £1 bet on every possible match outcome for every match, i.e. Strategy 1, generally yields negative returns in each season, except when the bets are placed with VC Bet during the 2010-11 and 2011-12 seasons, when this strategy generates a positive return up to 0.61%. The results of Strategy 2 indicate that it is possible to generate positive returns of 1%-2% during the 2010-11 and 2011-12 seasons. These results could be viewed as a sign of inefficiencies.

Both strategies yield negative returns during the 2012-13 and 2013-14 seasons, which is expected in efficient markets. Interestingly, the magnitude of the losses made by both strategies falls during the 2014-15 season. The yields from Strategy 1 improve during the 2015-16 season; a bettor could have achieved positive returns of 1%-2% with the bookmakers Bet 365 and VC Bet. Using Strategy 2 would have generated a 3.83% return. These results may suggest that, after two seasons of efficiency, inefficiencies re-appear. However, looking at the results across the six seasons, it is clear that on average the bookmakers’ odds are not exploitable through Strategy 1 or Strategy 2, which suggests that the market is efficient. From the results of Strategy 1 and 2, there is no apparent cyclical trend in market efficiency in the time span under

investigation.

The next strategies are established in order to gain an insight into the predictive power of the forecasting model developed in section 3.2, and the market’s efficiency. In Strategy 3, a £1 bet is placed on each outcome with each bookmaker if the forecasted probability for that outcome exceeds the probability implied by the bookmaker’s odds. In Strategy 4, a £1 bet is placed on each outcome with the bookmaker offering the best price, given that the forecasted probability for that outcome exceeds the probability implied by the bookmaker’s odds. The results are reported in Panel 3 and 4 of Table 43. The returns from Strategy 4 are disaggregated

into quarterly and monthly periods to enable further investigation into the potential cyclical trend of efficiency within this market. Finally, Strategy 5 is devised to further investigate the model’s

3 These strategies are not tested on the 2010-11 and 2011-12 seasons because the forecasting model

developed during this study uses the 12 proceeding seasons’ data for each forecast. So, applying latest strategies to the 2011-12 season would require the use of data from 1999-00 to 2010-11 but this study has only focused on data since 2000.

(16)

15

Table 4: The average returns of different betting strategies

2010-11 2011-12 2012-13 2013-14 2014-15 2015-16

Strategy 1: Betting on all outcomes with each bookmaker

B365 -1.99% -2.72% -6.17% -4.98% -2.72% 1.80% BW -4.29% -3.85% -8.10% -9.68% -7.21% -4.21% IW -8.05% -10.06% -11.12% -11.66% -9.38% -6.14% LB -3.47% -5.47% -9.47% -9.05% -5.59% -1.44% SJ -2.47% -3.36% -7.25% -8.32% VC 0.61% 0.04% -5.08% -5.27% -2.44% 2.07% WH -4.00% -5.50% -9.07% -9.10% -7.11% -2.90%

Strategy 2: Betting on all outcomes with the bookmaker offering the best price

Best Price 1.87% 1.10% -3.44% -3.26% -0.87% 3.83%

Strategy 3: Betting on all outcomes, with each bookmaker, given that model’s forecasted probability exceeds bookmaker’s implicit probability

B365 -2.92% -8.71% -8.39% 4.58% BW -1.96% -13.24% -13.99% -3.79% IW -6.49% -16.60% -14.74% -6.25% LB -5.48% -10.79% -10.73% -1.00% SJ -2.88% -12.30% VC -0.50% -6.96% -11.38% 5.04% WH -2.16% -9.66% -7.65% -2.23%

Strategy 4: Betting on all outcomes, with the bookmaker offering the best price, given that the model’s forecasted probability exceeds the bookmaker’s implicit probability

Best Price 0.86% -7.68% -5.02% 5.90%

Disaggregated average returns from Strategy 4

AUG-OCT 1.17% -10.50% 2.35% 7.85% NOV-DEC 0.55% -9.08% -2.41% 22.86% JAN-FEB -0.50% -13.25% -5.51% 3.44% MAR-MAY 1.98% 1.08% -14.17% -6.80% AUG -0.85% -24.59% 20.41% 28.68% SEPT 8.88% 2.81% 3.20% -3.99% OCT -5.87% -13.25% -16.20% -5.43% NOV 11.79% 20.54% -0.56% 9.97% DEC -12.13% -27.66% -3.66% 30.90% JAN 11.71% -24.39% -16.48% 4.48% FEB -14.22% -3.35% 4.18% 2.61% MAR 0.00% -10.94% -17.35% 3.05% APR 17.31% 12.08% -1.37% -17.67% MAY -20.80% 1.69% -25.50% 6.85%

Strategy 5: Betting on the outcome predicted by the model, with the bookmaker offering the best price

(17)

16

profitability. Here, a £1 bet is placed on one of the three possible match outcomes, based on the model’s prediction of the match result, with the bookmaker that offers the best price. The returns are reported in Panel 5 of Table 4.

Strategy 3 generally yields negative returns except in season 2015-16, where it is possible to achieve a positive return of 4.5%-5% with bookmakers Bet 365 and VC Bet. Comparing the returns from Strategy 1 and Strategy 3 suggests that the forecasting model is able to dampen the losses during seasons 2012-13 and 2015-16, but not during the two seasons in between. Through Strategy 4, the model enables a 0.86% profit during the 2012-13 season, which is an improvement from the 3.44% loss generated by the comparable Strategy 2 during that season. The model also yields a positive return during season 2015-16 but makes a loss during seasons 2013-14 and 2014-15. This suggests that the model’s predictive power does not exceed that of the bookmakers’ odds during these seasons. Disaggregating the results shows no indication of a cyclical efficiency trend. Figure 1 illustrates this.

The results from Strategy 5 illustrate that during seasons 2013-14 and 2014-15 a bettor could have generated positive returns by betting according to the model’s predictions. However, betting according to Strategy 5 during seasons 2012-13 and 2015-16 would have yielded

negative returns. On average, across the four seasons investigated, this strategy yields a negative return.

Figure 2 summarises the returns from Strategy 2, Strategy 4 and Strategy 5 of Table 4. A potential trading rule emerges from figure 2: if the yield from betting on all outcomes (Strategy A) during t-1 was negative, proceed to bet according to the model’s forecasts (Strategy C) during

-20,00% -15,00% -10,00% -5,00% 0,00% 5,00% 10,00% 15,00% 20,00% 25,00%

AUG-OCT NOV-DEC JAN-FEB MAR-MAY

Po ten tia l Re tu rn

Cyclical Trend in Efficiency

2012-2013 2013-2014 2014-2015 2015-2016

(18)

17

season t; if the yield from betting on all outcomes (Strategy A) during t-1 exceeded -1% proceed to bet according Strategy B during season t. This could have generated an average annual return of 2.85% over the four seasons investigated in this paper (see Table 5). Consequently, there appears to be a profitable strategy, which yields positive returns consistently over the time span under investigation and is relatively easy to implement. Since markets are said to be

efficient if it is not possible to consistently generate abnormal, i.e. positive, returns by trading on publically available information, the identified profitable trading strategy would suggest that this market is not efficient. However, caution is required when deriving such conclusions as there is no certainty that this strategy would consistently produce positive returns, beyond the time-frame investigated here. For the time-being, the most that could be inferred from these results is that

Note: The yield of Strategy A during season t-1 is derived from Panel 2 of Table 4.

Table 5: Trading rule yield Season (t) Yield of strategy A

during season t-1

Strategy to adopt during season t

Yield during season t

2012-13 1.10% B 0.86%

2013-14 -3.44% C 4.47%

2014-15 -3.26% C 0.17%

2015-16 -0.87% B 5.90%

Average annual return: 2.85%

-10,00% -8,00% -6,00% -4,00% -2,00% 0,00% 2,00% 4,00% 6,00% 8,00% 2012-2013 2013-2014 2014-2015 2015-2016 Re tu rn Season

Potential Returns

Strategy A: Bet on all outcomes

Strategy B: Bet if model's forecasted probability exceeds implicit probability of the odds Strategy C: Bet according to model's forecasts

Figure 2: Returns from different strategies, betting based on best available price

Data: For Strategy A refer to Strategy 2 of Table 4; for Strategy B refer to Strategy 4 of Table 4; for Strategy C refer to Strategy 5 of Table 4.

(19)

18 the market possibly contains some inefficiencies.

During efficiency studies within the football betting market, researchers have investigated various inefficiencies, arising from specific biases, in an attempt to learn more about when and in what circumstances different biases could be expected to occur (Cain, Law and Peel, 2003). One particular inefficiency that is often studied is the favourite-longshot bias. Bettors with such a bias tend to overvalue “long shots” and undervalue favourites (Snowberg and Wolfers, 2010). Bookmakers can exploit such a bias by skewing the odds against longshots. Some researchers also find evidence for risk-averse behaviour amongst bettors. Hence, Xu (2011) proposes a strategy that eliminates risk-loving and risk-averse behaviours, by betting £1 only on those match outcomes for which the implicit bookmakers’ odds fall in the range of 0.333 and 0.667, known as the ‘trim-the-tails’ strategy. Doing so essentially enables one to investigate whether it is profitable to bet after eliminating the match outcomes whose odds could potentially have been skewed by bookmakers. The results from this strategy, based on the odds provided by Bet 365, are reported in Table 6. For comparison purposes benchmark results, from simply betting on all match outcomes with Bet 365, are also included.

Table 6: Favourite-longshot bias

2010-11 2011-12 2012-13 2013-14 2014-15 2015-2016 1. Trim-the-tails strategy

B365 -6.44% -9.76% -10.47% 9.61% -3.96% -16.59%

2. Bet on all match outcomes (benchmark)

B365 -1.99% -2.72% -6.17% -4.98% -2.72% 1.80%

The trim-the-tails strategy yields positive returns during the 2013-14 season only. Keating (2015) applies this strategy to the top performing teams of the Premier League to yield a 15.3% return, also with Bet 365, during the 2013-14 season. However, applying the strategy over several seasons indicates that the returns in 2013-14 are incongruent. Comparing the results to the benchmark strategy of betting on all match outcomes suggests that the more extreme odds generate a positive return which reduces the overall losses in all cases except for the 2013-14 season. This may indicate that there is a favourite-longshot bias within the market, which possibly reverses during the 2013-14 season. A final inquiry is conducted to determine whether the returns from betting on longshots, which have the lowest implicit probabilities, exceed the returns from betting on favourites, which have the highest implicit probabilities. The results suggest that there is a favourite-longshot bias, during the years investigated, but there is no

(20)

19

clear strategy that could enable a trader to exploit these biases (see Appendix C). Hence, the EMH cannot be rejected.

5. Conclusion

This paper assesses the informational efficiency within the English Premier League using an ordered Probit forecasting model and a series of regression-based tests as well as economic tests. The paper provides an update to the literature by studying the efficient market hypothesis using more than ten years of recent data whilst also addressing recent findings regarding inefficiencies within the football betting market.

Results from the regression-based tests indicate that the efficient market hypothesis cannot be rejected, however, there is some evidence indicating that the level of efficiency is decreasing. Applying an economic approach and reviewing the returns of different betting strategies also suggests that the market is becoming less efficient. The results suggest that there is a favourite-longshot bias within the market, during the time span investigated, however, a strategy to exploit this could not be devised. There is no evidence for a cyclical trend, however, this may be the result of applying the investigated betting strategies over six seasons only. Studying the performance of the betting strategies over a wider set of consecutive seasons would provide a better insight. Betting solely on the basis of the forecasting model presented in this paper does not consistently generate abnormal returns. Nevertheless, it can be used to devise a profitable betting strategy that can produce positive returns consistently over the four years investigated. Future studies are encouraged to test the robustness of the proposed trading strategy by analysing its performance over additional seasons as this would provide further insights into the efficiency of the market.

Overall, since the results from the statistical analysis indicate that the bookmakers’ odds are informationally efficient, and since the robustness of the trading rule that currently appears to have the potential to generate consistent, positive returns is unconfirmed beyond the time span of this investigation, this paper concludes that the EMH holds in general.

(21)

20

References

Barclays Premier League (1992). Barclays Premier League table, current and previous

standings. Retrieved from

http://www.premierleague.com/en-gb/matchday/league-table.html

Buchdahl, J. (2001). Data Files: England. Retrieved from http://www.football-data.co.uk/englandm.php

Cain, M., Law, D., & Peel, D. (2000). The favourite-longshot bias and market efficiency in UK football betting. Scottish Journal of Political Economy, 47(1), 25-36.

Cain, M., Law, D., & Peel, D. (2003). The FavouriteLongshot Bias, Bookmaker Margins and Insider Trading in a Variety of Betting Markets. Bulletin of Economic Research, 55(3), 263-273.

Campbell, J. Y., Lo, A. W. C., & MacKinlay, A. C. (1997). The econometrics of financial markets (Vol. 2, pp. 149-180). Princeton, NJ: princeton University press.

Clarkson, R. (2007). Football Ground Map. Retrieved from: http://www.footballgroundmap.com/grounds/england

Constantinou, A. C. & Fenton, N. E. (2013). Profiting from arbitrage and odds biases of the European football gambling market. The Journal of Gambling Business and Economics,

7(2), 41-70.

Dixon, M. J., & Coles, S. G. (1997). Modelling association football scores and inefficiencies in the football betting market. Journal of the Royal Statistical Society: Series C (Applied

Statistics), 46(2), 265-280.

Dobson, S., & Goddard, J. (2000). Stochastic Modelling of Soccer Match Results. University College of Swansea, Department of Economics.

European Football Statistics (n.d.). Attendances England Average. Retrieved from: http://www.european-football-statistics.co.uk/attn/nav/attnengleague.htm Fama, E. F. (1965). The behavior of stock-market prices. Journal of business, 34-105.

Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383-417.

Fama, E. F. (1991). Efficient capital markets: II. The journal of finance, 46(5), 1575-1617. Fama, E. F. (1998). Market efficiency, long-term returns, and behavioral finance. Journal of

financial economics, 49(3), 283-306.

Forrest, D., Goddard, J., & Simmons, R. (2005). Odds-setters as forecasters: The case of English football. International Journal of Forecasting, 21(3), 551-564.

(22)

21

Forrest, D., & Simmons, R. (2001). Globalisation and efficiency in the fixed-odds soccer betting market. University of Salford, Centre for the Study of Gambling and Commercial Gaming. Goddard, J., & Asimakopoulos, I. (2004). Forecasting football results and the efficiency of fixed‐

odds betting. Journal of Forecasting, 23(1), 51-66.

Jensen, M. C. (1978). Some anomalous evidence regarding market efficiency. Journal of financial economics, 6(2), 95-101.

Keating, O. (2015). An investigation into the forecast efficiency of UK bookmakers’ betting-odds. The Barclays Premier League. (Unpublished dissertation). Universiteit van Amsterdam, Amsterdam.

Kuypers, T. (2000). Information and efficiency: an empirical study of a fixed odds betting market. Applied Economics, 32(11), 1353-1363.

Malkiel B (1992). Efficient market hypothesis. In NewMan P.M. Milgate ,and J Eawells (eds). The new Palgrave dictionary of Money and Finance.

Nyberg, H. (2014). A Statistical Test of Association Football Betting Market Efficiency. Retrieved from http://blogs.helsinki.fi/hknyberg/files/2014/01/Test_mlogit_wp.pdf

Paton, D., & Williams, L. V. (2002). ‘Quarbs’ and Efficiency in Spread Betting: can you beat the book?.

Pope, P. F., & Peel, D. A. (1989). Information, prices and efficiency in a fixed-odds betting market. Economica, 323-341.

Rajkovic, I. (2013). Applied Financial Time Series Modelling: Forming a betting strategy based

on the identifiable anomalies of European football fixed-odds betting markets. Retrieved

from Academia.edu (3046335).

Thaler, R. H., & Ziemba, W. T. (1988). Parimutuel betting markets: Racetracks and lotteries.

Journal of Economic perspectives, 2(2), 161-174.

The FA (n.d.). The FA Cup Results. Retrieved from: http://www.thefa.com/thefacup/results Williams, L. V. (Ed.). (2005). Information efficiency in financial and betting markets. Cambridge

University Press.

Xu, J. S. (2011). Online Sports Gambling: A look into the efficiency of bookmakers’ odds as

forecasts in the case of English Premier League (Doctoral dissertation, University of California, Berkeley).

(23)

22

Appendix A

Descriptions, motivations and computational methodologies for the explanatory variables incorporated within the ordered Probit forecasting model in section 3.2.

Team Quality

𝑷𝒊,𝒕,𝒔,𝒄𝒅 and 𝑷 𝒋,𝒕,𝒔,𝒄

𝒅 : These variables model the team quality for team i, the home team, and team j,

the away team. The indicators represent each team’s win ratios and are included in the model in order to capture the quality of the home team and the away team. They are computed as

follows:

(9) 𝑃𝑖,𝑡,𝑠,𝑐𝑑 = 𝑝𝑖,𝑡,𝑠,𝑐 𝑑

𝑛𝑖,𝑡,𝑐−1,

where 𝑝𝑖,𝑡,𝑠,𝑐𝑑 is the total points scored by team i based on the following scale – home win = 1, draw = 0.5, away win = 0; and 𝑛𝑖,𝑡,𝑐−1 represents the total number of matches played by team i prior to the current match, which is represented using the subscript c-1. The same process is repeated for team j.

The win ratios are calculated using the results of matches played across the previous 24 months. Goddard and Asimakopoulos (2004) find that win ratios for the time frame 24-36 months were not statistically significant, therefore these are not considered. The subscript t represents the time frame being considered: t=0 represents the points scored across matched played 0-12 months prior to the current match, t=1 represents the points scored across matches played 12-24 months prior to the current match. This sort of partitioning in the data is motivated by the structure of the English football league. The league comprises of four levels. The top level is the Premier League, which is followed by the English Championship League, Football League One and Football League Two. At the end of every season the three teams that find themselves at the bottom of the league table are relegated, in other words they are demoted from their current league. This means that at the end of the season the three teams at the bottom of the Premier League are demoted to the Championship League. On the other hand, the three teams that finish the season at the top of the Championship League table are promoted to the Premier League during the subsequent season. Partitioning the win ratios according to whether the matches were played within the previous 12 months or 12-24 months is the first step to accounting for this structure.

Next, time frame considered is further sub-divided according to the season within which the points were scored, which is represented by the subscript s. To elaborate, t=0 s=0 denotes that the win ratio has been computed using the results from matches played within the last 0-12

(24)

23

months and within the current season; t=0 s=1 denotes that the win ratio has been computed using the results from matches played within the last 0-12 months but the matches are from one season before the current season to which the observation in question belongs. T=1 s=1

denotes that the win ratio has been computed using the results from matches played within the last 12-24 months and the matches are from one season before the current season to which the observation in question belongs; t=1 s=2 denotes that the win ratio has been computed using the results from matches played within the last 12-24 months and the matches are from two season prior to the current season to which the observation in question belongs.

Lastly, to account for the division to which the match results that were used in the computation belong, the win ratios are further sub-divided using the superscript d. Hence, d=0 denotes that the win ratio is based on results for matches played in the current division, d=-1 denotes that the win ratio is based on results for matches played in one division below the current division, d=-2 denotes that the win ratio is based on results for matches played in two divisions below the current division. Overall, the win ratios for each team are modeled using ten variables – (t=0,s=0,d=0), (t=0,s=1,d=0), (t=1,s=1,d=0), (t=1,s=2,d=0), (t=0,s=1,d=-1), (t=1,s=1,d=-1), (t=1,s=2,d=-1), (t=1,s=2,d=-2). Recent Performance 𝑹𝒊,𝒄−𝒎𝑯 ; 𝑹 𝒊,𝒄−𝒏 𝑨 ; 𝑹 𝒋,𝒄−𝒏 𝑯 and 𝑹 𝒋,𝒄−𝒎 𝑨 : The variables 𝑅

𝑖,𝑐−𝑚𝐻 and 𝑅𝑖,𝑐−𝑛𝐴 for the home team i and their

counterparts 𝑅𝑗,𝑐−𝑛𝐻 and 𝑅

𝑗,𝑐−𝑚𝐴 for the away team j model the teams’ most recent results for

matches played on the home ground, denoted by the superscript H, and on the away ground, denoted by the superscript A. The motivation for including this information in the forecasting model is that short-term persistence in good recent performance can build confidence as well as moral or it can also breed complacency. The former effect can increase the likelihood that the team in question will perform well in the next match, whereas the latter effect can increase the likelihood of a poor performance in the next match. On the other hand, persistence in poor recent performance could sap moral and increase the likelihood of failure in the next match or it could motivate the team to put more effort, which would increase their chances of improving their performance. Dobson and Goddard (2000) find evidence of short-term persistence effects.

The full-time match outcomes – home win, draw or away win – are converted to

numerical values with home win = 1, draw = 0.5 and away win = 0. Goddard and Asimakopoulos (2004) find that for home team i the recent results attained when playing on the home ground are more useful predictors than the team’s results from matches that are not played on their own grounds. More specifically, the authors find that the coefficients for the results for the most

(25)

24

recent 9 home games are statistically significant and the coefficients for the results for the most recent 4 away games are statistically significant. As for the away team j, the recent results attained when playing on an away ground are more useful in predicting how the team will

perform during the match in question, where it is playing from an away position. The authors find that for the away team j the coefficients for the results for the most recent 9 away games are statistically significant and the coefficients for the results for the most recent 4 home games are statistically significant. The subscripts m and n are used to distinguish between these games, where 𝑚 ≤ 9 and 𝑛 ≤ 4.

Match Significance

𝑺𝑰𝑮𝒊,𝒄 and 𝑺𝑰𝑮𝒋,𝒄: A match can be significant to a team if it makes the difference between them

winning the league or not, it is also significant if it makes the difference between the team being relegated or promoted. A match can be significant for one team but not significant for the other, which would mean that both teams would face different incentives and this could potentially influence the outcome of the match. Therefore the variables 𝑆𝐼𝐺𝑖,𝑐 and 𝑆𝐼𝐺𝑗,𝑐 are incorporated in the forecasting model. These variables are match specific, this is denoted by the subscript c which refers to the current match observation. The variables 𝑆𝐼𝐺𝑖,𝑐 and 𝑆𝐼𝐺𝑗,𝑐 are dummy

variables that take on the value of 1 if the match is significant to the team in question either in terms of championship or relegation, otherwise it is 0. To determine whether the current match observation is significant for the team in question the following assessments are conducted at the beginning of the match:

 Can the team win the league assuming that all other teams in contention for the championship take one point in each of the remaining matches

 Can the team be relegated assuming that the rest of the teams in contention for relegation take one point in each of the remaining matches?

If the answer to either question is yes, the match is deemed significant for the team in question.

FA CUP involvement

𝑪𝑼𝑷𝒊,𝒄 and 𝑪𝑼𝑷𝒋,𝒄: Goddard and Asimakopoulos (2004) find statistically significant evidence that early elimination from the FA Cup affects match results in the English football league. Indeed one can rationalise that early elimination from the FA Cup can have positive and negative effect on a team’s performance in the Premier League. For instance, elimination from the FA Cup would enable the team to focus their efforts on the Premier League, which can lead to better performance. On the other hand, elimination could also have a negative effect on performance,

(26)

25

for example, by harming the team’s confidence. Hence, the dummy variables 𝐶𝑈𝑃𝑖,𝑐 , for the home team i, and 𝐶𝑈𝑃𝑗,𝑐, for the away team j are included, and they take on the value 1 if the teams are still participating in the FA Cup and 0 if they have been eliminated.

Home Advantage

𝑫𝑰𝑺𝑻𝒊,𝒋 : This variable models the distance between the grounds of the home team and the away team. The motivation to include this variable is the evidence for the significance of home ground advantage. Clarke and Norman (1995) find that home ground advantage increases as the distance travelled by the away team increases. Hence, by incorporating the geographical distance between the two grounds it is possible to model the effect of the home ground advantage.

Big Team Effect

𝑨𝑷𝒊,𝟐 ; ∆𝑨𝑷𝒊,𝟏 ; 𝑨𝑷𝒋,𝟐 and ∆𝑨𝑷𝒋,𝟏: Big teams benefit from larger crowd support, which can have direct positive psychological implications that influence the overall match outcome. Big teams also benefit from larger revenues, which makes them more resources and therefore also influences a team’s performance. This effect is modeled through average attendance data, whilst controlling for league position. This control is necessary since researchers have

discovered that fans change their attendance patterns according to promotions or relegations of teams. The approach used is as follows:

(4) 𝑙𝑛(𝐴) = 𝛼1+ 𝛽1𝑃1+ 𝑒1 for 1 season before the current season (5) 𝑙𝑛(𝐴) = 𝛼2+ 𝛽2𝑃2+ 𝑒2 for 2 seasons before the current season.

In both equations, ln(A) is the natural logarithm of the average attendance, Pk (k=1,2) is the final

league position of the team in question k seasons before the current season.

There are 92 football teams within the English football league, so for the purpose of incorporating the final league position in equations (4) and (5), the teams are ranked using a scale of 92 to 1, where 92 denotes that the team is the winner of the Premier League. The motivation for incorporating a big team effect indicator for two seasons prior to the match observation in question is to decrease the potential effect of any temporary variations in the attendance performance relationship. Finally, because successive values tend to be highly correlated, the difference between the variable for each season is incorporated. Overall, four variables are generated: 𝐴𝑃𝑖,2 and ∆𝐴𝑃𝑖,1 (= 𝐴𝑃𝑖,1− 𝐴𝑃𝑖,2) for the home team i and 𝐴𝑃𝑗,2 and

(27)

26

Appendix B

The correlations between bookmakers’ odds for each possible match outcome. In each season the odds are almost perfectly correlated, except for the draw odds. However, draws are known to be incredibly difficult to predict. Hence, the results are not surprising.

Season 2010-11 Correlations for Home Win, Draw, and Away Win (380 observations)

B365H BWH IWH LBH SJH VCH WHH B365H 1 BWH 0.9978 1 IWH 0.9915 0.9918 1 LBH 0.9965 0.9965 0.9935 1 SJH 0.9972 0.9969 0.9926 0.9968 1 VCH 0.9988 0.9979 0.9915 0.9965 0.9974 1 WHH 0.9975 0.9967 0.9911 0.9961 0.9964 0.9972 1 B365D BWD IWD LBD SJD VCD WHD B365D 1 BWD 0.9819 1 IWD 0.9692 0.9647 1 LBD 0.9758 0.9727 0.9741 1 SJD 0.9795 0.9721 0.9656 0.9702 1 VCD 0.9904 0.9828 0.9661 0.9749 0.9815 1 WHD 0.9672 0.9654 0.9499 0.9554 0.9573 0.9656 1

B365A BWA IWA LBA SJA VCA WHA

B365A 1 BWA 0.9975 1 IWA 0.9907 0.9903 1 LBA 0.9961 0.9957 0.9924 1 SJA 0.9965 0.9959 0.9917 0.9959 1 VCA 0.9985 0.9974 0.9904 0.9961 0.9969 1 WHA 0.9969 0.9960 0.9900 0.9950 0.9954 0.9970 1

(28)

27

Appendix C

To investigate whether the favourite-longshot bias exists in the sample of this study the implicit bookmakers’ probabilities for each match outcome are ordered, from lowest implicit probabilities to highest. Then a basic strategy of placing £1 bet on each outcome for each match is applied. The results are presented in Figure 3. The x-axis presents the category of the implicit

probabilities with 1 being the lowest, i.e. the longshots, and 10 being the highest implicit

probabilities, i.e. the favourites. The odds set by Bet 365 were analysed to see if the returns from betting on the longshots exceeded the returns from betting on favourites, in line with bettors’ beliefs. The results are reported in the figure below.

During season 2010-11 the returns when betting for a home win or a draw increase as the odds become shorter. In other words, as the implicit probabilities rise, the returns that can be obtained also rise. However, the returns for the top categories are lower than the returns for the medium categories. The results for the away win outcome imply a negative longshot bias, where betting on the longshots yields higher returns than betting on the favourites. There is some evidence for negative longshot bias during season 2011-12 as well, but for the home win and away win outcomes only. During the 2012-13 season the results suggest that there is some favourite-longshot bias as well. However, in seasons 2013-14 and 2014-15 it is possible to generate abnormal positive returns by betting on home win outcomes for matches that have long odds that fall in category one, i.e. lowest implicit probabilities. Nevertheless, a clear strategy to exploit the bias could not be devised.

(29)

28

Figure 3 The relationship between each result category and returns.

-100,00% -50,00% 0,00% 50,00% 100,00% 1 3 5 7 9 2010-11 Away Win -50,00% 0,00% 50,00% 1 3 5 7 9 2010-11 Home Win -50,00% 0,00% 50,00% 100,00% 1 3 5 7 9 2010-11 Draw -50,00% 0,00% 50,00% 1 3 5 7 9 2011-12 Home Win -50,00% 0,00% 50,00% 1 3 5 7 9 2011-12 Draw -20,00% 0,00% 20,00% 40,00% 1 3 5 7 9 2011-12 Away Win -40,00% -20,00% 0,00% 20,00% 1 3 5 7 9 2012-13 Home Win -100,00% -50,00% 0,00% 50,00% 100,00% 1 3 5 7 9 2012-13 Draw -100,00% -50,00% 0,00% 50,00% 1 3 5 7 9 2012-13 Away Win -100,00% 0,00% 100,00% 1 3 5 7 9 2013-14 Home Win -100,00% -50,00% 0,00% 50,00% 1 3 5 7 9 2013-14 Draw -50,00% 0,00% 50,00% 1 3 5 7 9 2013-14 Away Win -40,00% -20,00% 0,00% 20,00% 1 3 5 7 9 2014-15 Home Win -50,00% 0,00% 50,00% 1 3 5 7 9 2014-15 Draw -50,00% 0,00% 50,00% 1 3 5 7 9 2014-15 Away Win

Referenties

GERELATEERDE DOCUMENTEN

Recalling that betting on away long shots was the least profitable strategy in the weak form efficiency analysis, it comes as no surprise that betting on the away team

In the whole sample and in all-size stocks in both stock exchanges, the highest mean return occurs on days before the Chinese Lunar New Year, with 1.063 and 1.314 in

Before purchasing a video game, hardcore gamers consult expert sources, while casual gamers spend less time on external information search, and can rely on mere

A betting exchange that charges up to the standard five percent commission, offers a limited number of sports, does not seem to be targeting any particular

match id match id Unique match id for every match, given by Date PlayerA PlayerB prob 365A prob 365A Winning probability of player A as implied by B365A and B365B diffrank

But we have just shown that the log-optimal portfolio, in addition to maximizing the asymptotic growth rate, also “maximizes” the wealth relative for one

Niet alleen justitiële gedragsinterventies staan centraal in neurobiologisch onderzoek ge- richt op crimineel gedrag, maar er wordt ook meer onderzoek gedaan naar de voorspel-

Landelijke Huisartsen Vereniging, ‘Notitie: Bewegingsruimte voor de huisartsenzorg, van marktwerking en concurrentie naar samenwerking en kwaliteit’, 26-05-2015, online via