Fantasy football player performance : forecasting using time-varying parameter models

(1)

Bachelor’s Thesis

Fantasy Football Player Performance

Forecasting Using Time-Varying Parameter

Models

January 31, 2018

Lasse Liebrand - 10469532

supervised by prof. dr. C.G.H. Diks Abstract

In this thesis, different time-varying parameter models are used to forecast the performance of players in Fantasy Football, a popular online sports prediction game. Specifically, rolling autoregressive models with varying weight parameters and different window sizes are used. The different forecasting methods are compared to an adaptive and a naive model that serve as benchmark using a stationary bootstrap implementation of the model confidence set procedure. The results indicate that there is no significant difference in predictive ability between the time-varying parameter models, but the set of time-varying parameter models as a whole performs better than the two benchmark models.

Keywords: fantasy football, time-varying parameters, rolling autoregres-sive model, stationary bootstrap, model confidence set

(2)

Statement of Originality

This document is written by Student Lasse Liebrand who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document are original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

1 Introduction

The online sports prediction game Fantasy Football has seen a strong increase in popularity in recent years. The Fantasy Football industry is estimated to have a turnover of more than $7 billion a year in North-America and to have 59 million players annually, according to the Fantasy Sports Trade Association (FSTA, 2017). This recent growth in market value and number of users1 playing the game makes it interesting to study predictions made on Fantasy Football outcomes and improve the quality of these predictions. Sport results prediction in general is growing in popularity rapidly (Miljković et al., 2010). This thesis aims to test different time-varying parameter models and evaluate their predictive performance in the context of Fantasy Football.

When playing the game, users fulfill the role of team manager for a virtual team of American football players. The main tasks of the game for users are to set up the weekly line-up for their team, and the draft, which entails making a selection of football players before the season starts while being restricted to a virtual budget. Selecting and lining up players that perform better than teams of competing users will award more points, which is the condition to win. In both tasks, the perception of users on the performance of players is essential. Having more accurate predictions on player performance provides a competitive advantage, as user budgets are spent more efficiently on players and over-paying for players is minimized. In addition, having better predictions on the performance of players results in better decisions when setting the weekly lineup. In order to make a choice, users have many tools at their disposal, such as player statistics, predictions by sports news websites, rankings of experts and recommendations of peers. Because part of these tools are subject to biases and personal preferences, statistically supported forecasting methods are desired.

This paper focuses on developing such methods, and measuring their predictive performance. In order to improve prediction quality of Fantasy Football outcomes,

1

To avoid confusion, throughout the remainder of this text ’user’ refers to a person playing Fantasy Football. ’Player’ refers to an athlete playing American football.

(5)

in this paper the fantasy performance of players that play as Wide Receiver2 are analyzed using time-varying parameter models. In particular, the forecasting ability of different time-varying parameter models as well as the adaptive model and naive model that serve as benchmarks are compared using a bootstrap implementation of the model confidence set procedure by Hansen et al. (2011).

As players play multiple games throughout a season and their career, the performances of a player are measured and form a time series. For time series, several techniques are available to analyze the underlying effects of explanatory variables. Traditional time series models such as autoregressive (AR) models do not account for possible time variation in model parameters. In some instances, it may be preferable to include possible time variation in parameters into a model (Giacomini and White, 2006). In the analysis of Fantasy Football data, the presence of structural breaks in the performance of players due to season breaks makes it desirable to account for time variation in parameters. Models that do account for such time variation in parameters include rolling AR models (Inoue et al., 2017). This paper focuses on applying and comparing the forecasting ability of an AR model and rolling AR model to data from players in the National Football League (NFL). In particular, the data of current top performing Wide Receivers is used, because of the significance of elite Wide Receivers in Fantasy Football. Elite Wide Receivers account for a considerable amount of the total weekly points scored, which is why studying the forecasting accuracy of this position is extra interesting.

The remainder of this thesis is structured as follows. First, a brief guide on fantasy football and detailed background on the AR model and rolling AR model is given. Then, a bootstrap application of the model confidence set procedure is explained. Next, the data and methodology are described before discussing the results in the following section. Finally, Section 5 summarizes the paper and discusses possible future work. A structured overview of this paper can be found on page 2.

2_{Wide Receiver is a position in American football. Wide Receivers are of particular interest}

in Fantasy Football, because within the position a relatively large variance in performance is present. This variance increases the utility of having more accurate predictions.

(6)

2 Theoretical Framework

2.1 Brief Guide on Fantasy Football

The popular online sports prediction game Fantasy Football requires users to form expectations on player performance as accurately as possible. For users, the first part of the game starts before the season starts: the draft. In either a turn-based selection or by means of auction, users select players to fill their team. Having accurate predictions in the draft prevents overpaying on players and ’sleeping’ on good players.

During a Fantasy Football season, the timing of which corresponds to an NFL season save the off-season and runs for sixteen weeks from August to December, users play in a group of eight to sixteen people, called a league. Every week, users select the players from their team for their lineup that they think will perform best, as they play another user head to head. The lined up players are awarded points based on the performance in their real life football match. The nature of American football makes it possible to objectively track player performance. The number of yards, receptions, passes and scored touchdowns is recorded for each player. Using a linear formula, these statistics are used to compute the Fantasy Football score for each player. See Appendix A for standard scoring rules. At the end of the week, when every player has played, the score of each player in a user’s lineup is summed. The user with the highest sum wins that week’s match-up.

For both the draft and the weekly line-up selection, users rely not only on their own gut feeling, but also on recommendations from peers, rankings from experts and statistics. Performance of NFL players measured week by week forms a time series, which can be analyzed and used to forecast the future. The following subsections explore different possible models to forecast player performance and compare forecasting ability of the different models.

2.2 Autoregressive Model

In this thesis a traditional autoregressive (AR) model is compared to rolling autoregressive models with various alternative parameter values. An AR model

(7)

for a time series is a simple model for a time series by choosing a specific finite length of the autoregression. If the past p values are included in the regression, one obtains the model

yt= α + φ1yt−1+ φ2yt−2+ ... + φpyt−p+ εt, t = p + 1, p + 2, ..., p + n, (1)

where α and φ₁ through φ_p are unknown parameters. The process ε_t is the disturbance term with white noise property, meaning that E[εtyt−k] = 0 for all

k ≥ 1. The lagged explanatory variable yt−p is available from time t = p + 1

onwards, as the time series yt is observed starting at time t = 1 to t = n. This

model is referred to an autoregressive model of order p, and is also written as AR(p) (Heij et al., 2004).

The general idea of estimation of time series models is to minimize the sum of squared prediction errors

S(θ) =

n

X

t=1

(yt− f (Yt−1, θ))2, (2)

where f (Y_t−1, θ) is a specified function containing a limited number of unknown parameters θ and Yt−1 is the information set up to time t − 1. If a correct model

specification is used, the prediction errors will be close to the innovations ε_t, so that f (Yt−1, ˆθ) ≈ E[yt|Yt−1]. For an AR(p) model, minimizing the sum of squared

prediction errors S(θ) gives parameter estimates

ˆ β(n) =Xt(n)0Xt(n) −1 Xt(n)0yt(n), (3) ˆ σ2(n) = 1 n − kεˆt(n) 0_ε_ˆ t(n) (4) = 1 n − k h yt(n) − Xt(n) ˆβ(n) i0h yt(n) − Xt(n) ˆβ(n) i . (5)

2.3 Rolling Autoregressive Model

The need for time-varying parameters in forecasting models is rooted in parameter instability that is often prevalent in time series (Inoue et al., 2017). According to the authors, this problem is widely recognized in several often-used economic time

(8)

series, including stock prices and exchange rates. Parameters have the tendency to vary throughout time, because unobserved factors tend to affect the time series of interest starting at unknown times. For instance, the effect that variables have on the stock price of a certain company, may significantly change after a new CEO is appointed in this company. Another example could be a long period of inflation above a certain threshold causing consumers and firms to form their expectations differently (Tucci, 1995). Such a change in the effects of underlying variables is called a structural change (Inoune et al., 2017). When analysing Fantasy Football data, certain obvious potential structural breaks become apparent, such as breaks between seasons, the appointment of a new head coach, or the instalment of a new Quarterback3.

Most recent research in sports prediction focuses on correctly predicting winning sports teams. This paper focuses particularly on individual player performance, and how an advantage can be gained for Fantasy Football users. A paper by (Lutz, 2015) analyses individual player performance using support vector regression. The author concludes that while individual player performance estimates are possible, the errors are relatively high. To achieve the aim of this thesis to improve player performance prediction on an individual level, time-varying parameter models are used.

When tackling the problem of having potential structural breaks in the data, the so-called rolling AR model mainly uses the more recent observations. If the underlying process of a certain time series indeed changes, then parameters estimated using a rolling AR model are changing through time. Consider a univariate time series y_t, with t = 1, ..., T . To assess parameter consistency, let n denote the width of the rolling window. The rolling sample means, variances and 3_{Quarterback is an important position in American football. If a Quarterback performs well,}

(9)

standard deviations are then given by, respectively: ˆ µt(n) = 1 n n−1 X i=0 yt−i, (6) ˆ σ_t2(n) = 1 n − 1 n−1 X i=0 (yt−i− ˆµt(n))2, (7) ˆ σt(n) = q ˆ σ_t2(n). (8)

If the parameters are truly constant, then the rolling estimates should not differ by much. If the parameters change at some point during the measured period, this instability should be captured by the rolling estimates.

When applying the concept of a rolling window to a linear regression model, one obtains a rolling linear regression model. For a window n of width n < T , the rolling linear regression model may be expressed as

yt(n) = Xt(n)βt(n) + εt(n), t = n, ..., T (9)

where yt(n) is an (n × 1) vector of observations, Xt(n) is an (n × k) matrix of

explanatory varirables, β_t(n) is a (k × 1) vector of regression parameters and εt(n) is an (n × 1) vector of error terms (Zivot and Wang, 2007). The rolling least

squares estimates are given by

ˆ βt(n) =Xt(n)0Xt(n) −1 Xt(n)0yt(n) (10) ˆ σ_t2(n) = 1 n − kεˆt(n) 0_ε_ˆ t(n) (11) = 1 n − k h yt(n) − Xt(n) ˆβt(n) i0h yt(n) − Xt(n) ˆβt(n) i (12)

The descriptive statistics derived using a rolling window and assigning equal weights to the periods within the window as described above is useful for detecting periods of instability in the time series. However, this method may produce mis-leading outcomes when used for short-term forecasting, because equally weighted averages are sensitive to extreme values. As described by Zivot and Wang (2007), by weighting the observations in a rolling window differently, the effects of extreme

(10)

observations on moving averages estimates can be mitigated. A common method consists of placing more weight on the most recent observations. The least squares estimates are obtained by minimizing the criterion

SW(β) = n

X

t=1

λn−t(yt− x0tβ)2, (13)

where 0 < λ < 1. This criterion assigns larger weights to more recent observations and allows for relatively larger residuals for older observations (Heij et al., 2004).

2.4 Benchmark Models

The autoregressive model and rolling autoregressive model as described above are compared to the performance of two so-called benchmark models: the adaptive model and the naive model.

2.4.1 Adaptive Model

The adaptive model (Chow et al., 2011) is given by

ˆ

yt+1 = αyt+ (1 − α)ˆyt (14)

With repeated substitution, the forecast can be expressed as

ˆ

yt+1= αyt+ α(1 − α)yt−1+ α(1 − α)2yt−2+ ... (15)

The adaptive model forecast consists of a weighted average of past observations of the variable at hand with exponentially declining weights.

2.4.2 Naive Model

The naive model is a variation of the adaptive model where α = 1. In this case, the model is given by

ˆ

(11)

This means that the one-step-ahead prediction of y_t is equal to y_t itself, hence the name naive model.

2.5 Model Confidence Set Procedure

When multiple alternatives are available, deciding what forecasting model to use is a problem econometricians often face. When comparing just two models, relatively straightforward measures like the squared prediction error or information criteria allow the econometrician to decide what model to use. In this thesis, seven models are compared simultaneously. As described by Hansen et al. (2011) many applications will not have a single model that significantly dominates all alternative models. Nonetheless, by applying their proposed model confidence set (MCS) procedure, the number of models can be reduced to a smaller set of models that contains the optimal model with a certain confidence level.

Briefly summarized, an MCS procedure in this context starts with a set of all competing forecasts, M0, and a criterion for evaluating these forecasts empirically. Each iteration, the alternative hypothesis that the worst performing forecast is indeed worse than the set of the remaining forecasts is tested against the null hypothesis that all forecasts perform equally. To measure the performance of the models, the root mean squared error (RMSE) is computed for every model, which is defined by RMSE_i,m = 1 nf nf X t=1 (yi,t− ˆyi,t)2 !1/2 , (17)

for each player i = 1, 2, ..., 6 and model m = 1, 2, ..., 7 and where n_f = 50 is the number of observations in the prediction sample.

The objective of the MCS procedure is to determine the set of superior objects, M*_{, that contains the best model(s) from a larger collection of models,}

M0 _{(Hansen et al., 2011). To achieve this, a sequence of significance tests}

is performed where objects that are found to be significantly inferior to other elements of M0 are eliminated.

(12)

δM and an elimination rule, eM. First, the equivalence test δM is used to test the

null hypothesis that all remaining objects in M perform equal for any M ⊂ M0. The elimination rule eM then identifies the object of M that is to be removed from M in case the null hypothesis is rejected. Formally, (Hansen et al., 2011) specifies the algorithm as follows:

Step 0. Initially set M = M0.

Step 1. Test H_0,M using δM at level α.

Step 2. If H_0,M is accepted, define ˆM*_1−α = M; otherwise, use eM to eliminate

an object from M and repeat the procedure from Step 1.

The set ˆM*

1−α that remains after the algorithm is completed, is referred to as

the model confidence set.

2.6 Stationary Bootstrap

To test the procedure outlined in the previous paragraph, the distribution of the RMSE must be known to be able to make inferences about the significance of the differences between the RMSE of different models. The bootstrap of Efron (1992) is a powerful tool for approximating the sampling distribution of complicated statistics. This method uses original observations of a sample to construct a sample probability distribution ˆF , putting mass 1/n at each point x1, x2, ..., xn.

With this sampling probability distribution ˆF fixed, a random sample of size n is drawn from ˆF to create X*

i,

X_i* = x*_i, X_i*∼indFˆ (18)

which is called the bootstrap sample. The objective is to estimate the sampling distribution of a specified random variable R(X, F ) possibly depending on both X and the unknown distribution function F . This sampling distribution is approximated by the bootstrap distribution of R*= R(X*, ˆF ).

(13)

Neglecting the dependence of observations will give incorrect answers (Kunsch, 1989). Politis and Romano (1994) introduce a new resampling mathod called the stationary bootstrap that is generally applicable for stationary weakly dependent time series. This method of resampling generates a stationary pseudo-time series (Politis and Romano, 1994). Their proposed algorithm is as follows. Let X₁* be drawn randomly from the N available original observations, so that X₁* = XI1. With probability p let X₂* again be drawn from the N original observations; with probability q = 1 − p let X₂*= XI1+1 so that X

*

2 would be the next observation

in the original time series. In case X_N has been drawn at random and the next random draw requires the next observation in the original time series to be picked, the stationary bootstrap method wraps the data around in a circle so that X1

follows XN (Politis and Romano, 1994).

3 Methodology

3.1 Description of Fantasy Football Data

In this section, the data and methodology are discussed. Player data from the NFL website are used. This data set includes all relevant statistics necessary to compute the Fantasy Football score, such as number of receptions, touchdowns and receiving yards. These data are linearly transformed into a single value, the Fantasy Football Score (FFS), using standard Fantasy Football scoring as explained in Appendix A.

The data of a selection of 6 active Wide Receivers is used. The selection is based on performance in the past seasons and having played a minimum of 6 seasons in the NFL. The minimum of having played at least 6 seasons ensures there is a sufficient amount of observations. Over the past 6 years, the selected players consistently finished in the top 15 of fantasy points for Wide Receivers. Because the Wide Receiver position accounts for a considerable amount of weekly Fantasy Football points, the performances of this position is especially interesting to study. In addition, only Wide Receivers performing well for the past 6 years are researched because these players are relevant in every league, regardless of

(14)

size (mediocre-tier Wide Receivers are not considered in smaller leagues, as top players are available to fill a smaller amount of teams of users).

For convenience, throughout the rest of this text players will be referred to by the number in the first column in the table below. For instance, player 1 will refer to Antonio Brown. The fantasy football points scored by player i on time t will be referred to as yi,t. For Players 1 and 2, the fantasy football points scored

per game are plotted.

ID Name Games played

1 Antonio Brown 115 2 TY Hilton 94 3 Julio Jones 95 4 Doug Baldwin 110 5 Dez Bryant 113 6 Jeremy Maclin 113

Table 1: Players used for model estimation

Figure 1: Fantasy Football Scores per NFL game for the careers of players 1 and 2 up to December 2017.

3.2 Model Overview

In this section, nine different models are presented. For clarity, the table below summarizes the models that are estimated.

(15)

Model number Description Window Weights

1 AR(1) -

-2 rolling AR(1) 40 equal weights

3 rolling AR(1) 40 decaying, λ = 0.95

4 rolling AR(1) 20 equal weights

6 rolling AR(1) 10 equal weights

8 adaptive model -

-9 naive model -

-Table 2: Summary of estimated models

3.2.1 Autoregressive Model

An AR(1) model using data from the first 40 games of a player is estimated. Minimizing the sum of squared prediction errors of an AR(1) data generating process

yt= c + φ1yt−1+ εt, (19)

as described in Chapter 2 yields estimates ˆc for c and ˆφ1 for φ1. For all players,

the most recent 50 games of their career are forecasted using these estimates. The first forecast (one-step ahead) is given by

ˆ

yn+1= ˆc + ˆφ1yn, (20)

where n is the 51st-to-last observation of the player. The next forecasts are given by

ˆ

yn+h= ˆc + ˆφ1yn+h−1. (21)

3.2.2 Rolling Autoregressive Model

In this section the use of rolling autoregressive models on Fantasy Football data is explained. As explained in the theoretical framework, rolling autoregressive models contain several parameters. In this thesis, the window and exponential

(16)

decay parameters are varied with the aim of finding the best model for predicting Fantasy Football outcomes. Because the relatively short length of the data, the window size is chosen to be relatively short as well. For each rolling autoregressive model, a window size of 40, 20 and 10 observations is tested.

Apart from the varying window size, the weight allocated to each observation is also varied between two alternatives. First, the rolling AR parameters are estimated without assigning different weights to different observations, that is SW(β) is minimized where SW(β) = n X t=1 λn−t(yt− x0tβ)2, (22)

and λ = 1. For models 3, 5, and 7 the decay-parameter is set at λ = 0.95 to place a heavier weight on more recent observations.

3.2.3 Adaptive Model

The adaptive model is estimated using a weighted average of past observations

ˆ

yt+1= αyt+ α(1 − α)yt−1+ α(1 − α)2yt−2+ ... (23)

with parameter α = 0.5.

3.2.4 Naive Model

Forecasts of the naive model follow from its definition

ˆ

yt+1= yt. (24)

The squared prediction errors of this model are used to compare to the squared prediction errors of the best performing model out of the set of model 1 through 7.

(17)

3.3 Model Comparison

Once the RMSE is known for each player and model, the following hypotheses are tested:

H0 : all forecasts perform equally well

Ha: the worst forecast performs significantly worse.

Using the algorithm described above on page 8, the worst-ranked model is tested against the set of other models. Under the null hypothesis, forecasts from all seven different models perform equally well. To test whether the worst-ranked model (that is, the model with the highest RMSE) is significantly worse than the remaining set of models, a bootstrap sample is constructed using the squared prediction errors of the original sample. Let dm_i,t = (yi,t− ˆyi,t)2 be the squared

prediction error for player i at time t in model m. To create the bootstrap sample, all 350 squared prediction errors are shuffled in blocks to create 7 new vectors of squared prediction errors. Initially, a random dm_i,t is drawn from all 350 squared prediction errors. With probability p = 0.7 the following number from the original sample is drawn. With probability q = 1 − p a new random number from the original set of squared prediction errors is drawn. This procedure ’shuffles’ the vectors of squared prediction errors to create seven new vectors of squared prediction errors that are drawn from the distribution of squared prediction errors assumed under the null hypothesis.

By repeating this shuffling procedure for a large number of times, a distribution of the largest RMSE is constructed. The distribution is created from 1000 simulations. To test whether the observed largest RMSE is indeed significantly larger than the observed RMSE of the other models, this largest observed RMSE is compared to the simulated distribution. Let the largest RMSE of each simulation be stored in a (1000 × 1) vector RMSE*. The null hypothesis should then be

(18)

rejected if critical value k > 0.95: k = 1 1000 1000 X j=1

Imax{RMSE_obs} > RMSE* j

> 0.95, (25)

where j is the j − th simulation and I(.) is the indicator function.

4 Results

Chapter 3 focuses on describing the autoregressive model, the rolling autoregressive model and two benchmark models, the adaptive model and naive model, employed on the data for each of the six players. This chapter first highlights the results of the model estimation for player 1 in particular. Next, the Model Confidence Set algorithm is applied on the resulting RMSE of the models using data of player 1.

Because presenting the model estimates and Model Confidence Set algorithm output for all six players and all nine models in full does not contribute to the ease of reading and interpreting the results, only the key outcomes of the forecasts and MCS algorithm are presented for the other players. A summary of model forecasts is available in the Appendix4.

Estimation of models 1 through 7 for player 1 yields seven vectors of 50 observations of squared prediction errors SP E_t,m when used to forecast the most recent 50 career games of player 1. These squared prediction errors are presented in Table 5 in Appendix B. Using these squared prediction errors to compute the RMSE as described in Chapter two of each model gives the following table:

RMSEm=1 RMSEm=2 RMSEm=3 RMSEm=4 RMSEm=5 RMSEm=6 RMSEm=7

11.6570 11.4011 11.9293 12.6480 11.3924 11.9751 12.69201

Table 3: Root mean squared error for m = 1 through m = 7 for player 1 based on forecast errors of player 1’s 50 most recent games played.

Although there is a slight variation in the RMSE of the seven models, the spread is relatively small, indicating that the difference in forecasting errors is

4

Detailed output on parameter estimation for each player and model is available per request at the author: lasse.liebrand@student.uva.nl.

(19)

also relatively small. The model with the highest RMSE is model 7 (rolling autoregressive model with window size w = 10 and weight-parameter λ = 0.95). To test whether model 7 is indeed a worse model to predict Fantasy Football scores, a stationary bootstrap application of the MCS procedure is applied. The following hypotheses are tested:

H0: the forecasts of model 1 through 7 perform equally well

Ha: the forecasts by model 7 are significantly worse compared to the other six models.

Under the null hypothesis, the squared prediction errors between models for a certain observation in time should not vary by much. The squared prediction error vectors are resampled 1000 times to draw 7 new root mean squared errors per simulation. The largest RMSE of each simulation is stored in a (1000 × 1) vector RMSE*. The null hypothesis should be rejected if the real observed root mean square error of model 7, RMSEm=7, is larger than the largest simulated root

mean squared error, RMSE*_j, for at least 95% of the total number of simulations where j indicated the j-th simulation. In mathematical terms, this means H0

should be rejected if the critical value k > 0.95:

k = 1 1000 1000 X j=1 IRMSE_m=7 > RMSE*_j> 0.95. (26)

The simulation returns k = 0.8780, which corresponds to a p-value of p = 1 − k = 0.1220, and therefore there is not enough evidence to infer that model 7 performs significantly worse than models 1 through 7. The values of RMSE* are displayed in the histogram below. In line with the result that k < 0.95, from the histogram it can be seen that RMSE_m=7 lies within two standard deviations of the mean of RMSE*. The results of the simulation are also summarized in the following table.

Simulations Mean Standard deviation Min Max 1000 12.4642 0.18 12.0768 13.1933

(20)

Figure 2: Histogram of simulated RMSE* from models 1 through 7 for player 1. The dotted vertical lines indicate the interval of two standard deviations from the mean.

The analysis above shows that there is not sufficient evidence to infer that models 1 through 7 perform significantly differently. The next step is to test whether the model with the lowest root mean squared error performs significantly better than the two benchmark models. Out of model 1 through 7, model 5 has the lowest root mean squared error (RMSEm=5 = 11.3924). The benchmark models,

model 8 and model 9, have a root mean squared error of RMSE_m=8 = 15.6596 and RMSEm=9 = 12.9868, respectively. Using the same procedure as outlined

above, the following hypotheses are tested:

H0: the forecasts of model 5, 8 and 9 perform equally well

(21)

The null hypothesis should be rejected if k = 1 1000 1000 X j=1 IRMSE_m=8 > RMSE*_j> 0.95. (27)

The simulation returns that k = 0.990, which corresponds to a p-value of p = 1 − k = 0.010, and therefore there is enough evidence to infer that model 8 performs significantly worse than models 5 and 9. A histogram of the simulation shows the resampled distribution of the root mean squared errors of model 5, 8 and 9. The conclusion is in line with the histogram: the root mean squared error of model 8 RMSEm=8 = 15.6596 lies on the far-right tail of the distribution

and lies further than two standard deviations from the mean. The results of the simulation are also summarized in the following table.

Figure 3: Histogram of simulated RMSE* from model 5, 8 and 9 for player 1. The dotted vertical lines indicate the interval of two standard deviations from the mean.

(22)

Table 5: Summary of RMSE* for player 1

According to the Model Confidence Set algorithm, model 8 is removed from the set of models and the previous procedure is repeated for model 5 and 9. The root mean squared error of the two remaining models is given by RMSE_m=5 = 11.3924 and RMSEm=9 = 12.9868 for model 5 and model 9, respectively. Because model

9 has the highest root mean square error, the following hypotheses are tested:

H0: the forecasts of model 5 and 9 perform equally well

Ha: the forecasts of model 9 are significantly worse compared to the the forecasts of model 5.

The null hypothesis should be rejected if

k = 1 1000 1000 X j=1 I RMSEm=9 > RMSE*j > 0.95. (28)

The simulation returns k = 0.9680, which corresponds to a p-value of p = 1 − k = 0.0320. Therefore, H0 should be rejected in favor of the alternative that the

forecasts of model 5 are significantly better. The results of this simulation are shown below in the histogram and table.

Table 6: Summary of RMSE* for player 1

In summary, two separate MCS procedures were applied to the Fantasy Foot-ball Scores of player 1. The first procedure tested whether there is a significantly better set of models within the total set of models containing model 1 through 7. The conclusion of the procedure is that there is no such difference between the seven models (p = 0.1220). The second procedure tested whether model 5, the model with the lowest root mean squared error out of models 1 through 7, and model 8 and 9 performed differently in forecasting the Fantasy Football Scores

(23)

Figure 4: Histogram of simulated RMSE* from model 5 and 9 for player 1. The dotted vertical lines indicate the interval of two standard deviations from the mean.

of player 1. After the first iteration, it was concluded that model 8 performs significantly worse than the other models (p = 0.010) and was therefore removed from the set. In the second iteration, the hypothesis that model 5 and 9 perform equally well was tested versus the alternative that model 9 performs worse than model 5. Another simulation returned that model 9 indeed performs worse than model 5 (p = 0.0320) and therefore the null hypothesis was rejected. The conclu-sion of the second separate MCS procedure is that model 5 performs better than model 8 and 9 in forecasting the Fantasy Football Scores of player 1.

(24)

iteration M min{RMSEobs} max{RMSEobs} p eliminate max{RMSEobs} ?

1 {1,2,...,7} {5}, 11.3924 {7}, 12.69201 0.1220 no 1 {5,8,9} {5}, 11.3924 {9}, 15.6596 0.010 yes 2 {5,8} {5}, 11.3924 {8}, 12.9868 0.0320 yes

Table 7: Summary of MCS procedure for player 1

In the same manner as presented above, these procedures are repeated for the other players. This resulted in the following tables.

1 {1,2,...,7} {1}, 9.0732 {4}, 10.0058 0.2210 no 1 {1,8,9} {1}, 9.0732 {9}, 14.9727 0.000 yes 2 {1,8} {1}, 9.0732 {8}, 11.3985 0.0120 yes

1 {1,2,...,7} {1}, 11.3472 {4}, 13.7906 0.1660 no 1 {1,8,9} {1}, 11.3472 {9}, 17.2589 0.0001 yes 2 {1,8} {1}, 11.3472 {8}, 13.5801 0.0390 yes

(25)

1 {1,2,...,7} {3}, 9.9160 {1}, 11.3348 0.1360 no 1 {1,8,9} {3}, 9.9160 {9}, 12.8451 0.0070 yes 2 {1,8} {3}, 9.9160 {8}, 10.2491 0.2900 no

1 {1,2,...,7} {3}, 8.1593 {1}, 9.1627 0.2600 no 1 {3,8,9} {3}, 8.1593 {9}, 11.6829 0.0000 yes 2 {3,8} {3}, 8.1593 {8}, 9.2306 0.0640 no

5 Conclusion

This thesis explored possible methods of forecasting Fantasy Football player performance using time-varying parameter models. A total of nine models was estimated per player. The forecasts based on these nine models were compared using a stationary bootstrap implementation of the model confidence set procedure.

The results indicated that between the first seven models, no single model performs significantly better than the remainder of the set. In a second separate procedure of the model confidence set, it was found that the time-varying parame-ter model with the lowest root mean squared error generally performs significantly better than the two benchmark models, the adaptive model and naive model.

When interpreting these results, it should be noted that there is a large number of unknown factors that affects the outcome of sports games and individual performance. For instance, not accounting for interactive effects of team schedules limits its accuracy. Future research could incorporate factors like team schedule and interactive effects of player performance between team members.

(26)

1 {1,2,...,7} {1}, 9.2447 {7}, 11.7789 0.2180 no 1 {1,8,9} {1}, 9.2447 {9}, 10.7081 0.1610 no

References

G. C. Chow et al. Usefulness of adaptive and rational expectations in economics. Center for Economic Policy Studies, Princeton University, 2011.

B. Efron. Bootstrap methods: another look at the jackknife. In Breakthroughs in statistics, pages 569–593. Springer, 1992.

R. Giacomini and H. White. Tests of conditional predictive ability. Econometrica, 74(6):1545–1578, 2006.

P. R. Hansen, A. Lunde, and J. M. Nason. The model confidence set. Econometrica, 79(2):453–497, 2011.

C. Heij, P. de Boer, P. Franses, T. Kloek, H. van Dijk, and A. Rotterdam. Econometric Methods with Applications in Business and Economics. OUP Oxford, 2004. ISBN 9780199268016. URL https://books.google.nl/books? id=hp4vQZZHfbUC.

A. Inoue, L. Jin, and B. Rossi. Rolling window selection for out-of-sample forecasting with time-varying parameters. Journal of Econometrics, 196(1): 55–67, 2017.

H. R. Kunsch. The jackknife and the bootstrap for general stationary observations. The annals of Statistics, pages 1217–1241, 1989.

R. Lutz. Fantasy football prediction. arXiv preprint arXiv:1505.06918, 2015. D. Miljković, L. Gajić, A. Kovačević, and Z. Konjović. The use of data mining for

basketball matches outcomes prediction. In Intelligent Systems and Informatics (SISY), 2010 8th International Symposium on, pages 309–312. IEEE, 2010.

(27)

D. N. Politis and J. P. Romano. The stationary bootstrap. Journal of the American Statistical association, 89(428):1303–1313, 1994.

M. P. Tucci. Time-varying parameters: a critical introduction. Structural Change and Economic Dynamics, 6(2):237–260, 1995.

E. Zivot and J. Wang. Modeling financial time series with S-Plus , volume 191.R

(28)

Appendix A: Standard Scoring Fantasy Football

Passing Yards: 1 point per 25 yards passing

Passing Touchdowns: 4 points

Interceptions: -2 points

Rushing Yards: 1 point per 10 yards

Rushing Touchdowns: 6 points

Receptions: 1 point

Receiving Yards: 1 point per 10 yards

Receiving Touchdowns: 6 points

Fumble Recovered for a Touchdown: 6 points

2-Point Conversions: 2 points

Fumbles Lost: -2 points

Table 13: Standard offensive scoring with one point per reception according to NFL.com rules

(29)

t SP Et,m=1 SP Et,m=2 SP Et,m=3 SP Et,m=4 SP Et,m=5 SP Et,m=6 SP Et,m=7 66 97.6195 97.6195 64.4225 36.559 128.8154 73.8792 37.2231 67 0.0881 0.6964 2.0897 15.4008 0.4464 0.3233 15.3015 68 10.2077 7.6168 0.1065 2.7432 18.7935 0 2.2197 69 0.5755 0.0962 3.2537 8.4596 4.3436 1.2548 6.856 70 46.2407 39.1453 14.8177 5.5668 63.8219 19.851 5.5638 71 65.9816 50.8595 31.0283 16.9057 71.6135 35.6391 15.3946 72 190.3682 155.3937 127.8496 95.1647 189.3532 138.4398 90.6896 73 0.0424 7.3378 12.6303 22.0753 2.7829 8.9458 18.3045 74 102.744 148.3187 216.6599 245.0918 117.3042 204.0676 237.7352 75 84.4137 148.1531 241.5705 179.7325 128.4331 256.0194 206.4175 76 119.4724 180.7363 179.4243 29.6568 193.5287 245.1641 44.7555 77 9.6789 4.4159 15.3989 156.52 0.3048 0.0601 135.6962 78 4.1931 12.1552 13.1994 5.4795 9.4432 19.5253 6.7449 79 860.5875 770.2474 814.7288 927.5018 754.9361 758.5431 915.1838 80 124.1095 100.1675 10.7418 2.102 196.6641 18.5108 2.8615 81 125.4436 201.5355 318.137 362.4427 115.186 294.8719 359.3847 82 216.47 164.041 152.7206 231.8545 161.8051 129.0605 249.3733 83 32.9283 65.8164 90.4188 57.2073 45.131 92.571 49.7557 84 832.0903 691.6521 684.4408 781.0757 722.4698 689.506 831.7661 85 132.5356 153.396 205.5531 219.7007 126.5844 245.417 253.8543 86 407.7374 239.0751 220.3094 152.6654 265.8833 255.4868 191.4661 87 98.4356 99.8282 89.3193 89.7165 107.2489 72.2868 78.85 88 187.8677 240.7556 271.1601 349.0471 216.0347 278.082 346.7516 89 91.8832 9.5947 4.6262 210.9055 24.0697 18.9346 216.6605 90 4.7562 0.1621 1.513 17.9603 0.0792 1.3143 21.4543 91 11.1396 0.1289 0.6438 14.9828 0.0525 0.0507 16.5587 92 122.0247 213.297 219.3157 329.4276 194.1026 199.9467 344.8285 93 5.567 17.8108 15.1034 185.4972 6.708 3.3788 233.1104 94 7.6913 2.1618 0.2102 14.6934 1.4513 0.1919 28.2941 95 228.8197 128.8775 142.484 161.9613 125.8711 149.6508 145.3455 96 40.7663 55.6877 47.3763 29.7775 66.3264 57.366 30.8184 97 197.8804 72.0138 33.5627 107.6148 73.88 29.2075 110.6467 98 16.8213 30.1768 24.7433 2.4456 40.104 35.1125 5.1327 99 31.4875 112.0491 224.0298 106.0443 98.2233 271.9986 99.7645 100 74.7262 211.7778 312.7255 194.1272 189.593 349.157 188.5085 101 80.935 7.1042 3.3394 30.9277 8.7571 18.2175 25.5877 102 82.1733 41.1744 61.1016 114.0105 29.1685 46.422 115.5128 103 93.9307 130.6063 87.4417 103.2534 150.1578 95.9947 102.4959 104 97.8496 17.1668 1.5881 15.8016 21.9191 0.6375 11.4158 105 169.7636 220.0333 161.4796 154.2682 246.8298 174.0367 163.2851 106 88.1848 6.7957 1.2584 4.3861 13.8593 0.2378 4.7167 107 87.309 61.0386 101.1631 160.314 50.0521 87.7088 148.8495 108 19.8457 26.577 0.3241 0.2345 39.7384 0.0081 0.1324 109 38.7156 110.3704 94.3255 67.1862 103.3494 94.3859 58.7458 110 91.7049 219.6288 203.1986 183.6099 204.0573 203.2111 178.7947 111 677.4367 421.0486 449.9472 418.1487 446.216 424.4347 372.2254 112 232.0598 369.9709 703.5845 878.8452 264.5552 656.4716 844.7678 113 1.3671 8.5753 7.6928 6.8345 6.0091 16.145 24.4203 114 155.7568 109.3784 107.195 81.6146 119.3264 116.0035 92.8066 115 293.8969 316.9619 325.4209 411.0574 323.9655 282.4201 367.2943

Fantasy football player performance : forecasting using time-varying parameter models

Bachelor’s Thesis